Développement d’outils pour l’analyse de données de ChIP-seq et l’identification
des facteurs de transcription

Mercier, Eloi

Show metadata

Permalink

https://hdl.handle.net/1866/6038

Thesis or Dissertation

Mercier_Eloi_2011_memoire.pdf (7.682Mb)

2011-10 (degree granted: 2011-12-01)

Author(s)

Mercier, Eloi

Advisor(s)

Gottardo, Raphaël

Level

Master's

Discipline

Bio-informatique

Keywords

Abstract(s)

La méthode ChIP-seq est une technologie combinant la technique de chromatine immunoprecipitation avec le séquençage haut-débit et permettant l’analyse in vivo des facteurs de transcription à grande échelle. Le traitement des grandes quantités de données ainsi générées nécessite des moyens informatiques performants et de nombreux outils ont vu le jour récemment. Reste cependant que cette multiplication des logiciels réalisant chacun une étape de l’analyse engendre des problèmes de compatibilité et complique les analyses. Il existe ainsi un besoin important pour une suite de logiciels performante et flexible permettant l’identification des motifs. Nous proposons ici un ensemble complet d’analyse de données ChIP-seq disponible librement dans R et composé de trois modules PICS, rGADEM et MotIV. A travers l’analyse de quatre jeux de données des facteurs de transcription CTCF, STAT1, FOXA1 et ER nous avons démontré l’efficacité de notre ensemble d’analyse et mis en avant les fonctionnalités novatrices de celui-ci, notamment concernant le traitement des résultats par MotIV conduisant à la découverte de motifs non détectés par les autres algorithmes.

ChIP-seq is a technology combining the chromatin immunoprecipitation method with high-throughput sequencing and allowing the analysis of transcription factors in vivo on a genome wide scale. The treatment of such amount of data generated by this method requires strong computer resources and new tools have been recently developed. Though this proliferation of software performing only one step of the analyze leads to compatibility problems and complicates the analysis. Thus, there is a real need for an integrated, powerful and flexible pipeline for motifs identification. Here we proposed a complete pipeline for the analysis of ChIP-seq data freely available in R and composed of three R packages PICS, rGADEM and MotIV. Analyzing four data sets for the human transcription factors CTCF, STAT1, FOXA1 and ER we demonstrated the efficiency of or pipeline and highlighted its new features, especially concerning the processing of the results by MotIV that led to the identification of motif not detected by other methods.

Collections

This document disseminated on Papyrus is the exclusive property of the copyright holders and is protected by the Copyright Act (R.S.C. 1985, c. C-42). It may be used for fair dealing and non-commercial purposes, for private study or research, criticism and review as provided by law. For any other use, written authorization from the copyright holders is required.