The Bioinformatics team of the Microbiology Unit has developed an expertise in the data analysis and integration of a wide range of high-throughput technologies
Dedicated pipelines have been developed for microbial biodiversity analysis of environmental samples, particularly for extreme environments like deep geological formations (e.g. clay layers at +200 m depth) selected for long-term subsurface disposal of radioactive waste.
Extra efforts are invested in the pre-processing of the (meta)genome sequencing data to increase its reliability and ensure overall quality (entailing chimera detection and error correction of 16S rRNA-based metagenomics via specialised algorithms). High-quality reads are mandatory to ensure an accurate estimate of the microbial diversity and for the prediction of the metabolic potential of these communities.
Software packages that have been developed within this context are:
NoDe: Algorithm for sequencing error correction in 16S rRNA 454 pyrosequencing reads. The software can be downloaded here.
IPED: Algorithm for sequencing error correction in 16S rRNA Illmina paired-end sequencing reads. The software can be downloaded here.
CATCh: Ensemble classifier that integrates different software tools to predict chimeric sequences (i.e. PCR artefacts) The software can be downloaded here
Genome sequencing: De novo assembly of bacterial genomes, and resequencing of either highly related strains isolated from the environment or mutant strains resulting from lab evolution experiments. In these (re)sequencing experiments special attention is given to the detection of mobile genetic elements (insertion sequences, transposons) as they are major facilitators of bacterial evolution.
For the genome annotation (e.g. genome project data for Cupriavidus metallidurans CH34 (and resequenced, related strains) and Arthrospira sp. PCC 8005) we rely on the MaGe platform (http://www.cns.fr/agc/microscope/) which links numerous well-known biological databases and systems and integrates results obtained by a wide range of bioinformatics methods, allowing the exploration of gene context by searching for conserved synteny and reconstruction of metabolic pathways. The advanced web interface permits multiple users to refine the automatic assignment of gene product functions.
Figure 1. Circular representation of the two main replicons of Cupriavidus metallidurans CH34.
Until a few years ago all transcriptomics experiments were performed using expression and tiling microarrays, for which preprocessing pipelines were in-house developed for different types of microarrays (custom oligonucleotide spotted slides, Affymetrix, Nimblegen, and Agilent). However, recent advances in sequencing technologies have resulted in RNA-seq applications where RNA is directly sequenced rather than measured indirectly via hybridization on a microarray. Dedicated pipelines have now been developed to analyze such data sets.
Regulatory network reconstruction
By combining gene expression data and regulatory DNA motif detection schemes, novel algorithms are applied to reconstruct the transcriptional regulatory networks. Recently, special efforts have been invested in the analysis of the dynamic evolution and rewiring of regulatory networks in strains subjected to adaptive lab evolution experiments under heavy metal stress.
Contact person: Pieter Monsieurs