Ouafae Kaissi, Ahmed Moussa, Brigitte Vannier, Abdellatif Ghacham


Microarray Data Analysis, Gene Selection, ComparativeAnalysis.


Motivation: In the analysis of experiments that involves the high density of oligonucleotide chips, it is important to generate list of genes or ‘targets’ from the genome wide data set that contains a lot of information. Gene selection is a process that seeks to identify the most significant genes which reveal large expressions changes between the baseline experiments and the conditions. Even though, several algorithms like T-test and other derived statistical algorithms were used for that selection process, the suitable Pvalue Cutoff remains difficult to choose. Therefore, one solution consists of using a False Discovery Rate (FDR) control. The Significance Analysis of Microarray (SAM) and the T-test Benjamini & Hochberg (BH) algorithms have been successfully used in such way. However, the reproductivity of results and their impact on the genes and/or experiments classification, while using different soft tools remain a subject of discussion. Methods: we use two Affymetrix data sets, when we look for identifying list of genes under SAM and T-test-BH algorithms with FDR control running under R/Bioconductor project and Bioinformtics Tool Box of Math works and Expander. Results: The list of selected genes changes significantly when using the two algorithms under R/Bioconductor project, Bioinformtics Tool Box of Math works and Expander. By means of data provided from publicly databases, we illustrate, that the permutation process of the multiple statistical T-test (SAM and BH) may affect results of selection process. Moreover, list of genes using the two Soft is affected by the choice of the Pvalue Cutoff for identifying true differential expressed genes. According to this work, we present some results clarifying sensitivity and efficiency of used soft and its influence in gene selection process. Hierarchical classification of selected genes and corresponding experiences confirm the influence of both methods and tools on the outcome of gene expression data analysis.

Important Links:

Go Back