PhD Candidate Stephanie Schneider Chosen for Oral Distinction at GSI Research Symposium

October 29th, 2010

BU Bioinformatics PhD candidate Stephanie Schneider was chosen for oral distinction and gave a talk at the GSI Research Symposium, held on October 6, 2010. Ms. Schneider was chosen based on her abstract:

Improving the Interpretation of Affymetrix GeneChip Data Using Coefficient of Concordance and Graph Theory

A great deal of gene expression data is available for mining in various public repositories, such as NCBI’s Gene Expression Omnibus (GEO). This data can be a vaulable resource for performing meta-analyses, cross-species comparisons, etc. The most common type of data currently found in GEO is from Affymetrix GeneChip\circledR Arrays; the top five most common gene expression microarrays constitute over 100,000 out of the nearly 500,000 samples in GEO. One potential complication of interpreting Affymetrix microarray data is that each array contains multiple probe sets mapping to the same gene. The average number of probe sets per gene is approximately two, but many genes are represented by ten or more probe sets each. Occasionally, individual probe sets for the same gene show different trends in expression across experimental conditions, a situation that must be resolved in order to accurately interpret the data. We have developed a generalized and improved analysis using Kendall’s W coefficient of concordance for statistical consolidation of concordance probe set groups and graph searching algorithms for a posteriori identification of discordant groups for further analysis (e.g., detection of differential expression of splice variants). We compare this approach to other statistical approaches to this problem, as well as to the custom CDF (chip definition file) approach, and show how the use of our approach is simpler, more widely applicable, and potentially reveals more informative results that the other approaches.  In particular, our approach has revealed that certain probe sets of the same gene respond differently than others in different conditions, i.e., they may be concordant in some conditions but discordant in others, thus providing additional information about tissue-specific expression.