Fall 2021 Student Seminars

 

September 8

Dileep Kishore
Advisors: Daniel Segrè  & Pankaj Mehta
Title: Control of microbial communities in bioreactors using deep reinforcement learning

Abstract:
Microorganisms can be engineered to produce commercially valuable molecules such as antibiotics, biofuels, and pharmaceutical products. Individual microbial species might be subjected to high metabolic load or might be incapable of simultaneously catalyzing the multiple reactions needed for a specific metabolic objective. Synthetic microbial consortia alleviate these limitations by distributing the required pathways and reactions among the individual species and are emerging as an important approach in biomanufacturing. However, owing to the complex interdependence between organisms and environments, the rational design of synthetic communities still constitutes an unresolved challenge.

Our project will uniquely leverage and combine two complementary fields of expertise. The first field is the use of genome-scale metabolic models (GEMs) to generate mechanistic predictions of growth rates and whole-cell reaction fluxes of microbes within communities grown in a simulated bioreactor. The second area is the utilization of reinforcement learning using deep neural networks to control key parameters of the simulated bioreactor. Ultimately, we aim to develop a reinforcement learning algorithm that will learn optimal environmental control strategies to steer a microbial community towards a specific goal, such as reaching a specific taxonomic distribution or producing desired metabolites. We will train the reinforcement learning framework through community-level simulations of GEMs for different microbial species, and simulate the implementation of the algorithm in experimental bioreactor systems. This project will address key questions about the rational design and engineering of synthetic microbial consortia, especially those pertaining to environmental control. The algorithm will be broadly applicable to any experimental system for which GEMs are available and help researchers automate optimal control strategies for complex microbial communities.

Aaron Chevalier
Advisor: Josh Campbell
Title: The Mutational Signature Comprehensive Analysis Toolkit (musicatk) for the discovery, prediction, and exploration of mutational signatures

Abstract:
Mutational signatures are patterns of somatic alterations in the genome caused by carcinogenic exposures or aberrant cellular processes. To provide a comprehensive workflow for preprocessing, analysis, and visualization of mutational signatures we created the Mutational Signature Comprehensive Analysis Toolkit (musicatk) package. musicatk enables users to select different schemas for counting mutation types and easily combine count tables from different schemas. Multiple distinct methods are available to deconvolute signatures and exposures or to predict exposures in individual samples given a pre-existing set of signatures.  Additional exploratory features include the ability to compare signatures to the COSMIC database, embed tumors in two dimensions with UMAP, cluster tumors into subgroups based on exposure frequencies, identify differentially active exposures between tumor subgroups and plot exposure distributions across user-defined annotations such as tumor type. Overall, musicatk will enable users to gain novel insights into the patterns of mutational signature observed in cancer cohorts.

 

September 22

Xingyi Shi
Advisors: Marc Lenburg & Jennifer Beane
Title: NSCLC subtype-associated molecular and cellular alterations in the normal-appearing bronchial epithelium

Abstract:
We have previously identified lung cancer-associated gene expression alterations in normal bronchial airways of ever smokers. In this study, we sought to identify airway gene expression differences between patients with squamous (LUSC) and adenocarcinoma (LUAD) NSCLC subtypes and determine if these reflect changes in cell type composition. Bronchial brushings from ever smokers undergoing bronchoscopy for suspicion of lung cancer (AEGIS I & II trials, n=938) profiled using microarrays were leveraged to identify genes up-regulated in LUAD and LUSC that were enriched for ciliary and keratinization pathways, respectively. The NSCLC signature was concordantly enriched in an independent set of bronchial brushings (n=133), normal lung tissue adjacent to LUSC and LUAD tumors (n=109), and in a dataset of bronchial brushes from patients with premalignant lesions (n=137). Using scRNA-seq data on bronchial brushes (n=17), we identified cell type-specific gene expression and observed that LUSC patients had higher expression of secretory cell-specific genes and lower expression of a ciliated cell-specific genes compared to LUAD patients. Both LUAD and LUSC patients have lower levels of the immune cell-specific genes than patients without lung cancer. Our results suggest that there are different responses to injury in the normal-appearing airway that are associated with NSCLC subtype.

Dakota Hawkins
Advisors: Cynthia A. Bradham & W. Evan Johnson
Title: ICAT: A Novel Method for Identifying Cell-types Across Treatments in Single-Cell RNA Sequencing Data

Abstract:
Across many systems, from cancer to developmental biology, single-cell RNA sequencing has proven to be a powerful tool to identify and characterize distinct cell populations. Often, unsupervised clustering techniques are used to identify cell-types in a dataset. Identifying classes of cells, however, become more difficult when multiple biological conditions are present. In order to properly identify cell-types, current methods attempt to “integrate” biological conditions into a shared space. Groups of cells are then identified using traditional clustering methods, such as Louvain Community Detection. We present a new algorithm ICAT (Identifying Cell-types Across Treatments) to accurately label and distinguish cells-types across biological conditions. Motivated by self-supervised learning frameworks, ICAT first learns a defining space of cell-types, transforms cells into this space, and then performs a semi-supervised clustering step to group cells. Using simulated and real data to benchmark performance, ICAT outperforms current state-of-the-art methods over a variety of metrics. Excitingly, while offering competitive stand-alone performance, ICAT can also extend current integration methods to provide performance gains and aid in interpretability. We also present a brief overview of current efforts to spatially characterize cell populations by combining scRNAseq results with fluorescent imaging.

 

October 6

Anthony Federico
Advisor: Stefano Monti
Title: Structure Learning for Gene Regulatory Networks

Abstract:  
Inference of biological network structures is often performed on high-dimensional data, yet is hindered by the limited sample size of high throughput genomics data typically available. To overcome this challenge, often referred to as the “small n, large p problem,” we exploit known organizing principles of biological networks that are sparse, modular, and likely share a large portion of their underlying architecture. We present SHINE – Structure Learning for Hierarchical Networks – a framework for defining data-driven structural constraints and incorporating a shared learning paradigm for efficiently learning multiple Markov networks from high-dimensional data at p/n ratios not previously feasible. We evaluated SHINE on Pan-Cancer data comprising 23 tumor types, and found that learned tumor-specific networks exhibit expected graph properties of real biological networks, recapture previously validated interactions, and recapitulate findings in literature. Application of SHINE to the analysis of subtype-specific breast cancer networks identified key genes and biological processes for tumor maintenance and survival as well as potential therapeutic targets for modulating known breast cancer disease genes.

Nicholas O’Neill
Advisors:  Lindsay Farrer & Xiaoling Zhang
Title:  Bulk Brain Tissue Cell Type Deconvolution with Bias Correction for Single-Nuclei RNA-Seq

Abstract:
Quantifying cell type percentages from bulk tissue RNA-sequencing enables researchers to better understand the components making up complex systems.  When a single-cell RNA-sequencing (scRNA-seq) reference dataset is available deconvolution algorithms calculate these percentages without the need for additional sequencing.  scRNA-seq data is largely unavailable for human brain due to experimental difficulty, but various single-nuclei RNA-sequencing (snRNA-seq) datasets are available.  MuSiC is a popular and accurate deconvolution algorithm based on weighted non-negative least squares regression with multi-subject single-cell expression reference.  Like other methods, it struggles to compensate for the strong bias between bulk and snRNA-seq.  We propose a modification to MuSiC’s weighing scheme to compensate for this sequencing bias, named mMuSiC.  We show that this modification improves estimation accuracy in simulated and real human brain data.

 

October 20

Jamie Strampe
Advisor: John Connor & Tom Kepler
Title: Differential Expression Analysis of Bundibugyo Virus-Infected Macaques

Abstract:
Bundibugyo virus (BDBV) is the most recently discovered pathogenic species of ebolavirus, with mortality rates of 25% and 51% reported in the two outbreaks it has caused in humans. BDBV infection is non-uniformly lethal in macaques, with a mortality rate of ~40%. The disease course in identically inoculated macaques ranges from asymptomatic to fatal, which allows us to characterize and compare the host immune responses of animals that died and animals that survived. In our study, 19 cynomolgus macaques were infected with BDBV and sampled longitudinally until death, which ranged from 9-17 days post infection (dpi), or an endpoint of 28 dpi in monkeys that survived the infection. Four surviving monkeys were found to have low clinical scores and absent viremia (which we term “mild” disease), seven surviving monkeys had clinical disease and viremia (“severe”), and eight monkeys died (“fatal”). All animals received a post-exposure prophylaxis treatment of VSV pseudotyped with viral glycoprotein (GP); 12 monkeys received a clinically relevant BDBV GP, and 7 control monkeys received a non-relevant Lassa virus GP. In both the treatment and control groups, ~58% survived infection. Our differential expression analysis shows that all animals had significantly upregulated interferon stimulated genes (ISGs) at 2 and 5 dpi. ISGs and acute phase response genes become highly upregulated at day 7 and remain high until day 12 in severe survivors or until death in fatal animals. Gene expression returns to baseline in animals with mild disease by day 7. Animals with severe disease that survive largely overlap in gene expression with those that die; however, fatal cases could be differentiated by increases in genes associated with neutrophilia and platelet activation starting at day 7 post infection.

Boting Ning
Advisors: Avrum Spira & Marc Lenburg
Title: DReAmiR: Differential regulation analysis quantifies miRNA regulatory roles and molecular subtype-specific targets

Abstract:
Rewiring of transcriptional regulatory networks has been implicated in many biological and pathological processes. However, most current methods for detecting rewiring events (differential network connectivity) are not optimized for miRNA-mediated gene regulation and fail to systematically examine predicted target genes in study designs with multiple groups. We developed a novel method to address the current issues. The method first estimates miRNA-gene expression correlations with Spatial Quantile Normalization to remove the mean-correlation relationship. Then, for each miRNA, genes are ranked by their correlation strength per group. Enrichment patterns of predicted target genes are compared using the Anderson-Darling test and significance levels are estimated via permutation. Finally, graph embedding or difference in enrichment score maximization is performed to prioritize group-specific target genes. In miR-155 KO RNA-seq data from four mice immune cell types, our method successfully identified miRNA with known regulatory differences and the prioritized targets were involved in functional pathways with cell-type specificity. Moreover, the subtype-specific targets identified from the TCGA BRCA data were uniquely altered by miRNA KO in the cell line of the same subtype. Our work provides a new approach to characterize miRNA-mediated gene regulatory network rewiring across multiple groups from transcriptomic profiles. The method may offer novel insights into cell-type and cancer subtype specific miRNA regulatory roles.

 

November 3

Rui Hong
Advisor: Josh Campbell
Title: Novel Method for Identifying Expression Pattern Associated with Single Cell Copy Number Variants

Abstract:
Lung adenocarcinoma (LUAD) is one of the most aggressive and fetal types of lung cancer. Patients with LUAD exhibits high resistance to conventional radiotherapy or chemotherapy. Copy number variants (CNV) is a phenomenon in which sections of the genome is repeated or deleted, which is associated with the abnormal gene expression and lung cancer susceptibility. However, most studies focused on the characterization of CNV in LUAD patients. The impact of CNV on tumor transcriptome is still unclear. We develop a novel method to study the association between CNV and downstream transcriptome profile using scRNA-seq data. First, the CNV profile is inferred based on scRNA-seq data with inferCNV package. Then, for each CNV region, we identified genes expression value which are associated with the CNV pattern (transGenes). We perform gene set enrichment analysis on the transGenes to study pathways activity that might be affected by CNV. Finally, we used mediation analysis to identify the upstream genes located in CNV region (cisGene) that mediates the effect of CNV on the transGenes expression.

Lucas Schiffer
Advisor: Evan Johnson
Title: tuberculosis – Human Host Gene Expression Data for Machine Learning

Abstract:
Prior to the emergence of COVID-19, tuberculosis had been the leading cause of infectious disease mortality globally for many years. Mycobacterium tuberculosis, the causative agent of tuberculosis disease, is spread by aerosolization and, with sufficient exposure, results in latent or active tuberculosis. Furthermore, recent theoretical work by Drain et al. also suggests disease may progress along a continuum with biomarkers corresponding to severity and temporality. However, molecular identification of either a dichotomous classification or continuous regression outcome remains ambiguous and inconsistent when studied through the lens host transcriptomics. Machine learning offers promise to dissect host transcriptomics and produce such a model, but requires an enormous volume of input data for training and validation. The recently released tuberculosis R/Bioconductor package provides such data en masse – it features more than 10,000 samples from both microarray and sequencing studies that have been processed from raw data through a hyper-standardized, reproducible pipeline. In the context of a disease that has claimed more than 1,000,000,000 lives in the past 200 years, the tuberculosis package and its related pipeline offer novel opportunities to study host pathophysiology with state of the art techniques – detailed methods of both will be described.

 

November 17

Kritika Karri
Advisor:David Waxman
Title: Transcriptomic Landscape of lncRNAs in Non-alcoholic steatohepatitis (NASH) and Liver fibrosis using Single-cell RNA sequencing of Mouse Liver

Abstract:
Long non-coding RNAs (lncRNAs) comprise a heterogenous class of highly tissue-specific and cell-type specific RNAs whose low expression limits detection in minor cell subpopulations and whose function in both homeostatic liver and disease phenotype is poorly understood. Here, we devise an integrative approach to assemble the murine liver non-coding transcriptome from >2,000 RNA-seq datasets. The resulting comprehensive set of ~48,000 liver-expressed lncRNAs was used as a reference to analyze changes in single cell transcriptome profiles in three models of liver disease: AMLN diet-induced NASH, CCl4 (hepatotoxicant)-induced liver fibrosis, and liver exposed to TCDD, a chlorinated aromatic hydrocarbon and AhR agonist. We applied trajectory inference algorithms to uncover lncRNA zonation patterns in five major hepatic cell populations and their dysregulation in diseased states, including NASH-associated macrophages (NAMs), a hallmark of NASH linked to disease progression. Several thousand lncRNAs were dysregulated in NAMs, including lncRNAs that mark macrophage expansion during NASH. Several hundred lncRNAs were expressed in collagen-producing myofibroblasts, a key source of the fibrous scar in fibrotic liver. We also characterized changes in the hepatic lobule zonation profiles of xenobiotic-responsive lncRNAs in multiple liver cell populations. Finally, we applied regulatory network analysis using bigScale2 to associate individual lncRNAs with key biological pathways and functions. Gene centrality metrics were used to identify lncRNAs likely to have essential regulatory functions in NASH, liver fibrosis and following TCDD exposure. Examples include: lnc10922 (Meg3) and lnc47443 (Fendrr), which respectively emerged as key regulators of Wnt signaling and innate immunity during liver fibrosis; lnc48616 (9530091C08Rik) and lnc13435 (Mir99ahg), regulators of cytoskeleton/hippo signaling; and lnc1966 (Hnf4as-1), an essential regulator of mitochondrial function and lipid metabolism in TCDD-exposed livers. Thus, we have characterized the cell-type expression, hepatic zonation and regulatory network centrality of thousands of liver lncRNAs, many of which are novel, including lncRNAs with predicted roles in liver disease and responses to xenobiotic perturbations.

Howard Fan
Advisor: W. Evan Johnson
Title: Batch Effect Correction and Regional Variation in Oral Microbiome among Adult African American Women in the US

Abstract:
The oral microbiome has been linked to several oral infectious diseases, including dental caries, gingivitis, and periodontitis. As changes in oral microbial community compositions have been shown to be associated with oral health and disease, it is important to examine factors that can influence the oral microbiome with one such factor being the geographical location. However, one major obstacle in microbiome research is the high sensitivity of microbial compositions to their environment, which can result in batch effect. We investigate the use of ComBat-Seq to adjust for batch effect in microbiome data obtained from 648 participants in the Black Women’s Health Study. After adjustment, we identify differentially abundant microbes residing in the oral microbiome that differ by geographical region.

 

December 8

Ethel Nankya
Advisor: W. Evan Johnson
Title: Nasal microbiome biomarker predictive of lung cancer among high-risk smokers in the indeterminate pulmonary nodule setting

Abstract:
In the United States and worldwide, lung cancer is the leading cause of cancer related mortality with approximately 2 million new lung cancer cases diagnosed in the world every year. Reports indicate a 5-year survival rate for lung cancer being only 18.6%, lower than many other leading cancer sites. The lower survival rates could be partly due to the fact the most lung cancer cases were diagnosed at later stages that were not curable. The advent of better screening methods using low dose CT (LDCT), as demonstrated in the National Lung Screening Trial (NLST), showed that better screening methods can lead to an increase in the number of lung cancer cases being diagnosed early and that there was a significant reduction in mortality (20%) in the LDCT arm compared to the chest Xray arm

Although LDCT improves diagnoses of lung cancer cases at an earlier stage, this has posed a challenge in predicting which detected lung nodules will become malignant. There has been an increase in the number of indeterminate pulmonary nodules (IPN) being detected and it is crucial that these IPNs are evaluated for risk of them being malignant.

The use of molecular biomarkers on a variety of biospecimens seems to be a promising way of improving risk stratification for lung cancer among most at risk individuals while improving decision making regarding appropriate follow-up. The airway epithelium can be used to collect samples that can aid in development of molecular biomarkers in the IPN and lung cancer prediction setting. This is based on two proven concepts. 1) The Etiologic Field of Injury and the 2) the field of cancerization.

The role of microbiome in lung cancer is still unknown, however, it has been postulated in COPD that cigarette smoking disrupts the airway epithelium, causing dysbiosis in the airway, leading to an increased susceptibility to respiratory pathogens. This in turn allows presence of more virulent microbes that cause inflammatory associated events which could lead to increased risk of lung cancer. To date, there is no nasal microbiome biomarker that can be used as a classifier to discriminate between malignant and benign IPNs. It is therefore imperative that a less invasive nasal microbiome biomarker predictive of lung cancer among current / former smokers in the IPN setting is developed.