DMS/NIGMS 1: Multilevel stochastic orthogonal subspace transformations for robust machine learning with applications to biomedical data and Alzheimer’s disease subtyping

Sponsor: National Science Foundation (NSF)

Co-Is/Co-PIs: Mark Kon, Xiaoling Zhang

Abstract:

Late-onset Alzheimer’s Disease (AD) is the most common form of dementia, with an estimated 6.5 million Americans aged 65 and older living with AD today – this number will double by 2050. AD occurs in more than 35% of individuals over the age of 85 and is the fifth leading cause of death among Americans over the age of 65, with a resulting societal cost of more than $340 billion per year. Over the past years, numerous studies have highlighted that there are likely different forms of AD and AD-related dementia in the context of genetics, clinical symptoms, and biochemical pathways. Different processes and molecular pathways can lead to many clinical and physiological subtypes of AD. This may help to explain numerous (failed) clinical trials which have usually targeted the well-known amyloid pathways and genes. To develop newer, more effective, and safer treatments, multiple new clinical targets for AD treatment are needed to increase probabilities of success. The need to identify such multiple processes/pathways underlying specific AD subtypes is crucial. Discovering these will allow the development of targeted diagnosis and treatment that is adapted and personalized to particular AD forms. The investigators in this multifaceted project will leverage the availability of genetic, protein and brain imaging data obtained from diverse populations to develop a mathematical foundation and protocol for identifying AD subtypes and potential drug targets tailored to these subtypes. The findings will be valuable to the medical community and will contribute to further understanding of the many different forms of AD and to advancing precision medicine approaches. The investigators are also committed to training, developing and nurturing students’ expertise in these areas, providing them with valuable learning opportunities.

The increasing utilization and analysis of extensive datasets, particularly in medical and biological domains, underscores the need for advanced and precise data analysis methods. In these contexts, Machine Learning (ML)-based statistical inference is rapidly becoming a cornerstone of computational value addition. However, while much attention has been devoted to refining ML algorithms, the significance of feature engineering has been somewhat overlooked. Consequently, there is a growing interest in developing a novel mathematical framework for feature construction. The key insight is to treat data as realizations of a random field in a suitable Bochner function space. By constructing a new coordinate system, the investigators can unveil well-defined patterns that can significantly enhance the accuracy of existing ML algorithms. The objectives of this project include: (I) Developing a mathematical theory and protocol for constructing innovative features to better discriminate underlying stochastic behaviors of input data, employing multilevel spaces and the Karhunen-Loeve (KL) expansion for Bochner spaces. (II) Analyzing and optimizing the parameters of such multilevel feature constructions to markedly enhance the performance of ML algorithms, especially when dealing with complex and challenging inputs. (III) Identifying ML-based subtypes of Alzheimer’s Disease (AD) from available extensive AD datasets such as genome-wide genetic, genomic, proteomic, brain imaging data and population-scale electronic health record data. With an estimated 6.5 million Americans 65 and older living with AD, the impact of work in this area can potentially be very significant. Particularly, accurate subtyping of cases can greatly accelerate successful development of new targeted AD drugs.

For more information click here.