ECE PhD Prospectus Defense Chonghua Xue
- Starts: 11:30 am on Wednesday, November 13, 2024
- Ends: 1:00 pm on Wednesday, November 13, 2024
ECE PhD Prospectus Defense Chonghua Xue
Title: Strategies for interpretable machine learning on incomplete real-world medical data
Presenter: Chonghua Xue
Advisor: Professor Vijaya B. Kolachalama
Chair: Professor Prakash Ishwar
Committee: Professor Vijaya B. Kolachalama, Professor Yannis Paschalidis, Professor Prakash Ishwar, Professor Archana Venkataraman.
Google Scholar Profile: https://scholar.google.com/citations?user=f9k4jcMAAAAJ
Abstract: Real-world data presents significant challenges for traditional machine learning models due to its inherent multimodality and frequent instances of missing information, often lacking the structured format needed for conventional algorithms to function effectively. Common methods to address missing features, such as data deletion or imputation, either rely on unrealistic assumptions about the data (e.g., Missing Completely at Random (MCAR), Missing at Random (MAR)) or introduce additional complexity through preprocessing, such as using generative models for imputation. Another widely used approach involves explicitly modeling the missingness by incorporating binary indicators that signal which features are missing, but this can introduce bias and negatively impact model convergence. My PhD research seeks to address these limitations. I developed a self-attention-based model inherently designed to handle missing input features. Additionally, I introduced a permutation-based masking strategy during training to improve the model’s robustness to various missing feature patterns. The framework has been tested on incomplete real-world medical data for predicting cognitive statuses and identifying the underlying etiologies associated with dementia. Results demonstrate that the model achieves clinician-level performance across common data-missing scenarios, with interpretability validated by experienced neurologists. This research offers a more reliable and scalable solution for applying machine learning to diverse real-world datasets, ensuring that models are both robust and interpretable. The proposed framework can be applicable to other domains where data incompleteness is a persistent challenge.
- Location:
- PHO 339