Dealing with uncertain suicidal deaths due to imperfect data integration: a first step towards a data-driven suicide prevention framework (Kun Chen - University of Connecticut)

Abstract: The concept of integrating data from disparate sources to accelerate scientific discovery has generated tremendous excitement in many fields. The potential benefits from data integration, however, may be compromised by the uncertainty due to imperfect record linkage. Motivated by a suicide risk study, we propose an approach for analyzing survival data with uncertain event records arising from data integration. Specifically, deaths identified from the hospital discharge records together with reported suicidal deaths determined by medical examiners may still not include all the death events of patients, and the missing deaths can be recovered from a complete database of death records. Since the hospital discharge data can only be linked to the death record data by matching basic patient characteristics, a patient with a censored death time from the first dataset could be linked to multiple potential event records in the second dataset. We develop an integrative Cox proportional hazards regression (iCox), in which the uncertainty in the matched event times is modeled probabilistically. The estimation procedure combines the ideas of profile likelihood and the expectation conditional maximization algorithm (ECM). Simulation studies demonstrate that under realistic settings of imperfect data linkage, iCox outperforms several competing approaches including multiple imputation. A marginal screening analysis using iCox is performed to identify risk factors associated with death following suicide-related hospitalization in Connecticut. The identified diagnostics codes provide several new insights on suicide risk prediction and prevention. This study is only a first step towards a data-driven suicide prevention. We will discuss other aspects of our proposal, include data unification, data fusion, and joint feature construction, selection and predictive modeling.

When 4:00 pm to 5:00 pm on Thursday, November 2, 2017
Location MCS, Room 148, 111 Cummington Mall