Building a National HIV Cohort from Routine Laboratory Data: Probabilistic Record-Linkage with Graphs

Photo by Irwan iwe via Unsplash.

The management of chronic diseases such as HIV requires healthcare providers to link patient records across multiple interactions with the health system. South Africa’s National Health Laboratory Service (NHLS) conducts all routine laboratory monitoring for the country’s national public sector HIV program but does not use identifiers to track individual patients as they seek care. As a result, the lack of patient identifiers has limited the potential uses of the NHLS database for epidemiological research, policy evaluation and holistic patient care.

In a new working paper published in bioRxivJacob Bor and colleagues describe the algorithm they developed to give HIV patients in South Africa unique identifiers, therefore enabling analysis of the NHLS database as a national HIV cohort. They linked data on all CD4 glycoprotein counts, HIV viral loads and antiretroviral therapy workup laboratory tests from 2004-2016 to the patient’s name, sex, date of birth, gender and treatment facility. 

Main findings:

  • The authors identify 11.6 million unique patients with 97.7 million laboratory tests from 61 million different sets of identifying information.
  • Using data from the algorithm, the authors estimate that in 2016 there were 3.35 million patients receiving antiretroviral therapy and being virologically monitored, similar to the National Department of Health estimate of 3.50 million.

The linked NHLS database represents the first nationwide HIV cohort in any low- or middle-income country. In early work, the cohort has been used to quantify the national HIV care cascade in South Africa, to assess geographic heterogeneity in viral suppression, to assess rates of transfer across facilities, to quantify trends in clinical presentation and to assess the shifting burden of adolescents on HIV treatment. In future work, the authors plan to assess the feasibility of assigning unique identifiers in real-time and utilizing the algorithm to improve patient care.

Read the Working Paper