GRAP Opportunity – Building a national HIV cohort from South African laboratory data (Bor)

Opportunity ID: 02-Bor

Project Title: Building a national HIV cohort from South African laboratory data (Coding in R)

Faculty Mentor: Jacob Bor

Description Statement:

No national HIV cohort exists in any low- or middle-income country. South Africa’s National Health Laboratory Service (NHLS) conducts all CD4 count and viral load (VL) monitoring for the national HIV program, with over 40 million records dating back to program inception in 2004. Results are interfaced in real-time, avoiding error potential of chart review. We have developed and validated a record linkage algorithm, creating a unique patient identifier and enabling analysis of the NHLS database as a national HIV cohort. The algorithm uses standard probabilistic linkage methods with Jaro-Winkler string comparisons. In an innovation, we allow matching thresholds to vary with the density of the network to avoid lengthy chains. The cohort covers the full history of South Africa’s HIV treatment program and will enable: (1) retrospective epidemiologic and policy evaluations; (2) real time monitoring of the HIV care and treatment program; (3) and ultimately, integration into electronic medical records to improve individual patient care.

Scope of Work:

We are currently working to transform our algorithm into an R package that the South African government can use iteratively, as new lab results come in. Sections of our code are currently written in Stata. I am looking for a student with familiarity with Stata and expertise coding in R to assist in “translating” the code from Stata to R. Additional coding tasks, opportunities to contribute to innovations on our existing algorithm, and additional research and publication opportunities may arise from initial work.

Minimum skills desired:

Strong computing skills in R. Ability to work with string functions in R. Experience / familiarity with Stata (you must be able to read Stata code, but not necessarily to write Stata code). Familiarity with a Unix environment strongly preferred.

Time / Date Expectations:

10-15 hours per week, beginning Jan 2016. This is an open-ended assignment. Success with initial tasks will lead to opportunities for further involvement with the project.

Additional Material Requested: Sample R script

Number of Positions: One (1)

Logistics & Support: Biweekly meetings