[Louis Fiore] We Have a Big Data Problem
3:00 PM Wednesday, February 12th, 2014
Executive Director, Massachusetts Veterans Epidemiology Research and Information Center and Million Veteran Program
Abstract: The Million Veteran Program (MVP) is an effort supported by the Department of Veterans Affairs (VA) to enroll one million Veterans who are active users of the VA healthcare system into a genetic epidemiology cohort. Participants give informed consent and HIPAA authorization for unrestricted use of their electronic medical record (EMR) data and completed case report form (CRF) data for IRB approved research purposes. Additionally, they agree to future re-contact for the purpose of additional data collection and donate a sample of blood for storage and testing. To date approximately 250,000 subjects have been enrolled in MVP from 50 VA Medical Centers participating in the enrollment phase of the study. Recruitment and enrollment of participants into MVP is coordinated by the Massachusetts Veterans Epidemiology and Information Center (MAVERIC) located at the VA Boston Healthcare system.
Informatics support for the MVP is provided by the Genomic Information System for Integrated Science (GenISIS). GenISIS consists of two functionally distinct entities, a Patient Recruitment and Enrollment (PRE) suite of applications and a Scientific Computing Environment (SCE). The PRE was developed by a VA-based informatics core supported by outside contractors and is currently in production to support subject accrual. The SCE is under development and is the focus of the remainder of this white paper.
The functionalities of the SCE include: loading and warehousing of EMR, case report form and genomic data from VA and laboratory sources, presentation of these data to researchers in a query environment to allow ‘preparatory to research’ activities, creation of study-specific data marts and provision of analytical pipelines for use by investigators. The intention is to create a platform that allows researchers to approach the MVP data in a governed environment thus eliminating the need to move data to researchers.
Progress to date includes creation of governance and access applications, movement of CRF data into a ‘data warehouse’, preliminary efforts to extract patient EMR data from the VA Corporate Data Warehouse and installation of an i2b2 query environment. Little progress has been made regarding implementation pipelines or tools to support analysis.
Genomic data from MVP subjects is already arriving. By the end of the 2014 fiscal year (FY) GenISIS will contain genotyping data (Affy Biobank Chip) on approximately 240,000 subjects, whole exome sequencing on 24,000 subjects (from Claritas and Personalis) and whole genome sequencing on 2,400 subjects (Claritas and Personalis). The rate and types of additional genomic data for FY 2015 and beyond is yet to be established but will presumably increase steadily for several years.
The GenISIS team is seeking advice on how to approach two requirements. The first is to create the SCE so that data arriving in FY 2014 can be used by researchers in mid FY 2015. We are focusing on making phenotype and genotype data available for the 240,000 subjects tested with the Affy biobank chip as well as for the 24,000 with whole exome sequence data. The second requirement is to define a long-term solution for storage and compute capability for the full spectrum of phenotype and genotype data that will be made available over the lifetime of the project.
Bio: Louis Fiore is the Executive Director of the Massachusetts Veterans Epidemiology Research and Information Center and of the Million Veteran Program (MVP). He is very interested in utilizing the genetic and electronic medical records system information from the MVP to provide a large independent confirmatory population to validate specific host cell genetic loci associated with herpes zoster, herpes zoster-associated pain, and postherpetic neuralgia identified in your analysis of data from the Shingles Prevention Study. The MVP should be ideally suited for this purpose, and Lou happy to collaborate on this important endeavor.