New Data from Old Sources: A Machine Learning Approach to Census Record Linking

Wed@Hariri: Meet Our Fellows Series

Speaker:
James Feigenbaum, Assistant Professor, Economics, College of Arts & Sciences

When:
Wednesday, January 21, 2019
3:15pm – 5:00pm (Networking and refreshments 3:15pm – 3:40pm in Room 416, talk 3:45pm – 4:30pm in Room 315, reception to follow) 

Where:
Economics Building, 270 Bay State Road, Boston, MA

 


Abstract: The recent digitization of complete count census data is an extraordinary opportunity for social scientists to create large longitudinal datasets by linking individuals from one census to another or from other sources to the census. However, linking with simple algorithms is challenging when data is enumerated and transcribed with error and names are common and changing over time and hand linking, though accurate, is expensive, slow, and not replicable. I will present a machine learning approach that trains on the actual matches made by a skilled researcher or genealogist to make implicit linking rules explicit. Also, I will present preliminary results from two new projects exploiting linked data to demonstrate the possibilities of the complete count of historical censuses. First, I will use changes in name patterns among the African American population from 1860 to 1870 to predict antebellum enslavement status and trace forward the effects of enslavement intra- and intergenerationally. Second, I will use genealogically linked data to follow women across censuses and through name-changes at marriage to study the effects of automation and the technological destruction of a common occupation—local telephone operators.

James Feigenbaum

Bio: James Feigenbaum is an Assistant Professor of Economics at Boston University. His primary research interests are at the intersection of economic history and labor economics. Making use of recently digitized and transcribed historical census sources, James has studied intergenerational mobility, inequality, the returns to education, and the long-run effects of early twentieth-century public policy and environmental shocks. To construct these new longitudinal data sources, following individuals over many decades in the early 1900s, James developed new machine learning approaches to historical record linkage with a focus on the specific challenges of working with imperfect and noisy historical data. He received his Ph.D. in Economics from Harvard University in 2016 and joined BU after a one-year postdoc at Princeton University.