CISS Affiliates Share Their Favorite Data Sets in Celebration of ‘Love Data Week’
It’s “Love Data Week” (February 9-13, 2026)! Love Data Week is an international celebration of data, taking place every year during the week of Valentine’s Day. Universities and research organizations are encouraged to host and participate in data-related events and activities. Learn more here from ICPSR.
- All members of the BU community are also invited to a community social at the Mugar Library on Wednesday, February 11th from 1-3pm in celebration of Love Data Week! Come make cards, enjoy treats, and learn more about data offerings at the Library.
CISS has compiled information on the ‘favorite data sets’ of BU social scientists and share their responses in the article below.
Taylor Boas (CAS/Political Science).
Favorite Data Set? AmericasBarometer surveys
Why? The Americas Barometer is an incredibly high-quality cross-national survey project that administers a common questionnaire across all Latin American countries, plus the U.S. and Canada, every two years, going back several decades; they have occasionally covered the Anglophone Caribbean as well. I think I have used their data in virtually everything I have written on Latin America. It is my go-to source for data on public opinion and political attitudes in the region.
One of Your Publications That Uses The Data Set? Smith, Amy Erica and Taylor C. Boas. 2024. “Religion, Sexuality Politics, and the Transformation of Latin American Electorates.” British Journal of Political Science 54, 3: 816–835. (Winner of the 2025 Seligson Prize for the best scholarship using LAPOP’s Americas Barometer data, but I would love them even if they didn’t give me an award!).
Favorite Data Set? It’s hard to pick just one! I’ll say Midlife in the United States (MIDUS).
Why? MIDUS is multiwave survey spanning several birth cohorts so it’s a great resource for studying both within-person change and sociohistorical change. It also has incredibly rich psychosocial measures and biomarkers, along with extensive demographic data, so it is well-suited to understand mechanisms accounting for social disparities in health.
One of Your Publications That Uses The Data Set? Carr, Deborah, and Eun Ha Namkung. 2021. “Physical disability at work: how functional limitation affects perceived discrimination and interpersonal relationships in the workplace.” Journal of Health and Social Behavior 62, 4: 545-561.

Randall P. Ellis (CAS/Economics).
Favorite Data Set? Market Scan Commercial Claims and Encounter data, 2006-2024
Why? This amazingly rich panel dataset from 2006 to 2024 has health insurance claims and enrollment data on 20+ million people under age 65 who have private health insurance and is available to BU faculty and staff for FREE if used for unfunded research. Funded research requires paying a licensing fee of up to $25k. Post-docs and graduate students must get prior approval and work under close supervision of faculty. I have used it for over 25 publications. My largest data analysis to date used 7 years of weekly data on over 20 million people = 7*52*20*10^6 = 7 billion observations to explore influenza seasonality and covariates such as school calendar dates and family structure. Sample questions: how is medical spending related to illnesses, age, gender, month of year, siblings, plan type, cost sharing? How much are plans and well-insured consumers paying for specific drugs and procedures through their insurer? How do HMOs, high deductibles and health savings accounts change spending patterns? Is health care spending on only-child children different from first-of-multiple children or second-and-beyond children? Are babies and mothers born following IVF procedures sicker than other babies and mothers? Who is using GLP-1 drugs?
One of Your Publications That Uses The Data Set? Andriola, Corinne, Randall P Ellis, Jeffrey J Siracuse, Alex Hoagland, Tzu-Chun Kuo, Heather E Hsu, Allan Walkey, Karen E Lasser, Arlene S Ash (2024). “A Novel Machine Learning Algorithm for Creating Risk-adjusted Payment Formulas” JAMA Health Forum, Apr 5;5(4):e240625.
Favorite Data Set? National Longitudinal Survey of Youth 1979 (NLSY79).
Why? It covers a very broad range of topics and follows a representative sample (subject to some attrition bias) from ages 14 to 22 in 1979 to when this cohort is nearing retirement.
One of Your Publications That Uses The Data Set? Lang, Kevin. and Manove, M., 2011. “Education and Labor Market Discrimination.” American Economic Review, 100 (June): 1467-96.
Jonathan Mijs (CAS/Sociology).
Favorite Data Set? International Social Survey Programme.
Why? Representative survey repeated about every 5-10 years since the 1980s, fielded across dozens and dozens of countries, which has allowed me to describe patterns and trends in people’s beliefs about inequality (across countries and over time).
One of Your Publications That Uses The Data Set? Mijs, Jonathan J B (2021). “The paradox of inequality: income inequality and belief in meritocracy go hand in hand.” Socio-Economic Review, 19(1): 7–35.
Favorite Data Set? I’m an ethnographer, so most of the data I draw from I’ve collected myself, but I also regularly rely on the CDC’s Youth Risk Behavioral Surveillance System (YRBSS), which is now in peril unfortunately.
Why? I’ve used YRBSS to incorporate nationally representative statistics on bullying into my own ethnographic claims about bullying- it helps me make arguments about the connections between what I observed happening in a small community with larger trends across the country.
One of Your Publications That Uses The Data Set? Miller, Sarah. (forthcoming June 2026). The Tolerance Generation: Growing Up Online in the Anti-Bullying Era. Chicago: University of Chicago Press.
Spencer Piston (CAS/Political Science)
Favorite Data Set? American National Election Studies (ANES) time series survey.
Why? It’s super comprehensive, carefully vetted, and speaks to core questions in the field of American political behavior, such as: how do people decide for whom to vote in elections? Who turns out to vote, who doesn’t, and why?
One of Your Publications That Uses The Data Set? Piston, Spencer. 2010. “How Explicit Racial Prejudice Hurt Obama in the 2008 Election.” Political Behavior 32(4): 431-451.
Ian Sue Wing (CAS/Earth & Environment).
Favorite Data Set? American Housing Survey (AHS).
Why? AHS is a bi-annual household survey provides a window into the living conditions of urban residents across America. I use this dataset to investigate how households’ heat exposures, incomes and living contexts affect their propensity to adopt air conditioning, and what the implications are for inequality in heat health risk and capacity to adapt to climate change.
***
We will continue to compile a listing of faculty users of particular data sets. This resource may help to foster collaborations, and to introduce seasoned data users with novices hoping to learn a particular data set. If you have a data set you’d like to share, or if you’re seeking help in finding or using a data set, please complete this very brief survey. Thank you!
If looking for personalized assistance identifying, accessing, and using data sets from ICPSR you are welcome to contact Lucy, Social Work and Social Science Librarian, at LFlamm@BU.edu.


