The Big Question
Can data help us solve the world’s biggest problems?
Can data help us solve the world’s biggest problems?
Algorithms choose our music and our movies. They give us driving directions and shopping recommendations. They help researchers understand public health, the environment, and the galaxy. They also scan and identify faces, spread propaganda, and enable malicious hacking. Few areas of our lives are untouched by the power of algorithms, and the big data behind them, for better or worse. And with the constant tracking of our movements and actions, the accuracy of that data continues to improve.
Access to all of this information yields powers unimaginable to previous generations—powers that can be a panacea or a weapon. The US Bureau of Labor Statistics estimates that employment in the data science field will grow by 36 percent by 2031—and will be nearly matched by the 35 percent growth in information security analysis jobs. And look no farther than Comm Ave to see how the increased stature of data science is reshaping academia: the Center for Computing & Data Sciences, which will house the departments of mathematics and statistics and computer science as well as the new Faculty of Computing & Data Sciences, is BU’s largest building.
With the demand for data science surging, arts&sciences asked three faculty members: Can we trust big data to help solve the world’s problems—amid concerns about privacy, bias, and ethics—rather than make them worse?
Alisa Bokulich, a professor of philosophy, has devoted her career to considering the history and philosophy of science, and more recently the philosophy of data. She teaches courses on how science and technology intersect with race, gender, and values. Bokulich has directed the Center for Philosophy & History of Science since 2010.
Jonathan Huggins, an assistant professor of mathematics and statistics, studies the development of machine learning and artificial intelligence methods to address real-world problems in a way that’s efficient and trustworthy. He is a founding member of the BU Faculty of Computing & Data Sciences.
Adam Smith, a professor of computer science and engineering, studies privacy in data analysis and recently shared the ACM Paris Kanellakis Award for his contribution to the development of differential privacy, which helps to protect individual information while allowing the study of statistical databases. He is a founding member of the BU Faculty of Computing & Data Sciences.
The widespread collection of data presents an enormous opportunity to make better-informed decisions and allocate resources where they are needed but also a challenge to make sure the world we are creating is one we want to live in.
Fundamentally, data collection is the act of watching others. That perspective highlights the pitfalls of a data-rich society: Are we seeing everyone? Are we observing what we really care about? Who chooses what we observe and uses the results? Do we understand how we are being watched? Does being watched change the way we act?
I think we can get the benefits of data-driven decisions and technologies and minimize the costs implicit in these questions. But doing so will entail changing the way we regulate technology and slowing down the disruption-based way we adopt new ones.
Data-driven tech is largely developed and deployed by companies (though governments—notably in China and Russia, but also in the US—do their share). Collection is only limited out of concern for public optics or by the efforts of conscientious employees. Either way, those limitations are implemented despite the business incentives, which encourage more collection and sharing to help with targeted advertising and increasing user “engagement.” We are thus at the mercy of the companies’ goodwill.
Harnessing data and creating the infrastructure to manage it well is a grand challenge for our society.
That points to a need for well-resourced public agencies that can monitor and limit what companies and other agencies collect and share as well as new structures—akin to cooperatives or unions—that give people collective control over their data’s use. Such structures rely on broad public understanding of the technology being deployed. Despite major progress in the sophistication of mainstream tech journalism, we are still way behind.
Harnessing data and creating the infrastructure to manage it well is a grand challenge for our society, on a similar scale to controlling climate change or depolarizing public discourse. With the right changes, we can tackle it.
Three myths have long plagued discussions of big data and must be dispelled if we are to move forward.
The first myth is that data are an unmediated window onto the world, purely objective, and theory-free. However, data are better understood as the record of a process of inquiry. All data are shaped by the aims and assumptions of the people, instruments, and algorithms that create them, bearing the fingerprints of the contexts in which they are collected, processed, and stored. The more we know about these contexts, aims, and assumptions, the more reliable our use of the data will be.
The second myth is that data are value-free. Data reflect the biases and values of the social, political, and cultural context in which they are produced. We have seen many cases where data used for machine learning and AI have reinforced racist and sexist biases in our culture. Neither the “bigness” nor automation of this data absolves us of our social and moral responsibility to do the hard work needed to identify and guard against social harms.
We cannot ‘trust’ big data to solve the world’s problems for us; we need to roll up our sleeves and commit to doing the hard work ourselves.
The third myth is that big data are autonomous and that data science can stand alone. The responsible use of big data is going to require investing in a wider range of expertise. There is an interdisciplinary field that combines the humanities with the detailed knowledge required to engage data-science issues on a technical level: the history and philosophy of science and technology (HPST). HPST scholars can trace the histories of data sets, instruments, and algorithms, identifying theoretical assumptions, and revealing systematic biases. HPST scholars can help uncover the unanticipated harms of big data practices and develop data ethics solutions. This critical work will require investing in the relevant interdisciplinary expertise and a sustained collaboration between HPST scholars and data scientists. Any data science initiative that fails to do so is irresponsible.
Successfully harnessing the potential of big data will require dispelling these three myths, and a scaling up of critical thinking, ethical responsibility, and interdisciplinary collaboration that is proportional to the size of the data. We cannot “trust” big data to solve the world’s problems for us; we need to roll up our sleeves and commit to doing the hard work ourselves.
There’s no guarantee that big data is a force for good. But I remain optimistic that, on the whole, it will be. The data science community—including those working on AI, machine learning—are acutely aware of the challenging technical and ethical problems that big data, and the accompanying huge and complex systems that ingest this data, create. That is why so much research effort right now is going toward so-called “trustworthy AI.” But let me highlight two challenges that will require not just technical solutions but some combination of legal, social, and structural changes.
The first is that “trustworthiness” is an overloaded term: what it means for a big-data system to be treated as trustworthy depends on what that system does and who needs to trust that system. For example, consider two AI systems: one is for image-based skin cancer diagnosis and the other removes misinformation from a social media platform. Physicians and patients must trust the diagnostic system while the platform’s owners, consumers, and content creators must trust the misinformation system. But each of these groups will have different views about what makes the system trustworthy. Some will care more about privacy or fairness, others will care more about robustness (can the system be manipulated?) or accuracy.
There’s no guarantee that big data is a force for good. But I remain optimistic that, on the whole, it will be.
The second challenge is that the truly big-data AI systems can only be built by a small number of companies and organizations (Google, Meta, Apple, etc.) that have the necessary data and computing infrastructure, which can cost tens of millions of dollars. We still need to figure out how to audit and democratize these systems in a way that balances legitimate organizational interests against society’s interest in these systems being fair and individuals’ interest in having their data kept private.