Bostonia is published in print three times a year and updated weekly on the web.
A generation ago, the internet changed everything. Today, data science is proving just as revolutionary. Fueled by the abundance of personal information on the internet—yours, ours, everyone’s—data science is making business smarter, healthcare more efficient, technology easier, and sports more fun to watch (and play). But it’s also made all of us more vulnerable. This article, the second in a five-story series, comes as Boston University is investing aggressively into the world of big data, and is poised to build a 17-story Data Sciences Center on Commonwealth Avenue that will house its mathematics and statistics and computer science departments. As BU President Robert A. Brown said: “This is the science that’s going to change the way we behave, driving our behavior for the next 50 or 100 years.”
In countries around the world, political events—from local town council votes all the way up to presidential elections—are being influenced, analyzed, and charted with help from data science, and specifically, machine learning. To understand just how quickly, and dramatically, data can upend the universe, look no further than the 2016 US presidential election. The data science firm Cambridge Analytica, hired by the Trump campaign, got its hands on data from 50 million Facebook users without their permission—including where they live, what types of advertisements would most likely appeal to them, and other personal preferences—then used big data and machine learning to micro-target voters who were deemed persuadable.
During that time, the New York Times reports, pro-Trump bots—autonomous software applications—that automatically sent targeted messages through social media generated one-quarter of all Twitter traffic about the election, and in days leading up to the election they outnumbered Clinton bots five to one.
Across the pond, Cambridge Analytica worked its magic on the Brexit campaign, with advice from Steve Bannon, who also worked for the Trump campaign. And in 2018, the company aided the reelection of Kenyan President Uhuru Kenyatta. Steven Rosenzweig, a College of Arts & Sciences assistant professor of political science, whose research has focused on African politics, says the company’s work in Kenya’s last two elections, both on behalf of President Kenyatta’s campaign, worried many people.
“Cambridge Analytica’s involvement—allegedly involving party branding, writing campaign speeches, and running a social media campaign—was a source of great controversy,” says Rosenzweig. “This was particularly true among the influential group of public intellectuals and activists known as Kenyans on Twitter or KOT. Particularly problematic were potential violations of privacy and the spread of inflammatory messages in a volatile political context with a history of violence.”
How is Cambridge Analytica doing today after influencing so many elections worldwide? After filing for bankruptcy, it shut down in 2018, amid so many political controversies and scandals.
The impact the company had may be felt for decades. However, in a larger political context, Rosenzweig thinks big data sometimes gets more attention than it deserves, at least for the moment. “So far,” he says, “I think the evidence that big data is having a substantial influence on politics is fairly limited and its impact sometimes overstated. But psychological motivations are key to people’s political decision-making, and data-driven strategies that are able to tap into those are quite likely to have a real impact.”
Data analytics played a lesser-known role in President Barack Obama’s 2008 campaign, which assigned potential voters scores based on the likelihood that they would vote, and then, if they would vote for Obama, guided by surveys taken in battleground states. And the next presidential campaign, in 2020, is already giving data science a leadership role. In February 2019, President Trump named Brad Parscale, his former digital advisor, manager for his reelection bid.
Data science is also used as a purely observational tool, one that can reveal the workings, or failings, of some long-standing political processes. In 2018, three BU political scientists used big data to study local political participation in housing and development policy. Katherine Levine Einstein, Maxwell Palmer, and David Glick compiled a data set by coding thousands of instances of people who chose to speak about housing development at planning and zoning board meetings in 97 cities and towns in eastern Massachusetts, then matched the participants with voter and property tax data. The researchers found that speakers tended to be male, older, whiter, and more likely to be homeowners than most residents of their towns, and they overwhelmingly opposed new housing developments. In fact, two-thirds of speakers opposed housing, and only 14 percent were in favor of building.
To learn more about how closely these meeting speakers represented the views of community residents, the researchers juxtaposed the opinions of meeting attendees with the vote on a statewide housing ballot referendum. Again, the views of speakers were not aligned with those of their broader communities.
That matters, the researchers say, because the dynamic contributes to the failure of towns to produce a sufficient housing supply. If local politicians hear predominantly from people opposed to a certain issue, it’s logical that they may be persuaded to vote against it, based on what they think their community wants. “Our study shows how political inequalities contribute to rising housing prices,” says Einstein, a CAS assistant professor of political science. “An unrepresentative group of white homeowners are able to take advantage of land use institutions to stop and delay the construction of new housing. Their actions help to block newcomers from accessing desirable communities.”
It also matters in theoretical terms, because the research shows that some supposedly democratic institutions that we have depended on for hundreds of years are, in fact, fundamentally undemocratic. “More broadly,” the researchers write, the study “reveals that institutions designed to enhance democratic responsiveness may have perverse consequences on participation, the views that policymakers hear, and/or outcomes.” Their study, “Who Participates in Local Government? Evidence from Meeting Minutes,” was published October 2018 in Perspectives on Politics.
Elsewhere at BU, Mark Crovella, a CAS professor of computer science, and Dino Christenson, a CAS associate professor of political science, working with researchers at other schools, used big data to predict which 2016 presidential candidate the public preferred, as well as how those preferences changed throughout the campaign and what the influencing events might have been. Crovella and Christenson analyzed the web-browsing histories of more than 100,000 Americans over the two months immediately prior to the election to pinpoint likely voter choice. Using data that was provided by Comscore, a kind of Nielsen rating of the internet, the researchers analyzed two terabytes of data, which included 70 million websites. They then correlated browsing patterns with public opinion polls.
Crovella says their methodology requires two things: web-browsing records, and an initial poll to calibrate their machine-learning component, so the machine knows what it’s looking for.
That, says Crovella, was the hard part, because while some websites are obviously biased, many are more nuanced. Also, he says, a visit to a particular site may not indicate the visitor’s political leanings. The researchers had to work backward, starting with traditional opinion polls to describe a particular leaning. “Let’s say you have a poll that shows that on a particular day 60 percent of people in a particular state were leaning Democratic,” Crovella says. “You use that to train an algorithm to look at everyone in the data set. You can get an idea of what a Democratic voter looks like in terms of website visits and you carry that forward, looking at subsequent visits and asking how the data is changing.”
The researchers say their new data-driven methodology is faster, and much less expensive than traditional polling, and it can zero in on small areas, like towns, and on specific political events that might influence opinions. The research, “Assessing Candidate Preference through Web Browsing,” is published in Proceedings of ACM KDD 2018, London, UK.
Crovella and Christenson’s original work turned up some interesting findings. Their study suggests, for example, that a last-minute dip in support for Hillary Clinton was not precipitated by a letter to Congress that reported that the FBI had found another batch of emails on Clinton’s email server. Instead, the research indicates that support for Clinton had already begun to decline three days before that event.
“This flies in the face of conventional wisdom,” says Christenson. “One of the things that makes social science so difficult is measurement. While polling can be pretty good at this, many polls have a hard time picking up fine-grained movements in particular locales and at particular times. With our approach, we were able to detect the shift in public opinion in close to real time.”
The two researchers, who are developing a method to accomplish the same goals with encrypted data that would improve the privacy of browsers, hope to build a web function that will make their technology available to social scientists and public opinion researchers.
“Ultimately,” says Crovella, “we’d like to provide a new kind of high-resolution microscope for use by the community, and we’d like to be able to open our system to researchers studying opinion dynamics on a wide range of topics.”
“I see this project as having the potential to provide a reliable and valid measure of public opinion that is not limited by time, money, or location, and therefore can provide unique insights into a number of substantive questions across a host of fields,” Christenson says. “The potential applications are virtually endless.”