A generation ago, the internet changed everything. Today, data science is proving just as revolutionary. Fueled by the abundance of personal information on the internet—yours, ours, everyone’s—data science is making business smarter, healthcare more efficient, technology easier, and sports more fun to watch (and play). But it’s also made all of us more vulnerable. This article, the final part of a five-story series, comes as Boston University is investing aggressively in the world of big data and is poised to build a 17-story Center for Computing & Data Sciences on Commonwealth Avenue that will house its mathematics and statistics and computer science departments. As BU President Robert A. Brown said: “This is the science that’s going to change the way we behave, driving our behavior for the next 50 or 100 years.”
Like most stories about data science, this one begins with a very big number: 150 million. That’s roughly how many people, worldwide, subscribe to Netflix. And because Netflix knows what all of those people watch, when they watch, and on what sort of device they watch, that number provides Netflix with the data it needs to determine the narratives, characters, and lengths of the programming to invest in. So when you binge on a season of Stranger Things over a school vacation week, then enjoy a weeknight with the comedy special of Ellen DeGeneres, and finally settle in on a chilly Saturday night to watch the recent Best Foreign Language Film Academy Award winner Roma, you aren’t just consuming art. You’re shaping it.
Gone are the days when a reporter’s key job responsibilities consisted solely of compelling storytelling and factual accuracy. And for public relations and marketing specialists, it’s no longer just enough to get a powerful message out or to produce a riveting advertising campaign.
Now, everything is about the metrics, the data, the numbers—the number of retweets, shares, or likes; the number of people who watched a video and the time they spent watching it; the detailed analytics behind how many people read a story, and from what demographic groups. And once that information is collected and studied, it’s all about learning from it, targeting content at specific audiences, and understanding what worked and didn’t work.
In the world of media, jobs that used to be about words and images are suddenly all about numbers. Is that a good thing?
Of the industries that have been transformed by technology, media may have had the most turbulent ride, and print media the most painful. First, the internet cut the legs out from under newspapers, stealing lucrative classified ads and moving them to online sites like Craigslist. Retail ads left next, swiped by Google and other ad servers, who directed them not to publications, but to readers who fit a profile described by an algorithm.
Chris Wells, a Boston University College of Communication assistant professor of journalism, who has studied the influence of social media, says the more recent growth of online platforms like Facebook has “platformitized” the media. That platformization, he says, “has severely threatened news outlets’ already weak online advertising.”
These days, algorithms have been supporting traditional media, as online newspapers use them to predict which stories might appeal to which readers. “The trackability inherent in digital and social media has transformed the use of audience metrics in newsrooms,” says Wells. “Editors and journalists are much more aware of exactly how well every story is doing, and they are making future editorial decisions based on that knowledge.”
The same strategy is largely responsible for the success of online video platforms like Netflix and Amazon, which go to great lengths to hone their content for viewer preferences. “Big data is playing a much larger role in the development of television series,” says Cathy Perron (COM’99), a COM associate professor of film and television and director of the Master’s in Media Ventures program. “Platforms like Netflix, Amazon, and Hulu know so much more about viewer preferences and habits. Their data can tell them what kind of content resonates with their subscribers, how many episodes a viewer tends to watch, when the television is paused during the airing of a program, which characters viewers prefer. This makes the development process less of a financial risk for the platform, and it offers recommended content to viewers looking to sample a new series.”
Perron says online platforms’ ability to exploit those metrics has left traditional broadcast media painfully disadvantaged. However, she says, some cable boxes are starting to collect more data, and on-demand programming is starting to track which programs are viewed.
On the editorial side of newspapers, the big change precipitated by technology is a greatly diminished staffing, a direct result of greatly decreased revenues. But Wells points out that while social media may have taken ad dollars and readers from traditional news organizations, it hasn’t usurped their most important function: publishing important stories that change our lives.
“It’s not social media that is breaking new stories,” he says. “That’s being done in large part by traditional media, by outlets like the New York Times and the Washington Post, and also by new entrants like BuzzFeed, the Daily Beast, Breitbart, and others. Cases like #MeToo illustrate the hybrid nature of this. A lot of the critical elements of the #MeToo movement came in the form of prominent stories in the New York Times and the New Yorker.”
For newsrooms, the most dramatic influence of algorithms may be yet to come. At some media companies, including the Associated Press, Forbes, and the Los Angeles Times, algorithms are literally writing the news. Computers are using natural language generation to build automated content from data that is readily available, such as financial results. The technology, which has proven suitable for small and straightforward stories, offers some potential advantages over human reporters, at least for the business side of things: algorithms are tireless, inexpensive, and they can generate content on demand, as long as the facts are available. And for visual representations of data, algorithms—when used well by human reporters and editors—are telling compelling, colorful, and interactive stories about everything from the wealth of nations to the NBA playoffs.
Maggie Mulvihill, a COM associate professor of the practice of computational journalism, sees those stories as the inevitable offspring of big data. After all, she says, data has always been the quantifying force in journalism and the newly available renderings of computer programs have pushed it to the fore.
The availability of data, lots of data, she says, has made it easier for reporters to get their hands on that quantifying force. “We used to go to city hall and ask them about suspended liquor licenses, and if they weren’t mad at you, you could get them,” says Mulvihill, who is also a Faculty Fellow at the Rafik B. Hariri Institute for Computing and Computational Science & Engineering and cofounder of the New England Center for Investigative Reporting. “These days, it’s all digitized, and in many cases you can download it.”
While Mulvihill’s courses in data journalism do not delve into algorithmic analyses, they do teach data journalism skills, including how to scrape data from the web, analyze it, and present data in narrative and visual forms. Today, she says, these are necessary skills for serious journalists: for those reporters who know how to analyze it, big data yields big stories.
The most significant effect of big data on media is not the nature of content; it’s the change in the direction of the flow of content. While online readers still seek out stories with greater speed and specificity than ever, stories also chase readers, sent by algorithms and guided by the reader’s demonstrated preferences. The good news is that readers get what they want. The bad news is that readers may get only what they want, and become less exposed to alternative viewpoints, having existing convictions reinforced rather than challenged. Worse, says Lei Guo, a COM assistant professor of emerging media studies, social media pump a filter bubble full of fake news.
That matters, says Guo, because fake news has been shown to influence the content on reputable news sites, and particularly on partisan news sites. One of Guo’s recent studies, conducted with Chris Vargo, a University of Colorado assistant professor of mass media, used computational methods to analyze the influence of fake news on fact-based media covering Donald Trump and Hillary Clinton during the 2016 presidential election.
“We looked at what fake news sites were talking about, and we looked at what legitimate news sites were talking about,” says Guo. “We found that fake news will shift the focus of legitimate news websites. They don’t report false information, but they will pay attention to what is being said.”
Guo is currently working on a project aimed at helping people escape the mental confines of the filter bubble. “We want to suggest a way to recommend articles that you might not necessarily like,” she says. “We think it’s important for people to be intellectually challenged.”
Big data isn’t just changing media; it’s changing the study of media, by providing analytical tools to researchers like Guo and her colleague Jacob Groshek, a COM associate professor of emerging media studies. “One reason big data is so important to social scientists is because it can show how media influence each other, and also how media influence people,” she says.
Groshek’s research has revealed that contrary to popular opinion, people who used social media were not more likely to support Donald Trump or Bernie Sanders during the run-up to the 2016 presidential election. He also found that people who got more of their news from television were more likely to support Trump.
Like Guo, Groshek is hoping to use algorithms to fix some of the problems these algorithms have created. He is developing algorithms that can identify the source of misinformation on social media. “If we can identify the source,” he says, “we can possibly intervene and slow the dissemination of misinformation.”
March 7, 2019
April 2, 2019
May 1, 2019