{"id":12115,"date":"2018-11-07T10:52:49","date_gmt":"2018-11-07T15:52:49","guid":{"rendered":"https:\/\/www.bu.edu\/cs\/?p=12115"},"modified":"2020-03-18T14:23:06","modified_gmt":"2020-03-18T18:23:06","slug":"crovella-election-data","status":"publish","type":"post","link":"https:\/\/www.bu.edu\/cs\/2018\/11\/07\/crovella-election-data\/","title":{"rendered":"CAS Professors Use Web Browsing Data to Predict Election Results"},"content":{"rendered":"<p><span class=\"dropcap\">From <a href=\"http:\/\/www.bu.edu\/today\/2018\/better-election-predictions\/\">BU Today<\/a> \u2014\u00a0A<\/span>s the predictions for the 2016 presidential election remind us, polling the electorate is an imperfect science. Most polls claimed that Hillary Clinton would be our next president\u2014it seemed a foregone conclusion\u2014and most polls were wrong, although many forecasts for the popular vote were very close\u2014off by less than one percentage point. Election polling has always been inexact. It has also been time-consuming, expensive, and lacking the ability to measure the influence of short-lived events, like a candidate\u2019s speech, or to read the electorate of small geographic areas.<\/p>\n<p>Now, two Boston University professors believe they have found an alternative, one that is not only similarly accurate, but has the potential to be faster and less expensive, can target areas as small as towns, and can measure the people\u2019s response to specific issues and events. The methodology, which correlates web browsing patterns with public opinion from polls, was developed by two College of Arts &amp; Sciences faculty: <a href=\"https:\/\/www.bu.edu\/cs\/mark-crovella\/\">Mark Crovella,<\/a>\u00a0a professor of computer science, and <a href=\"http:\/\/www.bu.edu\/polisci\/people\/faculty\/christenson\/\">Dino Christenson,<\/a> an associate professor of political science.<\/p>\n<p>They worked with Giovanni Comarela from the University of Vicosa (formerly a BU PhD student under Crovella), Ramakrishnan Durairajan at the University of Oregon, and Paul Barford at the University of Wisconsin, Madison. Barford, who also works for <a href=\"https:\/\/www.comscore.com\/\">ComScore, Inc.,<\/a> a kind of Nielsen ratings of the internet, negotiated an arrangement with comScore, which provided the researchers with the web browsing histories of more than 100,000 US residents over the 56-day period preceding the 2016 election.<\/p>\n<p>All the data the researchers used was specifically authorized and released for this kind of research by the users who generated the data. The researchers\u2019 analysis of that data\u2014two terabytes worth containing 70 million websites\u2014showed exactly when and where voters made decisions that led to the election of Donald Trump.<\/p>\n<div id=\"attachment28867\" class=\"wp-caption alignleft\">\n<p><img src=\"http:\/\/www.bu.edu\/research\/files\/2018\/10\/dino-christensen-18-1858-DINO-032.jpg\" alt=\"Political Scientist at Boston University, Dino Christensen, leans against a whiteboard in his office.\" width=\"400\" class=\"size-full wp-image-28867\" \/><\/p>\n<p class=\"wp-caption-text\">Dino Christenson plans to build a web API that could be used by academic researchers to gauge public opinion. Photo by Cydney Scott<\/p>\n<\/div>\n<p>It also suggested that contrary to popular and expert opinion, a last-minute dip in support for Hillary Clinton was not precipitated by a letter to Congress from FBI director James Comey that revealed that the FBI had found a new batch of relevant emails on Hillary Clinton\u2019s server. Crovella and Christenson\u2019s analysis clearly indicated that support for Clinton began to decline on October 25, 2016, three days before the letter was sent. That doesn\u2019t mean, says Christenson, that the letter had no impact on support for the Democratic candidate. \u201cThe previous slippage could have just been a coincidence,\u201d he says. \u201cIt may have been a small dip that would have rebounded had it not been for the letter\u2026but the findings certainly cast doubt on the Comey letter as the first mover.\u201d<\/p>\n<p>For Crovella and Christenson, the importance of that finding is its proof that their methodology can measure the influence of single, brief events, such as a particular campaign stop, or a Supreme Court decision, or a scandalous news report\u2014a valuable potential for candidates and pollsters.<\/p>\n<p>\u201cLet\u2019s say a candidate flies in to a city, makes a speech, and flies out,\u201d says Crovella. \u201cHow much of an effect does that have? A typical political poll is too coarse an instrument to measure that. A poll, even one that\u2019s well done, takes three or four days to get a large enough response to be statistically significant. You can\u2019t measure something that had an effect that lasted two days. That\u2019s washed out of the measurement process.\u201d<\/p>\n<p>Similarly, says Crovella, the large numbers needed to give a traditional poll statistical significance prevent it from drilling down on small populations. \u201cBecause there are a lot of people participating in our data, we can look at political leanings of different populations on an early, localized geographical basis,\u201d says Crovella. \u201cWe can do this in a fairly fine-grained way in space and time, because we\u2019ve got records of their browsing behavior, their websites, on a minute-by-minute, hour-by-hour, day-by-day basis.\u201d<\/p>\n<p>Crovella and Christenson also say that their method can gauge big-picture support more accurately than current polling methods do. Their research, \u201c<a>Assessing Candidate Preference through Web Browsing History<\/a>,\u201d by Giovanni Comarela, Ramakrishnan Durairajan, Paul Barford, Dino Christenson, and Mark Crovella, is published in <em>Proceedings of ACM KDD 2018<\/em>, London, UK.<\/p>\n<p>Ultimately, says Crovella, the polling system needs two things: \u201cIt needs the records of web browsing, and it needs some kind of initial poll to calibrate the machine-learning component to learn what it\u2019s looking for.\u201d<\/p>\n<p>Calibration was the hard part, as well as the reason that massive computing power was brought to bear. How exactly does one translate website visits into reliable indicators of political leanings? Some websites are clearly biased toward one candidate or party, but many are not. And a visit to a particular site may not necessarily mean that the visitor shares the site\u2019s opinion.<\/p>\n<p>Step one was finding a credible way to determine \u201cground truth,\u201d a term that describes criteria based on real-world evidence that is used to train a machine-learning algorithm. Crovella worked backwards, starting, somewhat ironically, with the results of traditional opinion polls.<\/p>\n<div id=\"attachment28874\" class=\"wp-caption alignleft\">\n<p><img src=\"http:\/\/www.bu.edu\/research\/files\/2018\/10\/mark-crovella2-18-1593-MARKPOEM-049.jpg\" alt=\"Boston University computer scientist, Mark Crovella, leaning against a row of servers.\" width=\"400\" class=\"size-full wp-image-28874\" \/><\/p>\n<p class=\"wp-caption-text\">Mark Crovella says his methodology can quickly measure the public\u2019s response to specific events, such as campaign rallies. Photo by Cydney Scott<\/p>\n<\/div>\n<p>\u201cLet\u2019s say you have a poll from September 1, and it shows that on this day 60 percent of the people in Michigan are leaning toward the Democratic party. You use that to train a machine-learning algorithm to look at all of the individuals in your data set and decide which of them must make up that 60 percent. Then you have an idea of what a Democratic voter looks like in terms of their website visits. You carry that forward, looking at subsequent visits and asking how the data set is changing. This method was not previously well-developed, and we had to find a new way to apply it to data that was as large as what we were studying.\u201d<\/p>\n<p>Crovella and Christenson point out that now that they have developed their approach with data that was donated, they are developing methods to accomplish the same ends that operate on encrypted data. This will improve user privacy, because no computer (other than the user\u2019s own computer) will be able to see a user\u2019s web browsing data.<\/p>\n<p>Unsurprisingly, Crovella and Christenson\u2019s initial analysis taught them a few things about their methodology, as well as the sentiments of voters. They learned, for example, which browsing habits were the best indicators of political leanings. \u201cWe found that referrals from social media are very informative,\u201d says Crovella. \u201cWe found that if you simply type a search into a browser and click on that link, it\u2019s not as likely to tell us something about your political leanings. But if you follow a link that was referred to you by a friend, it is likely that that\u2019s indicative of your political leanings.\u201d<\/p>\n<p>What\u2019s next? Crovella and Christenson plan to build a web function that will make their technology and methodology available to other social scientists and public opinion researchers. Crovella says they would like to build a system that social scientists can use to answer questions \u201clike if someone goes to Chicago and gives a speech, how much does it move the needle and how long does it stay moved?\u201d<\/p>\n<p>\u201cI would like to have a web API where any academic researcher could go on any day to query public opinion,\u201d says Christenson. \u201cOne could type in their outcome of interest as well as the geographic area of the country and period of time, and in return get estimates of the related public opinion dynamics in real time. The applications are potentially quite broad. You could look at the public\u2019s position on candidates, representatives, policy issues, even local events, like campaign stops or school board elections, assuming there is an underlying partisan or ideological dimension, and you wouldn\u2019t have to spend tens of thousands of dollars on a poll or even have a poll in the field for the time period or region of interest.\u201d<\/p>\n<p>Perhaps because he is a longtime observer of political polls and a trained survey researcher, Christenson is sympathetic to the shortcomings of traditional polls.<\/p>\n<p>\u201cThere is going to be error whenever you try to generalize,\u201d he says. \u201cAnd when there\u2019s an electorate that\u2019s as divided as the United States, it\u2019s not surprising that polls would be off, especially by small margins in locales where we don\u2019t have a great deal of data collection.\u201d Still, he suggests, public opinion is too important to be marked by the limitations and costs of polls, at least if there is a way to improve upon them. And now there just might be.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>From BU Today \u2014\u00a0As the predictions for the 2016 presidential election remind us, polling the electorate is an imperfect science. Most polls claimed that Hillary Clinton would be our next president\u2014it seemed a foregone conclusion\u2014and most polls were wrong, although many forecasts for the popular vote were very close\u2014off by less than one percentage point. [&hellip;]<\/p>\n","protected":false},"author":15310,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/www.bu.edu\/cs\/wp-json\/wp\/v2\/posts\/12115"}],"collection":[{"href":"https:\/\/www.bu.edu\/cs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.bu.edu\/cs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/cs\/wp-json\/wp\/v2\/users\/15310"}],"replies":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/cs\/wp-json\/wp\/v2\/comments?post=12115"}],"version-history":[{"count":4,"href":"https:\/\/www.bu.edu\/cs\/wp-json\/wp\/v2\/posts\/12115\/revisions"}],"predecessor-version":[{"id":12119,"href":"https:\/\/www.bu.edu\/cs\/wp-json\/wp\/v2\/posts\/12115\/revisions\/12119"}],"wp:attachment":[{"href":"https:\/\/www.bu.edu\/cs\/wp-json\/wp\/v2\/media?parent=12115"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.bu.edu\/cs\/wp-json\/wp\/v2\/categories?post=12115"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.bu.edu\/cs\/wp-json\/wp\/v2\/tags?post=12115"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}