Azer Bestavros, founding director of Boston University’s Rafik B. Hariri Institute for Computing and Computational Science & Engineering, was practically giddy. It was the first BU Data Science (BUDS) Day and the ninth-floor conference room of the Photonics Center, where the institute was hosting the event, was standing-room only.
“I thought there might be 80, 90 registrants,” said Bestavros, a professor of computer science and head of the University’s Data Science Initiative, welcoming the participants with—what else—data. “They told me there were 262. I was shocked—really?”
Not only that, but the data science geeks—faculty and students from physics, mathematics and statistics, computer science, electrical and computer engineering, systems engineering, biostatistics—were there with people from the humanities and the social sciences as well as BU’s Questrom School of Business, Sargent College of Health & Rehabilitation Sciences, College of Communication, and schools of Social Work, Public Health, Law, and Medicine.
Bestavros had the data. The registrants were from 66 different disciplines, departments, and offices across the University, including the libraries, information systems and technology, and career services. It was the sort of diverse, cross-disciplinary crowd that Bestavros and the event’s co-chairs, Dino P. Christenson, an associate professor of political science and a former Hariri Institute Junior Fellow, and Prakash Ishwar, an associate professor of electrical and computer engineering, had hoped to draw.
“Why are you here?” Bestavros asked his crowd. “Is it because of the whole ‘data science is the sexiest job’ thing? Maybe it’s about how you’re going to make a ton of money. Maybe it’s about the data that’s coming at us and we don’t know what to do with it; we’re drowning in it. Maybe a lot of you are here to figure out how you can float.”
Or maybe they had all come together, on a wintry Friday morning at the end of January 2016, because they knew “that data science has become the common language of all disciplines.” Data science breaks down the walls between disciplines, said Bestavros—“at least we can talk, at least we can all be in the same room.”
For the next seven or eight hours, faculty, students, and staff connected through data science, brainstorming about its possibilities, reporting on how it was transforming their work in an astonishing array of disciplines—physics, neuroscience, health analytics, cancer research, genomics, the social sciences, marketing, law, even art history. They raised big questions: Can a robot learn how to teach physics? How do you know you can trust the data from the crowd? How can you bring all these different networks together with the right information to actually improve people’s lives?
A newcomer’s question: What is data science? Co-host and political scientist Christenson explained: “Data science is a broad term—perhaps overly broad—used to characterize a number of different fields, including political science, that are interested in the systems and processes for extracting knowledge from data. It uses statistical and computational tools to collect, curate, store, analyze, model, and visualize various types of data.”
Addressing the audience before lunch, Vice President and Associate Provost for Research Gloria Waters commended Bestavros for the interdisciplinary community of scholars he has created at the Hariri Institute. She said that the day’s events—the talks, the poster sessions—demonstrated “the excellence, the depth of work” in data science at BU. She noted that data science is one of BU’s “research peaks,” an area that Waters, along with President Robert Brown and Provost Jean Morrison, are committed to investing in and excelling at.
“It’s absolutely clear we have world-class faculty in basic science—in math and statistics, computer science, electrical and computer engineering—and faculty who are doing amazing work in applications of data science,” Waters said. Recruiting additional top data science faculty is a primary goal of the Data Science Initiative that Bestavros is leading, Waters added.
At the event, 12 faculty panel speakers from multiple disciplines spoke for 10 minutes each about how their data-driven research related to one of three broad themes: vision, networks, and health, markets, and policy.
Kicking off the panel focused on vision and visual-data-driven research, Jodi Cranston, a College of Arts & Sciences professor of Renaissance art, made the case for small data. “Most scholars in humanities fear big data because it involves technology,” she said. She gave a quick slideshow tour of her “Mapping Titian” project, an archive and teaching web application that documents the relationship between the artwork of 16th-century Venetian Renaissance artist Titian and their changing locations and historical context (the project was funded, in part, by the Hariri Institute).
“You could think about how movements of artwork are affected by disease, natural disasters, population changes, economic crises, political events,” Cranston said. “Recognizing the potential wide applicability of small data in the humanities helps strengthen the human underlying all humanities research.”
“It’s absolutely clear we have world-class faculty in basic science—in math and statistics, computer science, electrical and computer engineering—and faculty who are doing amazing work in applications of data science.” — Gloria Waters
Advances in brain imaging have produced a treasure trove of data for neuroscientists. “I study the brain and the brain is a great problem for big data because the brain has one billion neurons,” said Michael Hasselmo, a professor of psychological and brain sciences and director of BU’s Center for Systems Neuroscience, beginning his vision talk. Hasselmo explained how he is studying the coding of space and time by neurons in rats as part of his work in understanding memory in humans.
“I’m an algorithms guy,” said another vision panel speaker, Brian Kulis, a College of Engineering (ENG) assistant professor of electrical and computer engineering who works on machine learning and big data analysis. Kulis defined machine learning as “a set of tools used to make predictions from data.” These tools are useful in many areas, he said, from driverless cars to robotics.
Margrit Betke, a professor of computer science, uses big data to help visually impaired people with things such as navigating busy intersections on foot, reading medication instruction labels, and setting the temperature control in their apartments. She explained how she and a team of students—with the aid of crowdsourcing—insert tags of text on images on a web page. A visually impaired person “takes a photo of their temperature control, uploads it to the internet, and then some friendly person in the world will type the answer back to them: ‘This is your temperature setting.’”
Betke ticked off a few of her other current collaborations: She and Stan Sclaroff, a professor of computer science, are designing a machine-learning text recognition system. She is working on cell tracking with Joyce Wong, a professor of biomedical engineering. She and a team of biologists are tracking and analyzing the behavior of bats in caves in Texas.
Collaboration was the mantra of the day. “We live and die by our collaborations,” said W. Evan Johnson, an associate professor of medicine and biostatistics at the School of Medicine, who underscored the role of team science in his lab’s work in tracking the evolution of cancer tumors and drug response in cancer cells. Some collaborations are more successful than others, he said.
Biostatisticians are after “the best method to do something,” he said. Biologists, on the other hand, “want to be the first person to discover something.” The two goals—best and first—don’t always converge. The key, he said, is to find collaborators who want to contribute their skills to a joint project and who understand what’s in it for everyone involved.
Johnson, whose research falls at the intersection of statistics, computing, biology, and medicine, said his two teenage sons deserved some of the credit for motivating his research. “They get a kick out of telling people, ‘My dad’s a doctor but not the kind that helps people,’” he said. The audience laughed. “I’ve made it my goal to do something that helps people,” he went on. “How can we use biological big data to inform and influence how patients are treated in the clinic?”
During a break, Bestavros noted the multitude of ways the speakers managed to collect the data for their research. Law professor Michael J. Meurer, for example, purchases the data he uses to study patent trolls. For his research into the sharing economy, Georgios Zervas, a Questrom assistant professor of marketing, computer scientist, and Junior Faculty Fellow at Hariri, analyzes publicly available data from sources such as Airbnb and federal, state, and municipal websites.
“It starts with getting the data, cleaning the data, scraping the data,” Bestavros said. “We have to worry about security and privacy, then we have to worry about doing the analytics. We mine it for information that advances our understanding, and we check if our findings make sense. Finally we have to communicate this in very different ways.”
Speaking of communication, 26 students from colleges and schools across the University—public health, medicine, engineering, business, communication, arts and sciences—participated in the day’s poster session. Sahar Abi Hassan, a doctoral candidate in political science, presented her work on interest groups and the Supreme Court. Abi Hassan said she had been introduced to data science through her department’s required Quantitative Methods 1 course. “From there, I just became fascinated with data science and its great potential for social sciences,” she said. “Working with data allows me to find patterns in political and social phenomena that otherwise would be hidden.”
Praising Abi Hassan’s work, Bestavros said he hoped the day had demonstrated the importance of data science and education. “If you’re a student in political science or sociology or marketing or business or journalism and the whole area is now going to become data-driven, you need to learn at least the basics of data science,” he said. “It’s not something that only computer scientists need to learn. Data touches everything we do.”