Show Me the Data

in ECE Spotlight Faculty, ECE Spotlight-Research, ENG Spotlight Faculty, NEWS, SE Spotlight Faculty, SE Spotlight-Research, Spotlight Faculty

By Sara Rimer, Photos by Dave Green

First BU Data Science Day Draws Cross-Disciplinary Crowd

Azer Bestavros, director of the Hariri Institute for Computing and Computational Science & Engineering (center), and the co-chairs of Hariris first BU Data Science Day, Prakash Ishwar (left) and Dino Christenson (right), were delighted by the large, cross-disciplinary crowd that came together to talk about how data science is transforming research.
Azer Bestavros, director of the Hariri Institute for Computing and Computational Science & Engineering (center), and the co-chairs of Hariri’s first BU Data Science Day, ECE Professor Prakash Ishwar  (left) and Dino Christenson (right), were delighted by the large, cross-disciplinary crowd that came together to talk about how data science is transforming research.

Azer Bestavros, founding director of the Rafik B. Hariri Institute for Computing and Computational Science & Engineering, was practically giddy. It was the first BU Data Science (BUDS) Day and the Photonics Center ninth-floor conference room, where the institute was hosting the event, was standing room only.

“I thought there might be 80, 90 registrants,” said Bestavros, a College of Arts & Sciences professor of computer science and systems engineering, and head of BU’s Data Science Initiative, welcoming the participants with—what else—data. “They told me there were 262. I was shocked—really?”

Not only that, but the data science geeks—faculty and students from physics, mathematics and statistics, computer science, electrical and computer engineering, systems engineering, biostatistics—were there with people from the humanities and the social sciences as well as from CAS, the Questrom School of Business, Sargent College of Health & Rehabilitation Sciences, the College of Engineering, the College of Communication, the School of Social Work, the School of Public Health, the School of Law, and the School of Medicine.

Bestavros had the data. The registrants were from 66 different disciplines, departments, and offices across the University, including the libraries, Information Systems and Technology, and Career Services. It was the sort of diverse, cross-disciplinary crowd that Bestavros and event cochairs Dino P. Christenson, a CAS associate professor of political science and a former Hariri Institute Junior Fellow, and Prakash Ishwar, an ENG associate professor of electrical and computer engineering, had hoped to draw.

“Why are you here?” Bestavros asked his crowd. “Is it because of the whole ‘data science is the sexiest job’ thing? Maybe it’s about how you’re going to make a ton of money. Maybe it’s about the data that’s coming at us and we don’t know what to do with it; we’re drowning in it. Maybe a lot of you are here to figure out how you can float.”

Or maybe they had all come together, on a wintry Friday morning at the end of January 2016, because they knew “that data science has become the common language of all disciplines.” Data science breaks down the walls between disciplines, said Bestavros—“at least we can talk, at least we can all be in the same room.”

For the next seven to eight hours, faculty, students, and staff connected through data science, brainstorming about its possibilities, reporting on how it was transforming their work in an astonishing array of disciplines—physics, neuroscience, health analytics, cancer research, genomics, the social sciences, marketing, law, even art history. They raised big questions: Can a robot learn how to teach physics? How do you know you can trust the data from crowdsourcing? How can you bring all these different networks together with the right information to actually improve people’s lives?

Josée Dupuis (right), an SPH professor of biostatistics, fields an audience question after her talk about using large genomic data sets to understand the genetic architecture of diseases such as type 2 diabetes. Other panel speakers were Jason Bohland (GRS07) (left), a SAR assistant professor of health sciences and speech, language, and hearing sciences, and Eric Kolaczyk (center), a CAS professor of mathematics and statistics.
Josée Dupuis (right), an SPH professor of biostatistics, fields an audience question after her talk about using large genomic data sets to understand the genetic architecture of diseases such as type 2 diabetes. Other panel speakers were Jason Bohland (GRS’07) (left), a SAR assistant professor of health sciences and speech, language, and hearing sciences, and Eric Kolaczyk (center), a CAS professor of mathematics and statistics.

A newcomer’s question: “What is data science?” Christenson explained: “Data science is a broad term—perhaps overly broad—used to characterize a number of different fields, including political science, that are interested in the systems and processes for extracting knowledge from data. It uses statistical and computational tools to collect, curate, store, analyze, model, and visualize various types of data.”

Addressing the audience before lunch, Gloria Waters, vice president and associate provost for research, commended Bestavros for the interdisciplinary community of scholars he has assembled at the Hariri Institute. She said that the day’s events—the talks, the poster sessions—demonstrated “the excellence, the depth of work” in data science at BU. She noted that data science is one of BU’s “research peaks,” an area that Waters, along with President Robert A. Brown and Provost Jean Morrison, are committed to investing in and excelling at.

“It’s absolutely clear we have world-class faculty in basic science—in math and statistics, computer science, electrical and computer engineering—and faculty who are doing amazing work in applications of data science,” Waters said. Recruiting additional top data science faculty is a primary goal of the Data Science Initiative that Bestavros is leading, she added.

At the event, 12 faculty panel speakers from multiple disciplines spoke for 10 minutes each about how their data-driven research related to one of three broad themes: vision, networks, and health, markets, and policy.

Kicking off the panel focused on vision and visual-data-driven research, Jodi Cranston, a CAS professor of Renaissance art, made the case for small data. “Most scholars in humanities fear big data because it involves technology,” she said. She gave a quick slideshow tour of her Mapping Titian project, an archive and teaching web application that documents the relationship between the artwork of 16th-century Venetian Renaissance artist Titian and their changing locations and historical context (the project was funded, in part, by the Hariri Institute).

“You could think about how movements of artwork are affected by disease, natural disasters, population changes, economic crises, political events,” Cranston said. “Recognizing the potential wide applicability of small data in the humanities helps strengthen the human underlying all humanities research.”

Its absolutely clear we have world-class faculty in basic sciencein math and statistics, computer science, electrical and computer engineeringand faculty who are doing amazing work in applications of data science, said Gloria Waters.
“It’s absolutely clear we have world-class faculty in basic science—in math and statistics, computer science, electrical and computer engineering—and faculty who are doing amazing work in applications of data science,” said Gloria Waters.

Advances in brain imaging have produced a treasure trove of data for neuroscientists. “I study the brain, and the brain is a great problem for big data because the brain has one billion neurons,” said Michael Hasselmo, a CAS professor of psychological and brain sciences and director of BU’s Center for Systems Neuroscience, beginning his vision talk. Hasselmo explained how he is studying the coding of space and time by neurons in rats as part of his work in understanding memory in humans.

“I’m an algorithms guy,” said another vision panel speaker, Brian Kulis, an ENG assistant professor of electrical and computer engineering, who works on machine learning and big data analysis. Kulis defined machine learning as “a set of tools used to make predictions from data.” These tools are useful in many areas, he said, from driverless cars to robotics.

Margrit Betke, a CAS professor of computer science, uses big data to help visually impaired people with things such as navigating busy intersections on foot, reading medication instruction labels, and setting the temperature control in their apartments. She explained how she and a team of students—with the aid of crowdsourcing—insert tags of text on images on a web page. A visually impaired person “takes a photo of their temperature control, uploads it to the internet, and then some friendly person in the world will type the answer back to them: ‘This is your temperature setting.’”

W. Evan Johnson, a MED associate professor of medicine and biostatistics, underscored the role of team science in his work tracking cancer tumors and drug response in cancer cells. He said he wants to use biological big data to improve what happens in the clinic for patients.
W. Evan Johnson, a MED associate professor of medicine and biostatistics, underscored the role of team science in his work tracking cancer tumors and drug response in cancer cells. He said he wants to use biological big data to improve what happens in the clinic for patients.

Betke ticked off a few of her other current collaborations: she and Stan Sclaroff, a CAS professor of computer science, are designing a machine-learning text recognition system. She is working on cell tracking with Joyce Wong, an ENG professor of biomedical engineering. She and a team of biologists are tracking and analyzing the behavior of bats in caves in Texas.

Collaboration was the mantra of the day. “We live and die by our collaborations,” said W. Evan Johnson, a MED associate professor of medicine and biostatistics, who underscored the role of team science in his lab’s work in tracking the evolution of cancer tumors and drug response in cancer cells. Some collaborations are more successful than others, he said.

Biostatisticians are after “the best method to do something,” he said. Biologists, on the other hand, “want to be the first person to discover something.” The two goals—best and first—don’t always converge. The key, he said, is to find collaborators who want to contribute their skills to a joint project and who understand what’s in it for everyone involved.

Johnson, whose research falls at the intersection of statistics, computing, biology, and medicine, said his two teenage sons deserved some of the credit for motivating his research. “They get a kick out of telling people, ‘My dad’s a doctor, but not the kind that helps people,’” he said to laughter from the audience. “I’ve made it my goal to do something that helps people,” he went on. “How can we use biological big data to inform and influence how patients are treated in the clinic?”

During a break, Bestavros noted the multitude of ways the speakers managed to collect the data for their research. Michael J. Meurer, a LAW professor, for example, purchases the data he uses to study patent trolls. For his research into the sharing economy, Georgios Zervas (GRS’11), a Questrom assistant professor of marketing, a computer scientist, and a Hariri Junior Faculty Fellow, analyzes publicly available data from sources such as Airbnb and federal, state, and municipal websites.

From schools and colleges across the University, 26 students participated in the data science poster session. Rajita Menon (GRS18), a doctoral candidate in physics, presented her work on microbial interactions in the human gut.
From schools and colleges across the University, 26 students participated in the data science poster session. Rajita Menon (GRS’18), a doctoral candidate in physics, presented her work on microbial interactions in the human gut.

“It starts with getting the data, cleaning the data, scraping the data,” Bestavros said. “We have to worry about security and privacy, then we have to worry about doing the analytics. We mine it for information that advances our understanding, and we check if our findings make sense. Finally, we have to communicate this in very different ways.”

Speaking of communication, 26 students from colleges and schools across the University—public health, medicine, engineering, business, communication, arts and sciences—participated in the day’s poster session. Sahar Abi Hassan (GRS’19), a doctoral candidate in political science, presented her work on interest groups and the Supreme Court. Abi Hassan said she had been introduced to data science through her department’s required Quantitative Methods 1 course. “From there, I just became fascinated with data science and its great potential for social sciences,” she said. “Working with data allows me to find patterns in political and social phenomena that otherwise would be hidden.”

Commending Abi Hassan’s work, Bestavros said he hoped the day had demonstrated the importance of data science and education. “If you’re a student in political science or sociology or marketing or business or journalism and the whole area is now going to become data-driven, you need to learn at least the basics of data science,” he said. “It’s not something that only computer scientists need to learn. Data touches everything we do.”

A version of this article was originally published in BU Research.