SAIL: A Resource from Literature to Medicine – Hariri Institute lab puts computer expertise to work for researchers

By Joel Brown
Cathie Jo Martinh_butoday_16-10462-sail-029 had a problem.
The College of Arts & Sciences professor of political science wanted to compare British and Danish cultural attitudes toward education through the prism of classic literature. She needed to collect and distill data on word use in hundreds of novels written in two languages. But a computer programmer she is not.
Enter SAIL, the Software & Application Innovation Lab at the Rafik B. Hariri Institute for Computing and Computational Science & Engineering, a boutique group of technologists and interns that puts programming expertise at the service of faculty members from across the University, whether that means completing a complex research project or creating a mobile app.
“I just can’t say enough good things about them and this process,” Martin says. “They’ve been unbelievably helpful.”
While the University’s Research Computing Services group within Information Services & Technology provides computing horsepower and a wide variety of specialized services to researchers, the SAIL team works more like a start-up or a close collaborator, offering individual, dedicated service and a fleet-footed approach.
“We have to learn something new every time,” says SAIL director Andre Lapets (GRS’11), the institute’s director of research development, as well as a CAS computer science lecturer. “We might look at something and say, we don’t actually know the technology for this project, but we know that we can learn it, because that’s the culture we cultivate in SAIL.”
The growing SAIL team currently includes three software engineers, a project analyst, and Lapets, as well as several interns. It generally works on half a dozen projects at a time, from both the Medical Campus and the Charles River Campus, and has applied its expertise to about 20 projects since it began operations in March 2015.
Azer Bestavros, founding director of the Hariri Institute and a CAS professor of computer science, created SAIL because “there are so many instances where what faculty need to move their research past a certain point is a demonstrative artifact, some demo, some platform,” says Linda Grosser (Questrom’14), the institute’s director of program and project management.
“The idea is that they’re helping to usher BU faculty into the digital age,” says Martin. “I think it was a tall order to usher me into the digital age.”

A terabyte of Dickens and friends

Britain didn’t offer mass public education until 1870, while comparatively small, rural Denmark had it by 1814. What made the difference? Martin’s thesis, boiled down, is that Britons saw education as a benefit to the individual, while Danes saw it as a benefit to society.
Looking for evidence, she turned to popular novels. She knew she was on the right track after a close reading of a handful of British titles like Robinson Crusoe and David Copperfieldand equivalent Danish classics such as Niels Klim’s Underground Travels and Only a Fiddler(she speaks the language). But to prove her thesis, she would need to study many more books, which meant computerizing her project.
Last spring, Jack Ammerman, associate University librarian for digital initiatives and open access, steered Martin toward SAIL. Lapets and senior software engineer Frederick Jansen walked her through the initial steps and assigned intern Ben Getchell (CGS’09, CAS’11, GRS’19) to work with her.
The first step was obtaining a corpus of about 300 British novels from the Hathi Trust Digital Library and a similar stash from the Royal Library in Copenhagen.
“I was pretty excited about this project. I like the humanities,” says Getchell, who studied philosophy and math as an undergraduate and is now working on a master’s in computer science. “At first it was converting the corpora we had into common text files, parsing XML files and text documents and writing code to extract text and ignore typos.”
With a terabyte of data involved, that took half the summer. Then Martin used computer techniques to choose 200 relevant keywords (e.g., individual, education, freedom, self), and explore how their use varied across time and country.
“The techniques we’re using are usually applied to things like tweets or movie reviews, and all from the last five years,” Lapets says. “One of the challenges was, how do you apply those techniques consistently to large data sets where language changes over hundreds of years.”
“I think I spent the first month apologizing for not knowing more, and they would just wave their hands and say, ‘Calm down, Cathie,’” Martin says. “They were very supportive, doing a lot of hand-holding and explaining really elementary stuff to me, which I appreciated.”
SAIL’s computations revealed that British works of literature were more likely to associate schools and learning with individualism and freedom than were Danish works, which linked schooling to nation and people. The data, most of it anyway, supported Martin’s thesis, which, she hopes, will eventually become part of a book.
On many projects, research grants fund SAIL’s involvement, but when they don’t, there are sometimes Hariri Research Awards. SAIL helped Martin apply for a $25,000 Hariri grant that would, among other things, pay Getchell and poli sci grad student Ozgur Bozcaga (GRS’22) for their hours on the project. She is still processing data with their help through the fall.
“It was a small grant,” Martin says, “but man, did I get the bang for the buck on this one.”

Helping spinal injury patients

A team led by Alan Jette, a School of Public Health professor of health law, policy, and management and director of the Health & Disability Research Institute, spent more than a decade developing a new measure to assess the functional abilities of people with spinal cord injuries, called the Spinal Cord Injury Functional Index—SCI-FI for short. SCI-FI is a computerized adaptive test (CAT) that takes a sophisticated approach using a computer algorithm to administer questions tailored to each individual. It has been tested on more than 1,000 patients over several years, and now the team is ready to make SCI-FI available worldwide.
Mary Slavin, director of education at the Health & Disability Research Institute, is working to convert it from a desktop software package to a web-based version so it can be accessed easily by clinicians and researchers. Slavin (SAR’81) found that vendors who could convert SCI-FI to a web-based version were expensive, a roadblock for a measure with such a limited market. She began to look for help within BU, and she found SAIL. “They immediately understood what we needed to do,” she says.
SCI-FI “has an interesting history and a maybe even more interesting future,” says Jansen. “We have to talk about rolling this out in actual hospitals and having clinicians use it, with the idea that this would become the national standard of tracking progress for people with spinal cord injuries.”
The SAIL team is making the move to get SCI-FI on the web and also working to make the test software modular, so it can be easily adapted for other conditions. Plans to adapt the SCI-FI web-based platform for a measure to assess persons with burn injuries and other applications are in the pipeline.
“I think having an in-house, high-tech consulting company, if I can call them that, that can meet the needs of faculty is really critical,” Slavin says. “Having a group that can customize solutions with new technology that is project-specific and not within the expertise of the researchers is critical. Without them, we can only bring this innovation so far.”
Lapets says there is another thing that the SAIL team has in common with a start-up: their hours.
“We’re usually here until pretty late in the evening,” he says. “They’re very long days, but it’s exciting, and I think everyone in the staff here feels like they’re investing in something that will be very valuable.”