Jonathan Huggins: Computation, Statistics, and Real-World Problems

Jonathan Huggins joined BU in 2020 after a Postdoctoral Research Fellowship at Harvard’s Department of Bioinformatics. During the past few years, he has been busy researching, giving talks, writing papers, and, of course, teaching! A Data Science Faculty Fellow, he participates in various collaborative efforts which go beyond his reaching and research agendas. Recently, he also co-created Massachusetts COVID Vaccination Help, which helped secure thousands of COVID vaccine appointments for people across the state.

His research focuses on the development of robust, reliable machine learning and AI methods that are scalable to real-world problems, big datasets and complex models. He aims to create algorithms with trustworthy accuracy guarantees.

Hear what he has to say about his background and current work below.


Tell us about your current position at Boston University.

I’m an Assistant Professor in the Department of Mathematics & Statistics. So, I teach undergraduate and graduate classes, and do research, including publishing papers and advising graduate students on their research.

Can you talk about your current research? Is there a specific problem you are working on right now? 

My research centers on the development of fast, trustworthy machine learning and AI methods that balance the need for computational efficiency and the desire for statistical optimality with the inherent imperfections that come from real-world problems, large datasets, and complex models. I’m particularly interested in generalizations of Bayesian inference that provide correct uncertainty quantification when the usual assumptions required to use Bayesian methods do not hold. An interesting theme in my recent work is that good computational and statistical properties often go together rather than being in competition. For example, in a preprint that will be appearing soon, we’ve shown that using an algorithm called stochastic gradient Langevin dynamics (SGLD) can lead to more statistically robust uncertainty quantification. SGLD was originally proposed as a fast heuristic to draw approximate samples from the usual Bayesian posterior. Most work on SGLD has tried to show that it can provide a good approximation (perhaps after tweaking the algorithm). But in practice SGLD doesn’t approximate the posterior well – and that’s OK! Because by leveraging the “stochastic gradient” part of the algorithm – which is what makes SGLD fast – we can also get statistical robustness.

What motivated you to pursue the field you work in?

I’ve been broadly interested in AI and machine learning since I was a kid. I was lucky enough to work with a MIT graduate student doing robotics research one summer when I was in high school. It was a wonderful experience that got me interested in doing research and in academia. But I also learned I didn’t want to deal with actual robots – they are such a pain! After that I did an internship where I worked on natural language processing. That’s where I got introduced to probabilistic machine learning and I’ve never looked back.

How were you introduced to data science?

My internship working on natural language processing was my first introduction to what we now might call data science. 

How do you implement data science into your work?

My research is really about developing new methods and theories for data science. My applied data science work is focused on methods to enable more effective scientific discovery from high-throughput and multi-modal genomic data, with a focus on cancer genomics. 

How do you incorporate computer science into the courses you teach?

Data science is all about bringing many disciplines together for the successful analysis of data. The two disciplines I work at the intersection of are computer science and statistics, so I always try to incorporate both into the courses I teach. It’s not enough to just learn about data science methods and the theory behind them. You also need to be able to implement them efficiently, and that’s what computer science is all about. For example, in Spring 2022 I’ll teach a seminar course tentatively titled “From Stochastic Processes to Algorithms and Back.” As the name suggests, it’s a computer science course and a math & statistics course. Students will learn to use tools from probability theory and statistics to design and analyze algorithms, then implement those algorithms to confirm they perform as expected on real data.

What is a goal you hope to accomplish by the end of the upcoming academic year?

I hope to have helped all of the graduate students I advise successfully publish their first papers.

What is your role in CDS? Is there anything you are looking forward to within the role, moving forward?

I’m a Founding Member of CDS, which means I do the usual roles of a department faculty member like attend faculty meetings, vote on proposals, and advise graduate students. In addition, I’ve become involved in helping to develop CDS’s new academic program. This summer I worked with a small group of CDS faculty to design some of the core undergraduate courses for the CDS major. And, going forward, I will be a member of the Academic Policy Committee for CDS, which will provide oversight of the quality of the academic programs in CDS and suggest changes to programs as needed. I’m very excited about this new role and the unique opportunity it offers to help to shape the many exciting and innovative academic programs CDS is developing.