MSDS Curriculum
Curriculum
At Boston University’s Faculty of Computing & Data Sciences (CDS), the MS in Data Science (MSDS) program equips students with advanced analytical, computational, and problem-solving skills. Grounded in real-world application and interdisciplinary collaboration, the curriculum prepares graduates to lead in data science, AI, and emerging technology fields.
The 32-credit program is designed with flexibility in mind, allowing students to pursue academic or professional career paths and complete the degree in as little as nine months (two semesters). Students choose between a Core Methods Focused Concentration and an Applied Methods Focused Concentration, tailoring their studies to their interests and goals. In addition to core and concentration coursework, the program offers the option to extend learning through a summer internship or a master’s thesis course—enabling completion over 16 months. Please note that the summer internship course is only available to students completing the program in 16 months. All students begin the program in September; a spring entry term is not offered.
Requirements
Eight semester courses (32 credits) approved for graduate study are required.
Course requirements include 5 competency courses, with at least one in each of the following areas:
- A1 Modeling and Predictive Analytics
- A2 Data-Centric Computing
- A3 Machine Learning and AI
- A4 Social Impact
- A5 Security and Privacy
Plus 3 additional courses:
- CDS DS 701 Tools for Data Science (Must be taken in the Fall Semester)
- Concentration Elective 1
- Concentration Elective 2
CDS DS 701: Tools for Data Science
The goal of the course is to give students exposure to, and practical experience in, formulating data science questions – particularly learning how to ask good questions in a specific domain. The course covers methods of obtaining data and common methods of processing data from a practical standpoint. It is organized around a semester-long group project in which students are placed into teams and engage with “clients” who bring data science questions from a particular domain.
Competency Courses
Below is only a sample list of courses. The actual course list varies each semester. Once enrolled, students will receive an updated list of the available courses that semester.
A1 Modeling and Predictive Analytics
- A1 Modeling and Predictive Analytics - Covers the formulation of statistical models to describe data, methods for fitting models to data, and use of models for prediction and inference. Approved courses that fall into this category are:
CDS DS 644 ML for Business Analytics
This course develops your ability to solve real-world problems using machine learning. Through hands-on application in Python, you will learn to build, evaluate, and interpret predictive models to derive meaningful insights from data. By the end of this course, you will be able to: - Apply core machine learning algorithms, including regression, lasso, ridge, decision trees, and ensemble methods to appropriate problems. - Implement essential techniques like cross-validation and regularization to build robust and reliable models. - Analyze and address the ethical challenges in machine learning, such as data bias and algorithmic fairness, in practical scenarios. - Execute an end-to-end predictive modeling project, from data preparation to presenting actionable findings.
CAS MA 568 Statistical Analysis of Point Process Data
Introduces the theory of point processes and develops practical problem-solving skills to construct models, assess goodness-of-fit, and perform estimation from point process data. Applications to neural data, earthquake analysis, financial modeling, and queuing theory.
CAS MA 575 Linear Models
Topics to be covered include simple and multiple linear regression, regression with polynomials or factors, analysis of variance, weighted and generalized least squares, transformations, regression diagnostics, variable selection, and extensions of linear models.
CAS MA 576 Generalized Linear Models
Covers topics in linear models beyond MA 575: generalized linear models, analysis of binary and polytomous data, log-linear models, multivariate response models, non-linear models, graphical models, and relevant model selection techniques. Additional topics in modern regression as time allows.
CAS MA 578 Bayesian Statistics
The principles and methods of Bayesian statistics. Subjective probability, Bayes rule, posterior distributions, predictive distributions. Computationally based inference using Monte Carlo integration, Markov chain simulation. Hierarchical models, mixture models, model checking, and methods for Bayesian model selection.
CAS MA 581 Probability
Basic probability, conditional probability, independence. Discrete and continuous random variables, mean and variance, functions of random variables, moment generating function. Jointly distributed random variables, conditional distributions, independent random variables. Methods of transformations, law of large numbers, central limit theorem.
CAS MA 582 Mathematical Statistics
Point estimation including unbiasedness, efficiency, consistency, sufficiency, minimum variance unbiased estimator, Rao-Blackwell theorem, and Rao-Cramer inequality. Maximum likelihood and method of moment estimations; interval estimation; tests of hypothesis, uniformly most powerful tests, uniformly most powerful unbiased tests, likelihood ratio test, and chi-square test.
CAS MA 583 Introduction to Stochastic Processes
Basic concepts and techniques of stochastic process as they are most often used to construct models for a variety of problems of practical interest. Topics include Markov chains, Poisson process, birth and death processes, queuing theory, renewal processes, and reliability.
CAS MA 585 Time Series and Forecasting
Autocorrelation and partial autocorrelation functions; stationary and nonstationary processes; ARIMA and Seasonal ARIMA model identification, estimation, diagnostics, and forecasting. Modeling financial data via ARCH and GARCH models. Volatility estimation; additional topics, including long-range dependence and state-space models.
CAS MA 589 Computational Statistics
Topics from computational statistics that are relevant to modern statistical applications: random number generation, sampling, Monte Carlo methods, computational inference, MCMC methods, graphical models, data partitioning, and bootstrapping. Emphasis on developing solid conceptual understanding of the methods through applications.
CAS MA 592 Intro to Causal Inference
Concepts and methods for causal inference. You may have heard "association does not imply causation." But, what implies causation? In this course, we study how to estimate causal effects from data. We cover both experimental and non-experimental settings.
CDS DS 722 Mathematical Foundations of Data Science and Machine Learning
Mathematical Foundations of Data Science and Machine Learning equips you with the essential mathematical tools and concepts needed for future DS courses. This course is intended for first-year graduate students who need to refresh or expand their mathematical knowledge. The course covers essential topics on Linear Algebra, Optimization, as well as Probability and Statistics.
QST BA 830 Business Experimentation and Causal Methods
This course teaches students how to measure impact in business situations and how to evaluate others' claims of impact. We will draw on a branch of statistics called causal inference that studies when data can be used to measure cause and effect. The course will begin by discussing randomized controlled trials, the most reliable way of measuring effects, and will move onto other methods that can be used when experiments are not feasible or unavailable. We will learn how to implement these methods in Python. Causal inference has become especially important for digital businesses because they are often able to run experiments and to harness 'big data' to make decisions. We will illustrate the methods we learn with examples drawn from digital businesses such as Airbnb, Ebay, and Uber and through topic areas such as price targeting, balancing digital marketplaces, reputation systems, measuring influence in social networks, and algorithmic design. We will also use data from other business and social science applications.
A2 Data-Centric Computing
- A2 Data-Centric Computing - Covers the algorithmic and programming techniques and system designs for processing, analysis, and management of data at scale. Approved courses that fall into this category are:
CDS DS 522 Stochastic Methods for Algorithms
Application of stochastic process theory to design and analyze algorithms used in statistics and machine learning, especially Markov chain Monte Carlo and stochastic optimization methods. Emphasizes connecting theoretical results to practice through combination of proofs, numerical experiments, and expository writing.
CDS DS 591 Engineering for Big Data Workloads
This course is designed for graduate students and advanced undergraduates pursuing careers in data science, computer science, or related fields. Today’s organizations increasingly rely on big data engineering to achieve strategic goals, optimize operations, and gain a competitive edge. This course will not only provide students with comprehensive technical knowledge in designing big data solutions but will also emphasize the critical role of business acumen in engineering. Students will explore how big data can drive business transformation by learning to deploy, optimize, and scale data-driven solutions. By the end of the course, students will have worked on a real-world project, applying the concepts covered to create solutions that meet both technical specifications and business objectives. This approach prepares students to contribute effectively in professional environments where technical solutions must align with organizational strategies, costs, and customer value.
CDS DS 551 Data Engineering at Scale
Welcome to "Data Engineering at Scale," a course designed to immerse you into the fascinating world of large-scale data management, processing, and analytics. Throughout this course, we will focus on a mythical but powerful application called the "Epidemic Engine". This application gathers information about potential health events, aggregates this data, publishes it in diverse ways, and ultimately attempts to predict epidemics. The Epidemic Engine, while hypothetical, embodies the principles and challenges of real-world data engineering systems that power today's most innovative technologies, from social networks to streaming platforms to cutting-edge AI research.
CDS DS 595 Special Topics in Physical and Engineering Sciences - AI for Science
The goal of the course is to equip students with the tools necessary to understand and carry out research at the forefront of AI and the natural sciences. Prerequisites: Multivariable calculus, linear algebra, probability theory; familiarity with neural networks and deep learning frameworks (PyTorch or JAX); proficiency in Python. Exemplary: - Preliminaries: the AI4Science landscape, core ML concepts, automatic differentiation, Bayesian statistics, simulators, common scientific data modalities - Scientific computing infrastructure: data management, compute accelerators, benchmarking and evaluation, reproducibility - Bayesian inference: MCMC and variational methods - Generative modeling (e.g., diffusion models) and surrogate models - Differentiable programming for scientific computing - Neural network building blocks: encoding scientific inductive biases - Neural ODEs and operator learning - Uncertainty quantification - Interpretability and symbolic regression - Foundation models and LLMs for scientific applications - Case studies from across the natural sciences
CAS CS 561 Data System Architectures
Discusses the design of data systems that can address the modern challenges of managing and accessing large, ever-growing, diverse sets of data, often streaming from heterogenous sources, in the context of continuously evolving hardware and software. We use examples from several data management areas including relational systems, distributed database systems, key value stores, newSQL and NoSQL systems, data systems for machine learning (and machine learning for data systems), interactive analytics, and data management as a service.
ENG EC 503 Introduction to Learning from Data
This is an introductory graduate course in (classical) machine learning covering the basic principles and methods of four major non-sequential supervised and unsupervised learning problems namely, classification, regression, clustering, and dimensionality reduction. A variety of contemporary applications will be explored through homeworks and a project.
ENG EC 528 /CAS CS 528 Cloud Computing
Fundamentals of cloud computing covering IaaS platforms, OpenStack, key Big Data platforms, and data center scale systems. Examines influential publications in cloud computing. Culminates in a group project supervised by a mentor from industry or academia.
GRS CS 660 Graduate Introduction to Database Systems
Graduate introduction to database management systems. Examines entity-relationship, relational, and object-oriented data models; commercial query languages: SQL, relational algebra, relational calculus, and QBE; file organization, indexing and hashing, query optimization, transaction processing, concurrency control and recovery, integrity, and security.
CAS MA 539: Methods of Scientific Computing
An introduction to topics including computational linear algebra, solutions of linear equations, numerical integration and solution of differential equations, finite element methods, and methods of stochastic simulation (i.e., Monte Carlo methods).
A3 Machine Learning and AI
- A3 Machine Learning and AI - Covers methods for supervised, unsupervised, and reinforcement learning methods applied to structured and unstructured data. Approved courses that fall into this category are:
CDS DS 542 Deep Learning for Data Science
In this course, students will gain an understanding of the fundamentals in deep learning and then apply those concepts in exercises and applications in python. We'll start with the origins of artificial neural networks, learn about loss functions, understand gradient descent, back propagation and various training optimization techniques. Students will be familiar with canonical network architecture such as multi-layer perceptions, convolutional neural networks, recursive neural networks, LSTMs and GRU, attention and transformers. Through explanations, examples and exercises students will build intuition on how deep learning algorithms work and how they are implemented in popular deep learning frameworks such as PyTorch. Students will be able to define, train and evaluate deep learning models as well as adapt deep learning frameworks to new functionality. Students will gain exposure to pre-trained large language models and other foundation models and the concepts of few-shot learning and reasoning. Finally, students will be able to apply many of the techniques they learned in a final class project.
CAS CS 542 Principles of Machine Learning
Introduction to modern machine learning concepts, techniques, and algorithms. Topics include regression, kernels, support vector machines, feature selection, boosting, clustering, hidden Markov models, and Bayesian networks. Programming assignments emphasize taking theory into practice, through applications on real-world data sets.
CDS DS 592 Special Topics: Intro to Sequential Decision Making
Introduction to Sequential Decision Making This course introduces the study, design and analysis of algorithms for sequential decision making with a particular focus on bandit algorithms and other topics in statistical learning theory. Designed for upper undergraduate and graduate students, the course covers foundational concepts and cutting-edge research in multi-armed bandits, linear bandits, and contextual bandits. Students will gain an understanding of fundamental algorithmic principles in sequential decision making such as optimism, multiplicative weights as well as bandit algorithms such as UCB, EXP3, OFUL. Additionally, the class will cover bandit problems in the general function approximation regime via the study of algorithms such as SquareCB and statistical dimensions for function approximation, including the eluder dimension, dissimilarity dimension, and decision estimation coefficient. Finally, the course will also explore miscellaneous yet essential topics such as online model selection, and offline estimation. Through a combination of theoretical insights and practical applications, students will gain a comprehensive understanding of how to design, analyze, and implement algorithms for sequential decision-making tasks.
CDS DS 593 Special Topics in Data Science Methodologies - Theory and Applications of Large Language Models
Theory and Applications of Large Language Models. In this course, students will become savvy consumers and sophisticated developers of LLMs and related tooling. We will start by orienting ourselves to the history of natural language processing and the current state of AI tools, which students will learn to critically evaluate and use extensively throughout the course. Students will develop a deep intuition for LLM concepts including attention and transformer architectures, sampling, and search, and will build small models from scratch. They will then apply pre-trained LLMs to solve real-world problems, working with advanced techniques including fine-tuning, prompt engineering, RAG, and AI agents. Throughout, the course emphasizes bias, safety, and responsible deployment. Through reflections, labs, and projects, students will demonstrate learning and develop a professional portfolio.
CAS CS 505 Introduction to Natural Language Processing
Natural language processing (NLP) is a field of AI which aims to equip computers with the ability to intelligently process natural (human) language. This course explores statistical and machine learning techniques for the automatic analysis of natural language data.
CAS CS 523 Deep Learning (We recommend CDS DS 542 over CS 523 unless you have time conflicts)
Mathematical and machine learning background for deep learning. Feed-forward networks., Backpropagation. Training strategies for deep networks. Architectures such as convolutional, recurrent, transformer networks. Deep reinforcement and unsupervised learning. Exposure to modern programming tools and libraries. Other recent topics, time permitting.
CAS MA 615 DS in R
Introduction to R, the computer language written by and for statisticians. Emphasis on data exploration, statistical analysis, problem solving, reproducibility, and multimedia delivery.
CDS DS 543 Introduction to Reinforcement Learning
This course aim to present a math-lite introduction to reinforcement learning. We will cover (1) the basics of Markov Decision Processes (2) primary algorithmic paradigms including model-based, value-based and policy-based learning (3) modern challenges and open problems in RL.
GRS CS 640 Artificial Intelligence
Graduate introduction to database management systems. Examines entity-relationship, relational, and object-oriented data models; commercial query languages: SQL, relational algebra, relational calculus, and QBE; file organization, indexing and hashing, query optimization, transaction processing, concurrency control and recovery, integrity, and security.
CAS MA 615 DS in R
Introduction to R, the computer language written by and for statisticians. Emphasis on data exploration, statistical analysis, problem solving, reproducibility, and multimedia delivery.
A4 Social Impact
- A4 Social Impact - Covers considerations of the social implications from the deployment of data science and AI systems, including issues of ethics, fairness, and bias. Approved courses that fall into this category are:
CDS 587 DS in Human Contexts
Where do statistical and computational insights lose historic social contexts? What are the impacts of datafication on individuals and communities? How do social and technical systems reify or challenge social hierarchies? Through a survey of academic literature, community-produced knowledge and coverage of technology in the popular press, this course will explore these themes as they relate to labor and automation, surveillance and the legal system, social media governance, and digital inclusion.
CDS DS 680 Data Science, Society, and Ethics
This course develops students' ability to critically examine and question the interplay between artificial intelligence (AI), data science, and computational technologies on the one hand, and society and public policy on the other. Students will complete exercises to demonstrate their facility with key ethics tools and techniques, and analyze a series of real-world case studies presented alongside ethical tools and analyses that are useful both for staying alert to emerging ethical challenges and responding to them as they arise in both employment settings and everyday life.
CDS DS 684 AI Ethics
This course addresses questions about AI, data science, society, and ethics through a series of real-world case studies, lectures on ethical tools and analytical methods, and a semester-long project in applied AI ethics in partnership with nonprofit and for-profit businesses focusing on an ethics benchmark for chat-based AI systems. This course design is intended to help students stay alert to emerging ethical challenges and respond to them as they arise in both employment settings and everyday life.
A5 Security and Privacy
- A5 Security and Privacy - Covers methods and algorithms that protect user privacy, guarantee information security, and assess system. Approved courses that fall into this category are:
CAS CS 538 Fundamentals of Cryptography
Basic Algorithms to guarantee confidentiality and authenticity of data. Definitions and proofs of security for practical constructions. Topics include perfectly secure encryption, pseudorandom generators, RSA and Elgamal encryption, Diffie-Hellman key agreement, RSA signatures, secret sharing, block and stream ciphers.
ENG EC 521 Cybersecurity
Fundamentals of security related to computers and computer networks. Laws and ethics. Social engineering and psychology-based attacks. Information gathering, network mapping, service enumeration, and vulnerability scanning. Operating system security related to access control, exploits, and disk forensics. Shellcoding. Wired and wireless network security at the physical, network, and application layers. Theoretical lessons are augmented with case studies and demonstrative experimental labs.
CASCS 595 Blockchains and Their Applications
Blockchain technology amalgamates technical tools, economic mechanisms, and system design patterns. It facilitates the construction of information systems with novel combinations of robustness, decentralization, privacy, cost, and flexibility. Beyond their initial use in cryptocurrencies such as Bitcoin, blockchains have become a promising and powerful technology in business, financial services, law, and other areas. This course covers blockchain technology in a comprehensive, systematic, and interdisciplinary way. It surveys major approaches, variants, and applications of blockchains in these areas. Beyond a solid grasp of the principles, the course aims to build familiarity with practice through numerous case studies and hands-on projects. To facilitate its interdisciplinary perspective, this course will be open to two categories of students: students with Computer Science background (graduate or advanced undergraduate), and graduate students with a substantial Business or Law background and a working knowledge of computer programming. Projects will be done in heterogeneous teams combining these categories, and will center on devising and analyzing sample applications of blockchain technology, including both prototype implementations and analysis of its business/legal implications. Topics covered: disentangling 'blockchain'; cryptographic prerequisites; assets and their representations; on-chain programming; state consensus; deployments; decentralized applications (Dapps/Web3); protocol governance; protocol revenue and business models; market structure; privacy and authorization; regulation.
CDS DS 653 Crypto for DS
This course investigates techniques for performing trustworthy data analyses without a trusted party, and for conducting data science without data. The first half of the course investigates cryptocurrencies, the blockchain technology underpinning them, and the incentives for each participant, while the second half of the course focuses on privacy and anonymity using advanced tools from cryptography. The course concludes with a broader exploration into the power of conducting data science without being able to see the underlying data.
CAS CS 548 Advanced Cryptography
Advanced techniques to preserve confidentiality and authenticity against active attacks, zero-knowledge proofs; Fiat-Shamir signature schemes; non-malleable public-key encryption; authenticated symmetric encryption; secure multiparty protocols for tasks ranging from Byzantine agreement to mental poker to threshold cryptography.
CDS DS 593 Privacy-Conscious Computer Systems
Are you worried about web services abusing your data or someone observing your web search history? Do you wrap your phone (or yourself) in tin foil to protect your information? Do you wish to live in a world where web applications are privacy-conscious? Join us while we learn about the computer systems that power up the web and how we can redesign them to better protect users’ privacy. We will look at state of the art datacenter systems for GDPR compliance, compilers and runtimes for automatic policy enforcement, and cryptographic tools for private machine learning and web search. Along the way, we will learn about different normative perspectives on privacy, how users perceive it, and what challenges data scientists and web developers face while striving to achieve it!
Core Methods Concentration
One CDS DS 701: Tools for Data Science course plus two courses from any of the following Group A areas (see above):
A1 - Modeling and Predictive Analytics
A2 - Data-Centric Computing
A3 - Machine Learning and AI
Applied Methods Concentration
One CDS DS 701: Tools for Data Science plus two courses from the approved list of applied methods courses (all courses below are 4 credits unless otherwise noted). Students are free to take any two courses from the entire list below. Some courses naturally form pathways but pathways are not a requirement; students may mix and match across applied areas.
Data Science Pathway:
CDS DS 719: Data Science Product Management
CDS DS593: Special Topics in Data Science Methodologies --- Data Engineering at Scale (Might change to other competencies)
GRS CS 630: Graduate Algorithms
Business Pathway:
QST BA 860: Marketing Analytics
QST BA 870: Financial Analytics
QST BA 875: Operations and Supply Chain Analytics
QST BA 880: People Analytics
QST BA 815: Competing with Analytics
QST BA 843 / QST IS 843: Big Data Analytics for Business
QST BA 878: Machine Learning and Data Infrastructure in Health Care
Computational Biology Pathway:
CDS DS 526: Critical Reading in Biological Data Science
CDS DS 630 Intro to Bioinformatics and Computation Biology
CDS DS 596: Special Topics in Natural, Biological and medical science (Might change to other competencies)
ENG BE 562: Computational Biology: Machine Learning Fundamentals
ENG BF 527: Applications in Bioinformatics
ENG BF 768: Biological Database Systems
Social Technical Pathway:
CDS DS 680: AI Ethics
CDS DS 587: Data Science in Human Context
Security and Privacy Pathway:
CDS DS 653: Cryptography for Data Science
CDS DS 593: Privacy-Conscious Computer Systems
CDS DS 593 - Special Topics in DS Methodologies - Privacy in Practice (Might change to other competencies)
CAS CS 538: Fundamentals of Cryptography
CAS CS 548: Advanced Cryptography
ENG EC 521: Cybersecurity
