Boston University’s Sarah Milligan Launches First Comprehensive R Package Bringing Variational Autoencoders to Tabular Data

By Hariri Institute Staff

Sarah Milligan, a 2025 Hariri Institute Graduate Student Fellow and PhD candidate in Biostatistics at the BU School of Public Health

Sarah Milligan, a 2025 Hariri Institute Graduate Student Fellow and PhD candidate in Biostatistics at the BU School of Public Health, has developed a groundbreaking R package that brings Variational Autoencoders (VAEs) to tabular data, the first comprehensive tool of its kind available in the R programming language.

VAEs are a type of machine learning model typically used for images or text, but Milligan’s work adapts them to structured data found in spreadsheets or databases, such as medical records. The package allows researchers to implement VAEs on tabular data to uncover patterns within the data and generate synthetic datasets, making it easier to share and analyze data securely while accelerating research.

The package, AutoTab, provides flexible regularization options, supports β-VAE training via warm-up or cyclical annealing, introduces a new method for weighting distributional types within the ELBO, and offers numerous additional customization features, all achievable in as few as three lines of code.

By combining biostatistics with cutting-edge machine learning, this open-source tool introduces new statistical methods to the field and supports reproducible research. It is now available on the Comprehensive R Archive Network (CRAN), opening these powerful capabilities to R users, data scientists, and researchers across disciplines.

Access the package here: https://cran.r-project.org/web/packages/autotab/index.html