{"id":10431,"date":"2023-10-26T10:50:15","date_gmt":"2023-10-26T14:50:15","guid":{"rendered":"https:\/\/www.bu.edu\/cds-faculty\/?p=10431"},"modified":"2025-02-28T05:43:26","modified_gmt":"2025-02-28T10:43:26","slug":"machine-learning-symposium","status":"publish","type":"post","link":"https:\/\/www.bu.edu\/cds-faculty\/2023\/10\/26\/machine-learning-symposium\/","title":{"rendered":"Four-Week Machine Learning Symposium, Fall 2024"},"content":{"rendered":"<p><img loading=\"lazy\" src=\"\/cds-faculty\/files\/2023\/10\/Fall-2024-Main-2-636x358.jpg\" alt=\"\" width=\"636\" height=\"358\" class=\"alignnone size-medium wp-image-14393\" srcset=\"https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/Fall-2024-Main-2-636x358.jpg 636w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/Fall-2024-Main-2-1024x576.jpg 1024w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/Fall-2024-Main-2-768x432.jpg 768w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/Fall-2024-Main-2-1536x864.jpg 1536w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/Fall-2024-Main-2-1200x675.jpg 1200w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/Fall-2024-Main-2-992x558.jpg 992w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/Fall-2024-Main-2-1500x844.jpg 1500w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/Fall-2024-Main-2-300x169.jpg 300w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/Fall-2024-Main-2.jpg 1600w\" sizes=\"(max-width: 636px) 100vw, 636px\" \/><\/p>\n<p><span>The <strong>CDS Machine Learning Symposium,<\/strong> developed by\u00a0Assistant Professors <\/span><span><a href=\"https:\/\/www.bu.edu\/cds-faculty\/profile\/aldo-pacchiano\/\">Aldo Pacchiano<\/a> and <a href=\"https:\/\/www.bu.edu\/cds-faculty\/profile\/xuezhou-jack-zhang\/\">Xuezhou Zhang<\/a>, and <\/span><span>hosted by the Faculty of Computing and Data Sciences, brings together leading scholars in machine learning to delve into the cutting-edge developments and foundational technologies shaping the field of machine learning. By uniting experts from various technical disciplines, including algorithmic design, model architectures, and optimization techniques, the symposium aims to illuminate the latest advancements and challenges in core machine learning methodologies.<\/span><\/p>\n<hr \/>\n<h3><img loading=\"lazy\" src=\"\/cds-faculty\/files\/2023\/10\/Screen-Shot-2024-11-20-at-6.16.42-AM-150x150.png\" alt=\"Kiante Brantley, Assistant Professor, Harvard University\" width=\"150\" height=\"150\" class=\"alignright wp-image-14808 size-thumbnail\" srcset=\"https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/Screen-Shot-2024-11-20-at-6.16.42-AM-150x150.png 150w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/Screen-Shot-2024-11-20-at-6.16.42-AM-300x300.png 300w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/Screen-Shot-2024-11-20-at-6.16.42-AM-100x100.png 100w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/Screen-Shot-2024-11-20-at-6.16.42-AM.png 430w\" sizes=\"(max-width: 150px) 100vw, 150px\" \/>Past Talks Fall 2024<\/h3>\n<h3>Efficient Policy Optimization Techniques for LLMs<\/h3>\n<h3><span>Kiante Brantley, Assistant Professor, Harvard University<\/span><\/h3>\n<p>Date: Friday, December 6, at 10:00 AM<\/p>\n<p><strong>Location:<\/strong> CDS 1646<\/p>\n<p>Post-training is essential for enhancing large language model (LLM) capabilities and aligning them to human preferences. One of the most widely used post-training techniques is reinforcement learning from human feedback (RLHF). In this talk, I will first discuss the challenges of applying RL to LLM training. Next, I will introduce RL algorithms that tackle these challenges by utilizing key properties of the underlying problem. Additionally, I will present an approach that simplifies the RL policy optimization process for LLMs to relative reward regression. Finally, I will extend this idea to develop a policy optimization technique for multi-turn reinforcement learning from human feedback.<\/p>\n<p>Kiant\u00e9 Brantley is an Assistant Professor in the Kempner Institute and School of Engineering and Applied Sciences (SEAS) at Harvard University. He completed his Ph.D. in Computer Science at the University of Maryland College Park, advised by Dr. Hal Daum\u00e9 III. After graduating, he completed his postdoctoral studies at Cornell University, working with Thorsten Joachims. His research focuses on problems at the intersection of machine learning and interactive decision-making, with the goal of improving the decision-making capabilities of foundation models. He has received several awards with his colleagues, including spotlight talks at ICLR 2023 and ICLR 2019. He has also received multiple fellowships, including the NSF LSAMP BD Fellowship and the NSF CI Fellow Postdoctoral Fellowship. In his spare time, he enjoys playing sports; his favorite sport at the moment is powerlifting.<a href=\"https:\/\/www.eventbrite.com\/e\/fall-2024-machine-learning-symposium-tickets-1042083659277?aff=oddtdtcreator\"><\/a><\/p>\n<hr \/>\n<h3><\/h3>\n<h3>Synthetic Potential Outcomes and Causal Mixture Identifiability<\/h3>\n<h3><span><img loading=\"lazy\" src=\"\/cds-faculty\/files\/2023\/10\/Screen-Shot-2024-11-20-at-6.08.51-AM-150x150.png\" alt=\"Bijan Mazaheri, Postdoctoral Associate, Broad Institute of MIT and Harvard University; Incoming Assistnat Professor, Dartmouth\" width=\"150\" height=\"150\" class=\"alignright wp-image-14809 size-thumbnail\" srcset=\"https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/Screen-Shot-2024-11-20-at-6.08.51-AM-150x150.png 150w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/Screen-Shot-2024-11-20-at-6.08.51-AM-300x300.png 300w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/Screen-Shot-2024-11-20-at-6.08.51-AM-100x100.png 100w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/Screen-Shot-2024-11-20-at-6.08.51-AM.png 430w\" sizes=\"(max-width: 150px) 100vw, 150px\" \/>Bijan Mazaheri, Postdoctoral Associate, Broad Institute of MIT and Harvard University; Incoming Assistant Professor, Dartmouth<\/span><\/h3>\n<p>Date: Friday, November 22, at 2:00 PM<\/p>\n<p><strong>Location:<\/strong> CDS 1646<\/p>\n<p><span>Heterogeneous data from multiple populations, sub-groups, or sources can be represented as a &#8220;mixture model\u201d with a single latent class influencing all of the observed covariates. Heterogeneity can be resolved at multiple levels by grouping populations according to different notions of similarity. This talk, presented by Dr. Bijan Mazaheri, proposes grouping with respect to the causal response of an intervention or perturbation on the system. Mazaheri will show that this definition is distinct from standard notions, such as similar covariate values (e.g. clustering) or similar correlations between covariates (e.g. Gaussian mixture models). To solve the problem, Mazaheri will describe \u201csynthetically\u201d sampling from a counterfactual distribution using higher-order multi-linear moments of the observable data. To understand how these &#8220;causal mixtures&#8221; fit in with more classical notions, a hierarchy of mixture identifiability will be developed. Reflecting on this hierarchy, Mazaheri will discuss the role of causal modeling as a guiding framework for data science.<\/span><\/p>\n<p>Dr. Bijan Mazaheri is an Eric and Wendy Schmidt Postdoctoral Fellow at the Broad Institute of MIT and Harvard University. Bijan is broadly interested in the task of combining data and knowledge from multiple places, topics, and modalities. Before starting at the Broad, Bijan was an NSF Graduate Research Fellow and Amazon AI4Science Fellow at Caltech, supervised by Shuki Bruck and Leonard Schulman. Bijan also studied at the University of Cambridge under a Herschel Smith Fellowship and holds a BA from Williams College. Bijan is starting an assistant professorship at Dartmouth Engineering in January and is recruiting Ph.D. students.<\/p>\n<hr \/>\n<h3><span><img loading=\"lazy\" src=\"\/cds-faculty\/files\/2023\/10\/Screen-Shot-2024-11-20-at-6.34.57-AM-150x150.png\" alt=\"Han Shao,\u00a0Postdoctoral Associate, Harvard University; Incoming Assistant Professor, UMD\" width=\"150\" height=\"150\" class=\"wp-image-14811 size-thumbnail alignright\" srcset=\"https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/Screen-Shot-2024-11-20-at-6.34.57-AM-150x150.png 150w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/Screen-Shot-2024-11-20-at-6.34.57-AM-300x300.png 300w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/Screen-Shot-2024-11-20-at-6.34.57-AM-100x100.png 100w\" sizes=\"(max-width: 150px) 100vw, 150px\" \/><\/span><\/h3>\n<h3><span>Learning from Strategic Data Sources<\/span><\/h3>\n<h3><span>Han Shao,\u00a0Postdoctoral Associate, Harvard University; Incoming Assistant Professor, UMD<\/span><\/h3>\n<p>Date: Friday, November 8, at 11:00 AM<\/p>\n<p><strong>Location:<\/strong> CDS 1101<\/p>\n<p><span><strong>Abstract:<\/strong>\u00a0In contrast with standard classification tasks, strategic classification involves agents strategically modifying their features in an effort to receive favorable predictions. For instance, given a classifier determining loan approval based on credit scores, applicants may open or close their credit cards and bank accounts to fool the classifier. The learning goal is to find a classifier robust against strategic manipulations. Various settings, based on what and when information is known, have been explored in strategic classification. In this talk, Shao will focus on addressing a fundamental question: the learnability gaps between strategic classification and standard learning. This talk is based on joint work with Avrim Blum, Omar Montasser, Lee Cohen, Yishay Mansour, and Shay Moran (arxiv.org\/abs\/2305.16501 published at NeurIPS&#8217;23, arxiv.org\/abs\/2402.19303 published at COLT&#8217;24).<\/span><\/p>\n<p><span><strong>Bio:<\/strong> Han Shao is a CMSA postdoc at Harvard, hosted by Cynthia Dwork and Ariel Procaccia. She will join the Department of Computer Science at the University of Maryland in Fall 2025 as an Assistant Professor. She completed her PhD at TTIC, where she was advised by Avrim Blum. Her research focuses on the theoretical foundations of machine learning, particularly on fundamental questions arising from human social and adversarial behaviors in the learning process. She is interested in understanding how these behaviors affect machine learning systems and developing methods to enhance accuracy and robustness. Additionally, she is interested in gaining a theoretical understanding of empirical observations concerning adversarial robustness.<\/span><\/p>\n<hr \/>\n<p><img loading=\"lazy\" src=\"\/cds-faculty\/files\/2023\/10\/charapod-headshot-new-150x150.jpg\" alt=\"\" width=\"150\" height=\"150\" class=\"alignright wp-image-14696 size-thumbnail\" srcset=\"https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/charapod-headshot-new-150x150.jpg 150w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/charapod-headshot-new-300x300.jpg 300w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/charapod-headshot-new-600x600.jpg 600w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/charapod-headshot-new-550x550.jpg 550w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/charapod-headshot-new-710x710.jpg 710w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/charapod-headshot-new-100x100.jpg 100w\" sizes=\"(max-width: 150px) 100vw, 150px\" \/><\/p>\n<h3><span>Learning in Strategic Environments: From Calibrated Agents to General Information Asymmetry<\/span><\/h3>\n<h3><span>Chara Podimata, Assistant Professor, MIT<\/span><\/h3>\n<p>Date: Friday, November 15, at 12:00 PM<\/p>\n<p><strong>Location:<\/strong> CDS 1646<\/p>\n<p><span>In this talk, Podimata will discuss learning in principal-agent games where there is information asymmetry between what the principal and the agent know about each other\u2019s chosen actions. They will introduce a generalization of the standard Stackelberg Games (SGs) framework: Calibrated Stackelberg Games (CSGs). In CSGs, a principal repeatedly interacts with an agent who (contrary to standard SGs) does not have direct access to the principal\u2019s action but instead best-responds to calibrated forecasts about it. Podimata will show that in CSGs, the principal can achieve utility that converges to the optimum Stackelberg value of the game (i.e., the value that they could achieve had the agent known the principal\u2019s strategy all along) both in finite and continuous settings, and that no higher utility is achievable. Finally, they will discuss a meta-question: when learning in strategic environments, can agents overcome uncertainty about their preferences to achieve outcomes they could have achieved absent any uncertainty? And can they do this solely through interactions with each other?<\/span><\/p>\n<p><span>Chara Podimata is a 1942 Career Development Professor of Operations Research and Statistics at MIT. Her research focuses on incentive-aware ML and more broadly on social computing both from a theoretical and a practical standpoint. Her research is supported by Amazon and the MacArthur foundation through an x-grant. She got her PhD from Harvard. In her free time, she runs and spends time with her dog, Terra.<\/span><\/p>\n<hr \/>\n<h2>Fall 2023\/Past Speakers<\/h2>\n<p><span><div class=\"bu_collapsible_container \" aria-live=\"polite\" data-customize-animation=\"false\"><h3 class=\"bu_collapsible\" aria-expanded=\"false\"tabindex=\"0\" role=\"button\">Fall 2023 Speaker Lineup<\/h3><div class=\"bu_collapsible_section\" style=\"display: none;\"><\/span><\/p>\n<p><img loading=\"lazy\" src=\"\/cds-faculty\/files\/2023\/10\/1_Jinglong_Zhao_pic-150x150.jpeg\" alt=\"BU CDS Machine Learning Symposium\" width=\"150\" height=\"150\" class=\"alignright wp-image-10457 size-thumbnail\" srcset=\"https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/1_Jinglong_Zhao_pic-150x150.jpeg 150w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/1_Jinglong_Zhao_pic-300x300.jpeg 300w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/1_Jinglong_Zhao_pic-100x100.jpeg 100w\" sizes=\"(max-width: 150px) 100vw, 150px\" \/><\/p>\n<h3>Adaptive Neyman Allocation<\/h3>\n<h3>Jinglong Zhao, Assistant Professor, Questrom School of Business<\/h3>\n<p>Date: Wednesday, Nov. 1st at 4:30 PM<\/p>\n<p>Location: CDS 1750<\/p>\n<p>Abstract: In experimental design, Neyman allocation refers to the practice of allocating subjects into treated and control groups, potentially in unequal numbers proportional to their respective standard deviations, with the objective of minimizing the variance of the treatment effect estimator. This widely recognized approach increases statistical power in scenarios where the treated and control groups have different standard deviations, as is often the case in social experiments, clinical trials, marketing research, and online A\/B testing. However, Neyman allocation cannot be implemented unless the standard deviations are known in advance. Fortunately, the multi-stage nature of the aforementioned applications allows the use of earlier stage observations to estimate the standard deviations, which further guide allocation decisions in later stages. In this paper, we introduce a competitive analysis framework to study this multi-stage experimental design problem. We propose a simple adaptive Neyman allocation algorithm, which almost matches the information-theoretic limit of conducting experiments. Using online A\/B testing data from a social media site, we demonstrate the effectiveness of our adaptive Neyman allocation algorithm, highlighting its practicality especially when applied with only a limited number of stages.<\/p>\n<p>Bio: Jinglong Zhao is an Assistant Professor of Operations and Technology Management at Questrom School of Business at Boston University. He works at the interface between optimization and econometrics. His research leverages discrete optimization techniques to design field experiments with applications in online platforms. Jinglong completed his PhD in Social and Engineering Systems and Statistics at Massachusetts Institute of Technology.<\/p>\n<hr \/>\n<p><img loading=\"lazy\" src=\"\/cds-faculty\/files\/2023\/10\/2_Stephen_McAleer_pic-150x150.png\" alt=\"BU CDS Machine Learning Symposium\" width=\"150\" height=\"150\" class=\"alignright wp-image-10458 size-thumbnail\" srcset=\"https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/2_Stephen_McAleer_pic-150x150.png 150w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/2_Stephen_McAleer_pic-300x300.png 300w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/2_Stephen_McAleer_pic-100x100.png 100w\" sizes=\"(max-width: 150px) 100vw, 150px\" \/><\/p>\n<h3>Toward General Virtual Agents<\/h3>\n<h3>Stephen McAleer, Postdoc,\u00a0Carnegie Mellon University<\/h3>\n<p>Date &amp; Time: Thursday, Nov. 9th at 4:30 PM<\/p>\n<p>Location: CDS 1646<\/p>\n<p>Abstract: Agents capable of carrying out general tasks on a computer can greatly improve efficiency and productivity. Ideally, such agents should be able to solve new computer tasks presented to them through natural language commands. However, previous approaches to this problem require large amounts of expert demonstrations and task-specific reward functions, both of which are impractical for new tasks. In this talk, I show that pre-trained LLMs are able to achieve state-of-the-art performance on MiniWoB, a popular computer task benchmark, by recursively criticizing and improving outputs. I then argue that RLHF is a promising approach toward improving LLM agents, and introduce new work on countering over-optimization in RLHF via constrained RL.<\/p>\n<p>Bio: Stephen McAleer is a postdoc at Carnegie Mellon University working with Tuomas Sandholm. His research has led to the first reinforcement learning algorithm to solve the Rubik&#8217;s cube and the first algorithm to achieve expert-level performance on Stratego. His work has been published in Science, Nature Machine Intelligence, ICML, NeurIPS, and ICLR, and has been featured in news outlets such as the Washington Post, the LA Times, MIT Technology Review, and Forbes. He received a PhD in computer science from UC Irvine working with Pierre Baldi, and a BS in mathematics and economics from Arizona State University.<\/p>\n<hr \/>\n<p><img loading=\"lazy\" src=\"\/cds-faculty\/files\/2023\/10\/Screen-Shot-2023-10-27-at-6.56.06-AM-150x150.png\" alt=\"BU CDS Machine Learning Symposium\" width=\"150\" height=\"150\" class=\"size-thumbnail wp-image-10462 alignright\" srcset=\"https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/Screen-Shot-2023-10-27-at-6.56.06-AM-150x150.png 150w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/Screen-Shot-2023-10-27-at-6.56.06-AM-300x300.png 300w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/Screen-Shot-2023-10-27-at-6.56.06-AM-600x600.png 600w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/Screen-Shot-2023-10-27-at-6.56.06-AM-550x550.png 550w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/Screen-Shot-2023-10-27-at-6.56.06-AM-710x710.png 710w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/Screen-Shot-2023-10-27-at-6.56.06-AM-100x100.png 100w\" sizes=\"(max-width: 150px) 100vw, 150px\" \/><\/p>\n<h3>Geospatial Machine Learning: Data-focused Algorithm Design, Development, and Evaluation<\/h3>\n<h3>Esther Rolf, Postdoc at Harvard University; Incoming Assistant Professor, University of Colorado, Boulder<\/h3>\n<p>Date &amp; Time: Wednesday, Nov. 15th at 3 PM<\/p>\n<p>Location: CDS 548<\/p>\n<p>Bio: Esther Rolf is a postdoctoral fellow with the Harvard Data Science Initiative and the Center for Research on Computation and Society. In fall of 2024, Esther will join University of Colorado, Boulder as an assistant professor of computer science.<\/p>\n<p>Esther&#8217;s research in statistical and geospatial machine learning blends methodological and applied techniques to study and design machine learning algorithms and systems with an emphasis on usability, data-efficiency and fairness. Some of her specific projects span developing algorithms and infrastructure for reliable environmental monitoring using machine learning, responsible and fair algorithm design and use, and the influence of data acquisition and representation on the efficacy and applicability of machine learning systems.<\/p>\n<p>Esther completed her PhD in Computer Science in 2022 at UC Berkeley, where she was advised by Ben Recht and Michael I. Jordan. Esther\u2019s PhD was supported by an NSF Graduate Research Fellowship, a Google Research Fellowship, and a UC Berkeley Stonebreaker Fellowship. Esther has won best paper awards at ICML (2018) and the Workshop on AI for Social Good at Neurips (2019), and the impact of her work has been recognized with a SDG Digital Gamechangers award (2023) from the United Nations Development Programme and the International Telecommunication Union.<\/p>\n<hr \/>\n<p><img loading=\"lazy\" src=\"\/cds-faculty\/files\/2023\/10\/4_david_pic-150x150.png\" alt=\"BU CDS Machine Learning Symposium\" width=\"150\" height=\"150\" class=\"size-thumbnail wp-image-10460 alignright\" srcset=\"https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/4_david_pic-150x150.png 150w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/4_david_pic-100x100.png 100w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/4_david_pic.png 300w\" sizes=\"(max-width: 150px) 100vw, 150px\" \/><\/p>\n<h3>Machine Learning in the Space of Datasets: an Optimal Transport Perspective<\/h3>\n<h3>David Alvarez, Assistant Professor at Harvard; Senior Researcher at MSR New England<\/h3>\n<p>Date: Wednesday, Nov. 29th at\u00a03 PM<\/p>\n<p>Location: CDS 548<\/p>\n<p>Abstract: Machine learning \u2014as taught in classrooms and textbooks\u2014 typically involves a single, fixed, and homogenous dataset on which models are trained and evaluated. But machine learning in practice is rarely so \u2018pristine\u2019. In most real-life applications clean labeled data is typically scarce, so it is often necessary to leverage multiple heterogeneous data sources. In particular, there is an almost-universal discrepancy between training and testing data distributions. This phenomenon has been profoundly amplified by the recent advent of massive reusable \u2018pre-trained\u2019 deep learning models, which rely on vast amounts of highly heterogeneous datasets for training, and are then re-purposed for a variety of distinct (and often unrelated) tasks. This emerging paradigm of \u2018Machine Learning on collections of datasets\u2019 necessitates new theoretical and algorithmic tools. In this talk, I will argue that Optimal Transport provides an ideal framework on which to lay the foundations for this novel paradigm. It allows to us to define semantically-meaningful distances between datasets, to elucidate correspondences between them, and to solve optimization objectives over them. Through applications in dataset selection, transfer learning, and dataset shaping, I will show that besides enjoying sound theoretical footing, these OT-based approaches yield powerful, highly-scalable, and at times surprisingly insightful methods.<\/p>\n<p>Bio: David Alvarez-Melis is an assistant professor of computer science at Harvard SEAS, and is a faculty affiliate at the Harvard Data Science initiative, the Kempner Institute, and the Center for Research on Computation and Society (CRCS). Before Harvard, he spent a few years at Microsoft Research New England, as part of the core Machine Learning and Statistics group. His research seeks to make machine learning more broadly applicable (especially to data-poor applications) and trustworthy (e.g., robust and interpretable). For this, he draws on ideas from various fields including statistics, optimization and applied mathematics, and takes inspiration from problems arising in the application of machine learning to the natural sciences.<\/p>\n<hr \/>\n<p><img loading=\"lazy\" src=\"\/cds-faculty\/files\/2023\/10\/5_ashok_pic-150x150.png\" alt=\"BU CDS Machine Learning Symposium\" width=\"150\" height=\"150\" class=\"size-thumbnail wp-image-10461 alignright\" srcset=\"https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/5_ashok_pic-150x150.png 150w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/5_ashok_pic-300x300.png 300w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/5_ashok_pic-600x600.png 600w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/5_ashok_pic-550x550.png 550w, https:\/\/www.bu.edu\/cds-faculty\/files\/2023\/10\/5_ashok_pic-100x100.png 100w\" sizes=\"(max-width: 150px) 100vw, 150px\" \/><\/p>\n<h3><span>Learning Rate Tuning from Theory to Practice<\/span><\/h3>\n<h3>Ashok Cutkosky, Assistant Professor, Boston University\u00a0Department of Electrical and Computer Engineering<\/h3>\n<p>Date &amp; Time: Thursday, Dec. 7th at 4:30 PM<\/p>\n<p>Location: CDS 1646<\/p>\n<p>Abstract: Training large neural networks requires careful tuning of a large number of so-called &#8220;hyperparameters&#8221; &#8211; improper tuning result in dramatically worse performance. The learning rate is perhaps the most important such hyperparameter, and over the past decades a wide variety of schemes for setting this value have been proposed and gone in and out of popularity. Although the core ideas from many of the more venerable methods (e.g. the Adam optimizer) are directly inspired by theory, in recent years there has been a significant divergence between learning rates used in practice and those suggested by theory. In this talk, I will provide an overview of some recent work that can help shed light on this theory\/practice gap via new analysis of learning rates that not only justifies the heuristics popular in practice, but also provides actionable guidance for improving on these heuristics.<\/p>\n<p>Bio: Ashok Cutkosky is an assistant professor in the ECE department at Boston University. Previously, he was a research scientist at Google, and he earned a PhD in computer science from Stanford University in 2018. He is interested in all aspects of machine learning and stochastic optimization theory. He has worked extensively on optimization algorithms for machine learning that adaptively tune themselves to apriori unknown the statistical properties of their input datasets, as well as on non-convex stochastic optimization.<\/p>\n<p><\/div>\n<\/div>\n<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The CDS Machine Learning Symposium brings together leading scholars in machine learning to delve into the cutting-edge developments and foundational technologies shaping the field of machine learning.<\/p>\n","protected":false},"author":18026,"featured_media":10468,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[245,1],"tags":[379,417,26,378],"_links":{"self":[{"href":"https:\/\/www.bu.edu\/cds-faculty\/wp-json\/wp\/v2\/posts\/10431"}],"collection":[{"href":"https:\/\/www.bu.edu\/cds-faculty\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.bu.edu\/cds-faculty\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/cds-faculty\/wp-json\/wp\/v2\/users\/18026"}],"replies":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/cds-faculty\/wp-json\/wp\/v2\/comments?post=10431"}],"version-history":[{"count":50,"href":"https:\/\/www.bu.edu\/cds-faculty\/wp-json\/wp\/v2\/posts\/10431\/revisions"}],"predecessor-version":[{"id":15657,"href":"https:\/\/www.bu.edu\/cds-faculty\/wp-json\/wp\/v2\/posts\/10431\/revisions\/15657"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/cds-faculty\/wp-json\/wp\/v2\/media\/10468"}],"wp:attachment":[{"href":"https:\/\/www.bu.edu\/cds-faculty\/wp-json\/wp\/v2\/media?parent=10431"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.bu.edu\/cds-faculty\/wp-json\/wp\/v2\/categories?post=10431"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.bu.edu\/cds-faculty\/wp-json\/wp\/v2\/tags?post=10431"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}