Weekly Algorithms and Theory seminars will be held on **Mondays from 1:30-2:30pm**. The seminar is held simultaneously in person and over Zoom. Talks are not recorded unless requested by the speaker. Lunch will be served at **12:45pm** for people who attend in-person.

We provide a Google Calendar link for the seminar and other BU TCS events in case some people find it useful.

**Student Organizers: **Ephraim Linder, Fabian Spaeh, Connor Wagaman**Faculty Organizer: **Krzysztof Onak

## Semester Summary

Date |
Speaker |
Talk Title |

September 11 | – | Theory Lunch |

September 18 | Sidhanth Mohanty | High-dimensional expansion in random geometric graphs |

September 25 | Bailey Flanigan | Mitigating manipulation incentives in citizens’ assembly selection |

October 5 | Dragos Ristache | Testing Connectedness of Images |

October 10 | Margalit Glasgow | Characterizing γ-Regret in Structured Bandit Problems |

October 16 | Vladimir Podolskii | One-Way Communication Complexity of Partial XOR Functions |

October 23 | Freddy Reiber | Matching markets 101 plus some more |

October 30 | Zihan Tan | On (1+ε)-Approximate Flow Sparsifiers |

November 6 | Sandeep Silwal | Data structures for density estimation |

November 13 | Andrea Lincoln | An Introduction to Fine-Grained Complexity + Some Open Problems |

November 20 | Ludmila Glinskih | Branching Program Complexity of Minimization Problems |

November 27 | Sitan Chen | Theoretical foundations for diffusion models |

December 5 | Howard Straubing | An Old Problem in Automata and Logic |

UP NEXT

**Date:** December 5 2023

**Speaker:** Howard Straubing

**Title: **An Old Problem in Automata and Logic

**Abstract: **I will survey some recent (and some not-so-recent) progress on an old problem: In the 1990’s, it was shown (Barrington, Compton , Straubing, and Théren) that the regular languages definable by first-order sentences with no restrictions on the numerical predicates (i.e., the atomic formulas giving relations on the positions in a string) could be defined by sentences in which all these relations could themselves be computed by finite automata (the regular numerical predicates). More succinctly, regular languages require only regular numerical predicates in their logical definitions, proving a conjecture originally made by Robert McNaughton in 1960.

This simple-sounding characterization of first-order definable regular languages had to wait for advances in circuit complexity in order to be solved: it was finally proved by appeal to the theorem of Furst, Saxe and Sipser showing that the circuit complexity class AC0 cannot count the number of 1’s in an input string modulo any n>1. This led to a number of questions, which for the most part remain unsolved: First, does the property hold for logics other than first-order logic? For example, does it continue to hold for Sigma-k formulas, or for Boolean combinations of Sigma-k formulas, or for generalized first-order sentences containing modular counting quantifiers? Second, a somewhat more vague question: is there an ‘elementary’ proof of this fact about logic and automata, one that does not depend on circuit lower bounds? Such a proof for modular quantifiers would settle a long-standing open question about the circuit complexity class ACC. Much of the talk will be devoted to a 2020 paper (Borlido, Gehrke, Krebs, Straubing) giving such an elementary proof of the property for Boolean combinations of Sigma-1 sentences. I will also mention recent progress (Barloy, Cadilhac, Papeman, Zeume, LICS 2022) on establishing the property for Sigma-2 sentences, and older work by Hansen and Koucky suggesting that things are very differeent for modular quantifiers.

EARLIER THIS SEMESTER

**Date:** November 27 2023

**Speaker:** Sitan Chen

**Title: **Theoretical foundations for diffusion models

**Abstract: **I will describe recent progress on providing rigorous convergence guarantees for score-based generative models (SGMs) such as denoising diffusion probabilistic models (DDPMs), which constitute the backbone of large-scale real-world generative models such as DALL⋅E 2. In the first part of the talk, I will show that such SGMs can efficiently sample from essentially any realistic data distribution, even ones which are highly non-log-concave. In the second part of the talk, I will show how to extend these guarantees to *deterministic *samplers (e.g. DDIMs) based on discretizing the so-called probability flow ODE, which ultimately leads to faster convergence. All of these results assume access to an oracle for score estimation; time permitting, at the end I will briefly touch upon how to provably implement this oracle for interesting classes of distributions like Gaussian mixtures.

**Date:** November 20 2023

**Speaker:** Ludmila Glinskih

**Title: **Branching Program Complexity of Minimization Problems

**Abstract: **In this talk I will show that every read-once nondeterministic branching program computing the Minimum Circuit Size Problem (MCSP) on inputs of length N has size Omega(N^{loglog(N)}). This is the first superpolynomial lower bound on the size of 1-NBP computing MCSP. This lower bound is tight for the version of MCSP restricted to a linear circuit size parameter. To show this result we adapt a conditional lower bound of Ilango on the deterministic Turing Machine time complexity of computing partial-MCSP, the generalization of MCSP to partial functions. Nevertheless, our lower bound is unconditional and holds even for the total MCSP function.

**Date:** November 13 2023

**Speaker:** Andrea Lincoln

**Title: **An Introduction to Fine-Grained Complexity + Some Open Problems

**Abstract: **Do you want to know more about fine-grained complexity? Do you want to hear about the kinds of hypotheses we make, the reduction techniques, and the fun results? If yes, this is the talk for you! I will also give some open problems of varying levels of difficulty at the end of the talk.

**Date:** November 6 2023

**Speaker:** Sandeep Silwal

**Title: **Data structures for density estimation

**Abstract: **The talk is motivated by the question: how efficiently can we search distributions? The problem is modeled as follows: we are given knowledge of k discrete distributions v_i for 1 <= i <= k over the domain [n] = {1,…,n} which we can preprocess. Then we get samples from an unknown discrete distribution p, also over [n]. The goal is to output the closest distribution to p among the v_i’s in TV distance (up to some small additive error). The state of the art algorithms require Theta(log k) samples and run in near linear time.

We introduce a fresh perspective on the problem and ask if we can output the closest distribution in **sublinear** time. This question is particularly motivated as the problem is a generalization of the traditional nearest neighbor search problem: if we take enough samples, we can learn p explicitly up to low TV distance and then find the closest v_i in o(k) time using standard neast neighbor search algorithms.

However, this approach requires Omega(n) samples. Thus, it is natural to ask: can we obtain both sublinear number of samples and sublinear query time? We present some nice progress towards this question.

This is joint work with Anders Aamand, Alex Andoni, Justin Chen, Piotr Indyk, and Shyam Narayanan, and appeared in ICML 23.

**Date:** October 30 2023

**Speaker:** Zihan Tan

**Title: **On (1+ε)-Approximate Flow Sparsifiers

**Abstract: **Given a large graph G with a subset |T|=k of its vertices called terminals, a quality-q flow sparsifier is a small graph H that contains T and preserves all multicommodity flows that can be routed between terminals in T, to within factor q. The problem of constructing flow sparsifiers with good (small) quality and (small) size has been a central problem in graph compression for decades.

A natural approach of constructing O(1)-quality flow sparsifiers, which was adopted in most previous constructions, is contraction. Andoni, Krauthgamer, and Gupta constructed a sketch of size f(k,ε) that stores all feasible multicommodity flows up to factor (1+ε), raised the question of constructing quality-(1+ε) flow sparsifiers whose size only depends on k,\eps (but not the number of vertices in the input graph G), and proposed a contraction-based framework towards it using their sketch result. In this paper, we settle their question for contraction-based flow sparsifiers, by showing that quality-(1+ε) contraction-based flow sparsifiers with size f(ε) exist for all 5-terminal graphs, but not all 6-terminal graphs. Our hardness result on 6-terminal graphs improves upon a recent hardness result by Krauthgamer and Mosenzon on exact (quality-1) flow sparsifiers, for contraction-based constructions. Our construction and proof utilize the notion of tight span in metric geometry.

This talk is based on joint work with Yu Chen.

**Date:** October 23 2023

**Speaker:** Freddy Reiber

**Title: **Matching Market Design 101

**Abstract: **Matching markets are an increasingly common type of combinatorial market, which, unlike traditional markets, take into account the different preferences that participants in a market may have. Common examples of matching markets include housing and labor markets.

In this talk, we discuss the fundamental results and associated proofs for markets in which agents have the ability to transfer utility or money. Fundamental to this is the Shapley and Shubik result that cooperation is maintainable in the bipartite case. We present this result, as well as a one sided truthful mechanism to naturally find such outcomes. Next, we provide characterizations about the set of stable outcomes, including its lattice formation and a fundamental proof about the maximum payment an agent can receive in any stable outcome. We then close (if time allows) with a discussion on problems of investment in matching markets.

While matching market design is an interdisciplinary field, the talk is designed for a general theoretical computer science audience. No econ knowledge is assumed, although familiarity with standard combinatorial optimization techniques like LP-Duality is.

**Date:** October 16 2023

**Speaker:** Vladimir Podolskii

**Title: **One-Way Communication Complexity of Partial XOR Functions

**Abstract: **Boolean function F(x,y), where x and y are n-bit boolean vectors is an XOR function if F(x,y) = f(x + y) for some function f on n input bits, where x+y is a bitwise XOR. XOR functions are relevant in communication complexity, partially for allowing Fourier analytic techniques. Deterministic communication complexity of total XOR functions is relatively well understood. Montanaro and Osbourne (2009) observed that one-sided communication complexity of F is exactly equal to nonadaptive parity decision tree complexity of f. Hatami et al. (2018) showed that unrestricted communication complexity of F is polynomially related to parity decision tree complexity of f.

In this talk we discuss the connection between communication complexity of XOR functions and parity decision trees for partial functions. We show that in case of one-sided communication complexity, whether these measures are equal, depends on the number of undefined inputs of f. More precisely, if the number of undefined points of f is small, then one-sided communication complexity of F is still equal to nonadaptive parity decision tree complexity of f. At the same time, we provide a family of examples of partial functions f with a larger number of undefined points, such that these measures are not equal. For certain values of parameters we get an exponential gap. Our separation results translate to the case of two-sided communication complexity as well.

**Date:** October 10 2023

**Speaker:** Margalit Glasgow

**Title: **Characterizing γ-Regret in Structured Bandit Problems

**Abstract: **We give a statistical characterization of the γ-regret for arbitrary structured bandit problems, the regret which arises when comparing against a benchmark that is γ times the optimal solution. The γ-regret emerges in structured bandit problems over a function class F where finding an exact optimum of f ∈ F is intractable. Our characterization is given in terms of the γ-DEC, a statistical complexity parameter for the class F, which is a modification of the constrained Decision-Estimation Coefficient (DEC) ofFoster et al. [2023]. This complexity measure can be viewed as an analog of the VC dimension (which governs the statistical complexity of PAC learning) for interactive learning. Our lower bound shows that the γ-DEC is a fundamental limit for any model class F: for any algorithm, there exists some f ∈ F for which the γ-regret of that algorithm scales (nearly) with the γ-DEC of F. We provide an upper bound showing that there exists an algorithm attaining a nearly matching γ-regret.

**Date:** October 5 2023

**Speaker:** Dragos Ristache

**Title: **Testing Connectedness of Images

**Abstract: **We investigate algorithms for testing whether an image is connected. Given a proximity parameter ε∈(0, 1) and query access to a black-and-white image represented by an n× n matrix of Boolean pixel values, a (1-sided error) connectedness tester accepts if the image is connected and rejects with probability at least 2/3 if the image is ε-far from connected. We show that connectedness can be tested nonadaptively with O (1/ε²) queries and adaptively with O (1/ε^{3/2}√{log1/ε}) queries. The best connectedness tester to date, by Berman, Raskhodnikova, and Yaroslavtsev (STOC 2014) had query complexity O (1/ε² log 1/ε) and was adaptive. We also prove that every nonadaptive, 1-sided error tester for connectedness must make Ω (1/ε log 1/ε) queries.

**Date:** September 25 2023

**Speaker:** Bailey Flanigan

**Title: **Mitigating manipulation incentives in citizens’ assembly selection

**Abstract: **Citizens’ assemblies are a rapidly-emerging paradigm for making democratic decisions. In a citizens’ assembly, a randomly-chosenpanel of volunteers convenes to learn about a political issue, deliberate, and then come to a policy recommendation*. *Despite a recent flurry of research on algorithms for selecting assembly participants, a key property of these algorithms remains studied: their *manipulability.*In particular, the worry is that because these algorithms must satisfy demographic representation constraints according to volunteers’*self-reported*attributes, volunteers may be incentivized to*misreport*these attributes to increase their chance of being selected, decrease someone else’s chance, and/or increase the expected number of seats given to their own group.

*Manipulation-robust selection of citizens’ assemblies,*we study the extent to which agents can achieve these goals under different selection algorithms. Strikingly, we first show that

*Leximin*— an algorithm that is widely used in practice for its fairness — is highly manipulable, permitting dishonest individuals to increase their chance of selection to 1, and permitting relatively small coalitions to steal half of all available assembly seats. We show the same result for

*Nash Welfare,*another algorithm that is widely available for use. In light of these algorithms’ high manipulability, we introduce a new class of selection algorithms based on ℓ

*p*norms. We show that the manipulability of the ℓ

*p*norm-based algorithm decreases as O(1/n^{1-1/p}) as the number of volunteers

*n*grows, approaching the optimal rate of O(1/

*n*) as

*p*grows large. These bounds are confirmed via experiments in eight real-world datasets.

**Date:** September 18 2023

**Speaker:** Sidhanth Mohanty

**Title: **High-dimensional expansion in random geometric graphs

**Abstract: **High-dimensional expansion is a generalization of expansion to hypergraphs where the expansion of certain random walks are witnessed by local neighborhoods.

This talk is on the topic of constructing natural probability distributions over sparse high-dimensional expanders (HDXes). On one hand, (standard/1-dimensional) sparse expanders are plentiful, since a constant degree random graph is an expander with high probability. On the other hand, most natural random models over hypergraphs fail to exhibit 2-dimensional expansion unless the average degree exceeds sqrt(number of vertices).