CGSW 10.0 Student Research Presentation Abstracts

		Session 1: Autonomous Control
10:30am	10:45am	Wenliang Liu	Learning Robust and Correct Controllers from Signal Temporal Logic Specifications Using BarrierNet We consider the problem of learning a neural network controller for a system required to satisfy a Signal Temporal Logic (STL) specification. We exploit STL quantitative semantics to define a notion of robust satisfaction. Guaranteeing the correctness of a neural network controller is a difficult problem that received a lot of attention recently. We provide a general procedure to construct a set of trainable High Order Control Barrier Functions (HOCBFs) enforcing the satisfaction of formulas in a fragment of STL. We use the BarrierNet, implemented by a differentiable Quadratic Program (dQP) with HOCBF constraints, as the last layer of the neural network controller, to guarantee the satisfaction of the STL formulas. We train the HOCBFs together with other neural network parameters to further improve the robustness of the controller. Simulation results demonstrate that our approach ensures satisfaction and outperforms existing algorithms.
10:45am	11:00am	Mela Coffey	Assessing Reputation to Improve Team Performance in Heterogeneous Multi-Robot Coverage We consider a heterogeneous multi-robot team, where robots are equipped with different capabilities to serve discrete events in an environment. We utilize a heterogeneous coverage control approach to partition the space according to robot capabilities and the estimated probability density, such that each robot is responsible for serving the events in its assigned region. As the team serves events, we assign each robot a reputation, which is then used to adjust the size of a robot’s region, thus adjusting the amount of space a robot serves. Read More To make heterogeneous multi-robot teams more robust in deployment, we cannot assume agents all have equal performance in sensing, actuation, or communication. Suppose we deploy a multi-robot team to visit specific points in an environment and take various measurements or collect samples. It is very possible a ground robot tasked with collecting samples becomes slowed by mud, or the camera of a drone capturing images becomes faulty. Conversely, some robots may perform better than expected, either with variations in design or different wear over time. Accounting for these individual variations will improve the overall team performance. While prior works accommodate instantaneous variations or differences in physical characteristics, we calculate the reputation, which one can generally think of as a history of performance. We consider a heterogeneous team of n robots, i = {1, · · · , n}. Each robot has at least one of m capabilities, j = {1, · · · , m}, to serve events in a convex environment Q ⊂ R2, with points in Q denoted q. We invoke a weighted Voronoi-based coverage control approach (Figure 1) to assign regions of Q to robots, such that each robot is responsible for covering the events in their cell(s). This approach partitions the space into m partitions, assigning robot i to partition Pj if robot i has capability j. The probability that an event at point q will require capability j is represented by the density function ϕj (q). Reputation to Improve Performance. As robots serve events, we assign each robot a reputation based on their ability to meet events. We use this reputation to adjust the weights of each robot cell, ultimately adjusting the amount of area a robot covers. We define the performance pj i of robot i for capability j within a time window as the ratio of the number of events served by robot i of type j, to the total number of events of type j that appear in the robot cell. We then define the reputation memory Mj i for robot i for each of its capabilities j as the set of the N most recent performance metrics pji . We define the reputation rji as the average of the reputation memory Mji. Intuitively, as a robot i serves more events requiring capability j, its reputation associated with capability j increases, and thus the cell size increases. Estimating Event Probability. The density function ϕj (q) represents the probability an event requiring capability j will appear at any point q. Rather than assuming robots have knowledge of event density functions, we estimate the density functions over time based on current and past event appearances. This allows robots to adapt to evolving, stochastic demand while optimally positioning themselves to serve events. Robots estimate the density function ϕj (q) as events occur using kernel density estimation (KDE) to estimate the probability that events will occur in Q. We carried out 25 randomized simulations each of our approach (with weights) and the comparison algorithm (without weights). As a performance metric, we recorded the total number of events met by the end of the trial. We observe our weighted approach outperforms the non-weighted approach, with a statistical significance of p < 0.001 in the scenario where m = 2, and p < 0.05 when m = 3. We therefore conclude that, by adjusting the size of a Voronoi cell based on robot reputation, the number of events the team meets as a whole increases.
11:00am	11:15am	Anni Li	Safe Optimal Interactions Between Automated and Human-Driven Vehicles in Mixed Traffic with Event-Triggered Control Barrier Functions This paper studies safe driving interactions between Human-Driven Vehicles (HDVs) and Connected and Automated Vehicles (CAVs) in mixed traffic where the dynamics and control policies of HDVs are unknown and hard to predict. In order to address this challenge, we employ event-triggered Control Barrier Functions (CBFs) to estimate the HDV model online, construct data-driven and state-feedback safety controllers, and transform constrained optimal control problems for CAVs into a sequence of event-triggered quadratic programs. We show that we can ensure collision-free between HDVs and CAVs and demonstrate the robustness and flexibility of our framework on different types of human drivers in lane-changing scenarios while guaranteeing safety with human-in-the-loop interactions.
11:15am	11:30am	Sabbir Ahmad	Optimal Control of Connected Automated Vehicles with Event-Triggered Control Barrier Functions: a Test Bed for Safe Optimal Merging We address the problem of safely coordinating a network of Connected and Automated Vehicles (CAVs) in conflict areas of a traffic network. Such problems can be solved through a combination of tractable optimal control problems and Control Barrier Functions (CBFs) that guarantee the satisfaction of all constraints. These solutions can be reduced to a sequence of Quadratic Programs (QPs) which are efficiently solved online over discrete time steps. However, guaranteeing the feasibility of the CBF-based QP method within each discretized time interval requires the careful selection of time steps which need to be sufficiently small. This creates computational requirements and communication rates between agents which may limit the controller’s application to real CAVs. We tackle this limitation by adopting an event-triggered control approach for CAVs such that the next QP is triggered by properly defined events with a safety guarantee. We present a laboratory-scale test bed developed to emulate merging roadways using mobile robots as CAVs. We present results to demonstrate how the event-triggered scheme is computationally efficient and can handle measurement uncertainties and noise compared to timedriven control while guaranteeing safety.
		Session 2: Systems and Hardware
11:45am	12:00pm	Arslan Riaz	Ultra-Low Energy Universal Soft-Detection Decoding The inevitable presence of noise or interference in a system exposes the transmitted or stored data to the risk of harmful corruption. To mitigate the loss of valuable information, error correcting codes are widely used. By adding redundant bits, these codes not only enable the integrity of the data to be validated but they also allow to rectify the errors that were introduced during transmission or storage. Researchers have developed diverse codes with unique structures with the goal of facilitating error correction for different applications. Decoding the data at the receiver requires a specialized decoder that implements the sophisticated algorithm in hardware based on the code structure. Hardware realizations for these decoders have been implemented but the hardware is tightly coupled to the code and a unique implementation is necessary for each decoder, making the system less flexible. Read More Recently proposed Guessing Random Additive Noise Decoding (GRAND) has challenged these limitations by introducing a noise-centric approach to decode the data that is entirely independent of the codebook structure. GRAND relies on guessing the noise that affects a transmission rather than using the codebook structure. This feature makes GRAND a universal decoding algorithm that is especially suited for high-rate short-length codes. GRAND eliminates the need of standardization and provides a future-proof decoding solution. In soft-detection channel, in addition to the information bits, the confidence of the receiver in terms of the reliability of bits is provided. In such soft-detection channel scenario, Ordered Reliability Bits Guessing Random Additive Noise Decoding (ORBGRAND) provides the optimal ML decoding. In this work, we will demonstrate ORBGRAND in 40 nm CMOS featuring: 1) ultra-low energy consumption of 0.76 pJ/b and low power consumption of 4.9 mW compared to state-of-the-art softdetection decoders, 2) high fabricated throughput performance of 6.5Gbps while utilizing only a small core area of 0.4 mm2, 3) ability to abandon the sorting of soft information on-the-fly with dynamic clock gating, saving power and energy, while automatically adapting its performance to channel conditions; 4) energyand area-efficient integer partitions architecture for accurate ordered-reliability bit patterns with a Logistic Weight (LW) up to 104 and error correction up to 13-bits; and 5) reconfigurable architecture that supports codeword (CW) lengths between 32 to 256 bits. A demo of the chip will also be presented
12:00pm	12:15pm	Zeynep Ece Kizilates	Interleaved Noise Recycling in a Soft-Detection Scenario Using the ORBGRAND Decoder Real-world communication channels are often subject to correlated noise, degrading the decoding performance of Forward Error Correction (FEC) decoders and diminishing channel capacity. This is especially critical as state-of-the-art decoders are optimized to decode signals affected by independent noise per bit. To mitigate this, interleaving is commonly employed as a standard technique in digital communication channels to break the correlation effect. Interleaving, when combined with the noise recycling technique, allows for the utilization of the correlation information hidden in consecutive signals to improve the channel capacity and decoding performance. Unlike other techniques, interleaved noise recycling is a universal technique suitable with any modulation scheme, or codebook and only requires minimal receiver-side changes. Previous studies have showcased interleaved noise recycling in a hard detection decoding scenario using the Guessing Random Additive Noise Decoding (GRAND) chip, the first integrated universal decoder. This study extends the application of interleaved noise recycling to a soft-detection scenario, employing the Ordered Reliability Bits Guessing Random Additive Noise Decoding (ORBGRAND) chip. A novel dynamic lead channel selection technique with interleaved noise recycling in a soft detection scenario reveals a notable improvement, providing up to a 2 dB gain. Additionally, decoding latency improves by up to 80X, while decoding energy consumption reduces by up to 75X when the correlation in the noise is dominant.
12:15pm	12:30pm	Juan Pacheco Garcia	A Comparison of Mechanics Simplifications in Pose Estimation for Thermally-Actuated Soft Robot Limbs Soft robots have experienced limited use outside of controlled environments where their pose in space is difficult to estimate, a problem that is particularly worsened when actuated by smart thermomelectric materials with difficult-to-model mechanics. In order to estimate the pose of soft robots in real-world settings while being computationally practical, this work presents a comparative study of assumptions and simplifications made on a model of a soft robot. To do so, this article represents a planar soft robot arm, shown in Fig. 1, as a discretized many-link rigid arm, shown in Fig. 2, mapping material stiffness and actuator states to torques at the robot’s joints. Read More Four different sets of assumptions are proposed for these mappings, varying in how stiffness is distributed throughout the arm, as well the linearity and/or hystersis of the actuator torques. We demonstrate how to calibrate each model from experimental data in a soft arm powered by shape memory alloy (SMA) wires that contract via Joule heating. Then, we perform hardware tests to predict the robot’s pose in open-loop, using only actuator temperature, and compare model performance under each simplifying assumption. Through our experimentation, we discovered that we cannot assume the tested soft system can be described with a homogeneous bending stiffness even though the material of the system is homogeneous. Further testing demonstrated that calculating different torsional spring constants for different discretized links resulted in the most accurate estimations. Experimentation also demonstrated the hysteric behavior of the SMA’s and their dependency on temperature and torsional spring constants. Future work plans to adapt the system from 2D to 3D. Furthermore, this model will be integrated with closed loop feedback control to enable real time responses to human and environmental contact. Using only temperature sensing to estimate the robot pose allows other forms of sensing, such as the bending angle, to be used to discern deviations from the predicted pose. This improves the detection of not just environmental contact, but also provides information about the nature of the contact, which in turn can inform higher level decision making and path planning.
12:30pm	12:45pm	Shashwath Bharadwaj	Mitigating Misattributions in Single-Photon Detector Arrays with Row-Column Readouts Single-photon detector arrays have been of increasing interest in recent years in applications such as lidar, remote sensing, and quantum optics. State-of-the-art single-photon detectors use superconducting nanowires due to advantages like near-unity quantum efficiency and low dark counts. However, scaling these arrays to kilopixel and megapixel resolutions has been challenging due to requirements of cryogenic cooling and complex readout mechanisms. Recent efforts to implement large-scale arrays of superconducting nanowire single photon detectors (SNSPDs) involve the use of row-column readout architecture where each row and column is readout in a separate line instead of each pixel individually. While this mechanism reduces the required number of readout lines, it imposes a limitation on the incident photon flux for unambiguous signal reconstruction. When multiple photons are incident on the array within the period of a single readout, the spatial locations of their incidences become ambiguous since the readouts only provide a set of possible pixels where incidences could have occurred. Traditional signal reconstruction techniques assume that photons were incident at each candidate pixel and hence introduce misattributions in the reconstructed image. A simple workaround to avoid misattributions is to only use readouts with a single detected photon for reconstruction. However, this results in the underutilization of measured data, especially in high flux conditions where the number of readout frames with multiple detected photons is high. Read More To overcome this limitation, this work introduces a novel method to resolve up to 4-photon coincidences in single-photon detector arrays with row-column readouts. By utilizing unambiguous measurements to estimate probabilities of detection at each pixel, ambiguous multiphoton counts are redistributed among the candidate spatial locations to achieve a reduction in the mean-squared error (MSE) of the reconstruction compared to conventional techniques. With our method, we show that arrays can be operated at high incident photon fluxes, while simultaneously achieving an increase in the peak signal-to-noise ratio (PSNR) between 4-10 dB compared to previous methods. The application of this method to imaging natural scenes is demonstrated using Monte Carlo experiments in Fig 1. The naïve reconstruction introduces horizontal and vertical streaks due to multiphoton coincidences. While the single-photon reconstruction is free of these artifacts, the reconstructed image is noisy. The multiphoton reconstruction (our method) achieves a PSNR improvement between 6-10 dB compared to the naïve reconstruction and 4-6 dB compared to the singlephoton reconstruction and is free from multiphoton artifacts. Note that these results were obtained at the same mean photons per frame (PPF) value of 3. The change in the performance of these methods with variation in incident photon flux is shown in Fig 2. The multiphoton estimator achieves the lowest MSE at a PPF of ~1.4. In contrast, the lowest MSE for the singlephoton and naïve estimators are achieved at PPF values of ~1.1 and 0.35 respectively. This shows that in addition to producing better reconstructions, our method allows imaging at higher incident photon fluxes as compared to conventional reconstruction techniques.
		Session 3: Systems Fundamentals
2:15pm	2:30pm	Shiwen Yang	Attractor-Based Coevolving Dot Product Random Graph Model(ABCDPRGM) Explore the dynamics of polarization and coalescence with our ABCDPRGM. This innovative framework based on random dot product graph(RDPG) captures the interplay between attractor-driven forces and coevolutionary processes in dynamic networks. Our model reveals how entities polarize or coalesce over time, offering valuable insights into complex systems. We also present inference methods tailored for the parameters of this model. By leveraging properties of RDPG, these methods enable the extraction of meaningful insights from observed network behaviors with lightening speed, thus enhancing our understanding and predictive capabilities of such dynamic networks.
2:30pm	2:45pm	Danyang Li	Inference and Prediction with Neural Networks Based on Temporal Logic In recent years, machine learning techniques using neural networks have achieved great success in a wide range of fields, with a variety of architectures such as convolutional, recurrent, etc. However, the common drawback of neural networks is the lack of human-interpretability of the models. This property is important in robotics research, particularly in ensuring safety guarantees, facilitating human-robot interactions and advancing control strategies. Temporal logic tools have attracted lots of attention due to the rich expressiveness of temporal logic formulae. Signal Temporal Logic (STL) is an expressive formal language to describe spatio-temporal properties that are understandable to humans. It facilitates both qualitative and quantitative analyses, enabling one to not only understand the underlying mechanisms qualitatively but also to conduct quantitative assessments. Read More STL inference is the process of providing formal descriptions of system behaviors from observed data in the form of STL formulae. We present a neural-symbolic framework designed for the inference and prediction of time-series data. The neurons of the network can be translated into logic propositions, allowing the entire network to be transferred into an STL formula. The output of the neural network is a robustness degree of the STL formula over the input signal. This degree quantifies the satisfaction or violation of the signal concerning the STL formula translated by the neural network. The training process involves optimizing the neural network parameters to achieve a robustness degree that satisfies certain predefined requirements. However, the computation of the robustness degree includes multiple non-differentiable operations. To address the challenge of non-differentiability within STL, we propose various approximation methods to align with gradient based methods. Importantly, this adaptation does not compromise the qualitative guarantees. The proposed neural network is a template-free model that can dynamically adapt to diverse structures without fixing specific elements of the temporal logic formula. Our overarching goal is to offer a versatile tool that can not only serve as an interpretable neural network but also integrate seamlessly with other machine learning techniques. The interpretable nature is crucial for applications where understanding the decision-making process is essential, such as in autonomous vehicles. This work has achieved great success in classification problems, including binary classification and attribute-based multi-class classification. For binary classification, our neural network classifies desired and undesired behaviors such that the desired behaviors satisfy the inferred STL formula, while undesired ones violate it. For multi-class classification, our neural network extends its capabilities to classify signals based on specified attributes. These attributes denote shared features among signals across various classes. Furthermore, it can also contribute to the design of controllers by inferring temporal logic formulae from observed system trajectories. This inference process aims to discern specifications governing the system’s behavior. For future work, our research can be extended to predict trajectories that adhere to the identified temporal logic specifications. Keywords— Machine Learning, Neural Network, Formal Methods, Temporal Logic, Classification
2:45pm	3:00pm	Liangting Wu	IKSPARK: A Robot Inverse Kinematics Solver Inverse kinematics (IK) is an important problem in robot control and motion planning; however, the nonlinearity of the map from joint angles to robot configurations makes the problem nonconvex. In this work, we propose an inverse kinematics solver named IKSPARK (Inverse Kinematics using Semidefinite Programming And RanK minimization). IKSPARK works in the space of rotation matrices of the link reference frames rather than joint angles, allowing it to incorporate convex constraints for a variety of kinematic constraints such as spherical or revolute joints with angle limits, prismatic joints, and open/closed kinematic chains. To overcome the nonlinearity of the manifold of rotation matrices SO(3), we propose a semidefinite programming (SDP) relaxation of the kinematic constraints followed by a rank minimization via maximization of a convex function. Along the way, we show that the feasible set of an IK problem is exactly the intersection of a convex set and rank-1 matrices. Our algorithm to obtain rank-1 solutions has guaranteed local convergence. Unlike some traditional solvers, IKSPARK does not require an initial guess, and can be applied to robots with complex structures. Compared to other work that performs SDP relaxation for IK problems, our formulation is simpler, and uses variables with smaller sizes. We validate our approach via simulations on different robots, comparing against a standard IK method.
3:00pm	3:15pm	Brennan Brodt	Gathering Data from Risky Situations with Pareto-Optimal Trajectories This paper proposes a formulation for the risk-aware path planning problem which utilizes multi-objective optimization to dynamically plan trajectories that satisfy multiple complex mission specifications. In the setting of persistent monitoring, we develop a method for representing environmental information and risk in a way that allows for local sampling to generate Pareto-dominant solutions over a receding horizon. We propose two algorithms capable of solving these problems: a dense sampling approach and an improved method utilizing noisy gradient descent. Simulation results demonstrate the efficacy of our methods at persistently gathering information while avoiding risk, robust to randomly-generated environments.
		Session 4: Applications of Machine Learning
3:30pm	3:45pm	Ruizhao Zhu	Learning to Drive Anywhere Human drivers can seamlessly adapt their driving decisions across geographical locations with diverse conditions and rules of the road, \eg, left vs. right-hand traffic. In contrast, existing models for autonomous driving have been thus far only deployed within restricted operational domains, i.e., without accounting for varying driving behaviors across locations or model scalability. In this work, we propose Learning to Drive Anywhere (AnyD), a single geographically-aware conditional imitation learning (CIL) model that can efficiently learn from heterogeneous and globally distributed data with dynamic environmental, traffic, and social characteristics. Our key insight is to introduce a high-capacity geo-location-based channel attention mechanism that effectively adapts to local nuances while also flexibly modeling similarities among regions in a data-driven manner. By optimizing a contrastive imitation objective, our proposed approach can efficiently scale across the inherently imbalanced data distributions and locationdependent events. We demonstrate the benefits of our AnyD agent across multiple datasets, cities, and scalable deployment paradigms, i.e., centralized, semi-supervised, and distributed agent training. Specifically, AnyD outperforms CIL baselines by over 14% in open-loop evaluation and 30% in closed-loop testing on CARLA.
3:45pm	4:00pm	Yingqing Chen	Scalable Adaptive Traffic Light Control Over a Traffic Network Including Turns, Transit Delays, and Blocking We develop adaptive data-driven traffic light controllers for a grid-like traffic network considering straight, left-turn, and right-turn traffic flows. The analysis incorporates transit delays and blocking effects on vehicle movements between neighboring intersections. Using a stochastic hybrid system model with parametric traffic light controllers, we use Infinitesimal Perturbation Analysis (IPA) to derive a data-driven cost gradient estimator with respect to these parameters. We then iteratively adjust them through an online gradient-based algorithm to improve mean vehicle waiting times. By integrating a flexible modeling framework to represent diverse intersection and traffic network configuration with an event-driven, IPAbased adaptive controller, we propose a general scalable, adaptive framework for real-time traffic light control in multi-intersection traffic networks.
4:00pm	4:15pm	Monan Ma	Discovering Nonlinear Governing Laws in Nano-Electro-Mechanical Systems (NEMS) Using Symbolic Regression Consider an Olympic gymnast performing a routine on a suspended beam. The routine may be complicated, but the physics of the beam is straightforward: a force causes the beam to bend. If one can measure the bending curvature of the beam, then one can infer something about the force and mass of the athlete. If the aforementioned scenario is to be miniaturized by a factor of roughly 10 million, one would get a nano-beam, which is a nanomechanical sensor capable of detecting equally small and lively nanoscale phenomena, such as the motion of bacteria when exposed to antibiotics, mass of a coronavirus, or presence of a single molecule of a flammable gas, such as methane, in air. These nano-beams-based devices are broadly referred to as nano-electro-mechanical systems (NEMS) and have applications in precision metrology, cancer research, quantum computing and nanotechnology. Read More Continuing the macroscale analogy above, if the gymnast doubles their force, the beam will bend twice as much. This is a response of a linear system. However, nano-beams and NEMS generally do not behave linearly. Due to many contributing factors in real-world situations, such as geometric, material and dissipation-related nonlinear effects, nanoscale sensors are governed by more complicated, nonlinear laws that remain elusive. However, they are important for enabling precise control and characterization of these devices in order to make them a reliable medical device or a viable commercial product. To discover these nonlinear governing laws, I implement a machine-learning approach called sparse identification of nonlinear dynamics (SINDy), which is based on the thresholded linear regression model that is able to recover a symbolic form of a time series data. Briefly, the algorithm works as follows. I first build a Michelson interferometer that allows me to take ultrasensitive measurements of a NEMS device under known perturbations in the time domain. This is an experimental step. Then, I define a search space of potential candidates for a system of nonlinear governing equations. I sparsify the system while maintaining fit accuracy and eventually converge to an interpretable system of nonlinear differential equations that gives rise to my experimental data set. This is the machine learning step. To confirm the results, I introduce Pareto front analysis for leveraging model simplicity and fit accuracy, and utilize established analytical expressions to guide the machine learning process. In a broader perspective, this cross-disciplinary, physics-informed approach is potentially applicable to many nonlinear systems beyond nanoscale systems, such as climate change, disease modeling and even fluctuations in stock markets.
		Session 5: Efficiency and Security in Cloud and HPC
4:30pm	4:45pm	Zongshun Zhang	PraxiPaaS: An Efficient PaaS Cluster Container Package Discovery Framework Using Machine Learning Due to the increasing complexity of cloud architectures, automatically tracking and inspecting container packages in Platform-as-a-Service (PaaS) clusters are challenging tasks. This introspection capability, however, is critical to identify vulnerable packages and compile an accurate Software Bill of Materials (SBOM). For example, with SBOM, company cluster admins can restrict employees from using non-commercial licensing tools in the production environment and discover outdated packages containing known vulnerabilities. Read More Motivated by introspection frameworks focusing on virtual machine (VM) settings and ML methods for software discovery, we design PraxiPaaS as a framework to inspect PaaS container images with a highly scalable ML inference pipeline by scanning file changes during package installations. Our ML model adopts a discovery-byexample approach, utilizing container layer file system changes as package fingerprints, matching them with trained ones collected from test containers. The ML pipeline includes a bagging of word2vec encoders and a corresponding bagging of ML models, i.e., XGBoost. Specifically, we modify the bagging method to train each submodel to recognize a fixed number of packages. This design allows optimizing the balance between incremental training time and F1 score by configuring the number of packages attributed to each submodel. Our evaluation shows the bagging of models provides an exponential drop in incremental training time from 30 minutes to 0.5s with 32 CPU cores while maintaining an F1 score of 0.85 using 10 package labels per model across different datasets, compared with the previously used single ML model design. We deploy a prototype of PraxiPaaS in the New England Research Cloud (NERC) OpenShift test cluster as well. Keywords— Software Discovery, PaaS, XGBoost, Bagging, Incremental Training
4:45pm	5:00pm	Efe Sencan	Machine Learning-Based Analytics for Automated Computing System Management High-Performance Computing (HPC) systems play a crucial role in advancing scientific discovery and societal developments, thanks to their capacity to execute calculations reaching the quintillion scale. However, performance variations in these systems, often caused by network contention, hardware issues, and shared resource conflicts, adversely affect energy and power efficiency, leading to higher operational costs. To address these challenges, HPC systems use advanced monitoring frameworks that track extensive numeric multivariate time-series telemetry data for application resource usage. Given the scale and complexity of this data, manual analysis is impractical, positioning machine learning (ML) as an essential tool for automated performance analysis. Yet, the lack of labeled data presents a significant challenge in effectively training ML models. Our research addresses this by reducing the reliance on labeled data while maintaining high detection and diagnosis accuracy. In line with this, we have developed two frameworks: ALBADross, which minimizes the need for labeled samples in diagnosing anomalies, and Prodigy, an unsupervised anomaly detection framework designed for easy integration with existing monitoring systems. Read More ALBADross is a novel active learning-based framework that diagnoses previously encountered performance anomalies in HPC systems using significantly fewer labeled samples compared to state-of-the-art ML-based frameworks. Our framework combines an active learning-based query strategy and a supervised classifier to minimize the number of labeled samples required to achieve a target anomaly diagnosis score. We evaluate our framework on a production HPC system and a testbed HPC cluster using real and proxy applications. We show that our framework, ALBADross, achieves a 0.95 F1-score using 28x fewer labeled samples compared to a supervised approach with equal F1-score, even when there are previously unseen applications and application inputs in the test dataset. To further reduce the reliance on labeled anomalous data and demonstrate practical deployment, we introduce Prodigy, a variational autoencoder-based anomaly detection framework. Prodigy outperforms the state-of-the-art alternatives by achieving a 0.95 F1-score when detecting performance anomalies. We also provide a real system implementation of Prodigy that enables seamless integration with monitoring frameworks and facilitates swift deployment in real-world settings. We deploy Prodigy on a production HPC system and demonstrate 88% accuracy in detecting anomalies. Prodigy involves an interface to provide job- and node-level analysis and explanations for anomaly predictions.
5:00pm	5:15pm	Chathura Rajapaksha	IOMMU Deferred Invalidation Vulnerability: Exploit and Defense Modern computing systems are complex with many Input/Output (IO) devices such as Network Interface Cards (NICs) and accelerators connected to them. Direct Memory Access (DMA) for IO devices was introduced as a performance optimization a few decades ago, allowing IO devices to access system memory directly without involving the Central Processing Unit (CPU). However, DMA introduces a security vulnerability as IO devices are given direct access to system memory, exposing privileged data to potentially malicious IO devices. The attacks that exploit the DMA feature are known as DMA attacks. Modern systems are equipped with an IO Memory Management Unit (IOMMU) to mitigate DMA attacks. In the presence of an IOMMU, IO devices perform DMA using IO Virtual Addresses (IOVAs), which are translated to Physical Addresses (PAs) by the IOMMU using OS-maintained IO page tables. The IOMMU uses the IO page table entries to verify the read/write permission of each memory access, constraining each DMA request to an approved region by the OS. Furthermore, the IOMMU contains an IO Translation Lookaside Buffer (IOTLB) to cache recently translated IOVA-to-PA mappings. Read More IOMMU protection comes at the cost of reduced throughput in IO-intensive workloads, mainly due to the high IOTLB invalidation latency. The Linux OS eliminates this bottleneck by deferring the IOTLB invalidation requests to a later time. This opens a vulnerability window during which a memory region is unmapped but the relevant IOTLB entry remains. In this work, we present a proof-of-concept exploit, empirically demonstrating that a malicious DMA-capable device can use this vulnerability window to leak data used by other IO devices. To address the need for secure, low-overhead DMA operations for high-throughput IO workloads (e.g., multi-threaded writes to NVMe storage, 200 Gbps network), we propose a hardware-assisted mitigation for the deferred invalidation vulnerability. The proposed mitigation improves the overall IO throughput compared to strict invalidation while providing the same security guarantee by preventing the reuse of stale IOTLB entries until the IOTLB is invalidated. The proposed mitigation for deferred invalidation vulnerability works at the IOMMU hardware level, making it compatible with any DMA operation. We demonstrate the proof-of-concept exploit on an x86 machine emulated by QEMU. The emulated machine contains two IO devices — a storage device and an Intel 82574L NIC — and an Intel IOMMU. We show that the malicious NIC can take advantage of the deferred invalidation of its unmapped memory to access data that was read from the storage device. We also implement and evaluate the proposed mitigation within the QEMU Intel IOMMU implementation using the same QEMU setup we used for the exploit. The proposed mitigation achieves 12.7% higher throughput compared to the strict invalidation mode while providing the same security guarantees. We chose QEMU for the implementation and evaluation because support for an IOMMU is not available in open-source hardware. Additionally, we demonstrate that the variations in the average network throughput of QEMU in different IOMMU modes follow the same trend as that of a real hardware system. Keywords: IOMMU, DMA attacks, IOTLB, deferred invalidation

CGSW 10.0 Student Research Presentation Abstracts

Session 1: Autonomous Control

Read More

Session 2: Systems and Hardware

Read More

Read More

Read More

Session 3: Systems Fundamentals

Read More

Session 4: Applications of Machine Learning

Read More

Session 5: Efficiency and Security in Cloud and HPC

Read More

Read More

Read More