ECE PhD Prospectus Defense: Shining Yang

  • Starts: 2:00 pm on Monday, June 1, 2026
  • Ends: 4:00 pm on Monday, June 1, 2026

ECE PhD Prospectus Defense: Shining Yang

Title: SmartNIC-Based System with Software-Hardware Co-Design for Machine Learning Applications

Presenter: Shining Yang

Advisor: Professor Martin Herbordt

Chair: Professor Tali Moreshet

Committee: Professor Martin Herbordt, Professor Richard Brower, Professor Tali Moreshet, Professor Richard West

Google Scholar Link: https://scholar.google.com/citations?user=cm-EF2EAAAAJ&hl=en

Abstract: Most AI training and a growing fraction of AI inference are performed on large-scale GPU-based distributed systems. As AI models continue their rapid evolution, with model size and architectural complexity growing at an unprecedented rate, inefficiencies in these systems are increasingly being exposed. Bottlenecks arise not only from insufficient hardware resources but also from dead time resulting from suboptimal component interactions. One such bottleneck addressed in this dissertation is limited GPU memory capacity. A common approach to alleviating this limitation is to use CPUs, which provide substantially larger memory capacity, as offload engines. While this strategy reduces GPU memory pressure, it introduces new challenges: performance may become limited by constrained data-transfer bandwidth and the increasing complexity of coordinating computation across heterogeneous devices.

Our thesis is that system-level inefficiencies in large-scale distributed AI systems can be significantly reduced by integrating SmartNICs as active orchestration components that coordinate communication, data movement, and synchronization across heterogeneous devices. A SmartNIC is a programmable network interface device that integrates on-board processing capability with high-speed networking. By providing programmable packet processing and direct access to host and device memory, SmartNICs create new opportunities to restructure communication paths and execution pipelines. But rather than treating SmartNICs as isolated accelerators, we view them as part of a broader system co-design space that includes in-network computation, communication coordination, and multi-device cooperation.

This dissertation investigates SmartNIC-based mechanisms that offload communication and controlling tasks from host CPUs and GPUs to SmartNICs. These mechanisms aim to reduce GPU-side overhead, improve data-movement and control efficiency, and increase communication–computation overlap in distributed AI systems. The potential impact of this architecture is a reduction in the hardware scale required to support large AI model workloads, enabling researchers and smaller organizations without access to massive GPU clusters to run such workloads.

Preliminary work for this thesis investigates how SmartNIC-GPU-CPU (SGC) system designs can improve the efficiency of distributed training for both dense and sparse AI models. Results show that the proposed SGC system achieves up to 1.6x throughput improvement for dense model training. For MoE models (sparse model), it achieves up to 1.4x latency reduction and 1.6x throughput improvement.

Location:
PHO 339

Back to Calendar