Current Projects


3D Stacked Systems for Enabling Low-Power High-Performance Computing

3D stacking is an attractive method for designing high-performance chips as it provides high transistor integration densities, improves manufacturing yield due to smaller chip area, reduces wire-length and capacitance, and enables heterogeneous integration of different technologies on the same chip. 3D stacking, however, comes with several key challenges such as higher on-chip temperatures and lack of mature design and evaluation tools.
This project focuses on several key aspects that will enable cost-efficient design of future high-performance 3D stacks: (1) Thermal modeling and management of 3D systems; (2) Novel cooling (e.g., microchannel based liquid cooling, phase-change materials, etc.) modeling and control to improve cooling efficiency; (3) Architecture-level performance evaluation and optimization of 3D design strategies to maximize performance and energy efficiency of real-life applications; (4) Exploration of heterogeneous integration opportunities such as stacking processors with DRAM layers or with Silicon-Photonics network layers.



Energy-Efficient Mobile Computing

Mobile devices handle diverse workloads ranging from simple daily tasks (i.e., text messaging, e-mail) to complex graphics and media processing while operating under limited battery capacities. Growing computational power and heat densities in modern mobile devices also pose thermal challenges (i.e., elevated chip, battery, and skin temperatures) and lead to undesired performance fluctuations due to insufficient cooling capabilities, and as a result, frequent throttling. Designing practical management solutions is challenged by the diversity in computational needs of different software programs and also by the added complexity in the hardware architecture (i.e., specialized accelerators, heterogeneous CPUs etc.). Addressing these concerns requires revisiting existing management techniques in mobile devices to improve both thermal and energy efficiency without sacrificing user experience.
Our research in addressing energy and thermal efficiency of mobile devices focuses on (1) designing lightweight online frameworks for monitoring the energy/thermal status and for assessing performance sensitivity of applications to hardware and software tunables; (2) practical runtime management strategies to minimize energy consumption and mitigate thermally induced performance losses while providing sufficient user experience; (3) generating software tools and workload sets for enabling evaluation of emerging mobile workloads under realistic usage profiles.



Managing Server Energy Efficiency

The diversity of the elements contributing to computing energy efficiency (i.e., CPUs, memories, cooling units, software application properties, availability of operating system controls and virtualization, etc.) requires system-level assessment and optimization. Our work on managing server energy efficiency focuses on designing: (1) necessary sensing and actuation mechanisms such that a server node can operate at a desired dynamic power level (e.g., power capping), (2) resource management techniques on native and virtualized systems such that several software applications can efficiently share available resources, (3) cooling control mechanisms that are aware of the inter-dependence of performance, power, temperature-dependent leakage power, and cooling power.



Simulation and Management of HPC Systems

Additional levels of management and planning decisions take place at the data center level. These decisions, such as job allocation across the computing nodes, impact energy consumption and performance. HPC applications, e.g., scientific computing loads, typically occupy many server nodes, run for a long time, and include heavy data exchange and communication among the threads of the application.

Our work in this domain focuses on optimizing the cooling energy of the data center and the performance of HPC applications simultaneously. This work includes developing simulation methods that can accurately estimate power and performance of realistic workloads running on large-scale systems with hundreds or thousands of nodes. We also design strategies to assess and optimize system resilience.



Data Centers in the Smart Grid

How we are assessing electricity cost is changing following the developments in the smart grid and power markets. Instead of solely reducing the energy consumption of a data center, participation in demand response programs, where the data center regulates its power consumption as requested by the electricity provider, may achieve lower overall cost. This is because providers (e.g., independent service operators or ISOs) are offering incentives for participation in such programs as ISOs have to match supply and demand in real-time.

Our work in integrating data centers into the smart grid is the first to use data center as a grid load stabilizer. Specifically, we focus on designing the necessary techniques to enable a data center to accurately follow a dynamic regulation signal broadcast by the ISO. This work involves designing (1) the necessary optimization mechanisms that determine the power consumption required to maintain a desired quality-of-service and the regulation amount the data center can offer, and (2) the practical techniques that enable a data center to budget its power among computing and cooling nodes and to follow a power signal accurately.



Efficient data analytics for the cloud

Emerging cloud service platforms are hosting hundreds of thousands of virtual machines (VMs), each of which evolve differently from the time they are booted. As a result, the benefit of starting from a finite, well-arranged set of VM images vanishes. Thus, cloud service operators are facing challenges in continuously managing, monitoring and maintaining a large number of diversely evolving VMs during the life-cycles of these VMs for discovering potential resilience and vulnerability issues in a timely manner.

Our research on efficient data analytics for the cloud investigates automated cloud analytics solutions based on machine learning for system management and software vulnerability discovery. We propose feature extraction methods to generate condensed “fingerprints” from the comprehensive metadata of target applications, systems, or vulnerabilities. Our project involves building comprehensive and adaptive knowledge-bases using a large set of fingerprint samples on real cloud systems. We use various machine learning algorithms as part of the proposed discovery and identification framework.