[pdf version will be uploaded here](As of 2025-03-12)
Keynote Presentations
The Challenges of Delivering Power to and Cooling the Cerebras Wafer-Scale Engine
Jean-Philippe Fricker (Cerebras Systems, Inc.)
Abstract: As AI workloads push the boundaries of computational power, traditional chip architectures struggle to keep pace. Nowhere is this more evident than in wafer-scale computing, where delivering power and managing thermals become critical engineering challenges. This keynote explores the evolving landscape of high-performance computing and the infrastructure bottlenecks limiting future progress.
We begin by examining the ever-growing compute demands and why traditional datacenter infrastructure is struggling to keep up. We’ll analyze the impact of system density on both power delivery and cooling, highlighting the inefficiencies of conventional approaches. Next, we’ll look at how past innovations—such as water cooling—are making a resurgence as viable solutions.
Finally, we’ll explore how Cerebras has tackled these challenges head-on, leveraging novel architectural and cooling innovations to unlock unprecedented performance. We’ll compare this approach with conventional solutions to understand why wafer-scale integration represents a paradigm shift in AI computing. Join us for an in-depth look at the future of high-performance computing and what it takes to meet the growing demands of AI inference and training while overcoming power and cooling challenges.
Jean-Philippe (J.P.) Fricker is Chief System Architect and Co-Founder at Cerebras Systems. Before co-founding Cerebras, J.P. was Senior Hardware Architect at rack-scale flash array startup DSSD (acquired by EMC). Prior to DSSD, J.P. was Lead System Architect at SeaMicro where he designed three generations of fabric-based computer systems. Earlier in his career, J.P. was Director of Hardware Engineering at Alcatel-Lucent and Director of Hardware Engineering at Riverstone Networks. He holds an MS in Electrical Engineering from École Polytechnique Fédérale de Lausanne, Switzerland, and has authored 42 patents.
“A Content-Addressable Engine for Associative Processing”
José Martínez(Cornell University)
Abstract: TBD
José Martínez is the Lee Teng-hui Professor of Engineering at Cornell University. He has been very fortunate to work with some extraordinary people, and as a result his research has received a number of awards over the years; among them: two IEEE Micro Top Picks papers, an HPCA Best Paper award, MICRO and HPCA Best Paper nominations, an NSF CAREER Award, two IBM Faculty Awards, two Qualcomm Faculty Awards, and a Distinguished Educator Award by the University of Illinois’ Computer Science Department (my graduate alma mater). On the teaching side, he has been recognized with two Kenneth A. Goldman ’71 and one Dorothy and Fred Chau MS’74 College of Engineering teaching awards, a Ruth and Joel Spira Award for Teaching Excellence, thrice as the most influential college educator of a Merrill Presidential Scholar (Andrew Tibbits ’07, Gulnar Mirza ’16, and Angela Jin ’21), and as the student-elected 2011 Tau Beta Pi Professor of the Year in the College of Engineering. He is an IEEE Fellow and Vice Chair of ACM SIGARCH.
Specialized Hardware and Open-Source Tools for Scientific Computing and Instruments
Kazutomo Yoshii (Argonne National Laboratory)
Abstract: High-performance computing (HPC) faces critical challenges as transistor scaling slows, limiting further gains in computational performance and energy efficiency. Scientific instrumentation, meanwhile, faces a different obstacle: rapidly increasing data rates. Instruments such as advanced X-ray detectors generate terabytes of data per second, making it impractical to transmit raw data downstream. On-chip data processing and reduction at the source are now essential to address this bottleneck.
Data movement, rather than computation, has become the dominant factor limiting system performance across both HPC and scientific instruments. The performance gap between processors and memory exacerbates inefficiencies, leaving many data-intensive workloads unable to fully utilize processing capabilities. To optimize system performance, strategies such as data compression and reduction within hardware are increasingly necessary. Data flow and streaming computing paradigms can significantly improve data handling in both HPC and scientific instruments by facilitating efficient, continuous data transfer.
Specialized hardware accelerators offer a promising solution by enhancing both performance and energy efficiency across scientific domains. These accelerators also have the potential to revolutionize scientific instruments by enabling real-time data handling at the source. However, hardware specialization requires expertise in design, verification, integration, and sharable resources such as open-source hardware libraries — all of which remain scarce globally.
Open-source ecosystems, featuring tools such as Chisel, Verilator, FireSim, Chipyard, Mosaic, and OpenROAD, stimulate collaboration and community-driven prototyping activities by enabling the sharing of research ideas and innovations. These tools, along with open standards like RISC-V, enhance accessibility to hardware innovation, particularly for professionals from software backgrounds. Cultivating strong open-source collaborations could pave the road for both scientific computing and instrumentation.
Kazutomo Yoshii is a Principal Experimental Systems Specialist at Argonne National Laboratory. He earned an M.S. in Computer Science from Toyohashi University of Technology, Japan, in 1994. His career began at Hitachi’s research facility in Japan, where he developed medical imaging analysis software for functional MRI data. In 1998, he joined Turbolinux, contributing to the Linux operating system in both Japan and Santa Fe, New Mexico. In 2002, he transitioned to Mountain View Data, focusing on dynamic provisioning systems for cluster environments. Since December 2004, he has been with Argonne, actively engaging in co-design activities for supercomputers and scientific experimental systems. His recent work focuses on custom accelerator designs and streaming near-sensor processing architectures. His research interests include high-performance computing, power-aware computing, reconfigurable dataflow computing, hardware development tools, and hardware specialization.
Title: TBD
Bora Baloglu (Intel)
Abstract: TBD
Bio: TBA
Title: TBD
Jim Keller (Tenstorrent)
Abstract: TBD
Jim Keller is CEO of Tenstorrent and a veteran hardware engineer. Prior to joining Tenstorrent, he served two years as Senior Vice President of Intel’s Silicon Engineering Group. He has held roles as Tesla’s Vice President of Autopilot and Low Voltage Hardware, Corporate Vice President and Chief Cores Architect at AMD, and Vice President of Engineering and Chief Architect at P.A. Semi, which was acquired by Apple Inc. Jim has led multiple successful silicon designs over the decades, from the DEC Alpha processors, to AMD K7/K8/K12, HyperTransport and the AMD Zen family, the Apple A4/A5 processors, and Tesla’s self-driving car chip.
Invited presentation
Device-Algorithm Co-optimization for Analog In-Memory Computing
Sangbum Kim (Seoul National University)
Abstract:
Analog in-memory deep learning is a computing architecture that aims to improve the efficiency of deep learning algorithms by performing calculations in memory rather than transferring data back and forth between memory and processors. This can greatly reduce the energy consumption and latency of deep learning algorithms, making them more efficient and faster.
However, there are still challenges that need to be solved before analog in-memory deep learning can be used for real-world applications. For example, analog memory devices can have significant variability and noise, which can impact the accuracy of calculations. Additionally, the lack of weight update symmetry and linearity impedes the acceleration of on-chip training operation that is the most computationally expensive in deep learning.
In this talk, I will discuss some of recent device-algorithm co-optimization studies. In the first example, an array of capacitor-based synaptic cells is co-optimized with the Tiki-Taka algorithm that was introduced to mitigate the non-ideal characteristics of synaptic cells storing analog weights. Yet, the detailed practical implementation of the algorithm has not been demonstrated. The ultralow leakage current of IGZO TFT can be utilized to implement a novel 6T1C synaptic cell that can efficiently implement Tiki-Taka algorithm.
In the second example, we demonstrate that neuromorphic hardware based on phase change memory can efficiently implement the Boltzmann Machine on Spiking Neural Networks (sRB). The 6T2R sBM chip not only tolerates noise in devices but also capitalizes on this noise thanks to its stochastic nature.
These examples suggest that the efficient implementation of neuromorphic in-memory computing systems is feasible by pairing synaptic devices with optimal algorithms to mitigate the non-ideal characteristics of these devices. Concurrently, synaptic and neuronal devices must be optimized to meet a new set of requirements imposed by these innovative algorithms.
Sangbum Kim is an associate professor in the Department of Materials Science and Engineering, Seoul National University. From 2010 to 2018, he was with the IBM T.J. Watson Research Center in NY, USA. He is currently working on novel semiconductor memory materials and devices for various memory applications such as brain-inspired neuromorphic computing, storage-class memory, and embedded memory. He served as a subcommittee chair for international conferences on semiconductor devices, including the 2023 International Reliability Physics Symposium (IRPS) and the 2022, 2023 International Electron Devices Meeting (IEDM). Additionally, he served as the finance chair for the 2024 International Memory Workshop (IMW) and is currently serving as the technical program chair for IMW 2025. He received his B.S. degree from Seoul National University, Seoul, Korea, in 2001 and his M.S. and Ph.D. degrees from Stanford University, Stanford, CA, in 2005 and 2010, respectively, all in electrical engineering. His Ph.D. dissertation focused on the scalability and reliability of phase change memory (PCM).
Panel Discussion
Topics “Sustainable AI: Emerging Architectures, Devices, and Quantum Computing Towards Future Computing”
Tohru Ishihara (Nagoya Univ.)
Abstract: TBD
Special Sessions (invited lectures)
“Next-Generation Quantum Computing: A Computer Architect’s Perspective”
Jangwoo Kim (Seoul National University/MangoBoost Inc)
Abstract: Quantum computer is the next-generation computing paradigm. Therefore, many researchers are actively working in various domains in quantum computer (e.g,. qubit manufacturing, qubit interface control processor, programming and compiler, application). And, as the real-world quantum computer applications require millions of qubits, we are moving from Noisy Intermediate-Scale Quantum (NISQ) to fault-tolerant quantum computers (FTQC). In this talk, I first introduce the key challenges in developing a scalable and reliable quantum computer in the FTQC era. Next, I will introduce my research work covering quantum computer modeling, quantum control processor, quantum interface methods, distributed quantum computer, and reliable quantum computer. By integrating these outcomes, we have been contributing to realizing the real-world quantum computers in the FTQC era.
Jangwoo Kim is a full professor in the Department of Electrical and Computer Engineering at Seoul National University. He is also the CEO and founder of MangoBoost Inc which provides next-generation HW/SW solutions to maximize the efficiency of datacenters. He earned his PhD degree from Carnegie Mellon University, and his BS and MEng degrees from Cornell University. Prior to the academic career, he contributed to developing UltraSPARC T4 CPUs and servers at Sun Microsystems and Oracle Corporation. His current research interests lie in server and system architecture, cryogenic and quantum computer architecture, and AI/neuromorphic computing.
“Reliability and Efficiency in Deep Learning Processing Systems”
Alex Orailoglu (UC San Diego)
Abstract: Artificial intelligence techniques driven by deep learning have experienced significant progress in the past decade. The usage of deep learning methods has increased dramatically in practical application domains such as autonomous driving, healthcare, and robotics, where the utmost hardware resource efficiency, as well as strict hardware safety and reliability requirements, are often imposed. The increasing computational cost of deep learning models has been traditionally tackled through model compression and domain-specific accelerator design. As the cost of conventional fault tolerance methods is often prohibitive in consumer electronics, the question of functional safety and reliability for deep learning hardware is still in its infancy. This talk outlines a novel approach to deliver dramatic boosts in hardware safety, reliability, and resource efficiency through a synergistic co-design paradigm. We start off by reviewing the unique algorithmic characteristics of deep neural networks, including plasticity in the design process, resiliency to small numerical perturbations, and their inherent redundancy, as well as the unique micro-architectural properties of deep learning accelerators such as regularity. The advocated approaches reshape deep neural networks and enhance deep neural network accelerators strategically by prioritizing the overall functional correctness and minimizing the associated costs through the statistical nature of deep neural networks. Experimental results demonstrate that deep neural networks equipped with the proposed techniques can maintain accuracy gracefully, even at extreme rates of hardware errors. As a result, the described methodology can embed strong safety and reliability characteristics in mission-critical deep learning applications at a negligible cost. The proposed strategies further offer promising avenues for handling the micro-architectural challenges of deep neural network accelerators and boosting resource efficiency through the synergistic co-design of deep neural networks and hardware micro-architectures. Practical data analysis techniques coupled with a novel feature elimination algorithm identify a minimal set of computation units that capture the information content of the layer and squash the rest. Linear transformations on the subsequent layer ensure accuracy retention despite the removal of a significant portion of the computation units. We further demonstrate that novel complementary sparsity patterns can offer utmost expressiveness levels with inherent hardware exploitable regularity. A novel dynamic training method converts the expressiveness of such sparsity configurations into highly accurate and compact sparse neural networks.
Alex Orailoglu is an expert in Robust Systems and Designs. He has chaired a great many technical conferences, including the leading conferences of both the VLSI Reliability (IEEE VLSI Test Symposium) and of the Embedded Systems (IEEE/ACM CODES-ISSS) research domains. Many of his doctoral students have attained top-notch university research faculty positions in the United States and globally. He holds an S.B. degree cum laude in Applied Mathematics from Harvard College and the M.S. and Ph.D. degrees in Computer Science from the University of Illinois, Urbana Champaign. He is currently a Professor of Computer Science and Engineering at the University of California at San Diego, La Jolla, CA, USA, where he leads the Architecture, Reliability, Test, and Security (ARTS) Laboratory, focusing on VLSI test, reliability, security, embedded systems, and processor and neural network architectures. He has published more than 300 peer-reviewed articles. He has founded numerous technical conferences, including HLDVT, SASP and NanoArch. Prof. Orailoglu has served as an IEEE Computer Society Distinguished Lecturer. He is a Golden Core Member of IEEE Computer Society.