Call for Participation | IEEE COOLChips 28

[pdf version](As of 2025-04-03)

Keynote Presentations

Advanced Package Substrate Technology for Heterogenous Integration

Sriram Dattaguru (Intel)

Abstract: The talk focuses on the critical role of package substrates in the context of Moore’s Law and the semiconductor industry’s evolving needs. Historically, Moore’s Law, which predicted the doubling of transistors in an integrated circuit every two years, has been sustained by improvements in chip scaling and architecture. However, as chip scaling becomes more challenging and costly, package substrates, which traditionally acted as a space transformer between the chip and the external world are becoming more crucial as product differentiators. Over the last several decades, package substrates have evolved from lead-frames, ceramic, and currently organic materials. The age of heterogeneous integration is here and driving the need to find advanced packaging solutions that resolve reticle limitations and enable moving higher functionality onto the package. Intel continues to drive leadership in advanced packaging with Embedded Multi-die Interconnect Bridge (EMIB) technology and glass core substrates. EMIB technology have been developed and is high volume production, to enable large form factor high-performance computing (HPC) packages. Roadmap is in place to scale bump pitch and enable through-silicon via (TSV) EMIB bridges. Glass core substrates will enable ultra large form substrates with scalable performance and design rule offerings beyond organic substrates. Both EMIB and glass core substrates technologies will be key to enable scalable chiplet heterogeneous integration for AI and HPC system on package.

Sriram (Sri) Dattaguru is the Director of Supplier Engineering, Substrate Packaging Technology Development at Intel Japan, where he partners with various suppliers to enhance the advanced substrate packaging ecosystem – focusing on technical roadmap for embedded multi-die interconnect bridge (EMIB), ultra-large form factor and glass substrate packages for next-generation technologies. In his previous role at Intel, within the Global Supply Chain organization, he cultivated executive relationships with key substrate suppliers in Japan, spearheaded significant capacity expansion initiatives and negotiated intricate long-term purchasing contracts and collaboration agreements. Before his tenure in Japan, Sri led global R&D teams at Intel Arizona, overseeing development of substrate and passive products, from pathfinding through high volume manufacturing ramp (HVM) for client and server microprocessor applications. Prior to Intel, Sri was a Principal Engineering manager at Kyocera/AVX, South Carolina working on proto-typing, technology transfer and HVM ramp of low inductance multi-layer ceramic capacitors (MLCCs) used in Pentium microprocessors. Sri holds a Ph.D. degree in Ceramic Science from the New York State College of Ceramics, Alfred University and received his B.Tech degree from IIT-BHU, India. He holds over 12 patents on embedding and array capacitor technology and has authored over 20 scientific publications.

The Challenges of Delivering Power to and Cooling the Cerebras Wafer-Scale Engine

Jean-Philippe Fricker (Cerebras Systems, Inc.)

Abstract: As AI workloads push the boundaries of computational power, traditional chip architectures struggle to keep pace. Nowhere is this more evident than in wafer-scale computing, where delivering power and managing thermals become critical engineering challenges. This keynote explores the evolving landscape of high-performance computing and the infrastructure bottlenecks limiting future progress. We begin by examining the ever-growing compute demands and why traditional datacenter infrastructure is struggling to keep up. We’ll analyze the impact of system density on both power delivery and cooling, highlighting the inefficiencies of conventional approaches. Next, we’ll look at how past innovations—such as water cooling—are making a resurgence as viable solutions. Finally, we’ll explore how Cerebras has tackled these challenges head-on, leveraging novel architectural and cooling innovations to unlock unprecedented performance. We’ll compare this approach with conventional solutions to understand why wafer-scale integration represents a paradigm shift in AI computing. Join us for an in-depth look at the future of high-performance computing and what it takes to meet the growing demands of AI inference and training while overcoming power and cooling challenges.

Jean-Philippe (J.P.) Fricker is Chief System Architect and Co-Founder at Cerebras Systems. Before co-founding Cerebras, J.P. was Senior Hardware Architect at rack-scale flash array startup DSSD (acquired by EMC). Prior to DSSD, J.P. was Lead System Architect at SeaMicro where he designed three generations of fabric-based computer systems. Earlier in his career, J.P. was Director of Hardware Engineering at Alcatel-Lucent and Director of Hardware Engineering at Riverstone Networks. He holds an MS in Electrical Engineering from École Polytechnique Fédérale de Lausanne, Switzerland, and has authored 42 patents.

Open Source RiscV CPU and AI for Edge Applications

Jim Keller (Tenstorrent)

Abstract: This talk will cover Tenstorrent’s solutions in RiscV, IP, software and hardware to bring open source, buildable compute to a wide variety of low power applications.

Jim Keller is CEO of Tenstorrent and a veteran hardware engineer. Prior to joining Tenstorrent, he served two years as Senior Vice President of Intel’s Silicon Engineering Group. He has held roles as Tesla’s Vice President of Autopilot and Low Voltage Hardware, Corporate Vice President and Chief Cores Architect at AMD, and Vice President of Engineering and Chief Architect at P.A. Semi, which was acquired by Apple Inc. Jim has led multiple successful silicon designs over the decades, from the DEC Alpha processors, to AMD K7/K8/K12, HyperTransport and the AMD Zen family, the Apple A4/A5 processors, and Tesla’s self-driving car chip.

A Content-Addressable Engine for Associative Processing

José Martínez(Cornell University)

Abstract:

A content-addressable parallel processor, or associative processor for short, is a computing and storage architecture from the 1970s based on content-addressable memories, where sequencing bulk search and update memory operations is the primary means to manipulate in situ many operands in parallel, without employing arithmetic circuits. Our research group has been investigating the potential of this computing paradigm in the context of modern microarchitectures, with the goal of providing a processing-in-memory abstraction that is highly data-parallel and programmable.

In this talk, I will present some results to date on our Content-Addressable Processing Engine (CAPE), an associative processor architecture that employs CMOS-based push-rule SRAM/CAM logic to carry out data-parallel arithmetic and logic operations, abstracted as very long vector instructions that can be expressed using a RISC-V ISA with standard vector extensions for high programmability. I will describe the basic CAPE architecture and show some promising simulation-based results on a diverse set of data-parallel benchmarks. I will explain how CAPE can be part of a tiled multicore architecture that co-exists and cooperates with CPU cores and on-chip caches. I will also describe our co-design effort in the context of analytical databases.

José Martínez is the Lee Teng-hui Professor of Engineering at Cornell University. He has been very fortunate to work with some extraordinary people, and as a result his research has received a number of awards over the years; among them: two IEEE Micro Top Picks papers, an HPCA Best Paper award, MICRO and HPCA Best Paper nominations, an NSF CAREER Award, two IBM Faculty Awards, two Qualcomm Faculty Awards, and a Distinguished Educator Award by the University of Illinois’ Computer Science Department (my graduate alma mater). On the teaching side, he has been recognized with two Kenneth A. Goldman ’71 and one Dorothy and Fred Chau MS’74 College of Engineering teaching awards, a Ruth and Joel Spira Award for Teaching Excellence, thrice as the most influential college educator of a Merrill Presidential Scholar (Andrew Tibbits ’07, Gulnar Mirza ’16, and Angela Jin ’21), and as the student-elected 2011 Tau Beta Pi Professor of the Year in the College of Engineering. He is an IEEE Fellow and Vice Chair of ACM SIGARCH.

Specialized Hardware and Open-Source Tools for Scientific Computing and Instruments

Kazutomo Yoshii (Argonne National Laboratory)

Abstract: High-performance computing (HPC) faces critical challenges as transistor scaling slows, limiting further gains in computational performance and energy efficiency. Scientific instrumentation, meanwhile, faces a different obstacle: rapidly increasing data rates. Instruments such as advanced X-ray detectors generate terabytes of data per second, making it impractical to transmit raw data downstream. On-chip data processing and reduction at the source are now essential to address this bottleneck.

Data movement, rather than computation, has become the dominant factor limiting system performance across both HPC and scientific instruments. The performance gap between processors and memory exacerbates inefficiencies, leaving many data-intensive workloads unable to fully utilize processing capabilities. To optimize system performance, strategies such as data compression and reduction within hardware are increasingly necessary. Data flow and streaming computing paradigms can significantly improve data handling in both HPC and scientific instruments by facilitating efficient, continuous data transfer.

Specialized hardware accelerators offer a promising solution by enhancing both performance and energy efficiency across scientific domains. These accelerators also have the potential to revolutionize scientific instruments by enabling real-time data handling at the source. However, hardware specialization requires expertise in design, verification, integration, and sharable resources such as open-source hardware libraries — all of which remain scarce globally.

Open-source ecosystems, featuring tools such as Chisel, Verilator, FireSim, Chipyard, Mosaic, and OpenROAD, stimulate collaboration and community-driven prototyping activities by enabling the sharing of research ideas and innovations. These tools, along with open standards like RISC-V, enhance accessibility to hardware innovation, particularly for professionals from software backgrounds. Cultivating strong open-source collaborations could pave the road for both scientific computing and instrumentation.

Kazutomo Yoshii is a Principal Experimental Systems Specialist at Argonne National Laboratory. He earned an M.S. in Computer Science from Toyohashi University of Technology, Japan, in 1994. His career began at Hitachi’s research facility in Japan, where he developed medical imaging analysis software for functional MRI data. In 1998, he joined Turbolinux, contributing to the Linux operating system in both Japan and Santa Fe, New Mexico. In 2002, he transitioned to Mountain View Data, focusing on dynamic provisioning systems for cluster environments. Since December 2004, he has been with Argonne, actively engaging in co-design activities for supercomputers and scientific experimental systems. His recent work focuses on custom accelerator designs and streaming near-sensor processing architectures. His research interests include high-performance computing, power-aware computing, reconfigurable dataflow computing, hardware development tools, and hardware specialization.

Invited presentation

Device-Algorithm Co-optimization for Analog In-Memory Computing

Sangbum Kim (Seoul National University)

Abstract:
Analog in-memory deep learning is a computing architecture that aims to improve the efficiency of deep learning algorithms by performing calculations in memory rather than transferring data back and forth between memory and processors. This can greatly reduce the energy consumption and latency of deep learning algorithms, making them more efficient and faster.

However, there are still challenges that need to be solved before analog in-memory deep learning can be used for real-world applications. For example, analog memory devices can have significant variability and noise, which can impact the accuracy of calculations. Additionally, the lack of weight update symmetry and linearity impedes the acceleration of on-chip training operation that is the most computationally expensive in deep learning.

In this talk, I will discuss some of recent device-algorithm co-optimization studies. In the first example, an array of capacitor-based synaptic cells is co-optimized with the Tiki-Taka algorithm that was introduced to mitigate the non-ideal characteristics of synaptic cells storing analog weights. Yet, the detailed practical implementation of the algorithm has not been demonstrated. The ultralow leakage current of IGZO TFT can be utilized to implement a novel 6T1C synaptic cell that can efficiently implement Tiki-Taka algorithm.

In the second example, we demonstrate that neuromorphic hardware based on phase change memory can efficiently implement the Boltzmann Machine on Spiking Neural Networks (sRB). The 6T2R sBM chip not only tolerates noise in devices but also capitalizes on this noise thanks to its stochastic nature.

These examples suggest that the efficient implementation of neuromorphic in-memory computing systems is feasible by pairing synaptic devices with optimal algorithms to mitigate the non-ideal characteristics of these devices. Concurrently, synaptic and neuronal devices must be optimized to meet a new set of requirements imposed by these innovative algorithms.

Sangbum Kim is an associate professor in the Department of Materials Science and Engineering, Seoul National University. From 2010 to 2018, he was with the IBM T.J. Watson Research Center in NY, USA. He is currently working on novel semiconductor memory materials and devices for various memory applications such as brain-inspired neuromorphic computing, storage-class memory, and embedded memory. He served as a subcommittee chair for international conferences on semiconductor devices, including the 2023 International Reliability Physics Symposium (IRPS) and the 2022, 2023 International Electron Devices Meeting (IEDM). Additionally, he served as the finance chair for the 2024 International Memory Workshop (IMW) and is currently serving as the technical program chair for IMW 2025. He received his B.S. degree from Seoul National University, Seoul, Korea, in 2001 and his M.S. and Ph.D. degrees from Stanford University, Stanford, CA, in 2005 and 2010, respectively, all in electrical engineering. His Ph.D. dissertation focused on the scalability and reliability of phase change memory (PCM).

Panel Discussion

Topics “Sustainable AI: Emerging Architectures, Devices, and Quantum Computing Towards Future Computing”

Organizer and Moderator:
Tohru Ishihara (Nagoya Univ.)

Panelist:
Jangwoo Kim (Seoul National University)
Kazutoshi Kobayashi (Kyoto Institute of Technology)
Jose Martinez (Cornell University)

Abstract: This panel will discuss sustainable AI and quantum computing systems. Key topics will include energy efficiency, scalability, and reliability of each computing technology from each panelist’s perspective. The discussion will continue with comments and questions from the audience to help us understand future directions.

Special Sessions (invited lectures)

Next-Generation Quantum Computing: A Computer Architect’s Perspective

Jangwoo Kim (Seoul National University/MangoBoost Inc)

Abstract: Quantum computer is the next-generation computing paradigm. Therefore, many researchers are actively working in various domains in quantum computer (e.g,. qubit manufacturing, qubit interface control processor, programming and compiler, application). And, as the real-world quantum computer applications require millions of qubits, we are moving from Noisy Intermediate-Scale Quantum (NISQ) to fault-tolerant quantum computers (FTQC). In this talk, I first introduce the key challenges in developing a scalable and reliable quantum computer in the FTQC era. Next, I will introduce my research work covering quantum computer modeling, quantum control processor, quantum interface methods, distributed quantum computer, and reliable quantum computer. By integrating these outcomes, we have been contributing to realizing the real-world quantum computers in the FTQC era.

Jangwoo Kim is a full professor in the Department of Electrical and Computer Engineering at Seoul National University. He is also the CEO and founder of MangoBoost Inc which provides next-generation HW/SW solutions to maximize the efficiency of datacenters. He earned his PhD degree from Carnegie Mellon University, and his BS and MEng degrees from Cornell University. Prior to the academic career, he contributed to developing UltraSPARC T4 CPUs and servers at Sun Microsystems and Oracle Corporation. His current research interests lie in server and system architecture, cryogenic and quantum computer architecture, and AI/neuromorphic computing.

Reliability and Efficiency in Deep Learning Processing Systems

Alex Orailoglu (UC San Diego)

Abstract: Artificial intelligence techniques driven by deep learning have experienced significant progress in the past decade. The usage of deep learning methods has increased dramatically in practical application domains such as autonomous driving, healthcare, and robotics, where the utmost hardware resource efficiency, as well as strict hardware safety and reliability requirements, are often imposed. The increasing computational cost of deep learning models has been traditionally tackled through model compression and domain-specific accelerator design. As the cost of conventional fault tolerance methods is often prohibitive in consumer electronics, the question of functional safety and reliability for deep learning hardware is still in its infancy. This talk outlines a novel approach to deliver dramatic boosts in hardware safety, reliability, and resource efficiency through a synergistic co-design paradigm. We start off by reviewing the unique algorithmic characteristics of deep neural networks, including plasticity in the design process, resiliency to small numerical perturbations, and their inherent redundancy, as well as the unique micro-architectural properties of deep learning accelerators such as regularity. The advocated approaches reshape deep neural networks and enhance deep neural network accelerators strategically by prioritizing the overall functional correctness and minimizing the associated costs through the statistical nature of deep neural networks. Experimental results demonstrate that deep neural networks equipped with the proposed techniques can maintain accuracy gracefully, even at extreme rates of hardware errors. As a result, the described methodology can embed strong safety and reliability characteristics in mission-critical deep learning applications at a negligible cost. The proposed strategies further offer promising avenues for handling the micro-architectural challenges of deep neural network accelerators and boosting resource efficiency through the synergistic co-design of deep neural networks and hardware micro-architectures. Practical data analysis techniques coupled with a novel feature elimination algorithm identify a minimal set of computation units that capture the information content of the layer and squash the rest. Linear transformations on the subsequent layer ensure accuracy retention despite the removal of a significant portion of the computation units. We further demonstrate that novel complementary sparsity patterns can offer utmost expressiveness levels with inherent hardware exploitable regularity. A novel dynamic training method converts the expressiveness of such sparsity configurations into highly accurate and compact sparse neural networks.

Alex Orailoglu is an expert in Robust Systems and Designs. He has chaired a great many technical conferences, including the leading conferences of both the VLSI Reliability (IEEE VLSI Test Symposium) and of the Embedded Systems (IEEE/ACM CODES-ISSS) research domains. Many of his doctoral students have attained top-notch university research faculty positions in the United States and globally. He holds an S.B. degree cum laude in Applied Mathematics from Harvard College and the M.S. and Ph.D. degrees in Computer Science from the University of Illinois, Urbana Champaign. He is currently a Professor of Computer Science and Engineering at the University of California at San Diego, La Jolla, CA, USA, where he leads the Architecture, Reliability, Test, and Security (ARTS) Laboratory, focusing on VLSI test, reliability, security, embedded systems, and processor and neural network architectures. He has published more than 300 peer-reviewed articles. He has founded numerous technical conferences, including HLDVT, SASP and NanoArch. Prof. Orailoglu has served as an IEEE Computer Society Distinguished Lecturer. He is a Golden Core Member of IEEE Computer Society.