Sister Conferences
The 15th anniversary of IEEE annual symposium
|
CALL FOR PARTICIPATION
  [pdf version is here].
Keynote
Tofu Interconnect Controller for Fujitsu's Highly Scalable Supercomputer
Yuichiro Ajima (Fujitsu Ltd., Japan)
Abstract: The K computer, which is the current world's fastest
supercomputer, combines 88,128 processor chips using an
interconnection network called Tofu Interconnect. Fujitsu's new
supercomputer system FX10 is also powered by the Tofu interconnect. We
developed an interconnect controller (ICC) chip which integrates all
active components of the Tofu interconnect. In this talk, we will
present a technical overview of the ICC chip. The ICC chip provides a
Tofu network router, four Tofu network interfaces, and a Tofu barrier
interface. The Tofu network router provides four internal and ten
external ports. Internal ports connect the Tofu network interface for
each, and external ports are used to construct a six-dimensional
mesh/torus network. The Tofu network interface supports Remote Direct
Memory Access (RDMA) communication, and a Tofu barrier interface
provides offload capability for synchronization and reduction
communication.
Yuichiro Ajima is a system architect in the Next-Generation
Technical Computing Unit at Fujitsu. His research focuses on
high-performance computing system architecture. Ajima has a PhD in
information engineering from the University of Tokyo. He is a member
of the Information Processing Society of Japan and IEEE.
Nonvolatile Logic-in-Memory Architecture Using an MTJ/MOS-Hybrid Structure and Its Applications
Takahiro Hanyu (Tohoku University, Japan)
Abstract: Communication bottleneck between memory and logic modules
has increasingly become a serious problem, which causes large power
dissipation in the recent nanometer-scaled VLSI chips. One method to
solve such emerging VLSI-chip problems is to use "nonvolatile"
logic-in-memory architecture. In this architecture, nonvolatile
storage elements are distributed over a logic-circuit plane, so that
it is expected to realize both ultra-low-power and reduced
interconnection delay because of great reduction of global
interconnection counts and volatile storage-element counts. In this
presentation, I demonstrate concrete standby power-free logic circuits
based on a nonvolatile logic-in-memory structure using magnetic tunnel
junction (MTJ) devices in combination with MOS transistors. Since the
MTJ device with a spin-injection write capability is only one device
that has all the following superior features as large resistance
ratio, virtually unlimited endurance, fast read/write accessibility,
scalability, CMOS-process compatibility, and no volatility, it is very
suited to implement the MOS/MTJ-hybrid logic circuit with
logic-in-memory architecture. As typical examples of the proposed
nonvolatile logic-in-memory circuitry, an MTJ-based nonvolatile
Look-Up Table (LUT) circuit for an instant power-ON/OFF Field
Programmable Gate Array and an MTJ-based nonvolatile Ternary
Content-Addressable Memory are also demonstrated together with the
fabricated test-chip results.
Takahiro Hanyu
received the B.E., M.E. and D.E. degrees
in Electronic Engineering from Tohoku University, Sendai,
Japan, in 1984, 1986 and 1989, respectively. He is currently
a Professor in the Research Institute of Electrical Communication,
Tohoku University. His general research interests include
nonvolatile logic circuits and their applications to ultra-
low-power VLSI processors. He received the Sakai Memorial
Award from the Information Processing Society of Japan in 2000,
the Judge's Special Award at the 9th LSI Design of the Year
from the Semiconductor Industry News of Japan in 2002, the
APEX Paper Award of Japanese Society of Applied Physics in
2009, the Excellent Paper Award of IEICE, Japan in 2010,
Ichikawa Academic Award in 2010, and the Best Paper Award
at IEEE Computer Society International Symposium on VLSI
2010. Dr. Hanyu is a senior member of the IEEE.
The Expanding Universe of Embedded Imaging
Masaki Hiraga (Morpho, Inc., Japan)
Abstract: Embedded devices have been evolving at a tremendous speed
for the past 10 years, especially mobile phones. Multi-core CPUs and
GPGPUs are becoming ever so popular and the resolution of display
devices as well as digital cameras keep increasing. Image processing
is mainly performed in parallel, so it has high compatibility with the
advancement of hardware. As a result, highly complex imaging
technology which was once used only on super computers and
workstations now runs on embedded devices. By combining imaging
technology with portability of mobile devices and network
communications, image processing applications with new concepts are
now emerging into the market. In this session, the evolution of
mobile phone hardware and software in the past 10 years will be looked
upon from image processing perspective, and the present and future
imaging technology will be elaborated.
Masaki Hiraga is President of Morpho,Inc. Masaki received
his DSc degree from the University of Tokyo, Graduate School of
Science, Department of Information Science. He founded Morpho, Inc. in
2004. Morpho, Inc. is a leading company of software imaging solutions
for mobile devices. Customers utilizing Morpho's software
technologies include carriers, processing platform providers and
mobile device manufactures making the company a global leader in
mobile imaging.
Application Scalability - Key to Low Power, Performance Growth, and Exascale
Wen-mei Hwu (Illinois Univ., USA)
Abstract: Parallelism has become the main venue of performance
growth and power reduction. Once an application achieves good
performance for a given hardware and data set, it must be able to
scale effectively in terms of hardware parallelism and data
size. Parallelism scalability allows the application to take advantage
of a wide range of current and future generation hardware. Data
scalability allows the application to handle the ever increasing data
size in the real world while managing the ever limiting memory
bandwidth. The rise of CPU-GPU heterogeneous computing has
significantly boosted the pace of progress in this field. There has
been rapid progress in numeric methods, algorithm design, programming
techniques, compiler transformations and optimization tools for
developing scalable applications. In preparation of petascale
applications for deployment on Blue Waters, we have been further
accelerating this revolution. In this talk, I will discuss these
recent advances, their implications on the future course of computing
and computer design.
Wen-mei W. Hwu is a Professor and holds the Sanders-AMD
Endowed Chair in the Department of Electrical and Computer
Engineering, University of Illinois at Urbana-Champaign. He is also
CTO of MulticoreWare Inc., chief scientist of UIUC Parallel Computing
Institute and director of the IMPACT research group
(www.crhc.uiuc.edu/Impact). He co-directs the UIUC CUDA Center of
Excellence and serves as one of the principal investigators of the
$208M NSF Blue Waters Petascale computer project. For his
contributions, he received the ACM SigArch Maurice Wilkes Award, the
ACM Grace Murray Hopper Award, the ISCA Influential Paper Award, and
the Distinguished Alumni Award in Computer Science of the University
of California, Berkeley. He is a fellow of IEEE and ACM. Dr. Hwu
received his Ph.D. degree in Computer Science from the University of
California, Berkeley.
The IBM Blue Gene/Q Supercomputer
George Liang-Tai Chiu (IBM, USA)
Abstract: Blue Gene/Q™ is the third generation in the IBM
Blue Gene® line of massively parallel supercomputer systems, and
is scalable to deliver a peak performance of twenty PetaFLOP/s and
beyond. The aim of the Blue Gene platform remains the same, namely to
build a massively parallel high performance computing (HPC) system out
of highly power-efficient processor chips. Such power-efficient chips,
in turn, allow very dense packaging, which consequently results in
superior power efficiency, space utilization, and total cost of
ownership. A focus on reliability during all phases of the design
also contributes to the feasibility of scaling to large but reliable
systems.
The heart of a Blue Gene/Q system is its Compute chip, implemented as
a System-on-a-Chip (SOC) design. It combines processors, memory
hierarchy and network communications on a single ASIC. Integrating
these functions on a single chip reduces the number of chip-to-chip
interfaces, thereby reducing power, while increasing performance,
reliability and bandwidth. It also reduces network cost
substantially. This presentation will discuss the Blue Gene/Q Compute
chip architecture and design, emphasizing the aspects that result in a
peak performance increase of 15x versus the previous generation, Blue
Gene/P, while achieving a power efficiency increase of 5.6x.
The Blue Gene/Q Compute (BQC) chip is a 19 x 19 mm chip in IBM's Cu-45
(45nm SOI) technology. The chip functionally contains 18 processor
cores, intended to be used as 16 user cores, 1 core for operating
system services, and 1 core as a spare. The processor core is an
augmented version of the 4-way multithreaded Power A2 core used on the
IBM PowerEN™ chip. Blue Gene/Q-specific modifications include a
Quad Floating Point Unit (QPU) with a 4-way SIMD architecture
supporting integrated scalar and vector floating-point arithmetic. The
QPU can concurrently execute up to 8 floating-point operations (based
on a 4-wide FMA instruction), a store instruction and a load
instruction. The QPU also provides a set of permute instructions to
support efficient vector data reorganization, and instructions for
complex number arithmetic that act on adjacent vector element pairs.
In addition, each processor core interfaces, via a sophisticated
L1-prefetching unit and a crossbar switch, to a 32 MB central L2
cache, which uses embedded DRAM for data storage. The L2 cache allows
for the storage of multiple data versions per address. The versioning
can be used for advanced cache management techniques such as
Speculative Execution (SE) and Transactional Memory (TM). These
techniques support aggressive multithreading of applications, as
hardware will detect and deal with access conflicts. L2 cache access
misses are handled by two integrated memory controllers that interface
to DDR3 memory (16GB, directly attached to the BQC chip).
The BQC on-chip networking logic supports 10 bidirectional 2GB/s links
to neighboring chips, allowing the chips to be interconnected into a
high-bandwidth, low-latency 5-D torus network, as well as providing
for an additional IO link. The on-chip network logic incorporates
routing between these ports, DMA facilities to support remote memory
access, and hardware-assist facilities for broadcast and reduction
operations.
As a result of these architectural features, BQC is a power-efficient
compute chip, optimized for a wide range of parallel applications. The
Blue Gene/Q systems took over the Green500 top spot since November
2010 three times consecutively, achieving a power efficiency of ~2
GigaFLOPS/Watt. It also received the top honor of Graph500 in
November 2011 in a data analytics application.
George Liang-Tai Chiu (Fellow, IEEE) is the Senior Manager
of Advanced High Performance Systems in the Systems Department at the
Thomas J. Watson Research Center, responsible for the overall hardware
and software of the Blue Gene Platform. He received a Ph.D. degree in
astrophysics from the University of California at Berkeley in 1978,
and an MS degree in Computer Science from Polytechnic University in
1995. He joined IBM in 1980 after having been on the staff of Yale
University. Dr. Chiu has worked on picosecond device and internal
node characterization, laser beam and electron beam contactless
testing techniques, functional testing of chips and packages, optical
lithography, display technologies, computer packaging, and
supercomputing. Dr. Chiu is one of the three co-founders of the Blue
Gene project, and he has been in charge of the Blue Gene supercomputer
since 1999. In 2007, he became the Principal Investigator of the
Nuclear Energy Advanced Modeling and Simulations (NEAMS) project. In
2010, he was appointed as an Industrial Council Member of the CASL
(Consortium of Advanced Simulation for Light water reactors)
organization overseeing the Oak Ridge nuclear reactor research.
He has published over 400 papers and taught numerous short courses in
the areas mentioned above. He holds fifty two patents
internationally. He received an IBM Corporate Award in 2005, the
Gerstner Award for Client Excellence in 2005, the EE ACE Awards as
part of the Blue Gene/L System Design Team in 2005, three IBM
Outstanding Technical Achievement Awards, nine Invention Achievement
Awards from IBM, and National Medal of Technology and Innovation on
Blue Gene from the US Department of Energy in 2009. Dr. Chiu is a
member of the International Astronomical Union, IBM Academy of
Technology, and a Fellow of the Institute of Electrical and
Electronics Engineers.
Panel Discussion
"Technology exchange: Supercomputing and Embedded computing"
-
Organizer and Moderator:
-
Hideharu Amano (Keio Univ, Japan)
-
Panelists:
-
George Liang-Tai Chiu (IBM USA)
Yuichiro Ajima (Fujitsu)
Wen mei-Hwu (Univ. of Illinois)
Felipe Cruz (Nagasaki Univ.)
Toru Shimizu (Renesas electronics)
Abstrat: The most important challenges of the next generation
supercomputer is pushing into computing elements as many as possible
with a limited energy and space. The rapid advance of personal mobile
devices promoted embedded systems to provide powerful computing
functions also with a limit energy and space. The common keys are
many-core systems and accelerators. Programming techniques for making
the best use of complicated hierarchical multi-core systems are
another key technique. This panel discusses techniques in a field
which can be useful in the other field, and how to exchange them
beyond the barrier of the market.
Panelists' biographies
Felipe Cruz is a Postdoctoral Research Fellow at Nagasaki
University. He works at the Nagasaki Advanced Computing Center where
he focuses on Scientific Computing for low-cost and energy-efficient
high performance computing systems. For more details, please visit
his homepage.
For other panelists, please see their bio in the field of keynote presentations.
Special Invited Presentation
Seahawk - Optimizing power efficiency in high
performance Cortex-A15 processor implementations
Dermot O'Driscoll and Sumit Sahai (ARM, UK)
Abstract: TBA
Special Sessions (invited lectures)
Advanced Virtual Prototyping of Multiprocessor SoCs
Frédéric Pétrot (TIMA Laboratory, France)
Abstract: Virtual prototyping is a technology whose goal is to
simulate the behavior of an entire digital system, including the
software running on the processors, and the digital hardware. It
relies on specific modeling approaches, at different levels of
abstraction, so that speed/accuracy trade-offs can be made. This talk
will review the challenges of virtual prototyping techniques, and
introduce the level of abstractions that have been agreed upon. We
will then more specifically focus on the interpretation of software
codes and detail two techniques, an interpretive one based on dynamic
binary translation and a native one making use of hardware assisted
virtualization.
Frédéric Pétrot received the DEA
(master) and PhD degree in Computer Science from Université
Pierre et Marie Curie (Paris VI), Paris, France, in respectively 1990
and 1994. From 1995 to 2004, he was assistant professor, and
contributed actively to the Alliance VLSI CAD System and the Disydent
ESL environment. F. Pétrot joined TIMA in September 2004, and
holds a professor position at the Grenoble Institute of Technology,
France, where, since 2007, he heads the System Level Synthesis
group. His main research interests are in system level design of
integrated systems, and include computer aided design of digital
system, architecture and software for homogeneous and heterogeneous
multiprocessor systems on chip.
The Challenges of Analyzing Embedded Processor Behavior In the Age of
Complex SoCs
Markus Levy (EEMBC, USA)
Abstract: Drawing on the experience of the Embedded Microprocessor
Benchmark Consortium (EEMBC), this presentation will detail the
methodology used to develop benchmarks that target horizontal
technologies such as floating-point and multicore and vertical
technologies such as smartphones, automotive, and Android. In addition
to performance-related aspects, I will also discuss battery-life
measurement techniques for smartphones, a subject that is often
fraught with misinterpretation and abuse. The advanced development
effort of these benchmarks is faced with many challenges such as
ensuring repeatability, portability, and the ability to defeat
unwarranted optimizations. Furthermore, these diverse and popular
topics present the design engineer with unique challenges in trying to
understand how to analyze the embedded processor and system
behavior. Therefore, this presentation will also explain how to apply
these benchmark techniques to designing next-generation processors and
systems, as well as for system designers making tradeoffs between
performance and power.
Markus Levy is founder and president of EEMBC. He is also
president of The Multicore Association and chairman of Multicore
Developers Conference. Mr. Levy was previously a senior analyst at
In-Stat/MDR and an editor at EDN magazine, focusing in both roles on
processors for the embedded industry. Levy began his career in the
semiconductor industry at Intel Corporation, where he served as both a
senior applications engineer and customer training specialist for
Intel's microprocessor and flash memory products. He is the co-author
of Designing with Flash Memory, the only technical book on this
subject, and received several patents while at Intel for his ideas
related to flash memory architecture and usage as a disk drive
alternative. He is also a volunteer firefighter.
|