[pdf version is here].
Keynote Presentations
Advancing Moore’s Law: Opening New Horizons
Tsuyoshi Abe (Intel, Japan)
Abstract: 2015 is the 50th anniversary of Moore’s Law. Since 1965, the semiconductor industry has been thriving on Moore’s Law, enabling new semiconductor devices with higher functionality and complexity, while controlling power, cost, and size. The Internet-of-Things (IoT) era is right at our doorstep and industry experts predict that by the year 2020, there will be well over 50 billion devices connected to the Internet. These IoT devices together with cloud computing will continue drive demand for high performance semiconductors with low power consumption. As a result, Moore’s Law will play a pivotal role in controlling the cost of semiconductors. This presentation will focus on Intel’s 14nm process technology featuring the 2nd generation tri-gate transistors. Specifically, it will illustrate the advantages of the new process technology over the previous generation and how Moore’s Law will bring benefit to the development of leading edge semiconductors by reducing cost per transistor and improving performance/watt.
Tsuyoshi Abe joined Intel K.K. in 1985 and held various management positions in microprocessor system development, system support for embedded board computers, engineering training, and application engineering for PC, server, and embedded application. Since Oct 2012, he has been named Director and Senior Executive Officer, Vice President, and General Manager of Technology & Manufacturing Group of Intel K.K. He earned an engineering degree from Kinki University. He completed studies in Management of Technology and a doctor course in Regional Environment Systems at Graduate School of Engineering Management, Shibaura Institute of Technology.
The Kalray MPPA Mission-Critical Supercomputer on a Chip
Benoît Dupont de Dinechin (Kalray)
Abstract: The Kalray MPPA-256 processors implement a supercomputer architecture on chips manufactured with 28nm CMOS technology. These processors achieve high performance, low power, and timing dependability, by distributing a total of 256 user cores and 32 system cores across 16 compute clusters of 16+1 cores, and 4 I/O subsystems with a quad-core. All cores implement the same a 5-issue VLIW architecture, which is optimized for MCU as well as DSP type of computing. Each compute cluster and I/O subsystem owns a private address space, while communication and synchronization between them is ensured by data and control Networks-On-Chip (NoC). The MPPA-256 processors are also fitted with high-performance I/O interfaces, in particular DDR3, PCI Gen3, Ethernet 10G, Interlaken and GPIO. The MPPA-256 processors programming environment includes GGC-based C, C++, FORTRAN compilers with OpenMP support, and stand-alone or Eclipse GDB-based debuggers. The local memory spaces are either visible from the applications, or transparently turned into a cache of the external DDR memory thanks to a distributed shared memory run-time system. This allows to support the global memory of the OpenCL data-parallel and task-parallel programming models. Current uses of the MPPA-256 processors include data center CPU acceleration such as real-time H.265 video encoding, Monte Carlo type of simulations, or cryptanalysis. The CPU accelerators are based on PCI-e card with 4x MPPA-256 processors for a total of 1024 cores. Advanced uses of the MPPA-256 processors are in mission-critical applications such as avionics functions, airborne image analysis, and automotive ADAS (Advanced Driver Assistance Systems) functions.
Benoît Dupont de Dinechin is Chief Technology Officer of Kalray. He is the Kalray VLIW core main architect, and the co-architect of the Multi-Purpose Processing Array (MPPA) processor. Benoît also defined the Kalray software roadmap and contributed to its implementation. He published most of the Kalray-related scientific results, in particular in the areas of cyclostatic dataflow scheduling, and of network calculus applications to the NoC. Before joining Kalray, Benoît was in charge of Research and Development of the STMicroelectronics Software, Tools, Services division, and was promoted to STMicroelectronics Fellow in 2008. Prior to STMicroelectronics, Benoît worked at the Cray Research park (Minnesota, USA), where he developed the software pipeliner of the Cray T3E production compilers. Benoît earned an engineering degree in Radar and Telecommunications from the Ecole Nationale Supérieure de l’Aéronautique et de l’Espace (Toulouse, France), and a doctoral degree in computer systems from the University Pierre et Marie Curie (Paris) under the direction of Prof. Paul Feautrier. He completed his post-doctoral studies at the McGill University (Montreal, Canada) at the ACAPS laboratory led by Prof. Guang R. Gao.
Low Power and High Speed Working Memory with Spintronics and Vertical MOSFET Technology
Tetsuo Endoh (Tohoku University, Japan)
Abstract: Recently in semiconductor memories such as working memories (SRAM, DRAM) and storage memories (NAND memory), it is becoming difficult to meet the target performance only by scaling technologies. Especially for 1X nm high speed working memories and beyond, the large power consumption brings more serious issues due to rapidly increase memory capacity, operation speed and leakage current of scaled CMOS. Moreover, the speed gap between each memory levels in addition to the speed gap between the operation speed of MPUs and that of working memories have expanded year by year. In this invited talk, it is discussed about the directionality of the semiconductor memory hierarchy structure in the future from the background mentioned above. It is introduced that with using 3D stacked memories based on Vertical MOSFETs and STT-MRAMs, the current issues of cell density, speed gap and power consumption will be simultaneously overcome, and novel memory hierarchy structure will be achieved. In addition, from the viewpoint of future high-end memory system, the impact of memory technologies hybridized with Vertical MOSFETs and spintronics devices such as MTJs is discussed. Finally, nonvolatile logic as one of application of STT-MRAM is shown.
Tetsuo Endoh joined Toshiba Co in 1987 and was engaged in R&D of NAND Memory. He was a Lecturer of the Research Institute of Electrical Communication, Tohoku University in 1995. Now, He is a Director of Center for Innovative Integrated Electronic Systems (CIES), a Deputy Director of Center for Spintronics Integrated Systems (CSIS) and a professor of Graduate School of Engineering, Department of Electrical Engineering of Tohoku University. His current interests are novel 3D structured device technology such as Vertical MOSFET, high density memory such as SRAM, DRAM, 3D-NAND and STT-MRAM, and Beyond CMOS technology such as spintronics based nonvolatile Logic. He also interested in Power management technology such as Power IC and Power circuit technology.
Riding the Perfect Storm, Bringing Mobile Compute to
the Data Centre
John Goodacre (ARM / University of Manchester, UK)
Abstract: EUROSERVER is a European commission FP7 funded project which is combining the technology trends of nanotechnology 3D integration, low-power mobile SoC processor integration and the impossible requirements from next generation cloud and high performance compute to investigate and build a solution for scalable, cost effective and flexible ARM-based server system architecture suitable across multiple markets. This talk will introduce the vision and the goals for the project and the approach the consortium is taking to realize a ground breaking solution out of this perfect storm.
John Goodacre joined ARM in February 2002 and took responsibility for their platform architecture. Today he is Director of Technology and Systems focused on various programs around the company long term roadmap. He has an active role across the research community and is involved in many research programs across both hardware and software architectures. Prior to leading the development of ARM MPCore multicore technology while at ARM, he specialized in enterprise software in Microsoft Redmond as Group Program Manager and architect delivering Exchange 2000 Conferencing Server. In addition, John was recently appointed Professor of Computer Architectures in the Advanced Processor Technologies group in the school of computer science at the University of Manchester, where he is able to pursue his research interests across both academia and industry platforms.
How can Medical Electronics Revolutionise Health Care by 2050?
Rudy Lauwereins (IMEC)
Abstract: Today’s medical practice lacks prevention, is slow and expensive, and often treats symptoms instead of root causes. In this visionary presentation, I will predict how medicine could look like in 2050, enabled by modern electronics. I will present four dream scenarios of a healthier world: a world where illness is prevented, a world without cancer, a world where personalised spare parts are produced, and a world without neurological and psychiatric disorders. Based on advanced prototypes of today, I will motivate why these visionary scenarios might become a reality. And I will leave the audience behind with a massive amount of ethical questions.
Rudy Lauwereins is vice president of imec, which performs world-leading research and delivers industry-relevant technology solutions through global partnerships in nano-electronics, ICT, healthcare and energy. He is director of imec’s Smart Systems Technology Office, guiding the strategic research decisions in vision and telecommunication systems, and in (bio)medical and lifestyle electronics,. He also leads the imec Academy, coordinating all external and internal training curricula. He is a part-time Full Professor at the Katholieke Universiteit Leuven, Belgium, where he teaches Computer Architectures in the Master of Science in Elektrotechnical Engineering program. Before joining imec in 2001, he held a tenure Professorship in the Faculty of Engineering at the Katholieke Universiteit Leuven since 1993. He had obtained a Ph.D. in Electrical Engineering in 1989. Professor Lauwereins has authored and co-authored more than 400 publications in international journals, books and conference proceedings. He is a fellow of the IEEE.
Data Centric Systems: Architecture and Solutions for Technical Computing, Big Data, and High Performance Analytics
Michael Rosenfield (IBM Research Division)
Abstract: Computing systems will need to evolve in two fundamental ways: they must target solution driven workflows and they must be designed in such a way as to explicitly accommodate the impact of big data and complex analytics. The system requirements of classic modeling and simulation (HPC or technical computing) will converge with those of big data and analytics. As an example, HPC systems will need to be optimized to perform well on modeling and simulation; but, also must focus on other important elements of the overall workflow which include data management and manipulation coupled with associated analytics. Traditional machine balance points are no longer sustainable, nor achievable, with standard approaches. At a macro level, workflows will take advantage of different elements of the systems hierarchy, at dynamically varying times, in different ways and with different data distributions throughout the hierarchy. This leads us to a data centric design point that has the flexibility to handle these data-driven demands. This flexibility will come from balanced & composable systems built from modular components with computation distributed to all elements of the system hierarchy. Data Centric Systems (DCS) focus on the problem of data location, and the principle that moving computing to the data will lead to more cost effective and efficient systems than prior generation systems. DCS systems, characterized by heterogeneous hardware, will provide leadership capabilities for Big Data, complex analytics, modeling/simulation and cognitive computing. DCS system software will allow the hardware to be used efficiently. Fully exploiting heterogeneous high performance capabilities will require additional evolution and innovation in programming models. A central motivator for DCS is to ensure the attributes of the architecture and implementation lead to commercially viable Exascale-class systems. This means that investments in programming models, languages and software development will be preserved for the future and that new optimized code will be positioned to take advantage of Exascale features.
Michael Rosenfield is the Vice President of Data Centric Systems at the IBM Research Division in Yorktown Heights, NY. Previously, he was Director of Data Centric Systems. The Data Centric Systems organization develops current and future data-driven technical computing and analytics systems technologies putting computing power everywhere data resides, minimizing data in motion and energy consumption. Major research areas in Data Centric Systems include current and future system architecture and design, system software, workflow performance analysis, and the convergence of Big Data, Analytics, Modeling, and Simulation. Prior to his current position, he was Director of Smarter Energy focusing on the coordination, strategy, and plan for IBM Research’s worldwide activities. Smarter Energy ranges from photovoltaics, energy storage, chip, system and datacenter level power through smart grid enablement, standards, and joint partnerships – working closely with IBM’s Energy and Utilities industry team as well as IBM’s Services, Software, and Hardware Divisions. Previous to becoming Director of Smarter Energy, Mike was the Director of VLSI Systems at the IBM Research Division in Yorktown Heights, NY. VLSI Systems focused on high performance microprocessor design, microarchitecture, lower power design techniques, improved designer productivity, the management of technology complexity, and design automation tools in support of IBM’s microprocessor and ASIC design teams. He was also the Research Division Area Strategist for Microprocessors and Tools. Previous to becoming Director of VLSI Systems, Mike was the Director of the Austin Research Lab, one of IBM’s first worldwide research labs. At ARL, he focused on new server systems architectures, systems-level power management/optimization, VLSI design, and design automation tools. Before joining ARL, he was the Senior Manager of VLSI Design and Microarchitecture at IBM Research and has held management positions in parallel communication architectures and electron-beam lithography for integrated circuit manufacturing. He started his career at IBM working on electron-beam lithography modeling and proximity correction techniques. He has a BS in Physics from the University of Vermont and a MS and Ph.D from the University of California, Berkeley.
Invited Presentations
Acceleration Methods of Accurate Ego-Motion Using an Image Recognition Hardware for Advanced Driver Assistance Systems
Motoki Kimura (Renesas Electronics, Japan)
Abstract: An accurate ego-motion estimation algorithm based on optical flow and stereo matching has been implemented on R-Car H2 SoC which has eight CPU cores and an image recognition hardware for ADAS (Advanced Drivers Assistance Systems) applications. The image recognition hardware is constituted from a cluster of sixteen programmable floating-point based processors and four dedicated cores for image operations, which are tightly connected through internal bus and SRAM. These two types of cores have been implemented carefully in R-Car H2 based on the analysis of the open source based computer vision libraries frequently used for prototyping and application development, so as to offer enough programmability and processing capability for embedded image recognition applications. In this talk, we will introduce the architectures of these cores in the image recognition processor, and show acceleration methods exploiting the capability of R-Car H2 SoC, in order to achieve the real-time operation of one of the most accurate ego-motion estimation algorithms in the world.
Motoki Kimura received M.S. and Ph.D. degree in 2003 and 2006, both from Osaka University, Japan. From 2006 through 2010, he was a hardware engineer for a video codec design in Renesas Technology Corporation, Japan. In 2010, He was transferred to Renesas Electronics, and since then he had been a development lead of a video decoder hardware IP for an embedded SoC. From 2012 through 2013, he was a visiting scholar of University of California, San Diego where his research interest was FPGA/ASIC development of an object detection hardware. Currently, he is a front end application engineer and his interest includes both hardware and software development for image recognition applications in Advanced Driver Assistance Systems.
Heterogeneous Multi-Core SoC for ADAS and Image Recognition Applications
Takashi Miyamori (Toshiba Corporation, Japan)
Abstract: In recent years, image recognition technologies have become into practical on embedded systems for automotive, digital-consumer and mobile products. For automotive applications, they are the key technologies for Advanced Driving Assistance System (ADAS) and will lead us to safer car society. We have developed heterogeneous multi-core SoCs for ADAS and other image recognition applications. Because these applications require tremendous computing power, there are big challenges to achieve such a high performance with low power consumption. Furthermore, high accuracy recognition is also required. We proposed a heterogeneous multicore architecture that consists of energy efficient VLIW processor cores with a SIMD coprocessor and hardware accelerators. Novel image features, such as CoHOG (Co-occurrence Histograms of Oriented Gradients) and color-based features are introduced and dedicated hardware accelerators have been developed for them. In this presentation, we will introduce architectures of these image recognition SoCs. Our latest SoC is composed with two 4-core processor clusters and 14 hard-wired accelerators and achieves 1.9TOPS as its peak.
Takashi Miyamori received the B.S. and M.S. degrees in electrical engineering from Keio University, Japan, in 1985 and 1987, respectively. In 1987, he joined Toshiba Corporation, where he was engaged in the research and development of microprocessors. He is currently a Senior Manager of Digital Media SoC Department at Center for Semiconductor Research & Development and working on the development of image recognition SoCs, image signal processing hardware and software, and multi-core processors for embedded applications.
ExaScaler-1: The Power-Efficient Submersion Many-Core
Processor Based Supercomputer
Sunao Torii (ExaScaler / PEZY Computing, Japan)
Abstract: We have developed the proprietary many-core processor based supercomputer, “ExaScaler-1”. “Suiren”, the first installed system of ExaScaler-1 consisting of four 8U tanks, totaling 32 unit (32U) systems, achieves over 190TFlops HPL (High Performance Linpack benchmark) performance and 4.95GFlops/W power efficiency respectively. These values are ranked at 369th in the TOP500 list and at 2nd in the Green500 list of November 2014. Suiren is the only supercomputer ranked both in the TOP500 list and in the Green500 list utilizing a general-purpose many-core accelerator, which is developed by a venture company. ExaScaler-1 adopts PEZY-SC many-core processor as a calculation accelerating device. PEZY-SC integrates 1,024 of MIMD (Multiple Instruction-stream Multiple Data-stream) processing elements (PEs) on a chip. It also integrates 8-ch 64bit 2.4GHz DDR3/4 SDRAM interfaces and 34MB on-chip cache and scratch pad memory enough for memory bandwidth requirements. PEZY-SC consumes around 80W dynamic and 10W leak power consumption. Each unit of ExaScaler-1 combines two Intel Xeon E5-2660v2 processors and 8 PEZY-SC chips with 512GB SDRAM and it consumes around 1.3KW. To maintain low-temperature for the whole system and to minimize a tank machine’s installation space, we have developed a new submersion liquid cooling system from the scratch. Since it soaks whole mother-board totally in coolant, we can keep low enough temperature even for small parts such as power supply and memory modules as well as a highly power consuming device like CPU. It realizes not only reducing the chip leak current but also increasing the system reliability. Since this system adopts not popular 2-phase cooling, but 1-phase thermal conduction cooling with open roof top tank, it enables to reduce both maintenance cost and manufacturing expenditures for cooling tanks. In my talk, I will present the ExaScaler-1’s cooling system as well as PEZY-SC many-core chip architecture. Furthermore, I will explain our further development plan for the new era of Exa-scale computing system.
Sunao Torii is a General Manager of R&D/Chief Technology Officer of ExaScaler Inc. He joined NEC in 1992, research in parallel computer and microprocessor architecture. Control-flow parallel chip multiprocessors Merlot, Mobile low-power high-performance multiprocessor MP211 are developed in 2000 and 2005 respectively. These developed technologies are adopted to successor mobile processor Medity. Medity received the 18th prize for the global environment award by Minister of Economy, Trade and Industry. In 2009, he was a group leader of low-power network on chip (NoC) architecture in NEDO many-core green IT project. In 2010, He moved to Renesas Electronics and developed an image recognition processor. He joined PEZY computing K.K. in Mar. 2014. He received the best paper award of IPSJ journal in 1998.
Panel Discussion
Topics: “Computing Technology for Autonomous Driving”
Organizer / Moderator:
Shinpei Kato (Nagoya University, Japan)
Panelists:
Takashi Miyamori (Toshiba)
Tadashi Kamada (Denso)
Tsuguo Nobe (Intel)
Mandali Khalesi (HERE)
Abstract: Autonomous driving is becoming more and more multidisciplinary. Not only vehicular technologies but also computing, networking, and data management technologies are involved in autonomous driving. Of particular interest includes the trade-off between in-vehicle computing and cloud computing to support artificial intelligence of autonomous driving. Perception and planning of autonomy requires high-performance computing while battery-driven vehicles must consider power problems. Offloading such computations onto the cloud could be a drastic solution, though safety and reliability of driving remain major concerns. Data management is also a grand challenge of autonomous driving. In particular, high-precision maps are considered to be the common infrastructure to self-localize vehicles and efficiently route them to their destinations. Unfortunately, current navigation systems are not well compatible to high-precision maps and the sustainable management of map data also remains an open problem. These problems of autonomous driving are not dedicated to particular technologies but need to be addressed by tight coordination of multiple technologies. This panel gathers experts from multiple areas across vehicles, computing platforms, maps, and consumer electronics.
Special Sessions (invited lectures)
Adaptive Many-Core Architectures for Speed and Power
Convergence in Advanced Technology Nodes
Edith Beigné (CEA-LETI MINATEC)
Abstract: With the increasing complexity of today’s many-core applications, extremely high performance has become the main requirement. However, high performances do not only mean high speed but also low power. For example, in wireless internet devices, very high speed is mandatory for games or video computing while it is necessary to save dynamic and static power for low speed applications in order to improve the battery life. The convergence between high speed and low power is very difficult to reach. Most of the time, ultra low power architectures cannot reach high speed and conversely, at high speed, a lot of power is consumed. Using energy efficient architectures is the only way to achieve a good compromise between speed and power. In this talk, we will first overview fine-grain adaptive Voltage and Frequency Scaling architectures for many-core. Hardware issues will be discussed and some design solutions will be proposed for good performances results. Those architectures are, however, requiring a Wide Voltage Range of operation to reduce power and increase energy efficiency. We will then focus, during the second part of this talk, on Ultra Wide Voltage Range (UWVR) design challenges at the nanometer regime. How to improve the trade-off between leakage, variability and speed at low-voltage? Obviously the trend is to use thin film devices. Undoped thin-film planar FDSOI devices are being investigated in this presentation as an alternative to bulk devices in 28nm node and beyond, thanks to its excellent short-channel electrostatic control, low leakage currents and immunity to random dopant fluctuation. This compelling technology appears to meet the needs of nomadic devices, combining high performance and low power consumption. A major challenge for this technology is to provide various device threshold voltages (VT), trading off power consumption and speed. This presentation will finally highlight the development of an UWVR multi-VT design platform in FDSOI planar technology on Ultra Thin Body and Box (UTBB) for the 28nm node. The efficient use of an adaptive voltage and frequency scaling architecture has been proved on a 32-bit VLIW DSP exhibiting outstanding silicon results in terms of speed and energy. The use of an efficient Body Biasing (BB) shows an extremely efficient performance tuning for high energy efficiency. To conclude, this talk will give a short overview of FDSOI performances for Internet of Things future applications.
Edith Beigné joined CEA-LETI MINATEC in 1998 first working on RFID systems for biomedical applications. She focused then on asynchronous systems and circuits specifically for ultra low power mixed-signal systems and cryptographic circuits. Since 2005, she is in charge of the low power design team within the digital laboratory developing fine-grain power control and local voltage and frequency scaling innovative features. Since 2009, her main focus is to manage power and variability issues in advanced technology nodes for high energy efficiency. She was leading complex innovative SoC design in 65nm, 32nm bulk and now in 28nm and 14nm FDSOI technologies for adaptive voltage and frequency scaling architecture based on GALS structures. Her main focus is today automatic performance regulations for ultra low power circuits.
System-Level Energy Management in Many-Core Systems
Utilizing Distributed Speed-Power Controllers
Anca Molnos (CEA-LETI)
Abstract: Energy efficiency is one of the crucial concerns today in computing systems ranging from small connected devices to large data-centers. This issue is addressed a various levels, and recently we have witnessed a lot of progress in methods to control speed and power consumption of digital circuits. Notable examples are fine-grain adaptive voltage and frequency scaling, and the adoption of new technologies such as Fully-Depleted Silicon On Insulator (FDSOI). These advances however bring new knobs to tradeoff power and speed, e.g., supply voltage, body-bias voltage, which, in turn, open interesting questions about how to fully take advantage of their potential at software level. This talk we will present methods to reduce power consumption of applications and the tradeoffs therein. As a research vehicle, we have the case of a low-power many-core architecture with several power domains and distributed speed-power controllers. We will study the impact of adaptive voltage scaling and discuss methods to determine the optimal power modes, both with benefits at system level, in the context of advanced technologies such as FDSOI.
Anca Molnos received her M.Sc. degree in computer science from the “Politehnica” University of Bucharest, Romania and the Ph.D. degree in computer engineering from the Delft University of Technology, The Netherlands, in 2001 and 2009, respectively. Between 2006 and 2009 she was senior scientist at NXP Semiconductors, The Netherlands, working on low-power multi-processors and distributed real-time systems. From 2009 to 2012 she was a researcher with the Delft University of Technology, working on embedded multi-core resource management for low-power and quality of service. In January 2013 she joined CEA LETI, where her research focuses on developing energy-aware software, energy and variability management, and frameworks for adaptable parallel systems. She co-authored more than 40 papers in journals and international conferences and several patents and she served in the technical committees of prestigious international conferences and workshops, among which ICCD’10-’14, SCOPES’12-’14, ICCAD’14, program co-chair of MPSoC’14.
Towards Open-Source Development of Autonomous Vehicles
Shinpei Kato (Nagoya University, Japan)
Abstract: Autonomous driving is composed of perception, planning and control technologies. Perception components are supposed to understand scenes of driving in real-time with, for example, object detection and self-localization. Planning components use the results of perception to determine the path of driving including behaviors and motions. Finally, control components drive the vehicle in accordance with the plan. These components are often developed in different communities and are not designed to coordinate with each other, not being integrated as a reliable system. This talk introduces open-source software for autonomous driving, which provides all necessary components integrated as a system. Research and development of autonomous driving can build on top of this software, using provided components or adding new components to enrich the system. To be the best of my knowledge, this is the first piece of work on open solutions for autonomous driving.
Shinpei Kato is an Associate Professor in the School of Information Science at Nagoya University. He received his B.S., M.S., and Ph.D. degrees from Keio University in 2004, 2006, and 2008, respectively. He has also worked at The University of Tokyo, Carnegie Mellon University, and University of California, Santa Cruz from 2009 to 2012. His research interests include operating systems, real-time systems, and parallel and distributed systems.