Six Companies Get DOE PathForward Grants For Exascale Computing

By Allison Proffitt

June 15, 2017 | The Department of Energy today announced a $258 million investment with the intention of delivering at least one exascale-capable computing system by 2021. AMD, Cray, Hewlett Packard Enterprise, IBM, Intel, and NVIDIA received DOE grants to support R&D in three areas: hardware technology, software technology, and application development.

The six companies will receive funding from the Department of Energy’s Exascale Computing Project (ECP) as part of its PathForward program. The $258 million in funding will be allocated over three years, and companies are committing to providing additional funding amounting to at least 40% of their total project cost, bringing the total investment to at least $430 million.

“PathForward bridges the gap between open-ended architecture R&D and advanced product development that’s focused on the delivery of first-of-a-kind capable exascale systems,” explained Paul Messina, Director of the Exascale Computing Project, yesterday on a press briefing call. The program investment is for research, Messina clarified, not for the building of systems.

The PathForward program opened its request for proposals last June seeking proposals that would “improve application performance and developer productivity while maximizing energy efficiency and reliability of an exascale system.”

The ECP and PathForward model is one of co-design by DOE and private industry, Messina emphasized. “The work funded under PathForward has been strategically aligned to address those key challenges to development of [exascale computing, including] innovative memory architectures, higher speed interconnects, improved reliability of systems, and a process for increasing computer power capability without prohibitive increases in energy demand.”

Exa At Scale

Exascale systems will be at least 50 times faster than the nation’s most powerful computer today—the Titan system at Oak Ridge National Laboratory. Global competition for exascale is fierce. While the U.S. has five of the 10 fastest computers in the world, Titan ranks third behind two systems in China.

Exascale has been a moving target for the United States (and other countries) for years. In his 2012 budget, President Obama presented funds for an exascale computing system, but by the time the budget was approved in 2013, the Department of Energy’s funding for supercomputing was trimmed. The grants announced today align with a December 2016 goal of deploying at least one exascale system by 2021. Early this year, an engineer at China’s National Supercomputer Center said it expected an exascale prototype by the end of 2017, though the finished product won’t be operational until about 2020.

The ECP’s goal for an exascale computer is not merely achieving exaflops (10¹⁸ floating point operations per second) at peak performance, but exascale capability, “measured in actual application performance,” Messina said. Application performance figures of merit will be the most important criteria, the PathForward RFP stipulated, and Messina noted that the LINPACK benchmark would not be used.

Winners’ Plans

The grant winners echoed the same goals in their remarks during the briefing. Several eschewed peak performance machines, talking instead of sustained performance. IBM mentioned accelerating cognitive computing. HPE discussed memory-driven computing architecture. AMD and NVIDIA both touted GPUs.

Every company representative expressed excitement about working with DOE workloads and use cases, but most also emphasized their commercial direction. “We believe the work we do has to also be viable for commercial products. Building custom systems is too expensive and doesn’t make sense to us,” said IBM’s James Sexton.

AMD

It’s an exciting time for computing in general, said Alan Lee, AMD Corporate Vice President for Research and Advanced Development. While the PathFoward program is focused on HPC, the benefits of higher performance, lower power consumption, and efficient accelerator architectures apply to mobile devices, cloud servers, machine learning, data science, and more, he said. He summarized AMD’s planned projects under the PathForward program to include streamlining GPUs for HPC and machine learning; taking x86 CPU performance to the next level; providing advanced memory organization to feed the compute engines; developing new system architectures that are easier for programmers; supporting open next generation interconnects for tying together the components of the system; and enabling novel integration technologies to maximize performance.

Cray

“When people hear exascale, they often think about the peak floating point performance of a machine,” said Steve Scott, Senior Vice President and Chief Technology Officer at Cray. “At Cray we care very little about peak performance. We’re really focused and committed to delivering sustained performance on real workloads that matter to our customers.” Scott outlined myriad challenges to sustained exascale performance including power efficiency and cooling; and architecture of nodes, interconnects, and storage. “System manageability and resiliency at scale are particularly challenging as we push the frontiers,” he said. “Our PathForward project will focus on several of these topics as well as building systems that are highly flexible and upgradable over time in order to take advantage of various processor, network, and storage technologies.”

Hewlett Packard Enterprise

HPE’s PathForward investment leverages several years of R&D investments in memory-driven computing, said Mike Vildibill, VP, Advanced Technologies, Exascale Development & Federal R&D Programs. It’s an architecture that puts memory, not processing, at the center of the computing platform to realize a new level of performance and efficiency gains, and Vildibill called it “fundamentally required” to achieve the power efficiencies needed for exascale systems. The PathForward investment will accelerate HPE research into silicon photonics, balanced system architectures, and better interconnects, Vildibill said. The company also said it plans to develop open architectures based on open industry standards, particularly using the Gen-Z chip-to-chip protocol. In an HPE blog, Bill Mannel, Vice President and General Manager at HPE, outlined how he expects exascale computing to progress. “I predict that we will start to see exascale-class systems within three years (roughly by 2020), but they will not be very efficient in terms of programming efficiency, power and footprint. Moreover, they might reach one exaFLOPS briefly, for a few selected programs that have been highly optimized with heroic efforts… We’re looking to comfortably exceed one exaFLOPS all day long, on a wide class of applications, with Memory-Driven Computing.”

Intel

High performance computing has established itself as one of the three pillars of scientific discovery along with theory and experiment, said Al Gara, Intel Fellow, Data Center Group Chief Architect, Exascale Systems. “The usability of these systems is really fundamental. We’re not just building FLOPS. We really want to build highly scalable machines that are really usable,” he said. “Exascale from Intel’s perspective is not just about high performance computing, it’s also about artificial intelligence and data analytics. We think that these three really are all now part of the solution.” Gara said artificial intelligence is “probably the fastest growing segment of computing” as we find ways to efficiently use data to learn relationships and make accurate predictions. He highlighted use cases in building more efficient batteries—a grand challenge problem for the nation—and healthcare. “The ability to both analyze large databases to find patterns and develop personal treatments… are areas where exascale will really be targeted.”

IBM

“We do believe that future computing is going to be very data-centric, and we are focused very much on building solutions that will allow complex analytics, modeling, and simulation to execute on very large datasets,” said James Sexton, IBM Fellow and Director of Data Centric Systems, IBM Research. The PathForward initiative will accelerate IBM’s capabilities to deliver cognitive, flexible, cost-effective, and energy-efficient exascale systems, he said, emphasizing that solutions will be both on-premises and cloud solutions.

NVIDIA

NVIDIA plans to focus their development on improving the performance, resilience, and efficiencies of GPU accelerators. “It’s important to understand that Summit [a supercomputing system to be installed at Oak Ridge National Laboratory next year] and Sierra [a system to be delivered to Lawrence Livermore National Laboratory later this year] and all the machines we build with GPUs are not just FLOP machines, and not intended to accelerate LINPACK,” said William Dally, Chief Scientist and SVP of Research at NVIDIA. “That’s actually not important for competitiveness. What is important for competitiveness is accelerating a combination of AI and simulation, combining the ability to do deep learning very effectively with traditional, physical modeling. That’s how future scientific computations are being done.” NVIDIA believes GPUs will be essential to achieving energy efficiency as well. “DOE has ambitious goals for improving power efficiency, to achieve exascale performance using only 20-30 megawatts,” wrote Steve Keckler, Vice President of Architecture Research, on the NVIDIA blog today. “By comparison, an exascale system built with CPUs alone could consume hundreds of megawatts.”