New Machine Tops The Green500 List

The top 23 systems on this year’s list are all heterogeneous.


The Green500 has released its latest list of the top 500 most energy efficient Supercomputers and there is a new machine, L-CSC from the Helmholtz Center that is the first supercomputer to surpass the 5 GigaFLOPS/watt barrier. The machine is yet another heterogeneous system and is based on AMD FirePro S9150 GPU accelerators and Intel Xeon E5-2690v2 10C 3GHz processors.

IBM and NVIDIA aren’t standing still in this race, though, and announced the award of a significant chunk of a grant from the U.S. DOE for $425M to build two new supercomputers targeted to deliver 150 to 300 peak PetaFLOPS. The systems are expected to be installed in 2017. According to the DOE Exascale Challenge, the target is 20-40 MW in 2020 for 1 ExaFLOPS and these two new systems will be part of the race towards that goal.

Tianhe-2 with an Rmax of 33.862 PetaFLOPS is still sitting at the number 1 position on the Top500 list but has slipped from 49th to 64th on the Green500 list. It’s interesting to note that while Tianhe-2 has a peak of ~55 PetaFLOPS, the Rmax is about 60% of the peak value. It’s quite typical for these machines to have peak scores that are significantly higher than their Rmax scores. The Rmax scores are based on the Linpack benchmark and the Top500 site states, “In particular, the operation count for the algorithm must be 2/3 n^3 + O(n^2) double precision floating point operations.” This is supposed to present a more “realistic” measure of the machines actual performance capabilities on “real” problems. This is mentioned to help put into context the peak claims vs. the Rmax measurements that are used to rank systems on the Top500 and Green500 lists.

It’s interesting to note that the top 23 systems on the Green500 list are all heterogeneous. In the runner up position on the Green500, from Japan, is Suiren powered by PEZY-SC many-core accelerators paired up with Intel Xeon E5-2660v2 10C 2.2GHz processors and in 3rd place, also from Japan, is the former top position holder TSUBAME-KFC using NVIDIA K20x GPU accelerators paired with Intel Xeon E5-2620v2 6C 2.100GHz processors.

Figure 1. Energy Efficiency vs. Total Compute Capability

Picking up again on looking at energy efficiency vs. total compute capability like we did back in July’s blog, it would seem to be much harder to build a machine to top the Top500 and still do well on the Green500. Plotting the top 100 of the Green500 list in Figure 1 above shows a “frontier” line in red with L-CSC now the top efficiency system.

Even with this raise to the left end of the line, Piz Daint still sits above it indicating that it’s perhaps pushing the energy-efficiency frontier further than its competitors. If we look at the slope on this log vs. log plot, we get a slope of ~-0.216. This would indicate that for every factor of 10 increase in system compute capability we lose about 39% of our efficiency or in other words we lose about an order of magnitude in efficiency for every 4+ orders of magnitude increase in performance. Extrapolating this out to 1 exaflop would indicate that such a system today would operate at theoretically ~915 MFLOPS/W implying a whopping 1.09 GW of system power. This is about 27x the 40MW target.

If we were to just draw a line through L-CSC and Piz Daint, the slope would indicate about a 33% loss in efficiency for every factor of 10 increase in performance. Extrapolating from this new line gives us a projected efficiency of ~1300 MFLOPS/W or about 19x short of the 40MW efficiency goal at 1 ExaFLOPS. A theoretical factor of 19 could be a challenge to make up in 6 years.

On the plus side for improvements going forward, the AMD, NVIDIA and PEXY-SC accelerators are all implemented in 28nm CMOS technology, so we should be hopeful that moving the accelerators soon to smaller technology nodes will provide a significant boost in efficiencies. It will be interesting to see the efficiency ratings of the new top end machines in 2017 to see how much farther the technology will need to progress in the last 3 years towards the Exascale goal.