Supercomputing Efficiency Lags Performance Gains

Fifteen years of data shows the challenge of creating efficient systems with millions of processor cores.

popularity

In last month’s article, Top 500: Frontier is Still on Top, I wrote about the latest versions of the Top500 and Green500 lists. Power is an incredibly important aspect of designing a world performance leading supercomputer. (Why, I can remember back to when you could run the world’s fastest machine on only a couple MW of power.)

The first Green500 list was published back in 2013. Happy 10-year anniversary, Green500. There was power data incorporated into the Top500 starting 5 years earlier though, in 2008. I went back to 2008 to look at the progress that has been made over the past 15 years from an efficiency and performance standpoint. Figure 1 below shows the progress made over the past 15 years in terms of efficiency. There are certainly less computationally powerful machines that have significantly higher efficiency ratings, but part of the challenge in creating the world’s fastest supercomputer is getting efficiency at a massive computational scale where data needs to be able to be distributed over millions of processor cores.

Fig. 1: Efficiency ([TFlop/s] / kW)

The gains in efficiency have clearly been following an exponential path forward. The path is marked by quantum steps in improvement that have been happening every 2 to 4 years at rate of about 1.38x, or roughly 10x every 7.2 years. The red diamonds denote the points used to calculate the best-fit line shown on the graph.

Fig. 2: Performance (TFlop/s)

Figure 2 shows the gains in performance over the past 15 years. It too is following an exponential curve, but it is approximately 1.56x, which corresponds to a 10x gain every 5.2 years. The gains in performance are significantly outpacing the gains in efficiency. Again, the red diamond points correspond to the same points in figure 1. At the above rates, the efficiency will fall behind by about 10x every 18.5 years. In fact, what we see from the above charts is that the top machine in 2008, the then new No. 1 system Roadrunner, broke the petaflop/s barrier while using a “mere” 2.345 MW of power. Compare that with today’s Frontier which originally broke the exaflop/s barrier and currently sits at 1.194 exaflop/s while using 22.7 MW of power. As the curves would predict, in 15 years we would see about 3 orders of magnitude in performance gains and closer to only 2 orders of magnitude in efficiency, meaning that we should see nearly an order of magnitude gain in the power of the system.

If these curves continued forward at the same pace that would imply that it would be roughly around 2038 before we would see a Zettascale computing system and it would be operating in the neighborhood of 200MW. From “Understanding Data Center Energy Consumption”, per the US Department of Energy, the largest data centers with tens of thousands of devices require over 100MW of power, which is enough to power approximately 80,000 households. 200MW then seems like a lot of power for one machine. 15 years is a long time though by technological standards, and there will be many challenges. It’s also possible that other forms of computing will move to the forefront in terms of computational importance over that time.



Leave a Reply


(Note: This name will be displayed publicly)