Building the systems that power machine learning is an immensely complex task.
Artificial intelligence and machine learning are hot. Many, many startups, exciting new applications and lots of venture money. The technology promises to change the world. Whether it’s autonomous vehicles, domestic robots or machines that replace doctors and lawyers, the implications are astounding, and somewhat frightening. Let’s put the socio-economic dimension of this discussion aside for a moment and look at the enabling technology.
A lot of machine learning is being done in the cloud today. Software running on massive compute farms being trained to recognize facial expressions, household pets, tumors in an X-ray. Things like that. While this fundamental work is critical to progress, it’s not enough. Many of these systems will need to work in real time, and that requires massive local processing capability. We’ve already seen some of these systems. eSilicon is working on a few as well. Huge FinFET ASICs and HBM2 memory integrated in a 2.5D package. The challenges to design these systems are immense.
Compared to just 18 months ago, the entire design enterprise is a lot more complex. A 20X increase in compute requirements is not uncommon. Design team size increases in the range of 5 – 10X are also typical. And then there’s the IP, process technology and package choices. Many options, but only one chance to pick the right combination to hit the power, performance and area (PPA) target of the final system. There needs to be a way to manage all this with low risk and high certainty of success.
What we’re discovering is that machine learning can be applied to the problem of designing machine learning chips. To begin, you need a massive knowledge base of all combinations of process technologies, technology options, IP and package configurations. This does take years to build, but it’s well worth it. You also need to capture the profiles of CPU, disk, memory and I/O bandwidth required for many types of advanced designs, and for the various steps in the design process.
One can then begin to apply machine learning algorithms to this data. We see significant improvements in schedule predictability, PPA compliance and an overall reduction in program risk when you start to adopt these kind of techniques. A system that designs machine learning systems is still on the horizon, but using machine learning to help build machine learning chips is very real.
Leave a Reply