Custom design is gaining ground against standardized approaches in a variety of new applications.
Custom hardware is undergoing a huge resurgence across a variety of new applications, pushing the semiconductor industry to the other side of Makimoto’s Wave.
Tsugio Makimoto, the technologist who identified the chip industry’s 10-year cyclical swings between standardization and customization, predicted there always will be room in ASICs for general-purpose processors. But it’s becoming more difficult to rely on off-the-shelf processing elements as the benefits of Moore’s Law diminishing at each new node. This is particularly true for a variety of new application areas, where there are more and different types of data to process.
“One area is artificial intelligence—especially some specific machine-learning types of applications,” said Juan Rey, vice president of engineering, Calibre, at Mentor, a Siemens Business. “Also, huge specific domains have a very clear argument for developing customer-specific hardware. Cryptocurrency is one area where there is so much investment and so much focus because of companies that are doing essentially mining, along with other activities. Another area is medical imaging, where a clear combination of a well-defined market and a single algorithm, or a few algorithms, that control the vast majority of either the performance or the power consumption are the key requirements to be tackled for that specific industry.”
In segments where the computation that needs to be performed is out of the ordinary, such as AI, many engineering teams are coming up with their own processor architectures as opposed to going with a general-purpose processor in order to optimize power and ultimately cost, said Carlos Maciàn, senior director of AI strategy and products at eSilicon. “Because they have the freedom to deviate from the standard von Neumann architecture, they are introducing novel concepts such as near-memory compute, in-memory compute, and others.”
This is giving new instruction set architectures such as RISC-V an edge due to its open-source nature, Maciàn said. “The fact that it is license-free, and because it is open-source, it has enabled engineering teams to develop their own instruction set extensions—and in particular for AI, being able to operate on vectors and matrixes. That is a huge advantage. That is the best that you can accomplish with a general-purpose processor, which is why RISC-V is also very popular in that space.”
Rupert Baines, CEO of UltraSoC, agreed that general-purpose compute by definition will always be with us, but the current trend is toward more customization. “There is a pendulum in the industry, and you do see new architectures coming, specializations coming, and then things swing back towards the more general-purpose world.”
He recalled that some of the first projects he worked on as an engineer were a bit like the AI architectures of today. “At the time it was specialized architectures for running object-oriented code and machines that ran Smalltalk natively, or ran this natively and then after a while everyone said, ‘This is a bit silly. Why don’t we just buy an x86 and use a better compiler?’ And the pendulum swings.”
However, for some things the pendulum gets stuck, Baines noted. “I don’t think anyone is ever going to be using a general-purpose Intel MMX graphics instructions. Nobody’s going to go back to that anymore because graphics is such a standard staple that having an architecture there will make sense; having a co-processor will always make sense. Maybe AI is going to be similar that there will be a space and a segment and there’ll be a co-processor that will be an optimized architecture.”
And similar to a particular ecological environment, certain pressures push things in a certain direction. “In an IoT system with insane cost pressures, you’re going to end up with different solutions. An inference engine running in a low-cost IoT is going to end up with a different answer than a training engine running in a data center. Right now we are in a Cambrian explosion of those new ideas and new things, so they’re being thrown out there. There are 40 or 50 different AI startups all coming out with variations on a theme. Then, Intel, Nvidia, etc., are all trying to work out what the optimum balance is too. Some of those guys succeed and strive and flourish and some of them won’t. It’s a fascinating time for that,” he said.
Others are seeing a similar shift. Chris Jones, vice president of marketing at Codasip, pointed out that in this ever-increasing push for lower-power improvements, chips are becoming more and more specialized, and less silicon area is consumed by general-purpose compute.
“However, the decision for general-purpose versus dedicated-task engines or programmable accelerators all comes down to software,” Jones said. “There always will be instances where the software that will be run on a given chip is largely unknown, and if the software load is indeterminate, all the chip designer can do is provide a robust general compute platform where performance is purely a function of core frequency and memory latency.”
On the opposite end of the spectrum is where the software to be run is 100% always known prior to chip tape-out, Jones explained, be it security offload, sensing, physical layer comms protocols or an inference algorithm. “In that case the tradeoff becomes hard-wired logic versus an optimized task engine. Given infinite time and resources, a hard-wired logic block running a single task will almost always be smaller and lower power than a programmable element. However, we live in a world where there are always limits on schedule and manpower, so dedicated optimized processors make a lot of sense. Plus, programmable accelerators offer the benefit of making post silicon changes in software rather than chip re-spins, saving money and prolonging the lifespan of a given chip design if standards evolve.”
Fig. 1: Customized design approach. Source: Cadence
System-level concerns
As architectural tradeoffs must be made at the system level, the architect is always questioning whether to use application-specific hardware or one with some level of programmability, said Prasad Saggurti, director of product marketing at Synopsys. “Depending on the size of the market, you either go for a customized solution or you do a more general purpose solution.”
And while one engineering team may choose an entirely custom solution, the pendulum may be swinging slightly back to general purpose in the area of artificial intelligence applications. As Saggurti pointed out, in certain market segments like AI applications and embedded vision, there are certain types of computations happening more than others and as a result, IP providers armed with this knowledge are able to build in more specialized and application-optimized building blocks. For instance, Synopsys logic libraries are standard cells which contain certain gates that help make embedded vision processors and AI machine learning for the cloud processors much smaller and lower power. “Because we are providing these building blocks, we talk to our customers — chip designers — and see how they’re using our libraries and our memories. We see that there are certain optimizations we can do to make their products better. In that sense, we have a special HPC kit which includes a bunch of special logic library cells which make these AI implementations smaller and lower power. It still meets their performance requirements, but it ends up doing it in a lower power implementation.”
At the chip level, Saggurti noted there is a mix of specialized circuitry and general-purpose compute. “Engineering teams are not avoiding specialized circuitry. They are putting more and more of that. If somebody is doing a cryptocurrency chip, there’s a lot of specialized circuitry running at very, very low voltages. We support those guys using very-low-voltage memories and libraries, especially. But in other times, the restrictions that we can put on a cryptocurrency chip are different. AI chips are much more complex, and lots of different types of memories and libraries are needed.”
In short, he said, there is the GPU approach and then there is the pure CPU approach. For AI, people are somewhere in between. For cryptocurrency chips, it is completely specialized hardware. “Also, it is specific to whether you’re doing a CNN engine or inference or a training engine—those things also make a difference because whether somebody uses a typical DDR or HBM2 or they are thinking of using GDDR, all these things are driven by system architecture requirements. Of course that is external memory, but they also play into this. It doesn’t impact on-chip SRAM, but their choice of HBM, the end application, and whether it’s an inference engine or a neural network CNN engine, all these things make a difference.”
Still, Baines maintains “general-purpose compute by its very nature is always going to be the dominant paradigm, and other things will be ebbing and flowing, waxing and waning. Some of those gain a stable position and there’s an ecosystem that grows up with them.”
The fact remains that architectures today for everything from applications processors to wireless baseband to machine learning consist of as little general-purpose compute as they can get away with. On top of that, there is a constellation of dedicated offload engines with custom instruction sets, interfaces, memory subsystems, and so on, optimized for a specific or limited number of algorithms, said Codasip’s Jones. “This trend is in part responsible for the groundswell of interest in RISC-V as a platform. Given the modular nature of the architecture and its collection of standard extensions, users can create processors with little extraneous logic. This provides potentially enormous power savings, as it has been shown that custom instruction sets can radically reduce the cycle count of a particular piece of code, allowing the user to run the core at a much lower frequency while still maintaining the needed performance.”
So is general-purpose compute still applicable, given the number of examples for custom solutions? “Yes, it is,” says, Frank Schirrmeister, senior group director for product management and marketing at Cadence. “It comes down to how to get to functionality.”
In his mind there are seven ways to accomplish this. “On the extreme left is putting it all in pure, manually implemented hardware. A lot of people are still doing this and that’s where you have specific functionality; that’s where differentiation happens. That’s where you get all the differentiation from implementation. You can use specific technology, you can do things like analog/mixed signal, which otherwise would be really difficult in digital, then you can do all kinds of cool stuff in pure hardware. On the far other end is the pure software, which is the generic compute.”
And so the wave continues to cycle.
Related Stories
Leave a Reply