From an architectural perspective, there’s been more discussion lately about offloading tasks to specialized hardware. Is it worth the trouble?
Looking at an SoC design from an architecture viewpoint, I’m hearing more discussion lately about the option of offloading tasks to specialized hardware.
Especially where dark silicon is concerned, rather than having four or eight ARM processors — all with the same complexity — or cores like graphics processors, if you cannot use them all at full performance and they have to be shut off, then you’re not utilizing the amount of hardware that you have because of a power problem, Krishna Balachandran at Cadence pointed out.
And this is what leads to the notion of offloading some of the functions into specialized hardware (which are not processors) and which is implemented almost like an ASIC on the chip. He said this specialized logic becomes like hardware accelerators for something like a DSP function or some kind of a baseband where there is a lot of computation that gets handled by the specialized hardware. Also, it doesn’t get shut off, but it doesn’t consume as much power because it’s been implemented in a very optimal fashion. He said by doing this, the silicon can be used and is not you are able to use the silicon and you’re also not keeping it dark.
Then, Steve Carlson, also at Cadence, noted that there used to be nice replication of homogeneous cores that allowed a divide and conquer approach towards the implementation with a very regular kind of cookie cutter approach to the implementation. However, when these hardware accelerator units — that look more like ASIC or random logic kinds of implementations — get thrown into the mix, things like routing, congestion, and memory access become irregular, and this becomes a much harder problem. “It requires a lower density and that contributes to the unlightable solution as well, where you have nothing going on because of the interconnect complexity of the design. And that’s wasted money.”
Balachandran reminded that memory is all pretty much dark anyway because at any one point in time, out of a big memory, it’s only accessing one word which might be a 32-bit word or whatever size the memory is — the rest of the memory is not used. Unless a dual port or a multi port memory is being used, most of the memory can be shut off. “By definition, a memory is always dark. So, moving a lot of memory on chip is a good idea because you’re not going to use the power budget up because you can turn off all the banks in the memory that you’re not accessing, and having the memory closer on-chip, you’re going to get a performance benefit. It’s well understood that having more and more memory on chip improves the performance so this is becoming a trend where the area on the chip is being used because of the dark silicon problem to move more memory on board.”
As with many architectural decisions, the balance is a challenge. As an engineer, what is your experience with fighting back against dark silicon using different architectures? Leave us your comments below.
“Fighting back against dark silicon using different architectures”
A Proof-of concept dual core chip in 65nm was manufactured and
97% of the chip are used in memory blocks.It was possible to reach 350 MHz drawing 18 mA at 1.2V with both cores active. The cores are cacheless and there are no memory controller.
Calculation show that with 14 nm technology, an area of 238 mm2 could have 4096 cores, 672 MByte ROM and 400 MByte RAM and consume 31 W at 1.6 GHz.