Chips area is never fully utilized, creating opportunities for on-chip monitoring and improved reliability.
Chipmakers are pushing to utilize more of the unused portion of a design for different functions, reducing margin in the rest of the chip to more clearly define that white space.
White space typically is used to relieve back-end routing congestion before all of the silicon area is used up. But a significant amount of space still remain unused. That provides an opportunity for inserting monitoring, inspection, and other features that will add value without raising the die size.
“Advanced nodes end up somewhere in the vicinity of 60% to 70% [utilization],” said Raanan Gewirtzman, chief business officer at proteanTecs. “You don’t go beyond that in advanced chips because of the routing challenges.”
Monitoring circuits, inspection features, and other additions can be quite small if designed with die size in mind. With white space distributed broadly around the logic, there are many opportunities for insertion. But any added circuits would need to avoid further stressing the interconnects.
Too much metal
Each new silicon process node represents a shrink from previous nodes. Ideally, all features on all layers would scale down by the same amount. But that’s hardly ever the case – in particular at the dimensions envisioned in advanced silicon nodes.
Transistor scaling has proceeded more aggressively than metal scaling. As designs are routed, metal congestion, particularly on lower and middle layers, has become the limiting factor. While the metal layers are stressed, the transistor layers are not. “Standard cells are shrinking far more than the spaces between each metal,” said Eliot Gerstner, senior design engineering architect for Tensilica IP at Cadence.
Transistors have room to be packed closer together, but there is no routing space available to connect them. Color-aware placement also has a role here. As a result, many designers find that silicon surface utilization tops out at around 70%. The phrase “white space” may be current, but it’s not a totally new concept. “People called it ‘dark silicon,’” said João Geada, chief technologist at Ansys. “That’s been an expression all the way back to 16nm, and possibly even before.”
Bradley Geden, product marketing director for Design Compiler at Synopsys, agreed. “Because of the restrictive nature of double- and triple-patterning, it’s getting much, much harder to push utilization over 75%,” he said. “And then you have the pin density as well as the smaller footprints of the finFET. That’s resulting in higher congestion and, subsequently, lower utilization.”
This white space, while widely distributed, is found mostly in logic circuits. “If you’ve got logic, that’s probably where you’re looking at the larger white spaces,” said Geden. Memory blocks tend to be highly optimized by the foundries, and so there’s precious little extra space to be found within them.
As memory takes up an increasing percentage of SoC die area, the overall percentage of the complete die that’s available as white space is reduced. “If we figured out how to pack the memory — and memory is dominating — then for the overall SoC, we are looking at 25% to 30% white space,” said Geden. But within those logic areas, the question remains as to whether there’s something useful that could be done with it, as long as it goes easy on the routing.
While utilization in the 70% range is often attested, there isn’t universal agreement on that number. The numbers Cadence experiences are higher. “We’re seeing utilization as high as the mid-80s,” said Rob Knoth, product management director, digital and signoff group at Cadence. “We don’t see these nodes as qualitatively different.” At that level, there is certainly less available white space, but the question still remains as to whether something can be done with it.
Within a logic block, the white space tends to be relatively evenly distributed. That means only small circuits or features could be accommodated in a blank area without moving existing circuits around. By the time this white space is identified, the circuits for the chip’s main mission have been laid out. So any leveraging of the space cannot result in changes to the existing layout.
It’s also true that not all white space can be used for something else. “At the more advanced geometries, there are rules that may make it hard to use all the white space,” noted Steven Woo, fellow and distinguished inventor at Rambus. “There are keep-out regions, and there are spacing rules that can contribute to [unusable space].”
Given the opportunity, however, there are several ideas being implemented that take advantage of this “free” silicon.
Monitoring is a growing trend
One option is to use the space for on-chip monitoring circuits. If those monitors are small enough, then they can be tucked into the small pockets of available space. “There’s a lot more interest in putting metrology directly on chips from everywhere,” said Geada. “That tends to fit nicely into empty space.”
Such circuits would monitor local conditions and signals, so most interconnects would be local. Critically, they couldn’t impact the main functionality of the chip. “Conceptually, we want our Agents to run ‘under the hood’ of the monitored chip’s functionality,” said Gewirtzman.
There are other types of monitors, too. “There are structural monitors and functional monitors,” said Steve Pateras, senior director of marketing, hardware analytics and test at Synopsys. “So we want to be able to place these monitors and sensors throughout the chip to be able to extract that information. Since we’re adding a lot of gates, we want to do so in a way where we’re utilizing the white space [while] not adding to the congestion.”
ProteanTecs said its monitors are small enough to place within the white space. Gewirtzman illustrated the unusual opportunity. “There’s space for gates, and there is less place for routing, but we don’t need global routing,” he said. These Agents can be sprinkled around the die with no area impact.
Of course, monitors need to report their results, and that requires interconnects. Instead of creating a new set of dedicated interconnects, however, providers prefer to move the data using interconnects already available on the chip, such as a network-on-chip (NoC). Then, a small amount of local routing would be needed for the monitor circuitry, along with a way to hop onto the network, but no further long-range interconnect would be required.
Exactly what the communication endpoint consists of will vary widely by chip and application. If analytics is the goal, then the data eventually must be reported to the cloud. In any given system, there is likely to be a single point of external communication. If there are monitors on the chip responsible for that communication, it’s a matter of getting the monitor data to the communication block using existing on-chip interconnect.
But for monitors on other chips — those that don’t handle the external communication — the data can’t be sent directly to the cloud. Instead, it must be delivered to the chip that will connect to the cloud. How that happens depends on the design. In some cases, out-of-band signals may already be available to deliver this additional payload. In others, it may be delivered in-band through some other functional port. ProteanTecs said that, in their experience, monitor usage has never been stymied by a lack of data transport. “No dedicated channel is required for our Agents,” said Gewirtzman.
Fig. 1: Monitor data can be transmitted to the cloud using existing on-chip routing resources, existing chip-to-chip communication channels, and existing communications infrastructure to connect to the cloud. Source: Bryon Moyer/Semiconductor Engineering
The ability to make use of existing chip-to-chip or chip-to-cloud channels for the monitoring data suggests there is room available within the existing bandwidth, which is yet another kind of white space. The silicon white space is application-independent, so numbers are available for that. But communications white space will depend strongly on the application, so there are no general numbers available to quantify it.
That said, proteanTecs’ empirical experience has been that there has always been room for communicating the monitoring data. Sending may be delayed by other more urgent communication, but there always will be space.
The other consideration is buffer space for storing monitor data. Because monitor data reporting isn’t time-critical, it’s likely that accumulated data will need to be stored pending transport. One might expect that dedicated memory would be needed for that buffer. Here again, however, there appears to be excess memory available that can serve that purpose. “[The existing memory] is more than enough for the telemetry we need,” said Gewirtzman. In effect, the memory has its own white space that can be put to use buffering monitor data.
There’s one more consideration when inserting monitors — how they’re powered. One factor here is whether the monitors will be powered by an analog or digital power source. Analog power is typically available around the periphery of the die, serving any I/Os that have analog components. If that power supply were needed in the interior of the chip, it would severely disrupt the existing power layout. According to proteanTecs, some monitoring technologies require this analog power.
Because the internal areas of the chip are typically digital, a digital power supply is likely to be available throughout the chip. So for monitors that are built strictly from digital circuits, peripheral power would not need to be routed to the interior of the chip. “That alleviates a lot of the pressure as compared to traditional sensors,” said Gewirtzman. “With them, there is not just an issue of size and power consumption. It’s also a different supply.”
Large, complex SoCs often have multiple power domains, however, some of which may be powered down at any given time. According to proteanTecs, any such domains will need to be powered up when the time comes make a measurement. Switching power domains normally is controlled by firmware, so that code would need to include monitors as a consideration when deciding when to power on and off a domain. The routing, meanwhile, would need to be always on.
The net result of all of this white space is that large numbers of monitors can be installed around the chip using existing silicon area, minimal metal area, and existing memory while being communicated over existing channels. That makes it possible to add this functionality with no increase in die size.
How these monitors are inserted
But this raises an important question. Given a limited amount of space, how does one know what to put in there? What if you have more to do than there is space available? And what if utilization rates do rise to the 80% range or higher – something that would otherwise be considered a good thing?
That’s where priorities, budgets, and design software can help. Some monitors are important, while some are nice-to-have. Some monitor types are larger and some are smaller. Exact sizes, of course, will depend on who is offering the monitors. Any impact on performance also must be considered. “What’s key is to minimize the effect on PPA,” said Pateras. “You don’t want to be adding these monitors and all of a sudden your design is 10% slower.”
So the first thing needed is a clear set of priorities. That ensures the available space will be used first for the most important things before adding less-important monitors. Having a budget for the number of gates is also necessary so that, as one works down the priority list, one can tell what else there is room for.
Priorities and budgets then can be fed into software that scours the die, identifies white space, and places monitors according to the designer’s needs. “Agent type, number and configuration are determined by the Proteus suite of EDA tools,” said Gewirtzman. “These tools analyze each design block and, together with the user preference regarding gate budget as well as expected coverage, run optimization algorithms and eventually recommend the optimal number, type and configuration of Agents. The software makes sure to keep them between the cracks or the low utilization areas of the chip.” Once the strategy is approved, insertion is automated.
Synopsys also leverages its EDA tool suite to support monitor insertion. “Monitors are inserted through the same automated RTL integration flow that we provide for design-for-test (DFT) and built-in self-test (BiST) integration,” said Pateras.
It’s not necessarily an exhaustive process that identifies every monitor and ensures that it gets in somehow. In fact, one may stop inserting monitors once the available space is used up, leaving some on the list that didn’t get implemented. As long as the priorities are in place, and as long as all of the important monitors were instantiated, that’s sufficient.
As went test, so goes monitoring?
This “let’s do what we can” approach reflects the novel nature of monitors. Designers are trying them and using them where possible to keep track of chip behavior for the lifetime of the chip. As such, some might see it as a nice feature rather than an essential feature.
Synopsys noted that DFT circuitry also was looked at somewhat askance in the early days. Scan chains and controllers, in this case, required additional area. “Twenty years ago, BiST was not very popular,” noted Pateras. “People didn’t want to put it there. The designers hemmed and hawed that it was always an issue of the overhead and whether or not you want to do it.”
Based on the increasing complexity of SoCs, internal testing circuitry is no longer considered a luxury. It’s now a necessity, even though it usually consumes extra die area. “[Today] there’s no discussion, there’s no question, and it gets done as part of the design requirement,” said Pateras. If it adds to the die area, then one can attempt to make the circuits more efficient, but leaving them out entirely – or badly compromising test coverage by leaving some parts out – simply won’t fly. Any cost saved in die area would be lost in the discovery of faulty parts that weren’t tested completely.
Synopsys sees this as a possibility with monitors, as well, in the future. “Monitoring is going to follow the same kind of path,” said Pateras. “Some of the critical applications will drive this disruptive adoption first, but eventually it will migrate to more and more mainstream usage.”
The idea of watching the performance of chips during their lifetime is relatively new, and the benefits of doing so haven’t been universally embraced by the industry. If, in fact, die monitoring provides a measurable payoff on advanced chips, monitors will change from being optional to being mandatory, much in the way it has happened with test circuits.
That could change the calculus of inserting monitors. “It’s not so much whether to use monitors or not,” Pateras said. It’s a question of how many I can afford. Where can I place them? There is that distinction between critical monitoring, nice-to-have monitoring, and opportunistic monitoring.”
White space for inspection
Inspection is another area that can leverage white space. Most inspection features reside in scribe lines on a wafer. They’ll be destroyed when the wafer is diced up. Is there an opportunity to include such features on the die itself?
With one exception, that doesn’t seem to be happening. Scribe lines are consistent from die to die, but white space varies immensely. Having different placement for inspection features on different dies would require inspection equipment to know more in order to find the features.
PDF Solutions, however, is explicitly moving in that direction. It has an e-beam inspection tool that takes advantage of white space. While e-beam inspection always had value, it can be very slow as traditionally implemented because the beam must scan across an entire wafer many times. Multi-column e-beam machines help with this, but scanning the entire wafer can be an inefficient way of identifying the features to be inspected in high-volume production.
The white space-based approach inserts features where there is space within the die. Which feature depends on the needs of the design and the layer being built. “You could put a single contact to see whether that contact is open or closed,” said Indranil De, vice president of design for inspection at PDF. “And that’s basically detecting an electrical failure.”
The e-beam tool charges the contact (in this example). If it’s a complete connection, that charge will flow away through the substrate. If not, then the charge will remain, and voltage-contrast techniques are used to identify that spot.
Fig. 2: An e-beam tool testing the connectivity of a contact. On the right, electrons streaming from the e-beam tool are swept to the grounded chuck. On the left, the incompletely formed contact has no such path, so the electrons remain, setting up a voltage contrast that can be detected. Source: Bryon Moyer/Semiconductor Engineering
The inspection-time savings comes from the fact that the tool inserting the features reports the location of those features to their e-beam tool. “You need an e-beam tool that can inspect at those locations only,” said De, indicating that the tool must be capable of accepting such locations as a part of a recipe. “It knows precisely which locations to look for, and it inspects only those areas.”
Those inspection features then act both to identify possible yield limiters during product bring-up and to get an early warning of equipment excursions during manufacturing. It complements scribe-line inspection features with in-die features.
The future
There’s one other less obvious way to use the white space in the service of better hardware security. In an effort to thwart reverse engineering, dummy circuits could be placed in areas where there’s room. This adds a layer of obfuscation for anyone trying to dig into the chip to reveal secrets. “One area we’re investigating is security,” said Pateras. “You get into obfuscation work where you want to add more gates that are decoys.”
All of these ways of leveraging white space have something of an optional feel now. Future trends portend the possible mandating of monitors, inspection features, or decoy circuits on the one hand and improvements in silicon utilization on the other hand. Those competing trends may someday force a reconciliation — are these uses of white space still to be considered opportunistic? Or have they become strategic? And if they all become strategic, how does one partition out the available white space for the different needs?
As Michael Frank, fellow and chief architect at Arteris IP, noted, “Gates are free.” It remains to be seen whether that will continue to be the case in the future.
Leave a Reply