Design flows matter for getting the low-power/high-performance benefits of the latest nodes.
…Doomed.
Well, maybe that’s a little harsh, but your job won’t be getting any easier; that “happily ever after” may be harder to achieve than you think, and there are a number of reasons why. And by “me” (of vested power), here I’m really talking about the power of the consumer market as a whole and our collective insatiable demand for newer, shinier…well, just plain “cooler” high-performance gadgets that we interact with every day of our lives in ever-increasing numbers.
“Cool” devices are king
In order to take stock of what I felt like was an explosion of new devices, I decided to do an audit of myself and my little family of five. What I found is likely very representative of your average first-world citizen in 2018, with maybe a slight correction for being a geek living in Silicon Valley. What I learned was surprising, both for the sheer number, but also the wide variety of electronic devices ingrained in our lives. From our smart phones, tablets, smart TVs, laptops, cameras, smart speakers, way too many plastic talking/beeping kids’ toys, smart home appliances, sports/activity trackers or smartwatches, and sensor-laden vehicles, I counted 89 separate electronic devices (many with multiple subsystems). Of those, 34 percent had Bluetooth connectivity, and 46 percent had some form of WiFi or cellular internet connectivity. A very unscientific guesstimate of the number of discrete semiconductor integrated circuits in those devices was well over 15,000 (a modern car with advanced safety and infotainment features these days can have thousands itself).
The first thing this tells me is my family needs to go camping or hiking more often and leave all our devices at home. And second, that SoC designers’ jobs, while becoming ever more challenging, are very safe. But at a macro level, it hits home how reliant we all have become on this army of devices and ICs that make our daily lives more comfortable, productive and entertaining.
Probably most interestingly though, and where we introduce “low” to that “power vested in me”, more than half (61 percent) of these devices primarily ran off batteries. Yet many of them still delivered top-end performance capable of providing high-speed data feeds often presented on ultra-high pixel density displays all from within compact and lightweight form factors. It’s at this intersection of high performance (frequency and capacity) and low power (efficiency and temperature) where your hardest design challenges exist. It’s also where a combination of advanced-node process technology and cutting-edge tool/flow recipes must come together to address these challenges with solutions that can tape out chips and ship products within a timeframe that we users still consider them “cool.”
Pushing the process
So these rapidly multiplying electronic devices are giving us requirements that often demand gigahertz performance in milliwatt power envelopes. While many of our older devices are heavily mixed signal in nature and were developed with silicon process technology above 90nm, a larger subset of these devices that live in the “IoT” space have some basic wireless connectivity and low to medium performance requirements. While these current-generation IoT devices are still being met by more mature or mainstream process nodes from 65nm down to 20nm, many of those same semiconductor providers are pushing into FinFET process nodes at 16/14nm for their highest volume IoT parts or looking to the foundries for low-power process techniques like FD-SOI. And most interestingly, the highest performing of these devices mostly all use chips designed in the 16-10nm range today. Further, nearly all of those same chip manufacturers have begun developing at 7nm or are investigating transitioning soon for the next wave of devices that will be “cool” to consumers as they hit the shelves in the coming years.
As designers waded into early FinFET process nodes to take advantage of the power and performance benefits, you were forced to grapple with coloring or double patterning, on-chip variation and more pronounced electromagnetic effects. And moving to 10 or 7nm can introduce a new wave of design requirements including pin access placement, wire resistance mitigation, SOCV handling, routing halos, via alignment and insertion (including advanced new via structures), trim metal, self-heating effects, triple patterning and bus routing.
In light of this increasingly process-driven design complexity, what design flow changes are needed to take advantage of the low-power/high-performance benefits of these most advanced process nodes?
Feeling the flow
When you’re developing in the 20nm to 7nm range, it’s paramount you pick the right tool partners that have a deep and ongoing relationship with your foundry of choice and have a full suite of fully certified tools at your target node. Validating that the point tools are certified is only half the battle though, as it only provides a certain foundry-defined baseline level of compatibility with the process (addressing some of the critical new requirements listed above), guarantees an acceptable accuracy of results versus final silicon, and ensures the given tool has proven capable of successfully fulfilling its function in the flow towards a successful test chip on that process. What certification doesn’t do is guarantee you’ll be able to achieve the best power, performance or area (PPA) possible, run fast enough to get results in a timeframe that meets your schedule, and converge to your target goals with a reasonable number of iterations through the flow.
To have confidence that the flow itself will deliver on the above, you should be looking for three key ingredients from your tool vendor(s):
Fast
Today’s most advanced designs have hundreds of millions of placeable instances, and the ability to consume those full-chip designs in signoff tools for STA, power analysis and physical DRC/LVS is critical. Massively parallel techniques that leverage distributed computing (and ideally third-party cloud infrastructures) are now needed. And with such large chips, the physical block sizes need to grow as well to manage the blocks without exponentially growing design teams. Moving from block sizes of 1-2 million instances up to 5-10 million instances or more requires implementation tools that can break up the optimization and analysis processes into parallelizable jobs leveraging both shared memory multithreading and distributed computing techniques. You need results fast, and each tool in the flow needs to deliver that speed and capacity or the entire flow will be governed by the slowest point tool’s bottleneck.
Smart
The most cutting-edge EDA tools these days are using very advanced development techniques that help you eke out those last few MHz, shrink your die by a few percent or shave a couple milliwatts off your power specs. Given the design sizes and tool runtimes at very advanced nodes, having early estimates of power and physical issues at the RTL stage is critical. But these estimates need to be accurate and ideally tied to a full downstream implementation flow to be trusted. Optimization engines can no longer be strictly one- or two-variable driven, but must concurrently consider power, performance, area, congestion, EM/IR, DFM and physical process constraints as cost functions. As designers, you should be able to dig deep with your tool vendors and understand how they’re leveraging machine learning and applying AI techniques to their development process to improve both tool usability and final results.
Full-Flow
While each individual tool in the flow needs to be foundry certified, run really fast, and have massive capacity, if the tools aren’t predictive and correlated to one another, the whole flow will need to iterate more than your schedule will allow, or worse, never converge. At the most advanced nodes, it’s becoming clear that the best way to deliver a truly convergent flow is by sharing engines and algorithms between sister tools within the flow, meaning adopting a full-flow solution from a vendor that offers competitive tools for every step your design flow requires. If your high-level synthesis tool doesn’t understand the congestion your RTL synthesis tool will see during physical synthesis, your design won’t converge. If your RTL synthesis tool doesn’t use the same placement and routing decisions and R/C extraction in its early physical models as your implementation tools, your design won’t converge. If your place-and-route implementation tool isn’t using true signoff engine accuracy during optimization for power, timing, physical signoff and DFM, your design won’t converge.
So when choosing your advanced-node tool flow for high-performance/low-power designs, make sure what you’re getting is not just fully certified, but also fast, smart and full-flow.
As an SoC designer, while your challenge to help deliver the coolest end devices by designing concurrently for both high-performance and low-power silicon may feel daunting, you’re not actually doomed. There exists a clear path forward to help you find your “happily ever after.” By adopting the latest process technologies and pairing intelligent tools and flows that deliver high capacity via massive parallelism and advanced optimizations via integrated engines, you can still be the hero and get the girl [or guy] in the end! Back to those powers in which I was vested, “you may now kiss the bride [groom]”…just don’t take too long of a honeymoon, as that next tapeout is just around the corner.
Leave a Reply