IBM’s Tellum II Voltage Guard Band Reduction

An approach that improves energy efficiency and reduces power using three coupled control loops.

popularity

The 2025 International Solid State Circuits Conference was held in San Francisco from February 16th to 20th. IBM presented three papers[1,2,3] based on their Samsung 5nm Tellum II chip. These are interesting in terms of the technology and the specifics about the design and measures taken to improve energy efficiency and reduce power.

My first remembrance of a company specializing in guard-band reduction was seeing Silicon Metrics at the Design Automation Conference (DAC) in the late 90’s. (Silicon Metrics was acquired by Magma which was itself acquired by Synopsys in the EDA industry’s common bigger fish eating smaller fish scenario.) One of the company’s founders, Farid Najm, has focused much of his career in this area and power savings. There’s an article, Two Constraints-Based Techniques To Address Power-Related Challenges In SoC Design, discussing more background about this topic.

This article will mostly focus on the three coupled control loops approach IBM used to reduce voltage guard banding in the Tellum II and zNext System. It also provides a quick overview of the Tellum II and the decoupled (from core) voltage control for the SRAM.

Figure 1 below shows an overview of the Tellum II, including a die photo and labeled blocks of the processor. There are 8 cores on the chip, 3 on the left and 5 on the right. The left side also includes a data processing unit (DPU). There are 10 36MB L2 caches in the middle of the chip (light blue).

Fig. 1: Tellum II Chip Overview.[1]

Figure 2 below shows core-specific information and is actually a conglomeration of information from two presentations.[1,2] It includes markup information clearly showing the 7 fully abutted blocks used in the core and the labels for each block. Rough locations for the placement of digital droop sensors (DDS), droop mitigation units (DMU) and the core recovery unit (CRU) are also shown.

At ~9.52 mm2 per core, that works out to 76.16 mm2 total for the 8 cores and implies that only about 1/8 (12.7%) of the total chip area is used for the cores.

Fig. 2: Tellum II Core.

It’s clear that the L2 cache takes up a significant portion of the die area. The 36MB L2 cache has a 3.6ns (~20 clock cycles) access time and is designed with Samsung’s high-density SRAM bitcell. 360MB of virtual L3 cache is accessible in 11.5ns (~64 cycles) and 2.8 GB virtual L4 in 48.5ns[4] (~267 cycles). The increase in access time going from L2 to virtual L3 is over 3x and from virtual L3 to virtual L4 is over 4x. To deal with the memory wall, the on-board cache dwarves the chip area for the 8-cores.

Fig. 3: 36MB L2 Cache with SRAM Voltage Regulators.

A separate power supply is used to drive the SRAM cells and periphery logic. This allows the voltage to be changed on the cores to respond to workload scenarios without concerns about the voltage change creating SRAM issues. On-chip regulators generate the SRAM stand-alone supply and reduce system and integration complexity. The regulator is implemented as a dual-loop analog with a high precision slow loop to set the DC value. Distributed micro-regulators respond to fast switching transients and voltage noise is further suppressed via a predictive activation scheme. This enables the SRAM supply to be set to a level that optimizes yield, power, and performance. The placement of the regulators is shown above in figure 3.

Focusing now on the core, one might wonder why a dynamic voltage scaling scheme wasn’t employed. Power-supply redundancy and security requirements made fast voltage changes impossible. For the IBM zNext, a sophisticated voltage control loop (VCL) was developed that achieves similar power savings to that of the previous POWER10 Processor[5], while keeping performance loss less than 0.5% despite voltage changes requiring milliseconds instead of microseconds. A control loop was also added to react to core recoveries.

Figure 4 below shows a diagram with the three control loops, Timing Protection, Performance Protection and Guard Band Optimization, to implement the VCL for Tellum II. Each control loop is described below.

Fig. 4: Diagram of Tellum II’s three coupled control loops.

We look at the timing protection control loop first. Timing protection is implemented using droop mitigation only. A digital droop sensor (DDS) based around the same principles as we looked at in Staying Within the Margins was deployed with a latch tapped delay line with 24 possible values (this may be helpful when looking at figures 9 and 11 later for DDS values). Figure 5 below shows how VSET can be eliminated and an operating voltage dynamically found based on a VCRIT setting that causes instruction throttling (DMU) to kick in. The right part of the figure diagrammatically shows the impact on the voltage guard band.

Fig. 5: The timing protection control loop.

Figure 6 below shows the effectiveness of the timing protection loop. When the DMU throttling is enabled, there is a significant 50% reduction in voltage droop.

Fig. 6: Comparison results with timing protection control loop disabled and enabled.

Next, we look at the performance protection control loop. Its mission is to make sure that the performance impact stays at or below the 0.5% performance impact target. It does this by monitoring the impact of the time spent below VCRIT where throttling is active. If the core throttling becomes too detrimental to performance, the performance protection control loop will raise the voltage so that the performance impact remains at or below the 0.5% target. Figure 7 below diagrams this behavior.

Fig. 7: Performance protection control loop.

Previous work performed in Razor[6-9] enabled robust error recovery and used an adaptive voltage scaling scheme to raise voltages. While the voltage can be raised in the performance protection control loop to maintain performance, the adjustment made in the guard band optimization control loop is to raise VCRIT so that throttling kicks in sooner. If this were to lead to an unacceptable drop in performance, i.e. too much throttling, then the performance protection control loop, described above, would raise the voltage. Figure 8 shows a diagram of how VCRIT is raised after a recoverable error event.

Fig. 8: Guard band optimization control loop.

Figure 9 below shows the effectiveness of the guard band optimization control loop and how the system seeks a lower voltage level. If the voltage falls to a point where recoverable errors occur, then there’s a dynamic adaptation of the timing protection by raising the VCRIT. If this were to lead to an unacceptable degradation in performance, then the performance protection control loop would raise the voltage. All three control loops are coupled together to provide a stable voltage operating environment.

Fig. 9: Guard band optimization control loop effectiveness.

Figure 10 shows a composite diagram indicating the impact of each control loop on the effective voltage guard band.

Fig. 10: Diagram of each control loops contribution to voltage guard band reduction.

Figure 11 shows the overall gains in energy and power efficiency using the Tellum II coupled control loop implementation.

Fig. 11: Workload power savings results.

The three coupled control loops used in IBM Tellum II and zNext System protect against timing violations by using digital droop sensors (DDS) and droop mitigation units (DMU) that throttle the instruction rate. Simultaneously, performance is maintained by raising the voltage if needed to prevent too much instruction throttling. Furthermore, there’s additional guard band protection by detecting violations caused by sudden low voltages that lead to recoverable errors handled by the core recovery unit (CRU). This creates a dynamic environment where the system is continuously seeking to run at the lowest voltage possible, leading to better performance and cores that always run at 5.5GHz, voltage squared reductions in energy and power, all while simultaneously maintaining an 99.999999% (eight NINES) availability.

All in all, it dynamically optimizes guard bands by simultaneously leveraging both robust droop mitigation and robust error recovery to deliver significant chip power savings of 18% and system level power savings of 10% while maintaining average throttling below 0.5%.

References

  1. G. Strevig, et al., “IBM Telum II: Next Generation 5.5GHz Microprocessor with On-Die Data Processing Unit and Improved AI Accelerator,” Paper 2.2, 2025 ISSCC, February 17, 2025.
  2. T. Webel, et al., “Dynamic Guard-Band Features of the IBM zNext System,” Paper 8.1, 2025 ISSCC, February 18, 2025.
  3. D. Wolpert, et al., “IBM Telum II Processor Design-Technology Co-Optimizations for Power, Performance, Area, and Reliability,” Paper 37,1, 2025 ISSCC, February 19, 2025.
  4. C. Berry, “IBM Telum II Processor and IBM Spyre Accelerator Chip for AI,” Hot Chips, 2024.
  5. B.T. Vanderpool et al., “Deterministic Frequency Boost and Voltage Enhancements on the POWER10 Processor,” ISSCC, pp. 218-219, 2022.
  6. D. Ernst et al., “Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation,” IEEE/ACM MICRO, pp. 7-18, 2003.
  7. S. Das et al., “A Self-Tuning DVS Processor Using Delay-Error Detection and Correction,” IEEE JSSC, vol. 41, no. 4, pp. 792-804, 2006.
  8. S. Das et al., “Razor II: In Situ Error Detection and Correction for PVT and SER Tolerance,” IEEE JSSC, vol. 44, no. 1, pp. 32-48, 2009.
  9. J.P. Kulkarni et al., “A 409GOPS/W Adaptive and Resilient Domino Register File in 22nm Tri-Gate CMOS Featuring In-Situ Timing Margin and Error Detection for Tolerance To Within-Die Variation, Voltage Droop, Temperature And Aging,” ISSCC, pp. 82-83, 2015.

Bonus for readers who have read to the end

At the plenary session of invited papers, Navid Shahriari of Intel gave a presentation titled, “AI Era Innovation Matrix” that touched on several interesting topics and ideas. There was one item that I found very cool. It starts at ~10:16 mark, “Advancements in Fault Isolation and Debug”. Traditional IREM is highly compromised in the nodes using the new PowerVia so an e-beam imaging capability was developed that can see down the wire and the fin level and be able to isolate and debug issues. Figure Bonus shows transistors switching using voltage contrast. As Navid said, “I can never get tired of watching this image.”

Figure Bonus: E-beam logic state imaging of transistors.

The video is available from this website. Thanks for reading!



Leave a Reply


(Note: This name will be displayed publicly)