Bus-based scan data distribution architecture enables true bottom-up DFT flows.
The dramatic rise in manufacturing test time for today’s large and complex SoCs is rooted in the use of traditional approaches to moving scan test data from chip-level pins to core-level scan channels. The pin-multiplexing (mux) approach works fine for smaller designs but can become problematic with an increase in the number of cores and the design complexity on today’s SoCs. The next revolution in DFT tools to take test time, test cost, and DFT implementation effort eliminates the challenges of pin-mux approach by decoupling the core-level DFT requirements from the chip-level test delivery resources.
A common way to connect core-level scan channels to chip-level pins is by letting a mux network decide which cores are connected directly to chip-level pins. This works fine for smaller designs, but becomes problematic as the number of cores grows, the levels of hierarchy increase, and designs become more complex. It presents barriers to efficiently testing cores in parallel to save on time and cost. Challenges include:
These challenges have been addressed by using a bottom-up approach in which the DFT engineers allocate a fixed number of scan channels early in the flow, usually the same number for each core. This is the easiest approach, but it can end up wasting bandwidth because the different cores that are grouped together for testing might have different scan chain lengths and pattern counts (figure 1).
Fig. 1: In a hierarchical DFT flow, putting less effort into the mux network can lead to sub-optimal bandwidth usage.
One solution to this wasted bandwidth issue to save test time is to reallocate scan resources, but doing so involves reconfiguring compression, rerouting the scan channels, and regenerating patterns (figure 2).
Fig. 2: Building a more complex mux network to better align scan channel input and outputs will save test time, but at the cost of implementation effort.
Is the additional effort worth the savings in test time? Each DFT team must decide on these tradeoffs.
The traditional pin-muxed approach is also not very effective in dealing with tiled designs because of the challenges in routing scan channels through cores.
A new approach to distributing scan test data across an SoC—called Streaming Scan Network (SSN)— reduces both DFT effort and test time, with full support for tiled designs and optimization for identical cores. The SSN approach is based on the principle of decoupling core-level test requirements from chip-level test resources by using a high-speed synchronous bus to deliver packetized scan test data to the cores.
The number of scan channels per core is independent of the width of the SSN bus and the number of scan channels at chip level and from the number of cores in the design. Delivering test data in this way simplifies planning and implementation and allows core grouping to be defined later in the flow, during pattern retargeting rather than during the initial design. The SSN architecture is flexible—the bus width is determined by the number of scan pins available—and eases routing congestion and timing closure because it eliminates top-level test mode muxing, which also makes it ideal for abutted tile-based designs.
Part of the SSN architecture is the core-level host nodes that generate the DFT signals locally. The host nodes ensure that the right data is picked up from SSN bus and sent to scan inputs of the core and that the output data is placed back onto the bus. Each node knows what to do and when to do it based on a simple configuration step done using IJTAG (IEEE 1687). Which groups of cores will be tested together and which will be tested sequentially is configurable, not hardwired, with the SSN approach. The configuration is done as a setup step once per pattern set, and once it’s done, all the data on the SSN bus is payload.
As an example, take a design in which two cores are to be tested concurrently (figure 3). Block A has 5 scan channels, Block B has 4 scan channels. A packet is the total amount of data needed to perform one shift cycle across the two cores being tested concurrently. So the packet size in this example is 9 bits. However, there are 8 pins available for test, so the SSN bus is 8 bits.
Fig. 3: Testing two blocks at the same time. In a pin-mux scan access method, this would require nine chip-level scan input pins and nine scan output pins. With SSN the packet size is 9 bits, which is delivered on an 8-bit bus.
The table on the left side of figure 3 shows how the data is streamed through the synchronous SSN bus to the cores. It will take two cycles to deliver all the data. The first SSN bus cycle delivers all the data for one shift cycle for Block A. Block B has to wait until the second SSN cycle. The host nodes know where the data corresponding to that core resides on the bus and when to pulse the shift clock.
SSN contains several capabilities to reduce test time and test data volume. One is independent shift and capture. In many retargeting schemes, the capture cycles of all the affected cores must be aligned. If multiple cores are shifting in concurrently (figure 4) and they have different scan lengths, some of the cores with shorter chains need to be padded to perform capture at the same time for all cores. With SSN, the host nodes are programmed so that each core can shift independently, but capture occurs concurrently once all the cores have completed scan load/unload.
Fig. 4: When capture cycles must be aligned, some cores need padding, which is a waste of data and test time.
SSN also has bandwidth tuning. Rather than shifting as many bits as there are scan channels per packet, it can allocate fewer bits to a core that requires less data. For a core that has fewer patterns, less data is allocated per packet, which ultimately reduces test time.
SSN is a scalable method for testing any number of identical cores with a constant amount of test data and test time. For identical cores, the repair circuitry is included in each host node. Data provided to the identical cores is scan input, expect data, and mask data. That allows SSN to do a comparison inside each core. The accumulated status across all identical cores is then shifted out on the SSN bus. A pass/fail bit per core is also captured in the host and scanned out through IJTAG.
SSN was developed in collaboration with several leading semiconductor companies. We presented a paper with Intel at the International Test Conference 2020 that describes the technology and shows some key results of Intel’s validation of SSN. Compared to a pin-muxed solution, they saw a reduction in test data volume of 43% and also a reduction of test cycles of 43%. Steps in the design and retargeting flow were between 10x-20x faster with SSN.
SSN eliminates the tradeoffs between having an effective, streamlined implementation flow, or minimizing test cost.
Leave a Reply