Speed Matters

When a customer tells you they want better performance and more headroom, you probably should listen.


By Frank Ferro
Speed is the shiny object, the undisputed premium, and in many ways, the ultimate carrot with customers when designing advanced SoCs. There are a few moments when the conversation temporarily shifts to area, or some special feature, but we always come back to speed, or more specifically, frequency. This is without a doubt the first and most important requirement ‘gate’ to pass through. If your product can’t meet the speed target, real or perceived, than it is difficult to advance the discussion to the next stage. And as we all know, the customer is always right even if they aren’t. But no one disputes the need for speed or its place on the throne.

SoCs Now. If we look at heterogeneous SoC implementations to date, the memory speed often outpaces performance—so running the processor and the system bus faster may not always be helpful to overall system performance. If this is true, then why run faster? There are several reasons from a customers’ perspective. If we look at mobile applications processors for example, it is difficult to know exactly what will run on the device given the large number of applications for mobile computing—and many of these applications need to run simultaneously. Having that extra speed will provide some confidence that the SoC can handle those times when peak bandwidth is demanded.

However, even with an initial high-frequency design specification, a single-channel DRAM subsystem still puts practical limitations on the bandwidth. Having multiple heterogeneous cores competing for DRAM, even with the best QoS algorithms, results in processors that spend time waiting for memory. Given this fact, the designer needs to find an optimization point for the system (by doing system performance analysis) that maximizes bandwidth with the most efficient use of system resources (i.e. frequency and gates). This is certainly a challenge today that has not yet been completely solved. And it doesn’t look like the SoC designer will get a chance to catch their breath on this one anytime soon, given that the market pressure is rapidly driving requirements for even higher levels of performance (i.e. SoC complexity).

SoCs or Compute platform. Going forward, SoC architectures are taking on the attributes of large-scale computing platforms with multiple CPU processors that have cache-coherent memory. So now the problem for the embedded SoC is more challenging in some ways. In addition to the CPU compute clusters, the system has to deal with all the other heterogeneous processor cores and subsystems with some cores needing to be cache-coherent with the CPU memory while other subsystems have real-time processing requirements. The challenge is further complicated by the fact that with embedded SoCs the memory is less distributed than a traditional computing environment, so the DRAM continues to be a potential bottleneck if special care is not taken. These new requirements are clearly putting pressure on the overall system to run faster. But by how much – 2X, 3X?

Let’s look at an SoC with the CPU(s) running at 2GHz as an example (the target speed for next-generation tablet processors). In this SoC, the on-chip network performance after the cache will need to be 1GHz or half the processor speed. Compare this to applications processors that are in the market today with the on-chip network speeds typically running at about 200MHz to 400MHz with an 800MHz processor (one quarter or half the processor speed). We now need at least a 2.5X to 5X speed increase in the network to keep pace with the latest processors. But again we have to ask the question: Do I really need to run this fast or am I just wasting performance that I can’t utilize? To answer this question, we can look at two of the new architectural components of the SoC: cache coherency and multi-channel DRAM.

Adding hardware cache coherency to the embedded SoC is intended to speed performance by reducing the number of times you need to access external DRAM. This ultimately translates into a better user experience when running applications on the mobile device. Having cores outside the processor cluster, like the GPU, remain coherent with the CPU cache will require transactions like cache-to-cache data transfers. As mentioned above, the processor cache usually runs at half the speed of the processor, so the on-chip network speed will need to support transfers at 1GHz to be able to support these types of transactions.

The need for increased bandwidth is also forcing the memory subsystem to support multiple channels of DRAM. Having multiple channels expands the memory bandwidth allowing for a peak data rate increase. Looking specifically at the wide I/O memory (planned for use in mobile application processors), it has 4 channels of 16 byte DRAM running at 266.6MHz. Here it is easy to envision an application that will demand full memory bandwidth so provisioning the GPU interface, for example, to take full advantage of the entire memory bandwidth across all channels ensures peak performance when needed. Again, to support this level of performance requires a network speed of 1066MHz (266.6MHz x 4).

Is the customer always right? In this case by asking for more speed and headroom, they were clearly anticipating the SoC architectural evolution that is now taking place for the next-generation of SoCs. The previous two examples would say the answer is yes—more speed designed into the system ensures that the SoC will cover the widest set of use cases supporting a host of current and new applications. So when a customer tells you they require 2X the speed—listen. They are usually on to something.

–Frank Ferro is director of marketing at Sonics.


Leave a Reply

(Note: This name will be displayed publicly)