Performance-IP: Less Memory Latency

Embedded IP improves performance by identifying and isolating random requests.


The combination of more functionality on chips plus more contention for memories is forcing companies to look at different ways to improve performance.

Just adding more processing power doesn’t guarantee improved performance, and throwing more memory at a problem—either SRAM or multiple levels of cache—is expensive and not always faster. There are too many processors and too many requests vying for that memory, and continually shrinking features is actually making it harder to route signals because the wires are longer and thinner.

As a result, companies have begun looking at a number of architectural options, dissecting what is directly causing the bottlenecks. Startup Performance-IP’s focus is memory latency. The company has developed embedded IP that can separate out random calls to memory, which can tie up the memory and the traffic to and from that memory. The result is a more linear connection between processor and memory, optimizing performance, power, and even bandwidth.

“The memory request optimizer is basically a next-generation pre-fetch engine,” said Greg Recupero, Performance-IP’s CTO. “The technology analyzes the request stream, which can appear random, and allows it to improve spatial locality. Basically, you collect this data to create a virtual request stream, and with that information products can operate with greater efficiency.”

The advantage of this approach is the ability to optimize different profiles or modes, said Recupero. For example, in a smart phone, the phone mode doesn’t need high performance, so the memory optimization technology would run in low optimization mode. But it can be configured to run in “moderate” or “aggressive” modes for other applications. The key is holding some requests in a response buffer, which acts like a micro-cache, just long enough to decrease the false fetch rate.

“There is basically a pool of trackers,” he said. “Once the client request stream has a sufficient number of trackers, the traffic becomes very linear. That’s when you see the greatest performance improvement. It can begin within several requests. This is going to be important in ADAS, particularly for compute-intensive processing and analysis of video data. We’re also seeing customers in the low end of the market, where they can improve performance in talking to a flash controller.

Each tracker takes up about 1,500 gates, and typically are implemented in pools of 32 trackers. Software can utilize all or some of those trackers, which can be tuned for each application.

Performance-IP was started in 2013. The company, which is self-funded, emerged from stealth mode earlier this year.