From Game Theory To The Unified Theory of Coherency

What you need to worry about with interconnects.

popularity

Adam Smith said that the best result comes from everyone in the group doing what is best for himself. But he’s only half right because the best result would come from everyone in the group doing what is best for himself and the group.

If you are wondering where you might have heard this before, it was Russell Crowe playing John Nash in the movie “A Beautiful Mind.” John Nash was an American mathematician and a Nobel Memorial Prize winner who made fundamental contributions to game theory. In a more formal way, a Nash equilibrium occurs when decision-makers choose the best possible strategy taking into account the decisions of others.

Leaping from Game Theory to interconnects, coherency requirements have taken us by storm, not just in terms of heterogeneous features but in terms of scalability. In the already complex world of non-coherent interconnects that have to deal with dozens of protocols, width adaptation, reorder buffering, ordering requirements, and QoS it is easy to lose sight of the big picture. And when coherency is treated as an add-on, the result can be a disjointed solution optimized for conflicting non-coherent and coherent requirements. Don’t get me wrong. There is a place for that, but only in very specific use cases that I will discuss later.

Let’s look at a handful of concerns that arise when we dare to ignore our Nobel Memorial Prize winner. Below are high-level block diagrams of the two systems to convey the message. Fig. 1 shows a representative logical diagram of a system with separate interconnects for coherent and non-coherent traffic. Fig. 2 shows NetSpeed’s Gemini, which integrates both coherent and non-coherent requirements on a single logical network.

netspeed1

Among the factors to consider:

Coherent and Non-Coherent Bandwidth: “You’re only as good as the weakest link.” In a multi-interconnect solution, the weakest link is the connection between the interconnects. For example, in Fig. 1 the non-coherent traffic entering the coherent side is constrained and has to deal with high arbitration causing both latency increase and bandwidth degradation. With increases in the number of masters, this solution is prone to lack of scalability. The solution is to create design time static allocation of bandwidth not only among non-coherent and coherent, but also between them. That provides a configurable logical separation without the need to physically decouple them.

Dynamic Coherency Participation: In a physically divided interconnect, the decision has already been made to discriminate the treatment of the traffic based on which island they are part of. In today’s heterogeneous systems with dynamically changing workloads there is a need to update the coherency, snooping, caching, abilities of IPs at runtime.

System Cache: The only L3 available amongst the multi- interconnect solutions serves only the needs of the coherent space. RAM is an expensive piece of real estate and one would want it to be shared and reused heterogeneously.

Latency Sensitive IPs: Some slave IPs hanging off the non-coherent interconnect (Ex: Interrupt controller, OCRAM) might have some very specific low latency requirements from the CPU. In some of the coherent solutions, there aren’t enough slave ports to support all the latency sensitive slave IPs. Traveling across an interconnect to get to these latency sensitive IPs defeats the purpose and it gets complicated at peak usage.

Interconnect Deadlock: Deadlock can occur at different layers: protocol, routing or transport. The traffic flowing back and forth between the interconnects increases the chances of deadlock due to inter-network dependencies. One way around this is an algorithmic dependency graph to eliminate deadlocks at construction time, creating interconnects that are deadlock-free at the network level and at the protocol level.

Power Management: With power becoming one of the main metrics, the ability to shutdown interfaces and ports is a key feature. The CPU or any master should be able gracefully shutdown interfaces by flushing the traffic. And ideally, the traffic needs to be terminated as close to the master as possible. Because of the disjoint interconnect, the task of shutting down interfaces in different interconnects is complicated, if not impossible.

Performance Exploration: Since the interconnect is at the heart of the SoC, the ability to predict the behavior upfront with performance exploration has become a key part of the SoC development process. Also, traffic patterns used for this have become complex and emulate real use cases. Piecemeal performance exploration of the multiple interconnects in isolation might not give the best insight into dependencies across the interconnects.

Programming Model: The goal is a uniform programming model across all traffic without distinguishing between coherent and non-coherent systems.

Development Time: Last but not the least, multi-interconnects increase the development time in terms of design, verification and physical implementation.

Now there are specific systems that might require multi-interconnects at different levels of abstraction for reasons like:

A. Isolation due to specific security or safety requirement;
B. Divisions based on specific local connectivity requirements, and
C. Hierarchy created due to different design teams and subsystem ownership

Being able to partition while allowing architects to visualize, architect and design a single interconnect solution is the goal.

To close, I’ll take the liberty of quoting Buddha, “All things appear and disappear because of the concurrence of causes and conditions. Nothing ever exists entirely alone; everything is in relation to everything else.”