Tale Of Two HLS Viewpoints

Was high-level synthesis misguided and has the industry adopted a solution that was ineffective, ill-defined and ill-conceived? The answer depends on your perspective.


The Design Automation Conference attracts several co-located conferences, symposiums and other such gathering of people, often on more specialized topics than would appeal to the general DAC attendees. Some of them are more research-focused, but one conference is somewhat strange in that it is about a subject that has transitioned to commercial tool development and yet still remains an active area of research. That area is Electronic System Level design and the conference is ESLSyn.

Semiconductor Engineering attended one of the keynote talks given by Professor Forest Brewer from UC Santa Barbara. Brewer created a tool (Chippe) as a graduate in 1988 and a later development became the Synopsys Protocol Compiler. While he has been out of the field of synthesis for a while, he still has some blunt views about the commercial direction of high-level synthesis and some of the choices that have been made.


Why would the industry adopt a solution that is ineffective, ill-defined and ill-conceived? That was the question that Professor Brewer posed. He said that the industry had tackled an almost unsolvable problem when it worked on high-level synthesis and both research and industry had taken the wrong approach with it. And yet EDA companies are showing clear success with it. How can they both be right?

Professor Brewer’s talk
ESL has been redefined many times, with discussions dating back to 1971. Many universities participated in the development. The abstraction level was one step above logic and automata. We did a Pascal-like language and many languages have been tried – some of which are good at describing some part of the problems but not others, and other language that are good at those, but not others.

These days the parts that work are done in C-like languages. They are using it to explore RTL design. It is basically a mapping and scheduling problem being doing in an automated or semi-automated way. Things like loop unrolling, programmatic transformations – these things are tough. What we want to do is describe the algorithms.

Because of the lack of an equivalence model, there is no canonical form for the input format—and that is a problem. The lifetime model for variables is undecided. It is very difficult to make interfaces that make sense anything more complicated than FIFOs. The things that you can do are very small scale, and this is because we missed the boat. The algorithms that we chose to use are not scalable and the metrics we chose to push did not make sense.

The HLS problem is not that hard, but we let the wrong pieces creep in and tried to bite off a little bit too much. In addition, the generality of the solution is limited. People chose a model for operations that was far too simple, despite the fact that they cannot solve the problem. You cannot make efficient code from it, you can’t build code snippets and drop them in as alternatives for hardware because the two things are incompatible from a timing point of view and a communications point of view.

We should be measuring things such as how much of the available parallelism was actually found. We use sequential languages that have enormous numbers of inadvertent dependences. This was not smart.

In CAD, we have a history of very carefully defining the model and the capabilities of that model. Without that it is difficult to grade the quality of results. With HLS, when results were shown, people did not believe in them because a very slight modification of the input code would give you very different results.

The model is the one that was defined for software and that was defined 65 years ago. Does that make sense? I don’t think so. We have made the whole thing artificially hard.

Brewer talked about a design that he had been involved with and the difficulties they had in replicating results. This led his research in a new direction.

I had the idea to create an efficient event language to codify interface behavior, tie functional models of the behavior to event signals. The functional models could be as simple as Verilog/C code snippets. This led to investigation of synchronous languages such as Esterel, which is the European standard for concurrent specification of safety critical flight systems. We defined a compiler that became the Synopsys Protocol Compiler and led to PSL/Sugar. This is the way to define event scenarios for the Verilog language and leads to very efficient local automata. All of these are appropriate languages to describe interfaces and this is one of the largest problems facing the industry.

It is time for the industry to have a cogent standard interface language that can be synthesized to either hardware or software. It should have an asynchronous handshake and response mode even though it could be synchronously sampled, and it should have an efficient mapping to in-order transaction buses such as AMBA.

Brewer then talked about system-level architectural decisions that have been made and how they limit the performance of many electronic functions. He argued that total performance is limited by the need to have a function in the design that performs synchronization and ensures consistency of operation. He believes this artificially slows down the rate at which computation and communications can be performed.

Brewer proposes a model with a hierarchy of abstraction. Temporal abstraction decouples function from timing, and only functional dependencies are captured in the specification. It includes functional abstraction where all behaviors in both data and control are encapsulated within transactions and can always simplify analysis along transaction boundaries and spatial abstraction, and where data and control encapsulation allows full exploitation of potential locality. On top of this are annotations of resource transactions that allow a scheduler/binder to automatically make decisions about physical location of components and also allow for communications link modeling and data sequencing.

Market success for commercial HLS
While HLS may not be the killer app that would displace RTL synthesis and raise the entire industry up to a higher level of abstraction, it has emerged as a viable tool for transforming algorithms into hardware blocks. To get the industry viewpoint, Semiconductor Engineering talked to Dave Pursley, senior principle product manager for HLS at Cadence.

The fundamental premise that this is a hard problem is true. It is NP complete to find the exact perfect, optimal implementation for a given set of constraints. This means there are a lot of challenges, but equating the difficulty to the fact that it is not working or being used departs from reality.

One of the biggest hurdles for adoption of HLS is education in terms of defining a repeatable methodology that is broadly applicable and useable enough that everyone can follow it. That is still a big challenge. Consider if you tried to take the source code for Microsoft Word and compile it to hardware. It is not going to work because it is too unbounded. In reality there are bounded methodologies that enable broad enough descriptions while providing ways to express the interfaces and the relationships between the interfaces, the timing and power constraints and to be able to derive implementations.

If anyone believes the goal of HLS is to take software and transform it into optimized hardware, then they do not have the correct view or the pragmatic view of what can be achieved. It can take very abstract description from C, C++ or SystemC and create an RTL implementation.

Using a compiler you can find the dependencies of variables. You can see when it is first written and all of the reads. If you re-use the variable and assign to it in a different place, from a compiler perspective this is a new variable, and by doing the analysis that way it makes it a little easier. There are difficulties when you have multiple threads and a variable is assigned in one thread and read by other threads—and there is no synchronization. Then you would not know how long you need to keep the variable around, but this is not a fundamental issue.

In the early days of RTL synthesis, there was inconsistency of results. If you changed the input description very slightly, you could get completely different results. The industry worked out the right and wrong ways to describe things and this is true of HLS as well. This goes back to education. We do have a knowledge base that contains tips and tricks, but it is not as simple as following a handful of guidelines. Part of it is methodology, and this is the same as it was for RTL.

Very early on we realized that you have to be able to simulate and synthesize the same thing. This means you have to be able to express your interfaces in a cycle accurate manner and you also want to be able to do this at the high level of abstraction. This means that interface IP is essential. When you are just synthesizing software you soon get to the point where it has to communicate with other blocks and the system and you can’t think about this later. That is not hardware. Interfaces are fundamental to the problem.

HLS provides two means to getting quality of results. One is to be able to do better local optimization in things such as registers and HLS can do this better than most humans. The other, the real winner, is the ability to consider many different architectures.

At the end of the day, the users have spoken and most of the top semiconductor companies are using HLS. They pushed the EDA vendors into the C, C++, SystemC camp, and they are using HLS to create chips that are making them money. They may not be optimal, but they are probably more optimal than they would have been if an engineer had designed it without help from the tools. Does this mean that Professor Brewer is wrong? No. Academia should continue to strive for excellence and a better solution, one that can provide more optimal designs. But what exists today is good enough.

Leave a Reply

(Note: This name will be displayed publicly)