What’s The Best Way To Sell An Inference Engine?

The hardware choices for AI inference engines are chips, chiplets, and IP. Multiple considerations must be weighed.

popularity

The burgeoning AI market has seen innumerable startups funded on the strength of their ideas about building faster, lower-power, and/or lower-cost AI inference engines. Part of the go-to-market dynamic has involved deciding whether to offer a chip or IP — with some newcomers pivoting between chip and IP implementations of their ideas.

The fact that some companies choose to sell chips while others license IP suggests there isn’t one obvious correct way to offer inference technology. And the various pivoting decisions suggest that either founders aren’t considering the whole picture realistically, or that the calculus isn’t straightforward. Considerations include target markets, available funding, and perhaps even whether the solution on offer is an accelerator or a processor. But behind the scenes, software development may muddy the outcomes.

In this report, chips refer to dedicated AI processing engines. CPUs and GPUs can handle any AI application, but performance, power, and/or cost are likely to suffer. The discussion here isn’t about generic processors being assigned inference workloads, but rather chips dedicated to performing inference with fewer painful design compromises.

Today, the predominant decision is between selling a chip and licensing IP. Chiplets can factor into this decision, as well, but so far nearly all of the chiplets in use aside from HBM memory have been developed in-house. The future promise, however, is a chiplet market where a “chip” designer may build one chiplet and attach it to another chiplet sold by a different company. The chiplet option then will play a much bigger role as companies combine chips and IP as a fundamental go-to-market decision.

Revenue
Any investment decision starts with money. Startups require seed money to exist, and seed money requires the prospect of a return on that investment. That return comes via profit, and while profit requires revenue, revenue itself isn’t a guarantee of profits. Moreover, even if revenue brings profit, more revenue doesn’t necessarily mean higher profit margins.

The revenue model for chips is a per-piece price. Profit is naturally set by costs, which are highest for a chip. “If you’re in the chip game, you have an incredible amount of investment up front,” said Paul Karazuba, vice president of marketing at Expedera. “You’re going to make a bet on one or two things, and those are going to be extremely big bets. But the payoff is potentially huge. The amount of revenue that one can capture in a chip is going to be much higher than what you would have with IP.”

Steve Roddy, chief marketing officer at Quadric, agreed. “The economics of narrow market fit mean that the only viable business model is to be a full silicon provider — unless the market you’ve targeted is a huge market, like mobile phones, and you have a sure-fire path to cracking the top five AP chip suppliers. More likely, you’ve targeted a market with perhaps a 50 million or 100 million unit annual volume. If you try to bring your accelerator solution to market as IP, charging 10 or 20 cents per unit, the financials could never pan out. Ten million dollars in royalty potential for 100% market share is not a viable business plan. But gaining 25% market share of a 100 million-unit market with a $10 chip solution means annual revenues of $250 million. A venture capitalist will write a check to chase $250 million in annual revenue, but not $10 million.”

The revenue model for IP is typically a licensing fee plus per-use royalties, with the possible addition of non-recurring engineering (NRE) charges. How the IP revenue elements balance out depends on negotiations. Smaller customers trying to conserve cash may prefer higher royalties so that they end up paying when they’re making money. Large companies, by contrast, try to avoid royalties — especially for high-volume applications such as smartphones.

“There are many ways to structure a royalty, but typically they’re always based on some amount of money per chip made,” said Karazuba. “The larger [an IP provider] you are, the less likely you are to deviate from your standard royalty.”

Larger sales prospects may end up designing their own IP for internal use rather than pay royalties, and several of them have done just that. While startup-originated IP tends to involve some kind of differentiating circuit cleverness, the internally developed engine IP is more straightforward because it just needs to work well enough. It isn’t competing in an open market.

“All the mobile guys do their own thing,” said Gordon Cooper, product manager for ARC AI processors at Synopsys. “If you have 100 million units that you’re shipping, you don’t want to pay royalties.”

Profit
Chips come with additional costs that aren’t borne by IP. Because most IP blocks are proven out through a test chip, both chips and IP require a physical design and tape-out. But producing that chip is a development cost for IP, whereas it’s a production cost for a chip. Package design and production costs are unnecessary for IP and for chiplets.

“Chips are going to require a lot more people — not only engineering people, but operations and production people,” said Karazuba. “You’re going to have to think about the RMA process. With IP, it’s a much more capital-friendly environment. You need your engineers, you need a few support staff, but that’s really all you need.”

IP has a different customer than chips. It’s sold to other chipmakers rather than systems companies. Chiplets acquired from an open market would be similar, which raises a concern if a company wants to do both chips and IP. “You really can’t be an IP and a chip company,” said Karazuba. “In many cases, you’ll be competing against your own customers, especially if they’re merchant silicon makers.”

Fig. 1: Cost burdens for different AI implementations. Not shown is software, which applies to all four options. Chips and chiplets share similar whole-chip design, but the chiplet is still likely to be easier because all it needs, in addition to the AI function, is the interface design for communicating with other chiplets. Source: Bryon Moyer/Semiconductor Engineering

For IP companies, service typically ends once the customer’s design is complete and qualified. The chipmaker that bought the IP then services the system builders. If some of the IP-related service includes functional customizations, those costs may be offset by NRE charges.

Based on these costs, IP tends to have higher margins than chips. But the total profit still may be larger with chips if the opportunity is big enough. “Your top-line [IP] revenue isn’t going to approach what it would have been with chips, but your profitability is so much higher,” said Karazuba.

Future-proofing the solution
At first blush it might appear that building chips places more limitations on the future marketability of a solution than IP does, given that IP hasn’t been cast into silicon. “When you’re a chip company, as soon as your chip is taped out, its functionality is what its functionality is,” said Karazuba. Brute-force solutions such as GPUs have ultimate flexibility, but dedicated inference chips must find a sweet spot that’s efficient enough for the target system while maintaining flexibility for changes in the market.

“From a programmable point of view, unless you have a small subset of algorithms — and that’s all you want to do, and you don’t have to future-proof your product — that would probably be a deeply embedded application,” said Synopsys’ Cooper. “Maybe then you can get away with hard-wiring it.”

General-purpose dedicated inference chips are uncommon. Most chips address specific workloads, limiting the breadth necessary for the design. If multiple opportunities share characteristics with only a few exceptions, then a superset chip may make sense as a way to reduce operational costs, with unnecessary circuits fused out in production. But if the two opportunities were simple object recognition and natural-language processing, for example, then a superset chip would have to disable too much of the silicon to be cost-effective, and separate chips are more attractive.

Different markets also have different high-level needs, and one chip may not satisfy all. “If you’re making an automotive processor, or a smartphone processor, or a home speaker, or an edge-type processor, you can’t just move them across markets,” said Karazuba. “IP lets you move a bit more easily.”

That being the case, the future-proofing necessary for such a chip involves expectations of how the target market will evolve. With vision processing, for example, if a market strategist expects that infrared imaging will become as important as that of visible light, then the ability to include that fourth channel may make sense. Over-provisioning, however, will simply reduce profitability.

How far ahead to look?
Chips also have an expected sales lifetime. This decision affects system design. Once designers move to a new technology, the old solution starts to become obsolete. A roadmap may anticipate eventual obsolescence, with a replacement chip two or three years out. In that case, the “future” consists of a few years, and that can be managed.

“A smartphone has an expectation of the time that they’re actively building this generation of the cell phone,” said Karazuba. “There are going to be additional networks that are going to be released once the phone gets into production. So regardless of whether you’re building a chip or whether you’re building IP, you still have to have the ability to use new networks.”

That said, AI technology is evolving at a torrid pace. New technologies can appear literally overnight, as large language models did, completely upsetting well-thought-out strategic plans. Such black-swan events are inherently unpredictable, making it impossible to anticipate them.

As an example, a specific style of CNN was the hot topic for a long time. “Many of us learned the hard way,” said Cooper. “We all did 3 × 3 convolution, and it was great until separable convolutions came out.”

Greater flexibility would have helped bridge that change, but how much more flexibility to include is a decision each company must make. “If you’re trying to be at the forefront of the state-of-the-art and run the latest greatest models, then maybe sacrificing flexibility for efficiency is probably not a great move,” said Ian Bratt, vice president of machine-learning technology and fellow at Arm.

Others agree. “Anyone developing a general-purpose AI processing chip (or chiplet or IP) needs to balance how programmable and general-purpose the device is against its performance and efficiency,” said Russ Klein, program director for the High-Level Synthesis Division at Siemens EDA. “Keeping it programmable and general-purpose makes it more likely to still be useful as new algorithms are deployed. Committing certain features into hardware will improve performance and efficiency, which makes the design more valuable today, but it increases the risk of future obsolescence. Make it too general-purpose and reprogrammable and it loses its differentiation from CPUs and GPUs.”

Makers of IP have more wiggle room. “Because we’re an IP provider, we’re more general-purpose,” said Jason Lawley, product marketing director for AI IP at Cadence. But it depends on how the IP is marketed. If the intent is that it comes shrink-wrapped with the necessary tools to configure and implement the IP with minimal handholding, then it must be as thoroughly thought-out as a chip would be.

Smaller IP providers often have a piece of foundation IP that they then customize per customer requirements. That reduces the necessary IP design time up front, bringing early revenue with a minimally viable product. Additional changes and features then can be added incrementally as opportunities materialize.

“I see new networks literally every single day,” said Karazuba. “And I am able to build support for those networks into my hardware and software code tree and do releases regularly — weekly, if not daily.”

Splitting the difference with chiplets
Chiplets are somewhere between chips and IP, sharing many of the costs associated with chips, but not the package development and production costs. They’re particularly useful if the inference engine requires an advanced silicon node while other elements inside the package don’t, or when a single chip would be too large.

“If you’re on the leading-edge node because of your accelerator, then you have to pay a premium across the full die [when doing a monolithic chip],” noted Todd Koelling, senior director, product and solutions marketing at Synopsys. “If you break the reticle, then you’ve got to break the die into chiplets.”

Chiplets are the middle-ground option. “Chiplets offer an opportunity to produce silicon to capture not as much revenue as a chip, but more revenue than you would have as an IP company,” said Karazuba. “But it still brings in a lot of the costs that you would have as a chip company.”

The industry is still working out the optimum formula. “Chiplet technology may well shake up the financial calculus that determines investor decisions,” said Klein. “For successful IP licensing companies, chiplets will offer a path that increases per-unit revenues by an order of magnitude, while reducing licensing design starts and thus licensing revenues. But chiplets are unlikely to meaningfully change the game for narrowly defined AI accelerators. An accelerator aimed at one market needs to be offered as a full-chip solution to generate sufficient revenues, and chiplets will only marginally alter whether the full-chip solution is a monolithic die or a multi-chiplet package.”

Chiplets also provide more flexibility than a chip, but less than IP. “Any system that includes an AI accelerator on a chiplet has the potential of quickly pivoting to a new chiplet with the latest algorithms faster than one could redesign a complete ASIC, assuming there are standard interfaces,” noted Klein.

Show me the money
For all three AI inference engine implementations, volume revenue can begin as soon as the design is production-ready, but only for lead designs that the customer has been developing in parallel with the inference-engine design. Most other customers will be learning about product availability after the engine has been released, so volume revenue must await system-design completion.

A theoretical and simplistic look at time-to-volume-revenue shows chips taking longest and IP requiring the least time. Chips require more design than chiplets, and production must include packaging. That also gives it a longer qualification and production time. Chiplets may see faster physical design, fabrication, and qualification. IP will require physical design, production, and qualification only of a test chip, not a production chip. It also requires no volume production to stock shelves.

Fig. 2: Qualitative comparison of time-to-market. Durations can vary dramatically and are not shown to scale. Strategic lead customers may do their system design in parallel with the work necessary to bring the inference engine into production. This analysis assumes that software is designed in parallel and is not the rate-limiting step, but that assumption can easily be wrong. Source: Bryon Moyer/Semiconductor Engineering

It’s important to note that the actual times necessary for each of these steps can vary widely by company, depending on money and personnel available — not to mention competence. So an IP implementation poorly executed could take (and, for some startups, has taken) longer to revenue than a full chip.

What’s your function?
Quadric’s Roddy has a different take on the best way to go to market. “How to pick the most likely path to success? The answer turns out to be surprisingly straightforward: is the innovative architecture an accelerator or a full-fledged processor? Accelerators are, by definition, specialized function blocks that offload a portion of AI/ML algorithms from host CPUs or DSPs. If an architecture is an accelerator, it has narrow applicability to either a small class of algorithms or a known current-day set of algorithms. If an architect has chosen to accelerate a small subset of known algorithms, that architect has already narrowed down the scope of use to a single application or a small cluster of similar end-market uses that share a known workload.”

A processor, however, is a more general-purpose programmable component that can apply to a broader range of workloads. Both IP and chips have been proven successful here.

“As has been proven by successful CPU IP and DSP IP companies over the past 20 years, the wide applicability of a fully C/C++ programmable processor offers hundreds of potential end markets with more than a thousand potential licensees,” said Roddy. “Arm purports to have signed deals with more than one thousand companies over its long tenure; Cadence’s Tensilica division has reported more than 500 customers. Quadric has chosen the IP licensing model because the breadth of segments that can use the architecture reduces market concentration risk, and the ability to avoid mask sets and manufacturing costs reduces capital requirements.”

No one right answer
Ultimately, the choice of how a product should go to market will be based on many factors, from revenue to profitability to time-to-revenue. Startups must find a sweet spot that makes both investors and customers happy.

The calculus isn’t easy, as demonstrated by the startups that have pivoted from one offering to another. Some have moved from chip to IP, others the reverse. In the end, many companies haven’t made it through the development pipeline, so it’s hard to find solid evidence as to the wisdom or folly of the decisions made.

Those delays often have more to do with the software than with the hardware specifics. Most of the software development effort addresses the implementation of a network on a piece of hardware, and that’s going to be the same regardless of how it’s sold. An abiding focus on hardware at the expense of software is likely to render the chip-versus-IP decision moot.

Fortunately, the market opportunity is enormous. NVIDIA has set an incredible price umbrella that reveals both the money customers are willing to pay, and an enticement to offer something that is less expensive or energy-hungry. Once startups start sharing the stage with NVIDIA, it will be easier to assess more objectively how inference solutions can be successfully sold.

Related Reading
Startup Challenges In A Changing EDA World
Without innovation, it may not be possible to fully utilize technological advances.
AI Drives IC Design Shifts At The Edge
Rollout of artificial intelligence has created a whole new set of challenges, along with a dizzying array of innovative options and tradeoffs.
HW and SW Architecture Approaches For Running AI Models
Custom hardware tailored to specific models can unlock performance gains and energy savings that generic hardware cannot achieve, but there are tradeoffs.
Mass Customization For AI Inference
The number of approaches to process AI inferencing is widening to deal with unique applications and larger and more complex models.



Leave a Reply


(Note: This name will be displayed publicly)