Navigating The GPU Revolution

Potential cost and time benefits are driving GPU adoption, despite challenges.

popularity

Experts at the Table: Semiconductor Engineering sat down to discuss the impact of GPU acceleration on mask design and production and other process technologies, with Aki Fujimura, CEO of D2S; Youping Zhang, head of ASML Brion; Yalin Xiong, senior vice president and general manager of the BBP and reticle products division at KLA; and Kostas Adam, vice president of engineering at Synopsys. What follows are excerpts of that conversation.


L-R: D2S’ Fujimura; ASML’s Zhang; KLA’s Xiong; Synopsys’ Adam

SE: GPU acceleration is having a major impact on the design and production of masks and other process technologies. What applications is GPU acceleration best suited for?

Fujimura: There are many applications, but particularly I would say three different things. First is image processing. Graphical processing units are great for graphical processing, or even video processing, which is a lot of images in succession. Second is the simulation of natural phenomena, which includes tasks like Gaussian convolution for mask effects or lithography simulation. These are operations well-suited for GPUs due to their parallel processing capabilities. Third, in the past decade, the rise of deep learning has significantly expanded GPU usage, particularly for training purposes. Although deep learning inference can run on various platforms, training typically relies on GPUs, contributing to the success of companies like NVIDIA.

Xiong: I fully agree. Image processing simulation, plus deep-learning-based applications are where GPUs excel. In the end, it’s really the cost reduction. You can do a majority of those processing activities on a regular CPU, but the GPU provides the cost reduction of the computation. That is really attractive to us, especially in semiconductor manufacturing.

Zhang: We see GPU acceleration as a cost reduction mechanism. For the same costs you get a faster turnaround time. In addition, there are cases where the amount of computation is so heavy it’s not practical to achieve acceptable turnaround time with CPU, and GPU allows the task to be completed in reasonable time.

Adam: All of these applications are extremely relevant, and it does make sense to consider using GPU acceleration. However, one aspect to delve into further is the realization that many applications in our domain are not solely simulation or image processing tasks. In these cases, the bottleneck often lies in the weakest link, particularly if it cannot be effectively accelerated on the GPU. For instance, if a scenario involves 90% simulation and 10% other processes, even if the simulation can be accelerated significantly, the overall speed-up is limited by that remaining 10%. Therefore, we’re actively seeking applications where GPU acceleration can be leveraged throughout the entire process, maximizing its benefits. In many instances, we’re revisiting algorithms to minimize non-accelerated components, aiming for a higher percentage of GPU-accelerated tasks, which promises significant performance improvements.

SE: This is a relatively recent adoption of GPUs for this kind of work. Why hasn’t it happened before, and why is it happening now?

Xiong: Before the start of the deep learning, it was a tradeoff between flexibility and cost reduction, and it was not always one-sided. The answer wasn’t always clear in all the applications. That’s probably one of the key reasons the GPU hasn’t been adopted until recently. But deep learning tilted the balance and made it possible.

Fujimura: Using GPUs has been questionable in terms of economic benefits. It may be about to accomplish a task 10 times faster, which is meaningful, but it’s still only 10 times faster. Is that enough to justify the costs? There are other applications where maybe it’s only 3 times faster, and then it becomes questionable in terms of the economic benefit. The mask-making process is changing that equation because the world of mask making is becoming more curvilinear, which takes a lot more processing time. GPUs, being pixel-oriented, handle shapes more resiliently, as their runtime is largely determined by the number of pixels involved. As the industry shifts toward curvilinear shapes, GPUs are poised to play a more prominent role in mask-making. NVIDIA CEO Jensen Huang coined ‘GPGPU’ to mean “general-purpose GPU” about a dozen years ago to differentiate this kind of professional-grade use from gaming and other personal use. That marked a significant milestone in the accessibility and reliability of GPUs for industrial applications. It allowed for the extremely high cost of design of GPGPUs to be amortized with the already successful gaming platform, enabling the leading edge, and eventually enabling deep learning-based AI. Now, the term GPGPU isn’t even necessary.

Adam: Addressing why it didn’t happen earlier, it’s worth revisiting some historical context. There were notable technical successes in the past that didn’t translate into commercial success. Around 2008, Brion spearheaded the development of an OPC verification module designed to run on FPGAs, marking a significant technical achievement. During my tenure at Mentor Graphics, we collaborated with IBM to create an accelerated solution for the IBM Cell broadband engine, which was originally developed for the PlayStation in partnership with Sony and Toshiba. This engine, equipped with multiple internal cores, showed promise for OPC simulations, offering a speed-up of approximately 5 to 10 times. However, IBM discontinued the processor, leading to the eventual demise of our project, in spite of the technical goals being met. Also, despite the availability of GPUs at the time, the programming ecosystem— particularly CUDA — was still in its infancy, making GPU utilization challenging. Fast forward to today, the GPU ecosystem has matured significantly. GPUs are now more readily available, with major cloud providers like AWS offering GPU-equipped servers for compute-intensive tasks, and CUDA is a mature programming environment with numerous scientific computing applications available as open source. While deep learning applications have been the primary driver of GPU investment, other industries, including ours, benefit from this improved accessibility and performance.

Zhang: Indeed, Brion initially adopted FPGA acceleration. We also looked at GPU at the time, but didn’t pursue it because it lacked the precision required for our applications. Today, specialized AI processors boast impressive speed for certain computations, but they don’t offer the precision needed for our applications. Additionally, ensuring consistent and reproducible results is crucial in OPC applications. Unfortunately, GPU vendors haven’t fully addressed this concern, as different generations may produce varying results. Furthermore, unlike for CPUs, programming for GPU from one vendor may not directly carry to another. These factors have somewhat constrained our adoption of GPU acceleration, making it essential to carefully evaluate its suitability for specific applications.

SE: What else needs to change to make GPUs work? Is it primarily a software issue? Or is it a hardware issue from vendor to vendor? What’s the challenge with getting GPUs integrated into the manufacturing process?

Zhang: The primary consideration for integrating GPUs into a customer’s production is achieving cost reduction. However, there are several challenges. Deciding servers’ configurations to cover a wide range of applications while ensuring both speedups and cost-effectiveness is not straightforward. The diversity of applications complicates the task of determining the optimal server configuration. Forecasting future needs adds another layer of complexity, as we must anticipate requirements beyond the current year. Overall, addressing these issues requires a comprehensive approach considering software, hardware, and cost.

Adam: The GPU solution needs to be as robust as the CPU solution. If you are going to deploy in a data center, with hundreds of servers or thousands of servers working in parallel to solve the problem that we are computing, you need to be able to guarantee a similar level of robustness and reliability. It also needs to make economic sense for the user to adopt the GPU solution. For example, if you deliver a solution that is 10 times faster, which is already a good achievement, if it is cheaper to buy and operate a 10 times data center of CPUs, it doesn’t make sense to use GPUs. It’s a significant issue when running the same calculation twice doesn’t yield identical numerical results. There’s no explicit guarantee from GPU vendors that the same numerical output will be maintained across generations or even within the same vendor’s different architectures or generations. This inconsistency poses a major challenge for EDA vendors. We need to develop software that can handle these numerical differences, which is quite challenging. Additionally, there needs to be acceptance from users that perfect numerical precision might not always be achievable. Users may need to adjust their expectations toward accepting solutions within a small tolerance range. Another challenge lies in the programming ecosystem. Currently, separate development teams are needed for different GPU architectures, as there’s no unified compiler like those available for traditional CPU architectures. While solutions exist, bridging this gap remains a significant challenge in GPU integration.

Fujimura: Initially, when GPUs first emerged, they weren’t IEEE standard-compliant because it didn’t matter if games or displays of images had one least significant bit off of one color of one bit somewhere. When GPGPUs took off, GPUs quickly took care of that. But massively parallel computing needs to be careful with the execution order to get the same answer every time. So GPU application software engineers need to be careful about that. However, in general, the challenges we face are not unique to GPUs. Mask-making equipment, for instance, often operates far beyond its depreciation cycle, with some machines still in use today that were shipped before Google was even founded in 1998. Maintaining such equipment is crucial, whether or not GPUs were in existence at the time. It’s a complex problem inherent in our industry, and companies have adapted to manage it over time. Despite the potential cost savings of replacing aging equipment with newer, more efficient alternatives, the challenge lies in the extensive qualification process required for any changes, including thorough testing.

Another aspect unique to GPUs is that they are typically only utilized for performance-sensitive tasks. This means that when leveraging GPU acceleration, software must not only prioritize functionality but also ensure reliability and optimize performance. Unlike CPU-based platforms, where variations in component speed may not significantly impact overall system performance, even slight differences in GPU performance can have a considerable effect when using GPU acceleration. As a result, porting software between different GPU platforms can be more challenging, often leading to a preference for sticking with a specific generation of GPU for a generation of software, particularly for software attached to a specific equipment.



Leave a Reply


(Note: This name will be displayed publicly)