Implementing very high data rate algorithms using adaptive computing architectures.
Medical ultrasound is the most attractive among all diagnostic imaging systems due to its least-invasive nature and lack of any radiation. As medical ultrasound continues to grow in wider range of applications for its non-invasive nature and for its ability to see soft-tissue images, there is growing demand in supporting advanced imaging techniques in ultrasound beamformers, in multi-dimensional visualization and to apply artificial intelligence to assist in diagnosis of critical ailments.
Today, however, medical ultrasound is still limited in capability for real-time use due to challenges such as sequential data acquisition, low frame rates (10 to 50 frames per second) and sub-optimal image focus, which can be attained only at a single depth.
In standard sequential imaging, the complete image is obtained line by line. Each line is scanned by taking a set of points say, from left to right, with a transmitted beam that focuses on a given point. A line at any lateral position is then produced by using dynamic focusing during reception. At the next step, you go a bit deeper and scan again. You repeat the process multiple times to build the full image.
In order to improve ultrasound image quality, different transmit focal depths are used, and the final image is obtained using a recombination of these partial images corresponding to various depths.
Additionally, ultrasound imaging is limited to the speed of sound, which is 1,540 meters per second. So, it takes 200 microseconds to get down to 15 centimeters depth and receive the signal back resulting in 5,000 measurements per second. For an image with 100 to 200 lines of resolution, this will result in frame rates of 10 to 50 frames per second (fps) corresponding to 100ms to 20ms.
For cardiac imaging as an example, the aortic valve (found in human heart and most animal hearts) switches with timings in the range of 200ms. Thus, you can get only 10 snapshots (200ms/20ms=10) of the moving valve. This is not enough for real-time imaging requirements.
Therefore, sequential data acquisition is not adequate to achieve the desired frame rate and image quality in critical ultrasound diagnosis.
One way to address these challenges is to use UltraFast imaging techniques. UltraFast imaging is a paradigm shift, changing from normal sequential acquisition to complete parallel acquisition of the whole plane, using either spherical or plane waves. This provides the ability to do optimally focused images everywhere in the image and the ability to obtain thousands of images per second resulting in high image quality, accuracy, and scan depth. We can also do functional imaging with a high accuracy for both high and low velocities. The complete dataset gives us the possibility of more accurate retrospective measurements.
But UltraFast techniques have been limited to research scanners and have been proved to be difficult to implement on a commercially viable scanner due to the vast resource requirements in terms of computing, size of systems and amount of power dissipated.
Now, lets explore how the adaptive system-on-chips (SoCs) from AMD-Xilinx can further innovate in UltraFast imaging.
Xilinx Versal-based adaptive SoCs are the newest generation adaptive compute acceleration platform (ACAP) devices featuring closely coupled multiprocessors, an FPGA and the new ‘intelligent engine’ or the AI Engines (AIE) with a highly parallel tiled SIMD-VLIW architecture. SIMD stands for ‘Single-Instruction-Multiple-Data’ and VLIW is ‘Very Long Instruction Word’. The various blocks are tightly coupled using the NoC or the ‘Network-on-Chip’ architecture allowing fast data movement across the different blocks.
The AIE is the main computational unit for the UltraFast algorithms (like Planar Wave and Synthetic Aperture). It is a large matrix of SIMD/VLIW processors connected in a mesh (refer to figure 1 below). Each processor has its own instruction and data memory and each processor can share memory with it’s neighbor. All of the processors are connected on an innovative interconnect built with massive bandwidth of several terabytes per second. This structure allows for an unprecedented level of parallelism that is required to implement such algorithms.
Figure 1 below shows the architecture of the new AI Engines in the Versal SoC.
Fig. 1
This new adaptive computing architecture with the AIE system allows the medical equipment makers to implement very high data rate algorithms, for example, in parallel software beamformers realizing real-time scans or 3D/4D visualization, AI-ML for region-of-interest selection, data inference assistance and image reconstruction offload in endoscopy, robotic surgery and in radiology using a single fully embedded device.
Using this new adaptive SoC, AMD-Xilinx embarked on building a practical UltraFast beamformer with the help of Dr. Joergen Jensen from the Tech University of Denmark. Dr. Jensen helped develop the algorithms and AMD-Xilinx, together with our partner, implemented this beamformer in an example design along with software libraries for the Versal adaptive SoC.
The block diagram below (figure 2) is a high-level representation of an example UltraFast beamformer on a single Versal adaptive SoC, combining FPGA, CPU and hardware accelerators for AI and digital signal processing.
Fig. 2
The FPGA part of the device manages the transducers and acquires the echoes, stores the data in external DDR4 memories and can communicate with the host system using a PCIe or expand and synchronize with another module via Precision Time Protocol and 10/25Gigabit Ethernet channels.
Key benefits include:
Since many scientists use Matlab to design their system, we can provide Matlab support with Model Composer that is a Model-Based Design tool that enables rapid design within the MATLAB and Simulink environment and accelerates the path to production on Versal adaptive SoC devices through automatic code generation. This is strengthened by a set of C++ templates that wraps the basic API of the AI engines (shown in figure 3 below).
Fig. 3
Figure 4 below shows the Vitis unified software environment with the accelerated libraries used for ‘UltraFast’ ultrasound Imaging.
Fig. 4
AMD-Xilinx highlighted a real-world demonstration of the full capabilities of such an ultrasound beamformer at the Radiological Society of North America (RSNA) 2021 107th annual meeting in Chicago in December of 2021. The demonstrator uses the Versal AI Core Series VCK190 eval board connected to an AMD workstation. A Dell Alienware workstation was selected with the RX6900XT AMD GPU for rendering the image. The ultrasound data is provided in real-time by a wireless probe to show the exact performance, or it can also be provided using a simulator like the Field II used by most Ultrasound imaging scientists to validate the algorithms with a reference image.
We measured and generated performance benchmarks for the ‘UltraFast’ based beamformer for both the Versal adaptive SoC and one competing GPU. We presented the data for two applications, for abdominal imaging and for small parts imaging. The software environment used for the Versal device is Vitis 2021.2 (with support for future versions) and CUDA for the GPU.
Tables 1 and 2 below summarizes the performance results in fps (frames per second) for a single beamformer with 64 active elements and 200 lines of resolution for the Xilinx Versal adaptive SoC, the GPU RTX 2020 (Nvidia) and for a PC running i7 processor (Intel).
Table 1: Small Parts Imaging
Table 2: Abdominal Imaging
The linear and matched filter interpolation results are presented for both floating point 32 (FP32) and for integer 16 (Int16) data types. As these numbers above show, the Versal platform not only can implement a full beamformer using ‘UltraFast’ techniques but also significantly outperforms a gaming GPU and a PC. The results show a 44 times gain for linear interpolation for integer, 27 times gain for floating point in some of the critical algorithms over the GPU. The Catmull-Rom Spline interpolation which is one of the most challenging to implement shows the Versal further improving its performance advantage from 91 to 160 times over the GPU.
With this new innovation, a single embedded SoC device now can enable a commercially viable real-time ultrasound beamformer using ‘UltraFast’ algorithms. This will generate new capability in critical illness diagnosis by creating the ability to get optimally focused images everywhere in the image and to obtain thousands of images per second resulting in high image quality and accuracy.
Leave a Reply