Comparing the impact of using megapixel images and larger models.
If you want high performance AI inference, such as Super-Resolution Object Detection and Recognition, in your SoC the challenge is to find a solution that can meet your needs and constraints.
First, let’s look at the issue of accuracy.
Accurate detection and recognition of objects can be increased in 3 ways:
A state-of-the-art Object Detection and Recognition model is Yolov5, the latest in a family.
There are eight version of Yolov5 – the first four process 640×640 images, the second four process 1280×1280 images.
Let’s see how a selection of these models do in detecting small objects – here is an aerial view of a congested intersection processed on Yolov5s:
Fig. 1: Aerial view of intersection as processed by Yolov5s.
43 vehicles are detected by Yolov5s, but some are missed.
Next let’s see how Yolov5s6 does with four times the pixels:
Fig. 2: Aerial view of intersection as processed by Yolov5s6.
Now 61 cars are detected. In addition to the cars, one motorbike and one person are also detected due to the super resolution capability of the network. One thing to note is that the probabilities of the detections are relatively low, and as a result, there are two key misdetections above: a truck on the lower left corner, and a person on the lower right corner. Still the results are much better than Yolov5s.
Finally, we see below how Yolov5l6 does (l is for large, not numeral 1):
Fig. 3: Aerial view of intersection as processed by Yolov5l6.
Yolov5l6 detects all of the vehicles that are fully visible, but misses a few that are visually obstructed. This could be for reasons such as training (the model may have been trained to detect vehicles at street level not from above), etc. Yolov5l6 also detects each vehicle with higher detection probabilities compared to Yolov5s6, which eliminates the misdetection issues seen in previous figure. Lastly, two people in the scene are detected, which is very impressive as the people are very small and are difficult to spot.
An argument can be made that in the future, even better performing networks with even longer latencies will be desired. This will increase the need for flexible high performance AI inference IP like InferX.
The benefits of using megapixel images and larger models are clear, but the compute required goes up by an order of magnitude (4x the pixels and 2-3x the layers). How can you run these big images and models within your SoC’s area and power budget?
InferX is hardware and software IP that is available for integration in your SoC for finFET nodes from 16nm to 3nm. InferX hardware comes as a tile which can be built into arrays for more processing, then delivered with an AXI bus interface to connect to your SoC’s NOC.
Fig. 4: InferX compute tile.
InferX is 80% hardwired: almost all of the datapath is hardwired. But it is 100% reconfigurable because the 16 tensor processors are connected by a reconfigurable interconnect and eFPGA is used as the control plane to manage operation. eFPGA can also be used to implement new operators, which pop up all the time as models continue to evolve. Unlike fully hardwired solutions, having eFPGA means you can always adapt to changing models.
The table below shows the performance of 1 to 8 tiles in N7. InferX is optimized for low latency batch=1 operation. And InferX works efficiently with relatively low DRAM bandwidth, which is important because each DRAM requires 100+ package balls to connect and large ball-grid packages and their substrates get exponentially expensive as ball count grows.
Fig. 5: InferX performance in N7, batch=1, 1 DRAM for 1-2 tiles, 2 DRAMs for 4+8 tiles.
Even 8 tiles of InferX is only about 50mm2 in N7. 8 tiles of InferX outperforms even Orin AGX 60W using less DRAM bandwidth than Orin AGX.
Of course, for many of your applications you may only need 1 or 2 tiles.
You can get more information on InferX at https://www.flex-logix.com.
Leave a Reply