Challenges Of Edge AI Inference

Common pitfalls in training and deploying CNN solutions, and how to avoid them.


Bringing convolutional neural networks (CNNs) to your industry—whether it be medical imaging, robotics, or some other vision application entirely—has the potential to enable new functionalities and reduce the compute requirements for existing workloads. This is because a single CNN can replace more computationally expensive image processing, denoising, and object detection algorithms. However, in our experience interacting with customers, we see the same challenges and difficulties arise as they move an idea from conception to productization. In this article, we’ll review the common challenges and address some of the solutions that can smooth over development and deployment of CNN models in your edge AI application.

Leverage existing models

We see a lot of companies attempting to create models from the ground up. However, existing models already exist for almost every application, so rather than reinventing the wheel, it’s often much easier to start with a network based on one of these architectures. Moreover, starting with a known model will reduce the amount of time, data, and effort to train a model for your application, since it’s possible to retrain existing models in a process called ‘transfer learning.’  For example, rather than trying to define your own pose estimation network, start with the OpenPose model and work from there with the pose data you need to recognize. Likewise, if you are attempting to perform object detection, solutions such as YOLOv3 offer a computationally simple and straightforward way to get the job done after retraining on your dataset.

Simple models are effective

For most applications, the truth is that you don’t need the latest and greatest in CNN architectures. For example, if your application only requires detecting the difference between a few different objects with high certainty, even simple detectors such as YOLOv3 unaugmented can do the task you need it to do. Likewise, if super resolution or image denoising is your ultimate goal, you may only need a 10-layer network with fewer than 64 filters per layer. Customers can benefit greatly once they realize that their applications can be solved for a fraction of the computational complexity with much simpler models than what’s on the forefront of research. The goal is to not make the migration to CNNs any harder than it has to be.

Integrate quantization early

Quantizing a model down from multi-byte precisions such as FP32 or BF16 to a single-byte can multiply inference speed with little to no degradation in accuracy. However, many customers get tripped up on quantizing their model because it adds steps in the training and model creation process that can be tricky to implement. For example, frameworks such as PyTorch and ONNX expose their own methods for quantizing models, but they’re not always compatible with each other. Flex Logix has helped customers navigate the different options for quantization (eg, static, dynamic, training-aware). The main takeaways are that you should be consistent with your approach and aim to quantize from the outset of developing your model.


Most customers run into the same set of issues when attempting to bring CNNs to their industry, but Flex Logix can help ease the transition and unlock the potential of AI in your application. Using pre-existing models, optimizing models for your application, and quantizing your workload early will help you get your application running and deliver additional value to your customers.

Leave a Reply

(Note: This name will be displayed publicly)