SPONSOR BLOG

Building Vision-Enabled Devices To Capture The Emerging Wave In IoT

Enabling devices to perceive and interpret their surroundings in a more sophisticated manner.

February 13th, 2025 - By: Diya Soubra

The evolution of vision (the eye) is considered one of the most significant events in the history of life on Earth. 540 million years ago, during the Cambrian period, there was a sudden burst of evolutionary activity that resulted in the appearance of a variety of new species. Many of these species were characterized by the development of an eye which allowed them to perceive and interact with their environment in a more sophisticated way.

Similarly, the integration of vision into Internet of Things (IoT) devices revolutionizes how these devices interact with and perceive the world. With the ability to see and interpret their surroundings, IoT devices will drive a Cambrian explosion in IoT use cases that were not economically viable before. Many use cases are much simpler to implement with vision. For example, there is no need for a car presence sensor under each parking place as a camera sees the entire street, and with the proper AI model, will identify empty parking spaces.

The following table shows a mapping of use cases per location. Clearly, the same use case is applicable across all categories, but the deployment and management will be different.

Vision-related applications for the smart home:

Security: recognize and respond to potential threats, for example intruders or unauthorized access.
Monitoring and analysis: analyze the environment in real-time, identifying potential issues or inefficiencies and alerting homeowners. For example, evaluating air quality.
Natural interaction: for intuitive interactions between humans and smart home devices, for example recognizing and responding to gestures or facial expressions.
Personal assistant: recognize and respond to commands given by the homeowner, allowing them to control and interact with the device in a more intuitive way.
Home automation or energy: recognize and respond to changes in energy usage, for example turning off appliances when not in use or adjusting the thermostat to optimize energy efficiency based on presence.
Health and wellness: monitor and track the health of homeowners, for example detecting changes in sleep patterns or alerting them to potential health concerns.
Surveillance: remote check-in on your home when you are away.
Baby monitoring: keep an eye on your baby while they are sleeping, giving you peace of mind and the ability to check in on them without disturbing their sleep.
Pet monitoring: keep an eye on your pets while you are away, allowing you to check in on them and make sure they are okay.
Elderly care: A smart home camera can be used to monitor elderly family members and make sure they are safe and well.
Object recognition: recognize delivered packages to send notification.

Vision-related applications for the smart building or office:

Security: enable devices to recognize and respond to potential threats, for example intruders or unauthorized access.
Monitoring and analysis: monitor and analyze the environment in real-time, identifying potential issues or inefficiencies and alerting building operators.
Natural interaction: enables more natural and intuitive interactions between humans and devices, for example recognizing and responding to gestures or facial expressions.
Energy management: recognize and respond to changes in energy usage, for example turning off appliances when not in use or adjusting the thermostat to optimize energy efficiency based on employee presence.
Facility management: monitor and track the condition of facilities, for example identifying maintenance needs or detecting changes in air quality.
Access control: recognize and authorize access to different areas of the building, helping to improve security and efficiency.
Meeting room management: recognize and respond to changes in meeting room usage, for example identifying when a room is available or adjusting the temperature based on occupancy or taking attendance.
Safety: recognize and respond to potential safety hazards, for example identifying the presence of hazardous materials.
Wayfinding: provide directions and guidance to visitors, helping them navigate the building more efficiently.

Vision-related applications for smart retail or shop:

Inventory management: recognize, count, and track the movement of goods, allowing for more accurate inventory management and reducing the risk of stockouts.
Loss prevention: recognize and respond to potential theft or fraud, helping to reduce losses for retailers.
Market analysis or price analysis/price optimization: analyze customer behavior and adjust prices in real-time, helping to optimize profits for retailers.
Supply chain management: recognize and track the movement of goods throughout the supply chain, improving efficiency and reducing the risk of delays.
Advertising: recognize and respond to customer interests and preferences, providing targeted advertising and recommendations.

Vision-related applications for the smart factory:

Monitoring and analysis: monitor and analyze the environment in real-time, identifying potential issues or inefficiencies and alerting operators.
Predictive maintenance: monitor the condition of equipment and predict when maintenance is needed, helping to improve efficiency and reduce downtime.
Quality control: inspect products for defects or inconsistencies, ensuring that only high-quality items are shipped to customers.
Process optimization: optimize manufacturing processes by analyzing and improving the efficiency of different steps in the production process.
Inventory management: recognize and track the movement of goods, allowing for more accurate inventory management and reducing the risk of stockouts.
Safety: recognize and respond to potential safety hazards, for example identifying the presence of hazardous materials or alerting operators to potential collisions.
Robotic automation: enable robots to perceive and interact with their environment, allowing them to perform tasks for example object manipulation and navigation.
Environmental monitoring: monitor and analyze the environment, for example monitoring air quality or tracking waste reduction efforts.
Supply chain management: recognize and track the movement of goods throughout the supply chain, improving efficiency and reducing the risk of delays.

Vision-related applications for the smart city:

Traffic management: recognize and respond to traffic patterns, helping to optimize the flow of vehicles and improve safety.
Public safety: recognize and respond to potential threats, for example identifying the presence of hazardous materials or detecting criminal activity.
Environmental monitoring: monitor and analyze the environment, for example monitoring air quality or tracking waste reduction efforts.
Disaster response: recognize and respond to disasters or emergencies, for example identifying the presence of gas leaks or detecting structural damage.
Public transportation: recognize and respond to changes in public transportation patterns, helping to optimize routes and improve efficiency.
Infrastructure management: monitor and track the condition of infrastructure, for example roads, bridges, and buildings, helping to identify and address potential issues.
Public health: monitor and track public health trends, for example detecting changes in air quality.
Urban planning: analyze and optimize the use of urban spaces, for example identifying areas for development or identifying inefficiencies in the use of resources.
Intelligent transportation: enhances the capabilities of connected vehicles, for example recognizing and responding to traffic patterns, pedestrians, and other objects in the environment.
Object recognition and classification: enable IoT devices to recognize and classify objects, which can be used for a variety of applications, for example inventory management, quality control, and security.

The Machine Learning (ML) Neural Networks used for vision use cases are similar across these verticals:

Object detection and recognition: automatically detect and recognize objects in images and videos. This can be useful in a variety of applications, for example surveillance and security.
Image classification: classify images into different categories. For example, a machine learning model could be trained to classify a given image as a picture of a person, a dog, a cat, or a car.
Facial recognition: automatically identify individuals based on their facial features. This has applications in security, surveillance, and social media.
Gesture recognition
Advanced Presence Sensing:
- Detect presence and body position
- Monitor breathing patterns
- Monitor sleeping patterns
- Observe and analyze gait
- Detect falls
- Track movement and path taken
- Monitor how long a person remains inactive
- Identify if someone has been incapacitated

A comprehensive reference design offering from Arm

The latest reference design, the Corstone-320, can capture all these IoT vision market opportunities. The reference design integrates IP, software and prototyping platforms to reduce complexity for the SoC designer and the software developer. The Corstone package includes a collection of system IP and a subsystem designed to integrate the following Arm IP:

Cortex-M85
Mali-C55 ISP (optional)
Ethos-U85 NPU (optional)
DMA-350 (optional)

The Corstone package includes technical reference manuals, configuration and rendering scripts, plus the verification reports. ASIC developers then build an SoC around the subsystem to a specific segment requirement or use the package to explore Arm’s compute design intent before starting their custom design.

The vision use case is the most complex of the Edge AI use cases. Since most of the IP components are optional and the design is configurable and modifiable, the simpler use cases are easily addressed too.

Example SoC designed around the SSE-320:

Corstone-320 Fixed Virtual Platform (FVP)

The package includes a Fixed Virtual Platform (FVP) which models the subsystem. The FVP is used by software developers to accelerate development by removing the need for hardware to start application development.

Corstone-320 Software

The reference design software includes firmware, drivers for all the IP, middleware, RTOS and cloud integrations, ML models and reference applications. Software developers select the components required for their specific segment and build the IoT stack for that device using the development tools of their choice.

Three different software packages are available to developers.

Another for building functional ML enabled applications.

A full device stack with firmware update, middleware, reference ML application and cloud connectivity. The open-source applications demonstrate keyword spotting, speech recognition and object recognition use cases.

Refer to the Corstone-320 software blog post that goes into the technical details on each of those components.

Overall, the Arm Corstone has been designed for:

Performance (compute throughput): Right size compute for a variety of vision devices.
Memory bandwidth (data width, memory channels): System wide interconnect bandwidth for pixel traffic.
Energy efficiency (sustainability and battery life)
Power modes (sleep mode, Standby mode, power islands): Typical Cortex-M low-power profile along with power islands and control over which RTL block is active.
Design Costs (engineering time): Starting from a subsystem out of the box will jump start any SoC design.
Security (boot-time and runtime security mechanisms): Designed for system-wide security.
Observability (debug and trace): Full debug and trace of all components built into the subsystem.
Flexibility (one design customized for different segments): One subsystem to spin out many unique devices.
Software: A large amount of open-source software as the industry now recognizes that software cost is nearly half that of a typical project of such complexity. Besides the Arm and the industry open-source software, the Arm ecosystem also offers an extensive set of competitive offers for software, ML models and tools.

Below is a high-level description of the key IP that the subsystem integrates:

Cortex-M85:

The highest performing Cortex-M processor with Arm Helium technology provides the natural upgrade path for Cortex-M based applications that require significantly higher performance and increased security.

In addition to Arm TrustZone technology, Cortex-M85 integrates the new pointer authentication and branch target identification (PACBTI) architectural extension to mitigate against return-oriented programming (ROP) and jump-oriented programming (JOP) security exploit attacks.

Advantages of using Arm Cortex-M processors in low-cost low-power IoT vision device:

Energy efficiency: Arm Cortex-M processors are designed to be highly energy efficient, which is important for home cameras that need to operate for long periods of time on battery.
High-performance: Arm Cortex-M processors are known for their high-performance, which is essential for home cameras that need to capture high-quality video and audio in real-time.
Security: Arm Cortex-M processors are designed with security in mind, so you can trust that your home camera uses that security scheme to defend against hacking or other forms of cyber-attacks.
Flexibility: Arm Cortex-M processors are highly versatile and can be used in a wide range of home cameras.

Arm Cortex-M85 is a great choice for such cameras because they offer a combination of energy efficiency, high-performance, security, and flexibility.

Ethos-U85:

3^rd generation NPU from Arm

Configurations from 128 to 2048 MACs or cycle delivering up to 4TOPs of ML acceleration at 1GHz.
Higher energy efficiency; targeting 20% over current generation of Ethos-U.
Support for Transformer networks along with CNNs.

Mali-C55:

Up to 1.2 gigapixel/second throughput with low power consumption
2:1 HDR stitching, digital overlay (DOL), and dual-pixel HDR to address challenging lighting and weather conditions
Choice of processing blocks, including color noise reduction, downscaler in full resolution output pipeline, and single or double buffered configurations
Enhances on-device processing as the output from the ISP maybe redirected directly to the NPU

DMA-350:

Latest AMBA standards
Supports up to 8 channels
Single or Dual AXI5 Data Interface
TrustZone support for Cortex-M and Cortex-A
Stream interface for tiny heterogenous processing

Other IP used in the subsystem:

CoreLink NIC-450: Configurable AMBA AXI Network interconnect. 64bit and 128bit.
DMA-350: Up to 8 channels of data secure data transfer for low power operation.
CoreSight SoC-600M: Comprehensive library of components for the creation of debug and trace functionality within multi-core systems.
CoreLink SDC-600: Secure Debug Channel provides a dedicated path for authenticating debug accesses.
CoreLink PCK-600: Suite of pre-verified power control IP to simplify system power and clock management infrastructure.
CoreLink LPD-500: Stand-alone component to distribute Q-channel power control interfaces.
Runtime Security Engine components: Key management unit, Life cycle management unit, security alarm manager.

Conclusion

The integration of vision into IoT devices represents a significant opportunity for innovation. By allowing these devices to perceive and interpret their surroundings in a more sophisticated manner, it unlocks a wide range of new applications and capabilities that were previously not economically viable. Just as the evolution of the eye triggered the Cambrian explosion in species, the integration of vision into IoT devices has the potential to drive a similar explosion of innovation and evolution in IoT devices.

The Corstone-320 reference design for low-cost, low-power Intelligent IoT Vision is the easiest means to develop devices for these subsegments as the combination of integrated software and hardware dramatically reduces the complexity of SoC design and accelerates software development.

Finally, Arm has a large ecosystem of AI partners that supply competitive ML models, and software to meet the diversity of IoT vision use case requirements from the most powerful high-end to the battery-operated ambient applications.

Diya Soubra

(all posts)
Diya Soubra is a market entry specialist and director of segment strategy at Arm.

Building Vision-Enabled Devices To Capture The Emerging Wave In IoT

A comprehensive reference design offering from Arm

Corstone-320 Fixed Virtual Platform (FVP)

Corstone-320 Software

Conclusion

Diya Soubra

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers
Entities, people and technologies explored

Related Articles

RISC-V’s Increasing Influence

3D-IC For The Masses

Chiplets Add New Power Issues

Development Flows For Chiplets

Chiplet Tradeoffs And Limitations

New Data Center Protocols Tackle AI

Implementing AI Activation Functions

Future-proofing AI Models

Sponsors

Recent Comments

About

Navigation

Connect With Us

Building Vision-Enabled Devices To Capture The Emerging Wave In IoT

A comprehensive reference design offering from Arm

Corstone-320 Fixed Virtual Platform (FVP)

Corstone-320 Software

Conclusion

Diya Soubra

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers Entities, people and technologies explored

Related Articles

RISC-V’s Increasing Influence

3D-IC For The Masses

Chiplets Add New Power Issues

Development Flows For Chiplets

Chiplet Tradeoffs And Limitations

New Data Center Protocols Tackle AI

Implementing AI Activation Functions

Future-proofing AI Models

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored