General-purpose humanoid robots need all their senses to function equally well; vision and movement are the farthest along, but others are catching up.
Key Takeaways
Humanoid robots are making significant progress in performing a wide array of tasks by leveraging generative and agentic AI, and none too soon. Current projections indicate that such systems will be integral to human existence in the upcoming years.
Today, humanoid robots are mostly found in electronics and automotive factories, in warehouses and logistics, and in professional cleaning settings. [1] China is expecting to see a 94% increase in humanoid robot output in 2026. [2] Soon, humanoids are expected to enter the home as companions or assistants, especially for the elderly.
And that’s just for starters. “Robotics may be the biggest product category of all time,” said Anirudh Devgan, CEO of Cadence, in a recent presentation. “The projection is $25 trillion. The whole GDP of the world is $110 trillion. So this is huge if this happens.”
Fig. 1: Time frame for realizing different technologies. Source: CadenceLIVE Silicon Valley 2026
While humanoids are human-like, they do not need to replicate adult humans in every way. For example, they could be as small as a child or have only three fingers. There are many similarities between human senses and humanoid sensors. Each sense — and related sensors — is under development with specific challenges, and some are further along than others. Smell and taste trail the others while vision and language are in the lead.
“We’ve gotten good at natural language technology because it is generally used everywhere, not just in robots, so they benefit from the development of scale,” said Marc Swinnen, director of product marketing at Synopsys. “Vision is also well developed, but it faces the same challenges as self-driving cars. It needs to interpret objects, and that’s not a trivial problem. As for haptics, they’ve gotten good at pressure sensors and even feeling texture. Each researcher will claim that their sense is the most difficult, but it might be the whole shebang that is hard.”
Anything related to physical AI needs work, including touch or pressure, but AI is enabling humanoids to learn faster. “We have always had the sensors,” said Sathishkumar Balasubramanian, head of products at Siemens EDA. “We always had ways to differentiate from sensing something and converting that to a digital image, or digital footprint, that can be different from touching A or touching B. We always had that. But we didn’t know how to make it work like a human. With LLMs, you can make a humanoid do that. It can be the brain to drive everything — the voice interpretation, touch, anything you do physically, how you reason, and how you react. Because of LLMs and transformers, we can do it very fast.”
When it comes to touch, humanoids are taking cues from industrial robots.
“There’s still very little work done on real-world touching in terms of humanoid hands,” said Sam Toba, senior product marketing manager at Synaptics. “But there’s a hand on the end of an arm attachment for industrial robots. The applications for arms are very, very wide.”
Another developing area is voice recognition and speech. Applications are growing, whether it’s a humanoid connected to an LLM talking to a human, or a humanoid interacting with smart appliances in the home.
“The research here is changing rapidly,” said John Weil, vice president and general manager for IoT and edge AI processor business at Synaptics. “What you and I perceive as a voice model from a few years ago, compared to what is possible today, is maybe a 100X difference in capability.”
Giovanni Campanella, robotics and industrial automation general manager at Texas Instruments, considers touch and hearing to be the most challenging senses. “These are the ones that people understand less,” he said. “The camera is very well understood, because people can easily relate to it. It’s our eyes. There is a lot of literature, and the ecosystem is strong on the camera side and in software. When it comes to the hearing piece and the touch piece, there is very little today, and that’s what customers understand less. That’s also the one they want to focus on the most. Customers are moving from just having cameras in robots, to adding more of these other senses to make it smarter.”
In China, robotics companies are working on vacuum cleaners with vision and AI. “Here at the edge, they are doing so much visual interpretation of the room, for example, understanding if there is a water patch, or whether it’s a wood floor or carpets,” said Adam White, division president of power and sensor systems at Infineon Technologies. “They also map the room using AI so they can save battery power to make sure they’re effectively cleaning the floor. On top of that, when they clean the floor, they analyze dust particles to work out where in the room the most usage happens. This is the journey, going from generative AI into agentic AI, and then into the physical AI.”

Fig. 2: Vision is complex, but well understood. Source: Infineon OktoberTech
As these advancements continue to accelerate, the integration of sophisticated sensory and processing capabilities is reshaping the possibilities for humanoid robots.
Here, Nvidia is working to solve accuracy and manipulation. “In physical AI, and most robotics applications, the accuracy requirements are 99-point how many nines of accuracy?” said Deepu Talla, vice president and general manager of robotics and edge AI at Nvidia. “In some cases, it could be two. Maybe for an autonomous vehicle it needs to be 10. In surgical robotics, it needs to be even higher. That’s the big challenge that we have.”
Simple manipulation includes gripping with two fingers or suction. “The ultimate prize is to be general-purpose, to be able to do fine-grained, dexterous manipulation that would require advanced sensors and actuators and control in a safe, real-time manner,” said Talla. “We need the combination of general-purpose brain data collection and all the physical things that’re going to go inside the robot to come together.”
Nvidia is also working with Cadence to embed agentic AI for physical AI, combining physical AI chip IP with robotics simulation libraries to help close the “sim‑to‑real” gap for robots and autonomous machines, including humanoids. AI agents coordinate across the workflow with virtual training, physics models, and mission‑scale scenario simulation to help solve complex, real‑world scenarios.
These types of solutions are needed to solve the twin challenges of reasoning and physical movement. “The intricacies of humanoid robots are a very complex mechanical problem to solve first,” said Matthew Bubis, director of product management at Imagination Technologies. “You also have to solve the question of how to get the outputs of the AI model to control all of these extremely complex mechanical systems. So robotics has two problems to solve, whereas automotive has one. Robotics is both an AI problem and a mechanical technology problem, whereas automotive has all the mechanical and sensor technology already there. For humanoids, it’s a question of that final but very difficult step in controlling the outputs.”
Touch
One of the most challenging humanoid outputs is the movement of the hands and fingers.
When it comes to hand sensing, touch means contact, with sensors to measure force, shear, slip, and temperature. Often, the system will include an IMU (inertial measurement unit), too. Tactile sensors have different technologies to solve the problem, including capacitive sensing, piezoelectric, optical, magnetic, inductive, and resistive — each with its pros and cons.
“The sensing methods are similar, but what’s connected to the chip, the physical sensor, will look very different,” said Synaptics’ Toba.
All of the raw data then gets aggregated to become a meaningful event. “Our touch chips run machine learning-enabled algorithms,” Toba said. “We already use them for noise detection, but they can be used for force detection, so there’s a lot of processing done to provide that data. We provide capacitive sensing, and the big advantage is that it’s very fast. When you’re touching something, you need to be able to sense motion fast. You need to sense force, and especially shear, fast. For example, when you’re holding a can, and it’s slipping, that’s shear. You want to react to that so it doesn’t fall through your hands.”

Fig. 3: Pressure sensors for an AI-powered robotic hand, based on technology from Grinn and Synaptics. Source: Synaptics Tech Day
Edge pre-processing on an MCU or MPU in the palm of a hand avoids overloading the CPU. “If every single finger had a line to the host, it would be a physical burden on the mechanical design,” Toba noted. “We filter data and noise, and that allows us to filter the amount of data that gets passed on to the host, because there’s so much data generated by your finger. There are four different sensing modes sensing all the time, and very frequently. If you pass everything on to the host, you’re going to overload it. What we can also do is save power. The ‘look for touch’ feature goes into low power, then wakes to tell the host something is happening. There are different levels of processing, power, and CPUs. That’s a common theme.”
Aggregating data from different types of touch sensors is a type of sensor fusion, but sensor fusion tends to refer to different modalities, such as input from radar, lidar, touch, and voice, compared to different types of tactile sensors for one modality, explained Nebu Philips, senior director of strategy and business development at Synaptics. The smallest sensors, used in the grid sensor, are 5 x 5 millimeters, supporting 60 channels. There are also parts as small as 3 x 3 millimeters, depending on the channel.
Closed-loop processing in the hand helps speed up finger reactions, but processing can also be done centrally if the communication protocols are fast enough.
“You have up to 30 sensors spread around the palm,” said TI’s Campanella. “In the fingers, magnetic or capacitive sensors create a kind of matrix that allows you to tell where the touch is coming from. It also allows you to detect or sense the intensity of the touch. That’s important because then you can feed that back to your motor control. Let’s say you don’t squeeze the glass right, and you break the glass. That is a closed feedback loop where the processing needs to happen really fast. There are different types of approaches there. Some people want to do processing directly at the edge and close the loop as fast as possible with the motor control. But you can also send that back to the brain of the robots if you have really fast communication, usually gigabits (per second) and above. There is Ethernet today, and SerDes such as FPD Link, which is also used in the automotive space, and that can go even beyond the gigabit.”
Still, different use cases require different approaches. “In a humanoid, or for all types of robot arms, there doesn’t need to be the full hand with five fingers,” said Robert Otręba, owner of Grinn, and an embedded IoT specialist. “It could be two fingers or one finger to provide some touch. Very often, people think that humanoids need to look exactly like humans, but we can translate human behavior into the robotic space using a combination of touch controllers and miniaturized chips. Already on this small chip, there could be some gathering of the data, pre-processing, filtering out all the noise, and some preconditioning of the signal. This miniature chip can be as close as possible to the sensing element. This makes the most sense because then you don’t have to transfer signals over longer wires and provide additional disruption to the signal. The next, second level of computing of this data could be implemented in the hand itself. It doesn’t need to be in the full heart of the system. This adds smartness to the hand and pre-processes data from a few sensors before offloading to the main system, which is the heart of the robot.”
In industrial use cases, the touch controller on a robot arm can be combined with video processing to detect real hand gestures and provide data. “It’s cheapest to do processing through Wi-Fi into some external system, depending on the application,” said Otręba. “But there are several ways of doing this, and smartness at every single step is important.”
Hearing and speaking natural language
Humanoids are expected to listen and respond in natural language, ideally in real-time and with a suitable accent. But different languages and regional accents can present problems.
“Philosophically, voice technology is not difficult,” said Synaptics’ Weil. “It’s about scoping the problem and understanding, ‘Is it one model or is it 10 models?’ You could do one universal English model, or you could add more intelligence in the system and try to figure out a locale of where you are, and then dynamically change that model to an English one that is more localized, which improves your speed and response. If you try to solve world peace and do English for all, then the model gets bigger, and maybe you start to struggle with affordability. You need more silicon or more memory. That’s the balancing act we’re helping customers with right now.”
For example, in Japan, customers were impressed with a model’s native language, but there were still complaints. “One person said it sounds too young in their word choice, or it’s not quite at the respect level they want,” said Weil. “They said, ‘We don’t want it to sound like an 18-year-old. We want it to sound like a 35-year-old.’”
Models can generally figure out slang. “At the end of the day, it’s just matching words,” said Weil. “First, it has to get the English into token words that describe things, and then it goes and matches that with a nearest neighbor algorithm. When we try to visualize this, to teach customers how it works, we use a 3D space kind of view, showing how these different words are floating in space with relativeness to each other, and then relativeness to a question and answer set, and that’s the RAG (retrieval-augmented generation) concept. What you’re doing is taking language translation, moving from spoken to written to alpha, to a numerical approach to the problem, to matching it to a database — that’s where the magic really happens — and then turning it around and reading it back out. That’s where the Japanese were, saying, ‘It’s great that it understood me, but when it speaks back to me, it’s not using the right words.’”
Contextual understanding is also key for the AI model to determine when it speaks or stays silent. “When I talk or my spouse talks, the kids understand context and intent,” said Weil. “They don’t even have to be looking at me. They can understand, based on my tone or direction, that the conversation was being directed toward them before I said their name. With an espresso machine, we as an industry know how to voice-activate it, and we know how to have it talk back. But if I’m talking to my wife on the other side of the room about her wanting me to make a cup of espresso for her, I don’t necessarily want the machine queuing already. We have to teach the machine context.”
There is a balancing act for an embedded product on the kitchen countertop, such as thermostats, just as there is for robots. “Do you use a service like Google Voice and deploy a cloud-based system that has a small agent on every device?” said Weil. “That’s what the big guys want you to do. The product companies making a special machine are not willing to share with Google or Apple, just as automotive OEMs might not want Apple CarPlay or an Android system. They want their own quality, context-aware system, and that’s what people are trying to figure out. We did a demo at CES where we used beamforming mics to bolt onto it, as well, so that not only does the machine listen to you, it understands where the audio is coming from. If I walk up and face the machine, it already knows there’s more context.”
OEMs are talking about humanoids interacting with an oven, microwave, or refrigerator. “Maybe the IoT gives comments,” said TI’s Campanella. “‘Now the food is ready.’ Then the robot will go and grab the food and serve it to you. But there is a lot of noise in an environment. In homes, you could have kids talking or the vacuum cleaner going, so it needs to be able to distinguish where the noise or the voice source is coming from. It becomes crucial.”
For edge voice applications, several features are needed to make sure a system distinguishes the right things and isolates the right sounds. “It’s about having a good signal chain and the ability to amplify the real signal from the noise, with an audio codec,” said Campanella. “You already have a lot of intelligence integrated, and now, with hardware accelerators at the edge, you’re able to train models even before you deploy them to distinguish those voice commands. They’re able to distinguish a specific voice set from others. It’s a combination of analog and EP (embedded processing), having a very good signal chain with a high SNR (signal-to-noise ratio), and an MCU with hardware accelerators that do the job. That’s the key here to solve that challenge.”
Conclusion
Different markets are adopting robots, humanoids, and human-machine interfaces at different rates. “You go to China, look at the cars that are being manufactured there, and they’re pushing their leading innovation into the in-car experience with voice interfaces, massive displays, and the like,” said Rob Fisher, senior director of product management at Imagination Technologies. “That’s where the consumer base is really demanding that type of advancement in the user experience. In Europe, we’re slightly more conservative and swayed more by safety features. But in China and Asia, it’s the user experience and that type of innovation that’s really selling cars and robots.”
In a recent report, Kearney broke down current robotic applications and the types of robots used in each, noting that humanoids still have limited adoption compared to articulated or collaborative robots. Further, humanoids are seen in just four of nine potential settings. Professional cleaning is one application where humanoids are seen more than other robots. [4]
“You have robots everywhere, doing many things, so there are lots of challenges,” said Matt Commens, senior director of product management at Synopsys. “At CES, we saw many companies trying to mimic a human being. It requires a lot of sophisticated software to make decisions, lots of motors, lots of sensors, lots of wireless communication between them, and then the whole environment. That’s what customers are developing today, so hopefully we’ll see more commercial ones soon, to do the housework.”
Industry 4.0 is already riddled with robots. “Everything that a person was doing to manufacture a car, now it’s a robot,” Commens observed. “Everything is related to automation, doing repetitive tasks. ‘I want a robot to do this.’ We have our AI agents in our computers, so we want to have the same thing in real life. But in real life, it’s more than software. It’s also hardware.”
[Editor’s note: Future articles will explore central versus distributed compute in humanoids in greater detail, as well as vision and other senses.]
References
[1] China’s Humanoid Robot Output to Surge 94% in 2026; Unitree and AgiBot to Capture Nearly 80% Market Share (TrendForce)
[2] Olaf: Bringing an Animated Character to Life in the Physical World (Disney Research Imagineering)
[3] Under the Skin of America’s Humanoid Robots: Chinese Technology (Wall St. Journal)
[4] The robots running modern industry (Kearney)
Related Articles
Limiting AI/ML Tools To Ensure Physical AI Safety, Security
Tools designed to verify and monitor physical AI systems offer value, but human oversight is needed to prevent accidents and unexpected behavior.
Security Threats Converge On IoT, Industrial ICs, Physical AI
Edge devices across multiple applications share common attack vectors. Security functionality must be designed in from the start and be updatable.
Physical AI Takes Functional Safety Cues From Automotive
The automotive industry has established safety standards, but rules concerning safety-critical physical AI are still evolving as more robots work alongside humans.
LLMs Add Safety Risks To Physical AI
Extra measures are needed to avoid accidents and bias with robots and drones.
Leave a Reply