SPONSOR BLOG

How Good Is Your Data?

As machines begin training and talking to other machines, the question takes on new meaning.

July 6th, 2017 - By: Ed Sperling

Machines can be taught by other machines. They also can talk to other machines on their own, with no human intervention, which is the great attraction of the Internet of Things.

Sensor clusters or other trucks can pass along critical data that alerts a multi-trailered truck to slow down or take a different route. And sensors feeding a variety of data, such as temperature or vibration, can isolate the cause of those anomalies and recommend maintenance before a problem erupts, or shut down a production line before the problem causes further damage.

And that’s just the beginning. With inferencing, those same trucks may be able to detect changes in weather or traffic patterns well before they need to slow down—possibly even before starting out on their trip. And a control system may be able to do predictive analysis so well that it shifts jobs to a different site just before that maintenance is required, saving both time and money.

These are simple examples to illustrate the usefulness of data. But what if the data is wrong? No machine learning systems in place today understand whether data is dirty, clean, or somewhere in between. For the most part, these systems operate under close human supervision. As these systems become more mature, however, machines will teach machines. That’s the whole purpose of machine learning. It takes best practices for getting one or more jobs done in the context of what those machines are likely to encounter, and it provides an acceptable Gaussian distribution of responses.

In an industrial assembly line, this is fairly straightforward. And with roads, barring an attack of rabid wild animals or the sudden appearance of sinkholes, a truck should be able to navigate common road conditions. But in a more complex situation, such as basing market decisions on social media chatter, the lines become fuzzier. This becomes like the proverbial game of telephone, taught in primary school, where the teacher whispers something to the first student, who then whispers it to the next student, and so on. By the time it reaches the last person in the room, the message no longer resembles the original one.

The same is true for data biasing in machine learning, where the starting point for learning may be slightly off. That skew is carried on and magnified in certain instances. Machines don’t understand nuances that come naturally to people, which is why carmakers are having so much trouble getting autonomous vehicles to make turns when pedestrians are in the crosswalk. The car will sit and wait indefinitely until no one is crossing the street. A person driving the car would slip through in a fraction of the time.

But with machines—as with people—bad data is hard to correct. And it gets compounded by other bad data, so multiple not-so-bad inputs may produce a very bad result, like a model of stock market growth and opportunities heading into 2001, prior to the dot-com crash, and in 2007, prior to the worst downturn in the semiconductor industry’s history. As more machines begin teaching more machines, it becomes even harder to trace the origin of the problem and to correct all of the machine learning that has been built upon that data.

Machines effectively can do things that people cannot. In many cases, they can do things better than people. But the data used to program them, on which all machine learning and modeling is done—and which increasingly will be shared among machines without human intervention—is often not as clean as it needs to be. Like all data, it is subject to bias, human input error, and anomalies that crop up as data is fused together from a variety of different sources.

At this point, however, there is far more effort going into getting these machine learning systems to work and far too little effort being put into making sure the initial information is correct. Bad data requires workarounds, and workarounds are like software patches on software patches. Sooner or later they produce unexpected results, which in the case of machine learning and machine-to-machine communication is an unknown built upon an unknown.

Related Stories
The Darker Side Of Machine Learning
Machine learning needs techniques to prevent adversarial use, along with better data protection and management.
Machine Learning Meets IC Design
There are multiple layers in which machine learning can help with the creation of semiconductors, but getting there is not as simple as for other application areas.
The Great Machine Learning Race
Chip industry repositions as technology begins to take shape; no clear winners yet.
Plugging Holes In Machine Learning
Part 2: Short- and long-term solutions to make sure machines behave as expected.
What’s Missing From Machine Learning
Part 1: Teaching a machine how to behave is one thing. Understanding possible flaws after that is quite another.
Building Chips That Can Learn
Machine learning, AI, require more than just power and performance.
What Does An AI Chip Look Like?
As the market for artificial intelligence heats up, so does confusion about how to build these systems.
AI Storm Brewing
The acceleration of artificial intelligence will have big social and business implications.

Ed Sperling

(all posts)
Ed Sperling is the editor in chief of Semiconductor Engineering.

Knowledge Centers
Entities, people and technologies explored

Startup Funding: Q1 2025

AI chips and data center communications see big funding; 75 startups raise $2 billion.

by Jesse Allen

Advanced Packaging Fundamentals for Semiconductor Engineers

New SE eBook examines the next phase of semiconductor design, testing, and manufacturing.

by Bryon Moyer

Chip Industry Week in Review

AI export rule to be scrapped; SEMI, EU request; Cadence, Nvidia supercomputer; AI co-processor; Imagination's new GPU; semi sales up; imec, TNO photonics lab; NSF key to national security; flexible packaging control system; SiConic test engineering; USB 4 support; SiC JFETS; magnetic behavior in hematite.

by The SE Staff

How Good Is Your Data?

Ed Sperling

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers
Entities, people and technologies explored

Related Articles

Startup Funding: Q1 2025

Advanced Packaging Fundamentals for Semiconductor Engineers

Chip Industry Week in Review

Chip Industry Week in Review

RISC-V’s Increasing Influence

Chip Industry Week in Review

What Exactly Are Chiplets And Heterogeneous Integration?

Big Changes Ahead For Interposers And Substrates

Sponsors

Recent Comments

About

Navigation

Connect With Us

How Good Is Your Data?

Ed Sperling

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers Entities, people and technologies explored

Related Articles

Startup Funding: Q1 2025

Advanced Packaging Fundamentals for Semiconductor Engineers

Chip Industry Week in Review

Chip Industry Week in Review

RISC-V’s Increasing Influence

Chip Industry Week in Review

What Exactly Are Chiplets And Heterogeneous Integration?

Big Changes Ahead For Interposers And Substrates

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored