Learning How To Forget

Until learning systems are given the ability to forget, we need the reset button, and this is not sustainable.


There has been a lot of talk recently about the right to be forgotten, or data privacy rights. These require companies that hold data about us to remove it when properly requested. This might be data that was collected as we browse the Internet, or from online shopping. Or perhaps it’s collected as we drive our cars past cameras, or GPS tracking of our cellphones, or many other ways – some of which we don’t even know about.

When asked to delete data, companies often protest that this task is too difficult, and they are right. However, that does not mean they can keep designing systems that do not provide this basic functionality. If they cannot do it 10 years from now, it is because they chose to ignore the problem. I know personally that if someone asked me to remove all record of a particular piece of data, it would be a difficult task. I have backup systems that retain files even when deleted – in case their deletion was an accident. I have mirroring of data on multiple machines. I have dark storage, which ensures that even if someone attempted to hack me, they could not get to all copies of the files. I might be able to eliminate it eventually, but the problem is everything in society is set up to preserve data, not to remove it.

If data can be successfully deleted, there are still a host of problems. That data became part of my knowledge set, and I may have used it in some other documents that I created. Derivative work does not attach itself to original sources, unless you are putting together a paper that is going to be peer reviewed or needs to cite all references. Even if you did keep track of every usage of every piece of data, it is still lodged in your mind.

The human mind cannot be wiped and retrained. We have no idea how to do this. In the future, you could recreate a piece of information from memory, or you could have learned from that piece of data and used it to create something else that could not have existed without it. This has always been a problem when an employee leaves one company and joins a competitor. How much information can they take with them? And how do you prevent what was secret information from being used to enhance the competitor’s product – even if done unintentionally?

But there is another level to this where AI is concerned. If data has been used to train a system, how do you remove that data from what it has learned without resetting to zero and doing it all over, except with that one piece of data removed? That would be highly inefficient, and given the enormous power that training consumes today, it is not a sustainable strategy for the future.

The same problem exists when attempting to put together AI systems that tackle some aspects of the functional verification problem, such as deciding which testcases should be run and in what order. It is slightly different in that you are not asking for data to be deleted, but there is derived data that has become invalid every time the design changes. Including that data will lead to incorrect actions going forward, or potentially cause the verification process to be less efficient.

Whenever a change is made to the design, what verification information has been invalidated? Clearly, coverage data has been affected, but also the effectiveness of every test has potentially been changed. You no longer know which tests will provide meaningful data, and you will not know that until everything has been redone. Similarly, you no longer know what switches to the tools will provide the best results, or even enable a result to be found.

When I listen to most people in the industry talk about this, it is clear they don’t have a real answer, or do not comprehend the problem fully. They just assume you can keep going forward for a while, and then at some point you press the reset button and recollect all the data. But even then, they continue to use some information from the past in order to try and improve the rate at which new information will be gathered. So long as it is only used for ordering, the worst thing that can happen is that results are collected slower than expected. But you have to ensure that no invalidated data can find its way back into the system.

The best solution for functional verification would be to find a way that ties aspects of a design directly to the testcases that are supposed to be able to activate that piece of the design. This would solve many problems for the entire verification flow, but it is not clear that anyone truly wants to solve that problem either. Requirements tracing systems may be the starting point, given that they attempt to record that relationship.

Leave a Reply

(Note: This name will be displayed publicly)