Power/Performance Bits: Aug. 25

AI architecture optimization; THz encoding for 6G.


AI architecture optimization
Researchers at Rice University, Stanford University, University of California Santa Barbara, and Texas A&M University proposed two complementary methods for optimizing data-centric processing.

The first, called TIMELY, is an architecture developed for “processing-in-memory” (PIM). A promising PIM platform is resistive random access memory, or ReRAM. While other ReRAM PIM accelerator architectures have been proposed, Yingyan Lin, an assistant professor of electrical and computer engineering at Rice and director of the university’s Efficient and Intelligent Computing (EIC) Lab, said experiments run on more than 10 deep neural network models found TIMELY was 18 times more energy efficient and delivered more than 30 times the computational density of the most competitive state-of-the-art ReRAM PIM accelerator.

“The problem is that for large-scale deep neural networks, which are state-of-the-art for machine learning today, more than 90% of the electricity needed to run the entire system is consumed in moving data between the memory and processor,” said Lin.

“There are no one-for-all answers, as different applications require machine-learning algorithms that might differ a lot in terms of algorithm structure and complexity, while having different task accuracy and resource consumption — like energy cost, latency and throughput — tradeoff requirements,” she said. “Many researchers are working on this, and big companies like Intel, IBM and Google all have their own designs.”

TIMELY, which stands for “Time-domain, In-Memory Execution, LocalitY,” achieves its performance by eliminating major contributors to inefficiency that arise from both frequent access to the main memory for handling intermediate input and output and the interface between local and main memories.

In the main memory, data is stored digitally, but it must be converted to analog when it is brought into the local memory for processing in-memory. In prior ReRAM PIM accelerators, the team noted, the resulting values are converted from analog to digital and sent back to the main memory. If they are called from the main memory to local ReRAM for subsequent operations, they are converted to analog yet again, and so on.

TIMELY avoids paying overhead for both unnecessary accesses to the main memory and interfacing data conversions by using analog-format buffers within the local memory. In this way, the researchers said TIMELY mostly keeps the required data within local memory arrays, enhancing efficiency.

The second proposal, called SmartExchange, is a design that combines algorithmic and accelerator hardware innovations to save energy.

“It can cost about 200 times more energy to access the main memory — the DRAM — than to perform a computation, so the key idea for SmartExchange is enforcing structures within the algorithm that allow us to trade higher-cost memory for much-lower-cost computation,” Lin said.

“For example, let’s say our algorithm has 1,000 parameters,” said Lin. “In a conventional approach, we will store all the 1,000 in DRAM and access as needed for computation. With SmartExchange, we search to find some structure within this 1,000. We then need to only store 10, because if we know the relationship between these 10 and the remaining 990, we can compute any of the 990 rather than calling them up from DRAM.

“We call these 10 the ‘basis’ subset, and the idea is to store these locally, close to the processor to avoid or aggressively reduce having to pay costs for accessing DRAM.”

The researchers used the SmartExchange algorithm and their custom hardware accelerator to experiment on seven benchmark deep neural network models and three benchmark datasets. They found the combination reduced latency by as much as 19 times compared to state-of-the-art deep neural network accelerators.

THz encoding for 6G
Researchers from ITMO University are working on ways to encode data in the terahertz range and have shown a way to modify terahertz pulses in order to use them for data transmission.

As 5G communications roll out around the world, “we’re talking about 6G technologies,” said Egor Oparin, a staff member of ITMO University’s Laboratory of Femtosecond Optics and Femtotechnologies. “They will increase data transfer speeds by anywhere from 100 to 1,000 times, but implementing them will require us to switch to the terahertz range.”

Currently, a technology for simultaneous transfer of multiple data channels over a single physical channel has been successfully implemented in the infrared (IR) range. This technology is based on the interaction between two broadband IR pulses with a bandwidth measured in tens of nanometers. In the terahertz range, the bandwidth of such pulses would be much larger – and so, in turn, would be their capacity for data transfer.

One challenge, however, has been ensuring the interference of two pulses, which would result in a so-called pulse train or frequency comb used to encode data. “In the terahertz range, pulses tend to contain a small number of field oscillations; literally one or two per pulse,” said Oparin. “They are very short and look like thin peaks on a graph. It is quite challenging to achieve interference between such pulses, as they are difficult to overlap.”

The team’s work focused on extending the pulse in time so that it would last several times longer but still be measured in picoseconds. In this case, the different frequencies within a pulse would not occur simultaneously, but follow one another in succession, called chirping or linear-frequency modulation. However, it comes with another challenge: although chirping technologies are quite well-developed in regards to the infrared range, there is a lack of research on the technique’s use in the terahertz range.

“We’ve turned to the technologies used in the microwave range,” said Oparin. “They actively employ metal waveguides, which tend to have high dispersion, meaning that different emission frequencies propagate at different speeds there. But in the microwave range, these waveguides are used in single mode, or, to put it differently, the field is distributed in one configuration, in a specific, narrow frequency band, and, as a rule, in one wavelength. We took a similar waveguide of a size suitable for the terahertz range and passed a broadband signal through it so that it would propagate in different configurations; because of this, the pulse became longer in duration, changing from two to about seven picoseconds, which is three and a half times more. This became our solution.”

By using a waveguide, researchers have been able to increase the length of the pulses to a duration that is necessary from a theoretical standpoint. This made it possible to achieve interference between two chirped pulses that together create a pulse train. “What’s great about this pulse train is that it exhibits a dependence between a pulse’s structure in time and the spectrum,” explained Oparin. “So we have temporal form, or simply put field oscillations in time, and spectral form, which represents those oscillations in the frequency domain. Let’s say we’ve got three peaks, three substructures in the temporal form, and three corresponding substructures in the spectral form. By using a special filter to remove parts of the spectral form, we can “blink” in the temporal form and the other way around. This could be the basis for data encoding in the terahertz band.”

Leave a Reply

(Note: This name will be displayed publicly)