Harnessing Computational Storage For Faster Data Processing

Addressing the verification challenges of complex NVMe architectures.

popularity

By Ujjwal Negi and Prashant Dixit

In the evolving landscape of data storage, computational storage devices (CSDs) are revolutionizing how we process and store data. By embedding processing capabilities within storage units, these devices enable in-situ data manipulation, minimizing data movement between storage and CPUs and dramatically improving performance and efficiency. This paradigm shift has increased the demand for rigorous verification methods to ensure the reliability and compliance of CSDs.

This article sheds light on the computational storage paradigm, covering the key features of CSDs, comparing their performance characteristics with those of traditional storage systems, and describing the associated verification challenges. It explores how Avery NVMe Verification IP addresses these challenges through advanced protocol compliance checks, stimuli generation, logging mechanisms, and coverage tools—ensuring optimal performance and reliability. Additionally, the article discusses system-level validation using virtual in-circuit simulation (VICS) to streamline testing and improve verification efficiency.

Key terms to know:

  • Subsystem Local Memory (SLM): It is a host-accessible memory area within an NVMe subsystem that is made up of multiple memory namespaces. It provides a dedicated byte-addressable memory for compute namespaces, enabling efficient data processing.
  • Programs: These are self-contained code units for specific tasks on stored data, classified as downloadable programs (uploaded by the host) and device-defined programs (pre-installed). These programs are designed to execute a range of computational tasks—including data encryption, compression, and analytics—directly within the storage infrastructure.
  • Compute namespaces: These are specialized regions in an NVMe subsystem for storing and executing computational programs—distinct from traditional storage namespaces that primarily hold data. They provide isolated environments for running tasks independently of the host, using SLM ranges for inputs, intermediate results, and outputs. Memory access is restricted to namespaces within the same reachability association, ensuring data security and preventing conflicts from simultaneous access.

Fig.1: SLM and compute namespace.

Program execution in CSDs

Executing programs on compute namespaces requires initial set up, including loading and activating programs and creating memory ranges. These ranges define specific portions of memory within the SSD that the compute namespace can access. Only relevant data within these ranges is targeted for computation, reducing unnecessary access. Instead of reading data to the host and writing it back, the execute program command runs directly within the compute namespace. The program accesses data from the defined SLM ranges and performs computations without needing to transfer data outside the SSD. Only results may be transferred back to the host, leading to reduced bus traffic.

Fig. 2: Program execution on CSD.

Traditional vs. computational storage

In traditional SSDs, data is read from the SSD to the host for processing, resulting in high data movement and increased latency. In contrast, CSDs minimize data transfers by allowing programs to execute directly within the compute namespace. This reduces latency and enhances bandwidth efficiency, thus enabling more responsive systems, especially in data-heavy environments.

Fig 3: Traditional vs. computational storage model.

Performance Characteristic SSD CSD
Read/Write Speed 500 MB/s (SATA) to 7,000 MB/s (NVMe) Comparable to SSDs; enhanced by computation
Latency Generally low
Typically 10-20 μs (NVMe)
Lower
Typically, 5-15 μs
Throughput High
Up to 500,000 IOPS (NVMe)
Higher
Exceeds 500,000 IOPS (depends on computational workload distribution)

Table 1: Performance characteristics comparison of SSDs and CSDs.

Verification strategies with NVMe VIP

The complexity of CSDs introduces new verification challenges. Protocol compliance, memory management, and the interaction between different namespaces require thorough validation. To tackle these challenges, Avery NVMe Verification IP offers a comprehensive suite of tools:

  • Protocol Compliance: Avery NVMe Verification IP ensures protocol compliance in complex NVMe architectures with an embedded monitor that decodes transport packets and manages the address space. It uses a shadow NVM storage model for data score boarding, logs unnecessary packets via a beat logger, and includes a protocol suite with over 1,800 checks to validate correct operation.
  • Comprehensive Stimuli Generation: Verification of computational storage designs involves generating comprehensive stimuli for various scenarios, including corner cases, normal operations, and stress conditions. Avery NVMe Verification IP streamlines this process with features such as a transport-independent stimulus library of 600+ compliance tests, a highly configurable command structure for tailored testing, randomization of stimuli for diverse test cases, error injection to simulate faults, automation of command creation via APIs, integration with the UNH-IOL compliance suite, and support for both blocking and non-blocking transaction modes to accommodate different testing requirements.
  • Logging and Debugging Tools: Avery NVMe Verification IP features essential debugging tools for complex computational storage designs. The transaction logger assigns unique debug IDs to commands for traceability and captures critical attributes for fault isolation. The beat logger aggregates transport transactions under a single NVMe transaction, facilitating anomaly analysis and insights into command execution flow. Together, these tools enhance reliability during concurrent computational and standard NVMe operations.
  • Coverage Metrics and Analysis: Avery NVMe Verification IP employs robust coverage analysis techniques to ensure comprehensive validation of CSDs. It utilizes functional, code, and cross-coverage metrics to evaluate the effectiveness of test scenarios and identify coverage gaps. Questa Verification IQ (VIQ) enhances this process by providing advanced analytics and visualization tools, enabling real-time tracking of coverage metrics and facilitating regression analysis. Questa VIQ includes features like gap analysis, which identifies the most impactful areas for coverage improvement, and a coverage analyzer that streamlines the identification of uncovered scenarios. This data-driven approach optimizes test strategies, ensuring thorough validation of all operational scenarios and significantly enhancing the reliability and performance of CSDs.
  • Performance Assessment: The performance logger evaluates operational efficiency by providing metrics on latency, throughput, and input/output operations per second (IOPS). It tracks latency during command processing phases, identifying bottlenecks. Throughput measures data transfer rates in megabytes per second (MB/s), ensuring performance meets expectations. IOPS indicates the number of read and write operations per second, with higher values reflecting greater efficiency during concurrent computational and NVMe operations.

System-level validation with VICS

Virtual In-Circuit Simulation (VICS) provides system-level validation by co-simulating host and embedded software with SoC hardware, ensuring thorough testing before tape-out. VICS integrates with industry-standard benchmarking tools like PCIe-CV and FIO on virtual platforms, such as QEMU and ARM Fast Models. This setup allows for flexible simulation environments and maintains full debugging capabilities by combining SystemVerilog and UVM-based verification IP (PCIe, CXL, Ethernet, AMBA).

VICS enables early detection of issues, reducing the risk of silicon failures and accelerating the verification process. By simulating real-world interactions between hardware and software, it ensures that CSDs meet system-level requirements, validating both protocol compliance and system functionality prior to deployment.

Fig. 4: System-level simulation for NVMe using VICS.

As the digital landscape evolves, computational storage devices are becoming increasingly vital for enhancing data processing efficiency. By embedding computational capabilities directly within storage, CSDs significantly reduce latency and improve bandwidth utilization, addressing the growing demands of data-intensive applications.

Avery NVMe Verification IP offers a robust suite of features, flexible integration, and comprehensive validation tools that are essential for ensuring the performance and compliance of CSDs. This verification solution offers essential capabilities for validating CSDs, optimizing their performance while maintaining system integrity.

For a more in-depth treatment of this topic, please read the new whitepaper from Siemens EDA, Accelerating verification of computational storage designs using Avery NVMe Verification IP.

Prashant Dixit works on the development of verification solutions for UCIe-based designs at Siemens EDA. With a strong background in the storage domain, he also manages the Storage Verification IPs team, focusing on the development and testing of NVMe and NVMe over Fabrics testing solutions. Prior to his role at Siemens EDA, Dixit contributed to the design and verification of IP and SoCs in the networking and storage domains at Samsung. Dixit earned a Bachelor of Technology in Electronics and Communication from Uttar Pradesh Technical University in 2004 and a Master of Engineering degree in Microelectronics from BITS Pilani in 2006.



Leave a Reply


(Note: This name will be displayed publicly)