Knowing where trouble spots lie is vital in creating a verification plan to avoid them.
Non-Volatile Memory Express (NVMe) is gaining rapidly in mindshare among consumers and vendors. Some industry analysts are forecasting that PCIe-based NVMe will become the dominant storage interface over the next few years. With its high-performance and low-latency characteristics, and its availability for virtually all platforms, NVMe is a game changer. For the first time, storage devices and storage subsystems have a fundamentally different way to operate with host computers, unlike any previous storage protocol.
NVMe is an optimized, high-performance scalable host controller interface designed to address the needs of enterprise and client systems that utilize PCI Express-based solid-state storage. Designed to move beyond the dark ages of hard disk drive technology, NVMe is built from the ground up for non-volatile memory (NVM) technologies. NVMe is designed to provide efficient access to storage devices built with non-volatile memory — from today’s NAND flash technology to future, higher performing, persistent memory technologies.
With the rise of NVMe, verification engineers must be aware of the common pitfalls experienced while verifying PCIe based NVMe controllers. By knowing the areas where the probability of these occurring is highest engineers can create a verification plan checklist to avoid them with confidence.
Common NVMe verification pitfalls
1. Not testing register properties
At first glance, controller register testing may seem obvious as registers form basic functionality, but their thorough testing is often ignored. If one looks at the problem more closely, it can be seen that it can be difficult to detect subtle problems with merely a simple test.
Some of the areas to focus on while writing register tests include:
2. Failing to explore queue related non-trivial scenarios
As per NVMe protocol basics, queues are defined as circular in nature, and any entry is added to queue at the current tail location and popped from current head location. Also, the tail pointer is incremented by one for each entry pushed into the queue, and the head is incremented by one for each entry popped from the queue. When these pointers reach the maximum queue depth, they are rolled back to the point’s first location in their respective queues, this is a wrapping condition.
Since NVMe is based on a paired Submission and Completion Queue mechanism, commands are placed by host software into a Submission Queue and completions are placed into the associated Completion Queue by the controller. This queue pairing is an important aspect not to be left unexplored.
One of the gray areas where issues are seen frequently is many-to-1 queue mapping, where Multiple Submission Queues (SQ) may utilize the same Completion Queue (CQ). It is observed in some designs where such relationships exist that controllers are prone to report incorrect Submission Queue ID and Command ID fields in the posted completion queue entry.
The next point of interest is testing minimum depth conditions and maximum depth conditions. Although theoretically maximum queue depths can be 64K, the depth is limited by the maximum supported by the controller, as reported in register CAP.MQES. It is important to remember that the total number of entries in a queue when full is one less than the queue size.
Figure 1: Controller queues.
Another important scenario is when I/O queues are non-contiguous. It is imperative to note that when configured as a non-contiguous queue, each PRP Entry in the PRP List shall have an offset of 0h. If there is a PRP Entry with a non-zero offset, then the controller should return an error of PRP Offset Invalid. Also, creating an I/O Submission Queue that utilizes a PRP List is only valid if the controller supports non-contiguous queues as indicated in CAP.CQR register. Scenarios related to queue rollover situations cannot be overlooked and must be included in verification plan.
3. Not focusing on controllers attached to PCIe virtual functions
Single Root I/O Virtualization (SR-IOV) is a specification that allows a PCIe device to appear to be multiple separate physical PCIe devices. It introduces the idea of virtual function (VF), in addition to physical function (PF), that can be considered “lightweight” functions that lack configuration resources. VFs implement a subset of traditional configuration space and share some configuration space with the associated PF.
Most common problems faced in this category are related to the BDF calculation, where BDF means bus, device, and function number. Once the VF Enable field is set in SR-IOV, VFs are created and will respond to configuration transactions. They will not, however, be discovered automatically by legacy enumeration software. A new mechanism exists in SR-IOV that allows SR-IOV compliant software to locate VFs. The First VF Offset and VF Stride create a linked list that starts at a PF that can be used to identify the location of all the VFs associated with a particular PF. During this newer enumeration process, a unique BDF is assigned to each implemented VF along with the PF.
Figure 2 illustrates an NVMe subsystem that supports SR-IOV and has one PF and four VFs. An NVMe controller is associated with each VF, with each controller having a private namespace and access to a namespace shared by all controllers, labeled NS E.
Figure 2: PCIe device supporting Single Root I/O Virtualization.
Another error-prone configuration is when more than 256 VFs are supported per PF. SR-IOV allows a device to implement 100s of VFs. But to do this, a SR-IOV device may request the software to allocate more than one bus number (in order to support more than 256 functions).
Some other primitives to consider while verifying SRIOV based NVMe controllers are:
4. Overlooking interrupt and it’s mask handling complexity
NVMe allows a controller to be configured to signal interrupt in four modes. These modes are pin-based, single message MSI, multi message MSI, and MSI-X. While pin and MSI uses NVMe defined registers for interrupt mask/clear, MSI-X uses a mask table defined as part of PCIe specifications. Another point to consider is that pin and single MSI use only a single vector, multi-MSI can use up to 32 vectors, while MSI- X supports a maximum of 2K interrupt vectors.
Below are the mode of interrupt operations if any of the following conditions are met:
While verifying interrupt mechanisms, the following premises should hold true concurrently:
Though interrupt processing and mask handling form basic functionality for signaling command completion by controllers, and this feature is tested in every NVMe command process, some specific scenarios are worth considering while creating a test plan.
Case 1: For SRIOV enabled devices the following points should be acknowledged. PFs may implement INTx, but VFs must not implement INTx. PFs and each PF and VF must implement its own unique interrupt capabilities.
Case 2: Interrupt aggregation, also referred to as interrupt coalescing, mitigates host interrupt overhead by reducing the rate at which interrupt requests are generated by a controller. Controllers supporting coalescing may combine multiple interrupt triggers for a particular vector.
5. Misunderstanding metadata concepts
Metadata is contextual information about a particular LBA of data. It is additional data allocated on a per logical block basis. There is no requirement for how the host makes use of the metadata area. One of the most common usages for metadata is to convey end-to-end protection information. The metadata may be transferred by the controller to or from the host in one of two ways. One of the transfer mechanisms shall be selected for each namespace when it is formatted; transferring a portion of metadata with one mechanism and a portion with the other mechanism is not supported.
The first mechanism for transferring the metadata is as a contiguous part of the logical block that it is associated with. The metadata is transferred at the end of the associated logical block, forming an extended logical block. This mechanism is illustrated in Figure 3. In this case, both the logical block data and logical block metadata are pointed to by the PRP1 and PRP2 pointers (or SGL Entry 1 if SGLs are used).
Figure 3: Metadata as extended LBA
There is likeliness of erroneous behavior in this scenario, where the controller sends a fewer number of total bytes than expected. Example: If LBA data size is 512B and metadata per block is configured as 16B, the expected total data transfer for 1 logical block is 512 + 16=528B. Now when PRP is used, this transfer of 528B is indicated by specifying the number of logical blocks to be written or read (Command DW12 bits 15:0, NLB), and the total number of bytes is not explicitly specified anywhere. But sometimes the controller may be at fault sending a fewer number of bytes than this expected 528B. Other factors should also be considered while calculating metadata bytes, like current LBA format number, metadata support in selected namespace, etc.
Another scope of committing a mistake in an extended LBA mechanism is with SGL transfers. While for PRPs, it is the controller that internally manipulates the total number of bytes to be transferred, for SGLs the total bytes, including data and metadata, should be set within the length field of SGL descriptors.
The second mechanism for transferring the metadata is as a separate buffer of data. This mechanism is illustrated in Figure 4. In this case, the metadata is pointed to with the Metadata Pointer, while the logical block data is pointed to by the Data Pointer.
Figure 4: Metadata as separate buffer
The indispensable characteristics to test while testing metadata as a separate buffer mode are:
Conclusion
Considering the number and complexity of features supported by the NVMe specification, it is no surprise that the verification effort is huge. Fortunately, the NVMe Questa Verification IP (QVIP) sequence library makes it easy to verify these problematic scenarios easy and contributes to accelerating verification cycles. By focusing their verification effort on five key areas and applying NVMe Questa Verification IP (QVIP), verification engineers will save themselves a lot of trouble.
To learn more about NVMe QVIP and other supported protocols, please visit us at https://www.mentor.com/products/fv/simulation-verification-ip/?cmpid=10169.
Wonderfully compiled article.