NVMe commands and extensions

This chapter discusses a proposal of the NVMe protocol with vendor-specific commands. Custom commands are extending both Admin and I/O commands sets. All the additional commands are encoded in the vendor-specific commands address space.

Basic NVMe operations

To perform basic NVMe operations once the device gets probed successfully, you can use nvme-cli. This tool allows you to send various commands to NVMe devices.

Currently supported commands are:

  • list

  • list-subsys

  • id-ctrl

  • id-ns

  • list-ns

  • smart-log

  • set-feature

  • read

  • write

  • fw-download

  • fw-commit

NVMe Vendor extensions

In order to use the accelerator you need to be able to control it over NVMe interface. The control functionality can be implemented with additional, vendor-specific commands. This way the accelerator device will be compatible with generic NVMe software. Custom software will be required to use the additional accelerator functionalities. Some of the commands defined in the NVMe standard support vendor extensions which can be used to implement basic features, e.g. retrieving accelerator logs.

Get Log Page (Log Page Identifier 0xC0)

The get log page command returns a data buffer containing the log page for the specified accelerator. The command returns a variable length buffer containing a list of status descriptors.

Accelerators are identified using numeric ID values. Accelerator ID is provided in the Log Specific Identifier field. Bits 15:0 of the Log Specific Identifier field contain Accelerator ID.

Accelerator logs are packed into Log Entry descriptors. The tables below contain the descriptors’ structure:

Table 1 Get Accelerator Log Page data structure

Bytes

Description

0:3

Log page length (in bytes)

4:15

reserved for future use

16:N

Accelerator log entry descriptor list

Table 2 Accelerator Log Entry descriptor

Bytes

Description

0:3

Descriptor length (in bytes)

4:11

Timestamp

12:15

Entry type ID

16:N

Accelerator-specific information (optional)

The data inside the optional information block can be used to provide more information for the log entry, e.g. error message string.

Entry types are identified with unique IDs. The exact ID list is to be defined. Below is an example list:

Table 3 Entry type IDs

ID

Descrption

0

Invalid firmware ID was selected

1

Invalid input buffer configuration

2

Invalid output buffer configuration

3

Accelerator specific error

Custom NVMe commands

Controlling accelerator-related features will require a set of custom commands on top of what NVMe provides. The NVMe standard supports defining vendor-specific commands that use a separate range of opcodes. The following sections lists custom (vendor-specific) commands extending the Admin and I/O sets.

Admin command set extension

Custom admin commands will be used to obtain information about the device, status of the accelerators and will enable basic accelerator control.

The DPTR field of an NVMe command frame will be used to specify buffer location for commands that transfer data to or from the device.

Accelerator Identify (0xC2)

The identify command returns a data buffer that describes information about the custom accelerators available in the device. The command may also be used to determine if the connected device is an NVMe accelerator device. The data structure has a variable length. The length is determined by reading the first 8 bytes (confirming that the first 4 bytes hold the magic value). Once the length is known, the whole buffer can be retrieved.

Accelerators are described using descriptors. The tables below depict data structures used by the Accelerator Identify command.

Table 4 Accelerators Idenfity descriptor structure

Bytes

Description

0:3

Magic value (“WDC0”)

4:7

Descriptor length (in bytes)

8:15

Reserved for future use

16:N

Accelerator descriptor list

Table 5 Accelerators descriptor list entry

Bytes

Descritpion

0:3

Accelerator descriptor length (in bytes)

4:5

Accelerator ID (unique within the device)

6:7

Reserved for future use

8:N

Accelerator capabilities list

Table 6 Accelerator capabilities list entry

Bytes

Description

0:3

Capability ID

4:15

Capability specific data

Capabilities are identified with unique IDs. The exact ID list is to be defined. Below is an example list:

Table 7 Capabilites IDs

ID

Descrption

0

Accelerator supports firmware exchange

1

Accelerator supports input buffer of size defined in bytes 8:15 of the capability descriptor

2

Accelerator supports output buffer of size defined in bytes 8:15 of the capability descriptor

Get Accelerator Status (0xC6)

The Get Accelerator Status command is used to retrieve information about the current status of the selected Accelerator available in the system. The command returns a variable length buffer containing a list of status descriptors.

Accelerators are identified using numeric ID values. Accelerator ID will be provided in the CDW12 field. Bits 15:0 of the CDW12 field contain Accelerator ID. Bit 31 of the CDW12 is the Retain Asynchronous Event (RAE) flag - when set to true, status information will not be modified until accessed with bit set as false. This mechanism allows the host to read the header of the status data buffer, determine the length of the whole transaction and finally read the whole buffer.

Accelerators statuses are packed into Status descriptors. Statuses are tied to accelerators capabilities, e.g. buffers can report how much data was processed. The tables below summarize the descriptors’ structure:

Table 8 Get Accelerator Status data structure

Bytes

Description

0:3

Descriptor length

4:7

Status ID

8:31

reserved for future use

32:N

Accelerators status descriptors list

Table 9 Accelerator status descriptor

Bytes

Description

0:3

Capability ID

4:31

Status specific data

Global Accelerator Control (0xC0)

This Global Accelerator Control command is used to enable/disable accelerator subsystem.

CDW12 field contains operation ID.

Operation identifier will take one of the specified values:

  • 0x00 - Enable accelerator subsystem

  • 0x01 - Disable accelerator subsystem

I/O commands

I/O commands are used to transfer data to and from the accelerators.

Bits 15:0 of the CDW12 field contain Accelerator ID.

Send data to accelerator (0x81)

This command is used to fill accelerator input buffer with data from host memory.

CDW10 field contains the number of dwords to transfer.

Read data from accelerator (0x82)

This command is used to copy the accelerator output buffer to host memory.

CDW10 field contains the number of dwords to transfer.

Send Firmware to accelerator (0x85)

This command is used to upload firmware from the host memory to the selected accelerator firmware buffer.

CDW10 field contains the number of dwords to transfer. CDW13 field contains Firmware ID.

Read Firmware from accelerator (0x86)

This command is used to download firmware with selected ID from accelerator firmware buffer to host.

CDW10 field contains number of dwords to transfer. CDW13 field contains Firmware ID.

Use local storage as accelerator input (0x88)

This command is used to fill accelerator input buffer with data from local storage.

This command reuse CDW10 to CDW13 field layout from standard Read command relocated to CDW12 to CDW15. CDW14 and CDW15 from original Read command will not be used.

Use local storage as accelerator output (0x8c)

This command is used to copy accelerator output buffer to local storage.

This command reuse CDW10 to CDW13 field layout from standard Write command relocated to CDW12 to CDW15. CDW14 and CDW15 from original Write command will not be used.

Basic Accelerator Control (0x91)

This Basic Accelerator Control command is used to control the selected accelerator.

The CDW13 field will hold the operation identifier.

Operation identifier will take one of the specified values:

  • 0x00 - Reset accelerator - revert the accelerator to a blank stopped state.

  • 0x01 - Start accelerator - verify the accelerator configuration (i.e. data buffers, eBPF app) and start processing.

  • 0x02 - Stop accelerator - stop processing without modifying the configuration.

  • 0x03 - Set active firmware - select the firmware which will be

More operations are to be defined.

The result of a certain operation can be retrieved with the Get Accelerator Status command.

Start accelerator operation uses additional fields.

DPTR field contains location of Argument List in host memory. CDW10 field contains Argument List length in dwords. CDW14 field contains Firmware ID.

Argument List will contain 0 or more concatenated entries. The table below depicts Argument List entry.

Table 10 Argument List entry

Bytes

Description

0:3

Argument entry length in bytes

4:N

Argument entry value

Example command flow

An example command flow that utilizes NVMe command set extensions is shown below.

  1. Get basic information about the system:

    1. Send the Accelerator Identify command with the length set to 8 bytes.

      1. Verify that the first 4 bytes of the response match the magic value.

      2. Use the remaining 4 bytes as the Identify structure length.

    2. Send the Accelerator Identify command again using the length obtained earlier.

      1. Process the list of accelerators.

      2. Process the list of capabilities for each accelerator.

  2. Enable accelerator subsystem

    1. Send the Global Accelerator Control command with the operation ID set to Enable accelerator subsystem.

  3. Load firmware to the accelerator (applicable only to accelerators supporting firmware reloading)

    1. Check accelerator capabilities to verify that firmware loading is supported. The host software should cache the capabilities, so that it does not have to read it each time.

    2. Send the Send Firmware to accelerator command.

      1. Use the accelerator ID field to select which accelerator will receive the firmware.

      2. Set the firmware ID to select firmware slot.

  4. Send data to accelerator input buffer

    1. Check accelerator capabilities to get input buffer size and use that as the upper limit on input data size. The host software should cache the capabilities so that it does not have to read it each time.

    2. Use the accelerator ID field to select which accelerator will receive the data.

    3. Send the data using either

      • Send data to accelerator, or

      • Use local storage as accelerator input

  5. Start processing

    1. Send the Basic Accelerator Control command.

      1. Use the accelerator id to select which accelerator should start.

      2. Set operation ID to Start accelerator.

      3. Set the Firmware ID field to select firmware slot that should be used by the accelerator.

      4. Pack all the firmware arguments into the Argument List field

      5. Set correct list length and use DPTR to point to list location in memory.

  6. Monitor accelerator status

    1. Send the Get Accelerator Status command with the length set to 4 bytes and RAE=1.

      1. Use the Accelerator ID field to select the target accelerator.

      2. Use the returned value as the status structure length.

    2. Send the Get Accelerator Status command again using the retrieved length and RAE=0.

      1. Check the Status ID field to see if the accelerator is still processing, is stopped or if an error has occurred.

      2. Use status descriptors to check capability-specific status information, e.g. the amount of output data produced for output data buffer capability.

  7. Retrieve accelerator logs

    1. Send the Get Log Page command with Log Page Identifier = 0xC0, RAE=1 and the length set to 4 bytes.

      1. Use the Log Specific Identifier field to provide target accelerator ID.

      2. Use the returned value as the log page length.

    2. Send the Get Log Page command again using the retrieved length and RAE=0.

      1. Process the log entry list.

  8. Retrieve data from the accelerator output buffer

    1. Use the output data length obtained from Get Accelerator Status to see how much output data was produced.

    2. Transfer the data using either:

      • Read the data from accelerator, or

      • Use the local storage as accelerator output