v-extensions-riscv¶

Commands used¶

Note

This section was generated using:

kenning optimize test \
    --json-cfg \
        kenning-scenarios/renode-magic-wand-iree-bare-metal-inference.json \
    --measurements \
        ./results.json \
    --verbosity \
        INFO

kenning report \
    --report-types \
        performance \
        classification \
        renode_stats \
    --report-path \
        springbok-magic-wand/report.md \
    --report-name \
        v-extensions-riscv \
    --model-names \
        magic_wand_fp32 \
    --measurements \
        results.json \
    --verbosity \
        INFO \
    --to-html \
        report-html

General information for magic_wand_fp32¶

Model framework:

tensorflow ver. 2.14.1

Input JSON:

{
    "dataset": {
        "type": "kenning.datasets.magic_wand_dataset.MagicWandDataset",
        "parameters": {
            "window_size": 128,
            "window_shift": 128,
            "noise_level": 20,
            "dataset_root": "build/MagicWandDataset",
            "inference_batch_size": 1,
            "download_dataset": true,
            "force_download_dataset": false,
            "external_calibration_dataset": null,
            "split_fraction_test": 0.2,
            "split_fraction_val": null,
            "split_seed": 1234,
            "reduce_dataset": 1.0
        }
    },
    "dataconverter": {
        "type": "kenning.dataconverters.modelwrapper_dataconverter.ModelWrapperDataConverter",
        "parameters": {}
    },
    "optimizers": [
        {
            "type": "kenning.optimizers.iree.IREECompiler",
            "parameters": {
                "model_framework": "keras",
                "backend": "llvm-cpu",
                "compiler_args": [
                    "iree-llvm-debug-symbols=false",
                    "iree-vm-bytecode-module-strip-source-map=true",
                    "iree-vm-emit-polyglot-zip=false",
                    "iree-llvm-target-triple=riscv32-pc-linux-elf",
                    "iree-llvm-target-cpu=generic-rv32",
                    "iree-llvm-target-cpu-features=+m,+f,+zvl512b,+zve32x,+zve32f",
                    "iree-llvm-target-abi=ilp32"
                ],
                "compiled_model_path": "./build/tflite-magic-wand.vmfb",
                "location": "host"
            }
        }
    ],
    "platform": {
        "type": "kenning.platforms.bare_metal.BareMetalPlatform",
        "parameters": {
            "uart_port": "/tmp/uart",
            "uart_baudrate": 115200,
            "uart_log_port": "/tmp/renode_uart_pbasyoaz/uart_log",
            "uart_log_baudrate": 115200,
            "auto_flash": false,
            "openocd_path": "openocd",
            "simulated": true,
            "runtime_binary_path": "build/build-riscv/iree-runtime/iree_runtime",
            "platform_resc_path": "gh://antmicro:kenning-bare-metal-iree-runtime/sim/config/springbok.resc;branch=main",
            "resc_dependencies": [
                "sim/config/platforms/springbok.repl",
                "third-party/iree-rv32-springbok/sim/config/infrastructure/SpringbokRiscV32.cs"
            ],
            "post_start_commands": [
                "sysbus.vec_controlblock WriteDoubleWord 0xc 0"
            ],
            "disable_opcode_counters": false,
            "disable_profiler": false,
            "profiler_dump_path": "build/profiler.dump",
            "profiler_interval_step": 10.0,
            "runtime_init_log_msg": "Inference server started",
            "runtime_init_timeout": 30,
            "name": "rv32-springbok"
        }
    },
    "protocol": {
        "type": "kenning.protocols.uart.UARTProtocol",
        "parameters": {
            "port": "/tmp/uart",
            "baudrate": 115200,
            "packet_size": 4096,
            "endianness": "little",
            "timeout": 30
        }
    },
    "model_wrapper": {
        "type": "kenning.modelwrappers.classification.tflite_magic_wand.MagicWandModelWrapper",
        "parameters": {
            "window_size": 128,
            "model_path": "kenning:///models/classification/magic_wand.h5",
            "model_name": null
        }
    },
    "runtime": {
        "type": "kenning.runtimes.iree.IREERuntime",
        "parameters": {
            "save_model_path": "./build/tflite-magic-wand.vmfb",
            "driver": "local-sync",
            "llext_binary_path": null,
            "disable_performance_measurements": false
        }
    }
}

Inference performance metrics for magic_wand_fp32¶

Inference time¶

Bokeh Application

Figure 1 Inference time¶

First inference duration (usually including allocation time): 0.001847460000000023,
Mean: 0.0018692477142857266 s,
Standard deviation: 1.848366028774941e-05 s,
Median: 0.001874684999999987 s.

Inference quality metrics for magic_wand_fp32¶

Bokeh Plot

Figure 2 Confusion matrix¶

Accuracy: 1.0
Mean precision: 0.9999999996266236
Mean sensitivity: 0.9999999996266236
G-mean: 0.9999999996266236

Renode performance measurements for magic_wand_fp32¶

Count of instructions used during inference¶

Bokeh Plot

Figure 3 Histogram of used instructions during inference¶

Bokeh Plot

Figure 4 Histogram of V Vector Extension instructions during inference¶

Executed instructions counters¶

Bokeh Application

Figure 5 Count of executed instructions per second for cpu during benchmark¶

Bokeh Plot

Figure 6 Cumulative count of executed instructions for cpu during benchmark¶

Peripheral access counters¶

Bokeh Application

Figure 7 Count of uart0 reads per second during benchmark¶

Bokeh Plot

Figure 8 Cumulative count of uart0 reads during benchmark¶

Bokeh Application

Figure 9 Count of uart0 writes per second during benchmark¶

Bokeh Plot

Figure 10 Cumulative count of uart0 writes during benchmark¶

Instructions stats¶

Instructions counters per inference pass: 242006
V Vector Extension instructions percentage: 13.529403187953761 %
Top 10 instructions and counters per inference pass:
- addi: 44757
- sw: 28597
- lw: 28135
- vle32.v: 13136
- bne: 13059
- flw: 12278
- bltu: 10609
- vfmadd.vf: 9552
- beq: 9339
- lbu: 9121

Memory allocation stats¶

Host bytes peak: 1536
Host bytes allocated: 215040
Host bytes freed: 213504
Device bytes peak: 33536
Device bytes allocated: 2329536
Device bytes freed: 2312752
Compiled model size: 26944

Host memory refers to memory of the CPU controlling the accelerator, while device memory is the memory of the accelerator.

Last update: 2025-04-15