v-extensions-riscv

Commands used

Note

This section was generated using:

kenning optimize test \
    --json-cfg \
        kenning-scenarios/renode-magic-wand-iree-bare-metal-inference.json \
    --measurements \
        ./results.json \
    --verbosity \
        INFO

kenning report \
    --report-types \
        performance \
        classification \
        renode_stats \
    --report-path \
        springbok-magic-wand/report.md \
    --report-name \
        v-extensions-riscv \
    --model-names \
        magic_wand_fp32 \
    --measurements \
        results.json \
    --verbosity \
        INFO \
    --to-html \
        report-html

General information for magic_wand_fp32

Model framework:

  • tensorflow ver. 2.14.1

Input JSON:

{
    "dataset": {
        "type": "kenning.datasets.magic_wand_dataset.MagicWandDataset",
        "parameters": {
            "window_size": 128,
            "window_shift": 128,
            "noise_level": 20,
            "dataset_root": "build/MagicWandDataset",
            "inference_batch_size": 1,
            "download_dataset": true,
            "force_download_dataset": false,
            "external_calibration_dataset": null,
            "split_fraction_test": 0.2,
            "split_fraction_val": null,
            "split_seed": 1234,
            "reduce_dataset": 1.0
        }
    },
    "dataconverter": {
        "type": "kenning.dataconverters.modelwrapper_dataconverter.ModelWrapperDataConverter",
        "parameters": {}
    },
    "optimizers": [
        {
            "type": "kenning.optimizers.iree.IREECompiler",
            "parameters": {
                "model_framework": "keras",
                "backend": "llvm-cpu",
                "compiler_args": [
                    "iree-llvm-debug-symbols=false",
                    "iree-vm-bytecode-module-strip-source-map=true",
                    "iree-vm-emit-polyglot-zip=false",
                    "iree-llvm-target-triple=riscv32-pc-linux-elf",
                    "iree-llvm-target-cpu=generic-rv32",
                    "iree-llvm-target-cpu-features=+m,+f,+zvl512b,+zve32x,+zve32f",
                    "iree-llvm-target-abi=ilp32"
                ],
                "compiled_model_path": "./build/tflite-magic-wand.vmfb",
                "location": "host"
            }
        }
    ],
    "platform": {
        "type": "kenning.platforms.bare_metal.BareMetalPlatform",
        "parameters": {
            "uart_port": "/tmp/uart",
            "uart_baudrate": 115200,
            "uart_log_port": "/tmp/renode_uart_vhb12i1s/uart_log",
            "uart_log_baudrate": 115200,
            "auto_flash": false,
            "openocd_path": "openocd",
            "simulated": true,
            "runtime_binary_path": "build/build-riscv/iree-runtime/iree_runtime",
            "platform_resc_path": "gh://antmicro:kenning-bare-metal-iree-runtime/sim/config/springbok.resc;branch=main",
            "resc_dependencies": [
                "sim/config/platforms/springbok.repl",
                "third-party/iree-rv32-springbok/sim/config/infrastructure/SpringbokRiscV32.cs"
            ],
            "post_start_commands": [
                "sysbus.vec_controlblock WriteDoubleWord 0xc 0"
            ],
            "disable_opcode_counters": false,
            "disable_profiler": false,
            "profiler_dump_path": "build/profiler.dump",
            "profiler_interval_step": 10.0,
            "runtime_init_log_msg": "Inference server started",
            "runtime_init_timeout": 30,
            "name": "rv32-springbok"
        }
    },
    "protocol": {
        "type": "kenning.protocols.uart.UARTProtocol",
        "parameters": {
            "port": "/tmp/uart",
            "baudrate": 115200,
            "packet_size": 4096,
            "endianness": "little",
            "timeout": 30
        }
    },
    "model_wrapper": {
        "type": "kenning.modelwrappers.classification.tflite_magic_wand.MagicWandModelWrapper",
        "parameters": {
            "window_size": 128,
            "model_path": "kenning:///models/classification/magic_wand.h5",
            "model_name": null
        }
    },
    "runtime": {
        "type": "kenning.runtimes.iree.IREERuntime",
        "parameters": {
            "save_model_path": "./build/tflite-magic-wand.vmfb",
            "driver": "local-sync",
            "llext_binary_path": null,
            "disable_performance_measurements": false
        }
    }
}

Inference performance metrics for magic_wand_fp32

Inference time

Bokeh Application

Figure 1 Inference time

  • First inference duration (usually including allocation time): 0.001866450000000075,

  • Mean: 0.001864803642857146 s,

  • Standard deviation: 1.710471485588794e-05 s,

  • Median: 0.0018686549999999968 s.

Inference quality metrics for magic_wand_fp32

Bokeh Plot

Figure 2 Confusion matrix

  • Accuracy: 0.9928571428571429

  • Mean precision: 0.9947916662942757

  • Mean sensitivity: 0.9945652170191094

  • G-mean: 0.9945203413480738

Renode performance measurements for magic_wand_fp32

Count of instructions used during inference

Bokeh Plot

Figure 3 Histogram of used instructions during inference

Bokeh Plot

Figure 4 Histogram of V Vector Extension instructions during inference

Executed instructions counters

Bokeh Application

Figure 5 Count of executed instructions per second for cpu during benchmark

Bokeh Plot

Figure 6 Cumulative count of executed instructions for cpu during benchmark

Peripheral access counters

Bokeh Application

Figure 7 Count of uart0 reads per second during benchmark

Bokeh Plot

Figure 8 Cumulative count of uart0 reads during benchmark

Bokeh Application

Figure 9 Count of uart0 writes per second during benchmark

Bokeh Plot

Figure 10 Cumulative count of uart0 writes during benchmark

Instructions stats

  • Instructions counters per inference pass: 241480

  • V Vector Extension instructions percentage: 13.558857687523679 %

  • Top 10 instructions and counters per inference pass:

    • addi: 44757

    • sw: 28597

    • lw: 28077

    • vle32.v: 13136

    • bne: 13000

    • flw: 12278

    • bltu: 10492

    • vfmadd.vf: 9552

    • beq: 9281

    • lbu: 9062

Memory allocation stats

  • Host bytes peak: 1536

  • Host bytes allocated: 215040

  • Host bytes freed: 213504

  • Device bytes peak: 33536

  • Device bytes allocated: 2329536

  • Device bytes freed: 2312752

  • Compiled model size: 26944

Host memory refers to memory of the CPU controlling the accelerator, while device memory is the memory of the accelerator.


Last update: 2025-02-25