Using Kenning via command line arguments

Kenning provides several scripts for training, compilation and benchmarking of deep learning models on various target hardware. The executable scripts are present in the kenning.scenarios module. Sample bash scripts using the scenarios are located in the scripts directory in the repository.

Runnable scripts in scenarios require implemented classes to be provided from the kenning.core module to perform such actions as in-framework inference, model training, model compilation and model benchmarking on target.

Command-line arguments for classes

Each class (Dataset, ModelWrapper, Optimizer and other) provided to the runnable scripts in scenarios can provide command-line arguments that configure the work of an object of the given class.

Each class in kenning.core implements form_argparse and from_argparse methods. The former creates an argparse group for a given class with its parameters. The latter takes the arguments parsed by argparse and returns the object of a class.

Model training

kenning.scenarios.model_training performs model training using Kenning’s ModelWrapper and Dataset objects. To get the list of training parameters, select the model and training dataset to use (i.e. TensorFlowPetDatasetMobileNetV2 model and PetDataset dataset) and run:

python -m kenning.scenarios.model_training \
    kenning.modelwrappers.classification.tensorflow_pet_dataset.TensorFlowPetDatasetMobileNetV2 \
    kenning.datasets.pet_dataset.PetDataset \
    -h

This will list the possible parameters that can be used to configure the dataset, the model, and the training parameters. For the above call, the output is as follows:

positional arguments:
  modelwrappercls       ModelWrapper-based class with inference implementation to import
  datasetcls            Dataset-based class with dataset to import

optional arguments:
  -h, --help            show this help message and exit
  --batch-size BATCH_SIZE
                        The batch size for training
  --learning-rate LEARNING_RATE
                        The learning rate for training
  --num-epochs NUM_EPOCHS
                        Number of epochs to train for
  --logdir LOGDIR       Path to the training logs directory
  --verbosity {DEBUG,INFO,WARNING,ERROR,CRITICAL}
                        Verbosity level

Inference model arguments:
  --model-path MODEL_PATH
                        Path to the model

Dataset arguments:
  --dataset-root DATASET_ROOT
                        Path to the dataset directory
  --download-dataset    Downloads the dataset before taking any action
  --inference-batch-size INFERENCE_BATCH_SIZE
                        The batch size for providing the input data
  --classify-by {species,breeds}
                        Determines if classification should be performed by species or by breeds
  --image-memory-layout {NHWC,NCHW}
                        Determines if images should be delivered in NHWC or NCHW format

Note

The list of options depends on ModelWrapper and Dataset.

At the end, the training can be configured as follows:

python -m kenning.scenarios.model_training \
    kenning.modelwrappers.classification.tensorflow_pet_dataset.TensorFlowPetDatasetMobileNetV2 \
    kenning.datasets.pet_dataset.PetDataset \
    --logdir build/logs \
    --dataset-root build/pet-dataset \
    --model-path build/trained-model.h5 \
    --batch-size 32 \
    --learning-rate 0.0001 \
    --num-epochs 50

This will train the model with a 0.0001 learning rate and batch size 32 for 50 epochs. The trained model will be saved as build/trained-model.h5.

In-framework inference performance measurements

The kenning.scenarios.inference_performance script runs inference on a given model in a framework it was trained on. It requires you to provide:

  • a ModelWrapper-based object wrapping the model to be tested,

  • a Dataset-based object wrapping the dataset applicable to the model,

  • a path to the output JSON file with performance and quality metrics gathered during inference by the Measurements object.

The example call for the method is as follows:

python -m kenning.scenarios.inference_performance \
    kenning.modelwrappers.classification.tensorflow_pet_dataset.TensorFlowPetDatasetMobileNetV2 \
    kenning.datasets.pet_dataset.PetDataset \
    build/tensorflow_pet_dataset_mobilenetv2.json \
    --model-path kenning/resources/models/classification/tensorflow_pet_dataset_mobilenetv2.h5 \
    --dataset-root build/pet-dataset/ \
    --download-dataset

The script downloads the dataset to the build/pet-dataset directory, loads the tensorflow_pet_dataset_mobilenetv2.h5 model, runs inference on all images from the dataset and collects performance and quality metrics throughout the run. The performance data stored in the JSON file can be later rendered using Generating performance reports.

ONNX conversion

kenning.scenarios.onnx_conversion empirically tests the ONNX conversion for various frameworks and generates a report containing a support matrix. The matrix tells us if model export to ONNX and model import from ONNX for a given framework and model are supported or not. The example report with the command call is available in ONNX support in deep learning frameworks.

kenning.scenarios.onnx_conversion requires a list of ONNXConversion classes that implement model providers and a conversion method. For the below, call:

python -m kenning.scenarios.onnx_conversion \
    build/models-directory \
    build/onnx-support.rst \
    --converters-list \
        kenning.onnxconverters.pytorch.PyTorchONNXConversion \
        kenning.onnxconverters.tensorflow.TensorFlowONNXConversion \
        kenning.onnxconverters.mxnet.MXNetONNXConversion

The conversion is tested for three frameworks - PyTorch, TensorFlow and MXNet. The successfully converted ONNX models are stored in the build/models-directory. The final RST file with the report is stored in the build/onnx-support.rst directory.

Testing inference on target hardware

The kenning.scenarios.inference_tester and kenning.scenarios.inference_server are used for inference testing on target hardware. The inference_tester loads the dataset and the model, compiles the model and runs inference either locally or remotely using inference_server.

The inference_server receives the model and input data, and sends output data and statistics.

Both inference_tester and inference_server require Runtime to determine the model execution flow. Both scripts communicate using the communication protocol implemented in the RuntimeProtocol.

At the end, the inference_tester returns the benchmark data in the form of a JSON file extracted from the Measurements object.

The kenning.scenarios.inference_tester requires:

  • a ModelWrapper-based class that implements model loading, I/O processing and optionally model conversion to ONNX format,

  • a Runtime-based class that implements data processing and the inference method for the compiled model on the target hardware,

  • a Dataset-based class that implements data sample fetching and model evaluation,

  • a path to the output JSON file with performance and quality metrics.

An Optimizer-based class can be provided to compile the model for a given target if needed.

Optionally, it requires a RuntimeProtocol-based class when running remotely to communicate with the kenning.scenarios.inference_server.

To print the list of required arguments, run:

python3 -m kenning.scenarios.inference_tester \
    kenning.modelwrappers.classification.tensorflow_pet_dataset.TensorFlowPetDatasetMobileNetV2 \
    kenning.runtimes.tvm.TVMRuntime \
    kenning.datasets.pet_dataset.PetDataset \
    --modelcompiler-cls kenning.compilers.tvm.TVMCompiler \
    --protocol-cls kenning.runtimeprotocols.network.NetworkProtocol \
    -h

With the above classes, the help can look as follows:

positional arguments:
  modelwrappercls       ModelWrapper-based class with inference implementation to import
  runtimecls            Runtime-based class with the implementation of model runtime
  datasetcls            Dataset-based class with dataset to import
  output                The path to the output JSON file with measurements

optional arguments:
  -h, --help            show this help message and exit
  --modelcompiler-cls MODELCOMPILER_CLS
                        Optimizer-based class with compiling routines to import
  --protocol-cls PROTOCOL_CLS
                        RuntimeProtocol-based class with the implementation of communication between inference tester and inference
                        runner
  --convert-to-onnx CONVERT_TO_ONNX
                        Before compiling the model, convert it to ONNX and use in compilation (provide a path to save here)
  --verbosity {DEBUG,INFO,WARNING,ERROR,CRITICAL}
                        Verbosity level

Inference model arguments:
  --model-path MODEL_PATH
                        Path to the model

Compiler arguments:
  --compiled-model-path COMPILED_MODEL_PATH
                        The path to the compiled model output
  --model-framework {onnx,keras,darknet}
                        The input type of the model, framework-wise
  --target TARGET       The kind or tag of the target device
  --target-host TARGET_HOST
                        The kind or tag of the host (CPU) target device
  --opt-level OPT_LEVEL
                        The optimization level of the compilation
  --libdarknet-path LIBDARKNET_PATH
                        Path to the libdarknet.so library, for darknet models

Runtime arguments:
  --save-model-path SAVE_MODEL_PATH
                        Path where the model will be uploaded
  --target-device-context {llvm,stackvm,cpu,c,cuda,nvptx,cl,opencl,aocl,aocl_sw_emu,sdaccel,vulkan,metal,vpi,rocm,ext_dev,hexagon,webgpu}
                        What accelerator should be used on target device
  --target-device-context-id TARGET_DEVICE_CONTEXT_ID
                        ID of the device to run the inference on
  --input-dtype INPUT_DTYPE
                        Type of input tensor elements

Dataset arguments:
  --dataset-root DATASET_ROOT
                        Path to the dataset directory
  --download-dataset    Downloads the dataset before taking any action
  --inference-batch-size INFERENCE_BATCH_SIZE
                        The batch size for providing the input data
  --classify-by {species,breeds}
                        Determines if classification should be performed by species or by breeds
  --image-memory-layout {NHWC,NCHW}
                        Determines if images should be delivered in NHWC or NCHW format

Runtime protocol arguments:
  --host HOST           The address to the target device
  --port PORT           The port for the target device
  --packet-size PACKET_SIZE
                        The maximum size of the received packets, in bytes.
  --endianness {big,little}
                        The endianness of data to transfer

The kenning.scenarios.inference_server requires only:

  • a RuntimeProtocol-based class for the implementation of the communication,

  • a Runtime-based class for the implementation of runtime routines on device.

Both classes may require some additional arguments that can be listed with the -h flag.

An example script for the inference_tester is:

python -m kenning.scenarios.inference_tester \
    kenning.modelwrappers.classification.tensorflow_pet_dataset.TensorFlowPetDatasetMobileNetV2 \
    kenning.runtimes.tflite.TFLiteRuntime \
    kenning.datasets.pet_dataset.PetDataset \
    ./build/google-coral-devboard-tflite-tensorflow.json \
    --modelcompiler-cls kenning.compilers.tflite.TFLiteCompiler \
    --protocol-cls kenning.runtimeprotocols.network.NetworkProtocol \
    --model-path ./kenning/resources/models/classification/tensorflow_pet_dataset_mobilenetv2.h5 \
    --model-framework keras \
    --target "edgetpu" \
    --compiled-model-path build/compiled-model.tflite \
    --inference-input-type int8 \
    --inference-output-type int8 \
    --host 192.168.188.35 \
    --port 12345 \
    --packet-size 32768 \
    --save-model-path /home/mendel/compiled-model.tflite \
    --dataset-root build/pet-dataset \
    --inference-batch-size 1 \
    --verbosity INFO

The above runs with the following inference_server setup:

python -m kenning.scenarios.inference_server \
    kenning.runtimeprotocols.network.NetworkProtocol \
    kenning.runtimes.tflite.TFLiteRuntime \
    --host 0.0.0.0 \
    --port 12345 \
    --packet-size 32768 \
    --save-model-path /home/mendel/compiled-model.tflite \
    --delegates-list libedgetpu.so.1 \
    --verbosity INFO

Note

This run was tested on a Google Coral Devboard device.

kenning.scenarios.inference_tester can be also executed locally - in this case, the --protocol-cls argument can be skipped. The example call is as follows:

python3 -m kenning.scenarios.inference_tester \
    kenning.modelwrappers.classification.tensorflow_pet_dataset.TensorFlowPetDatasetMobileNetV2 \
    kenning.runtimes.tvm.TVMRuntime \
    kenning.datasets.pet_dataset.PetDataset \
    ./build/local-cpu-tvm-tensorflow-classification.json \
    --modelcompiler-cls kenning.compilers.tvm.TVMCompiler \
    --model-path ./kenning/resources/models/classification/tensorflow_pet_dataset_mobilenetv2.h5 \
    --model-framework keras \
    --target "llvm" \
    --compiled-model-path ./build/compiled-model.tar \
    --opt-level 3 \
    --save-model-path ./build/compiled-model.tar \
    --target-device-context cpu \
    --dataset-root ./build/pet-dataset/ \
    --inference-batch-size 1 \
    --download-dataset \
    --verbosity INFO

Note

For more examples of running inference_tester and inference_server, check the kenning/scripts directory. Directories with scripts for client and server calls for various target devices, deep learning frameworks and compilation frameworks can be found in the kenning/scripts/edge-runtimes directory.

Running inference

kenning.scenarios.inference_runner is used to run inference locally on a pre-compiled model.

kenning.scenarios.inference_runner requires:

  • a ModelWrapper-based class that performs I/O processing specific to the model,

  • a Runtime-based class that runs inference on target using the compiled model,

  • a DataProvider-based class that implements fetching of data samples from various sources,

  • a list of OutputCollector-based classes that implement output processing for the specific use case.

To print the list of required arguments, run:

python3 -m kenning.scenarios.inference_runner \
    kenning.modelwrappers.detectors.darknet_coco.TVMDarknetCOCOYOLOV3 \
    kenning.runtimes.tvm.TVMRuntime \
    kenning.dataproviders.camera_dataprovider.CameraDataProvider \
     --output-collectors kenning.outputcollectors.name_printer.NamePrinter \
    -h

With the above classes, the help can look as follows:

positional arguments:
  modelwrappercls       ModelWrapper-based class with inference implementation to import
  runtimecls            Runtime-based class with the implementation of model runtime
  dataprovidercls       DataProvider-based class used for providing data
optional arguments:
  -h, --help            show this help message and exit
  --output-collectors OUTPUT_COLLECTORS [OUTPUT_COLLECTORS ...]
                        List to the OutputCollector-based classes where the results will be passed
  --verbosity {DEBUG,INFO,WARNING,ERROR,CRITICAL}
                        Verbosity level
Inference model arguments:
  --model-path MODEL_PATH
                        Path to the model
  --classes CLASSES     File containing Open Images class IDs and class names in CSV format to use (can be generated using
                        kenning.scenarios.open_images_classes_extractor) or class type
Runtime arguments:
  --disable-performance-measurements
                        Disable collection and processing of performance metrics
  --save-model-path SAVE_MODEL_PATH
                        Path where the model will be uploaded
  --target-device-context {llvm,stackvm,cpu,c,cuda,nvptx,cl,opencl,aocl,aocl_sw_emu,sdaccel,vulkan,metal,vpi,rocm,ext_dev,hexagon,webgpu}
                        What accelerator should be used on target device
  --target-device-context-id TARGET_DEVICE_CONTEXT_ID
                        ID of the device to run the inference on
  --input-dtype INPUT_DTYPE
                        Type of input tensor elements
  --runtime-use-vm      At runtime use the TVM Relay VirtualMachine
  --use-json-at-output  Encode outputs of models into a JSON file with base64-encoded arrays
DataProvider arguments:
  --video-file-path VIDEO_FILE_PATH
                        Video file path (for cameras, use /dev/videoX where X is the device ID eg. /dev/video0)
  --image-memory-layout {NHWC,NCHW}
                        Determines if images should be delivered in NHWC or NCHW format
  --image-width IMAGE_WIDTH
                        Determines the width of the image for the model
  --image-height IMAGE_HEIGHT
                        Determines the height of the image for the model
OutputCollector arguments:
  --print-type {detector,classificator}
                        What is the type of model that will input data to the NamePrinter

An example script for inference_runner:

python3 -m kenning.scenarios.inference_runner \
    kenning.modelwrappers.detectors.darknet_coco.TVMDarknetCOCOYOLOV3 \
    kenning.runtimes.tvm.TVMRuntime \
    kenning.dataproviders.camera_dataprovider.CameraDataProvider \
    --output-collectors kenning.outputcollectors.detection_visualizer.DetectionVisualizer kenning.outputcollectors.name_printer.NamePrinter \
    --disable-performance-measurements \
    --model-path ./kenning/resources/models/detection/yolov3.weights \
    --save-model-path ../compiled-model.tar \
    --target-device-context "cuda" \
    --verbosity INFO \
    --video-file-path /dev/video0

Generating performance reports

kenning.scenarios.inference_performance and kenning.scenarios.inference_tester return a JSON file as the result of benchmarks. They contain both performance metrics data, and quality metrics data.

The data from JSON files can be analyzed, processed and visualized by the kenning.scenarios.render_report script. This script parses the information in JSON files and returns an RST file with the report, along with visualizations.

It requires:

  • a JSON file with benchmark data,

  • a report name for use in the RST file and for creating Sphinx refs to figures,

  • an RST output file name,

  • --root-dir specifying the root directory of the Sphinx documentation where the RST file will be embedded (it is used to compute relative paths),

  • --img-dir specifying the path where the figures should be saved,

  • --report-types, which is a list describing the types the report falls into.

An example call and the resulting RST file can be observed in Sample autogenerated report.

As for now, the available report types are:

  • performance - this is the most common report type that renders information about overall inference performance metrics, such as inference time, CPU usage, RAM usage, or GPU utilization,

  • classification - this report is specific to the classification task, it renders classification-specific quality figures and metrics, such as confusion matrices, accuracy, precision, G-mean,

  • detection - this report is specific to the detection task, it renders detection-specific quality figures and metrics, such as recall-precision curves or mean average precision.


Last update: 2023-05-30