Developing Kenning blocks

This chapter describes the development process of Kenning components.

To run below examples it is required to install Kenning with dependencies as follows:

pip install "kenning[tensorflow] @ git+https://github.com/antmicro/kenning.git"

Model and I/O metadata

Since not all model formats supported by Kenning provide information about inputs and outputs or other data required for compilation or runtime purposes, each model processed by Kenning comes with a JSON file describing an I/O specification (and other useful metadata).

The JSON file with metadata for file <model-name>.<ext> is saved as <model-name>.<ext>.json in the same directory.

The example metadata file looks as follows (for ResNet50 for the ImageNet classification problem):

{
    "input": [
        {
            "name": "input_0",
            "shape": [1, 224, 224, 3],
            "dtype": "int8",
            "order": 0,
            "scale": 1.0774157047271729,
            "zero_point": -13,
            "prequantized_dtype": "float32"
        }
    ],
    "output": [
        {
            "name": "output_0",
            "shape": [1, 1000],
            "dtype": "int8",
            "order": 0,
            "scale": 0.00390625,
            "zero_point": -128,
            "prequantized_dtype": "float32"
        }
    ]
}

A sample metadata JSON file may look as follows (for the YOLOv4 detection model):

{
    "input": [
        {
            "name": "input",
            "shape": [1, 3, 608, 608],
            "dtype": "float32"
        }
    ],
    "output": [
        {
            "name": "output",
            "shape": [1, 255, 76, 76],
            "dtype": "float32"
        },
        {
            "name": "output.3",
            "shape": [1, 255, 38, 38],
            "dtype": "float32"
        },
        {
            "name": "output.7",
            "shape": [1, 255, 19, 19],
            "dtype": "float32"
        }
    ],
    "processed_output": [
        {
            "name": "detection_output",
            "type": "List",
            "dtype": {
                "type": "List",
                "dtype": "kenning.datasets.helpers.detection_and_segmentation.DetectObject"
            }
        }
    ],
    "model_version": 4
}

In general, the metadata file consist of four fields:

  • input is a specification of input data received by model wrapper (before preprocessing),

  • processed_input is a specification of data passed to wrapped model (after preprocessing),

  • output is a specification of data returned by wrapped model (before postprocessing),

  • processed_output is a specification of output data returned by model wrapper (after postprocessing).

If processed_input or processed_output is not specified we assume that there is no processing and it is the same as input or output respectively. Each array consist of dictionaries describing model inputs and outputs.

Additionally, any non-conflicting miscellaneous keywords may be included into the metadata JSON, but those will not be validated by Kenning.

Parameters common to all fields (except for the miscellaneous):

  • name - input/output name,

  • shape - input/output tensor shape, if processed_input is present, input can have list of valid shapes,

  • dtype - input/output type,

Parameters specific to processed_input and output:

  • order - some of the runtimes/optimizers allow accessing inputs and outputs by id. This field describes the id of the current input/output,

  • scale - scale parameter for the quantization purposes. Present only if the input/output requires quantization/dequantization,

  • zero_point - zero point parameter for the quantization purposes. Present only if the input/output requires quantization/dequantization,

  • prequantized_dtype - input/output data type before quantization.

  • class_name - list of class names from the dataset. It is used by output collectors to present the data in a human-readable way.

Parameters specific to input:

  • mean - mean used to normalize inputs before model training,

  • std - standard deviation used to normalize inputs before model training.

Parameters specific to output and processed_output:

  • class_name - list of class names from the dataset. It is used by output collectors to present the data in a human-readable way.

Parameters specific to all fields which data is not compatible with np.ndarray:

  • type - can be either List, Dict or Any.

  • dtype - if type is List, it can specify type of elements, should be represented as whole path which can be imported (i.e. kenning.datasets.helpers.detection_and_segmentation.SegmObject for segmentation results). It can also receive dictionary with specification of list items.

  • fields - if type is Dict, it should be a dictionary with names as keys and values as dictionary matching input and output specification. Example from PyTorchCOCOMaskRCNN:

{
    "type": "Dict",
    "fields": {
        "boxes": {"shape": [-1, 4], "dtype": "float32"},
        "labels": {"shape": [-1,], "dtype": "int64"},
        "scores": {"shape": [-1,], "dtype": "float32"},
        "masks": {
            "shape": [-1, 1, 416, 416],
            "dtype": "float32",
        },
    },
}

The model metadata is used by all classes in Kenning in order to understand the format of the inputs and outputs.

Warning

The Optimizer objects can affect the format of inputs and outputs, e.g. quantize the network. It is crucial to update the I/O specification when a block modifies it.

Implementing a new Kenning component

Firstly, check Kenning API for available building blocks for Kenning and their documentation. Adding a new block to Kenning is a matter of creating a new class inheriting from one of kenning.core classes, providing configuration parameters, and implementing methods - at least the unimplemented ones.

For example purposes, let’s create a sample Optimizer-based class, which will convert an input model to the TensorFlow Lite format.

First, let’s create minimal code with an empty class:

from kenning.core.optimizer import Optimizer


class TensorFlowLiteCompiler(Optimizer):
    pass

Defining arguments for core classes

Kenning classes can be created in three ways:

To support all three methods, the newly implemented class requires creating a dictionary called arguments_structure that holds all configurable parameters of the class, along with their description, type and additional information.

This structure is used to create:

  • an argparse group, to configure class parameters from terminal level (via the ArgumentsHandler‘s form_argparse method). Later, a class can be created with the from_argparse method.

  • a JSON schema to configure the class from a JSON file (via the ArgumentsHandler‘s form_parameterschema method). Later, a class can be created with the from_parameterschema method.

arguments_structure is a dictionary in the following form:

arguments_structure = {
    'argument_name': {
        'description': 'Help for the argument',
        'type': str,
        'required': True
    }
}

The argument_name is a name used in:

  • the Python constructor,

  • an Argparse argument (in a form of --argument-name),

  • a JSON argument.

The fields describing the argument are as follows:

  • argparse_name - if there is a need for a different flag in argparse, it can be provided here,

  • description - description of the node, displayed for a parsing error for JSON, or in help in case of command-line access,

  • type - type of argument, i.e.:

    • Path from pathlib module,

    • ResourceURI - Path based supporting non-local resources, e.g. from GitHub (gh://), Kenning (kenning://) or HTTPS (https://),

    • str,

    • float,

    • int,

    • bool,

    • list or list[type] or list[type1 | type2 | ...],

  • default - default value for the argument (used for AutoML - the value should be in the defined ranges/enum),

  • required - boolean, tells if argument is required or not,

  • enum - a list of possible values for the argument (used for AutoML),

  • nullable - tells if argument can be empty (None),

  • AutoML - specify whether argument is used for AutoML flow, supported only in model wrappers,

  • item_range - tuple of lower and upper bound of numerical argument (type is either int or float) or elements of list argument (type is list) (used only for AutoML),

  • list_range - tuple of lower and upper bound of list argument length (used only for AutoML),

  • overridable - tells if argument set in JSON can be overridden with argparse (default: False).

Let’s add parameters to the example class:

from pathlib import Path
from kenning.core.optimizer import Optimizer
from kenning.core.dataset import Dataset


class TensorFlowLiteCompiler(Optimizer):
    arguments_structure = {
        'inferenceinputtype': {
            'argparse_name': '--inference-input-type',
            'description': 'Data type of the input layer',
            'default': 'float32',
            'enum': ['float32', 'int8', 'uint8']
        },
        'inferenceoutputtype': {
            'argparse_name': '--inference-output-type',
            'description': 'Data type of the output layer',
            'default': 'float32',
            'enum': ['float32', 'int8', 'uint8']
        },
        'quantize_model': {
            'argparse_name': '--quantize-model',
            'description': 'Tells if model should be quantized',
            'type': bool,
            'default': False
        },
        'dataset_percentage': {
            'description': 'Tells how much data from dataset (from 0.0 to '
                           '1.0) will be used for calibration dataset',
            'type': float,
            'default': 0.25
        }
    }

    def __init__(
            self,
            dataset: Dataset,
            compiled_model_path: Path,
            inferenceinputtype: str = 'float32',
            inferenceoutputtype: str = 'float32',
            dataset_percentage: float = 0.25,
            quantize_model: bool = False):
        self.inferenceinputtype = inferenceinputtype
        self.inferenceoutputtype = inferenceoutputtype
        self.dataset_percentage = dataset_percentage
        self.quantize_model = quantize_model
        super().__init__(dataset, compiled_model_path)

    @classmethod
    def from_argparse(cls, dataset, args):
        return cls(
            dataset,
            args.compiled_model_path,
            args.inference_input_type,
            args.inference_output_type,
            args.dataset_percentage,
            args.quantize_model
        )

In addition to defined arguments, there are also default Optimizer arguments - the Dataset object and path to save the model (compiled_model_path).

Also, a from_argparse object creator is implemented, since there are additional parameters (dataset) to handle. The from_parameterschema function is created automatically.

Additionally, if the new class does not inherit from any Kenning core classes, then it should inherit directly from ArgumentsHandler class which is responsible for creating argparse groups and JSON schema from arguments_structure.

The above implementation of arguments is common for all core classes.

Defining supported output and input types

The Kenning classes for consecutive steps are meant to work in a seamless manner, which means providing various ways to pass the model from one class to another.

Usually, each class can accept multiple model input formats and provides at least one output format that can be accepted in other classes (except for terminal compilers, such as Apache TVM, that compile models to a runtime library).

The list of supported output formats is represented in a class with an outputtypes list:

    outputtypes = [
        'tflite'
    ]

The supported input formats are delivered in a form of a dictionary, mapping the supported input type name to the function used to load a model:

    inputtypes = {
        'keras': kerasconversion,
        'tensorflow': tensorflowconversion
    }

Let’s update the code with supported types:

from kenning.core.optimizer import Optimizer
import tensorflow as tf
from pathlib import Path


def kerasconversion(modelpath: Path):
    model = tf.keras.models.load_model(modelpath)
    converter = tf.lite.TFLiteConverter.from_keras_model(model)
    return converter


def tensorflowconversion(modelpath: Path):
    converter = tf.lite.TFLiteConverter.from_saved_model(modelpath)
    return converter


class TensorFlowLiteCompiler(Optimizer):
    arguments_structure = {
        'inferenceinputtype': {
            'argparse_name': '--inference-input-type',
            'description': 'Data type of the input layer',
            'default': 'float32',
            'enum': ['float32', 'int8', 'uint8']
        },
        'inferenceoutputtype': {
            'argparse_name': '--inference-output-type',
            'description': 'Data type of the output layer',
            'default': 'float32',
            'enum': ['float32', 'int8', 'uint8']
        },
        'quantize_model': {
            'argparse_name': '--quantize-model',
            'description': 'Tells if model should be quantized',
            'type': bool,
            'default': False
        },
        'dataset_percentage': {
            'description': 'Tells how much data from dataset (from 0.0 to '
                           '1.0) will be used for calibration dataset',
            'type': float,
            'default': 0.25
        }
    }

    outputtypes = [
        'tflite'
    ]

    inputtypes = {
        'keras': kerasconversion,
        'tensorflow': tensorflowconversion
    }

    def __init__(
            self,
            dataset: Dataset,
            compiled_model_path: Path,
            inferenceinputtype: str = 'float32',
            inferenceoutputtype: str = 'float32',
            dataset_percentage: float = 0.25,
            quantize_model: bool = False):
        self.inferenceinputtype = inferenceinputtype
        self.inferenceoutputtype = inferenceoutputtype
        self.dataset_percentage = dataset_percentage
        self.quantize_model = quantize_model
        super().__init__(dataset, compiled_model_path)

    @classmethod
    def from_argparse(cls, dataset, args):
        return cls(
            dataset,
            args.compiled_model_path,
            args.inference_input_type,
            args.inference_output_type,
            args.dataset_percentage,
            args.quantize_model
        )

Implementing unimplemented methods

The remaining aspect of developing Kenning classes is implementing unimplemented methods. Unimplemented methods raise the NotImplementedError.

In case of the Optimizer class, it is the compile method and get_framework_and_version.

When implementing unimplemented methods, it is crucial to follow type hints both for inputs and outputs - more details can be found in the documentation for the method. Sticking to the type hints ensures compatibility between blocks, which is required for a seamless connection between compilation components.

Let’s finish the implementation of the Optimizer-based class:

from kenning.core.optimizer import Optimizer
from kenning.core.dataset import Dataset
import tensorflow as tf
from pathlib import Path
from typing import Optional, Dict, List


def kerasconversion(modelpath: Path):
    model = tf.keras.models.load_model(modelpath)
    converter = tf.lite.TFLiteConverter.from_keras_model(model)
    return converter


def tensorflowconversion(modelpath: Path):
    converter = tf.lite.TFLiteConverter.from_saved_model(modelpath)
    return converter


class TensorFlowLiteCompiler(Optimizer):
    arguments_structure = {
        'inferenceinputtype': {
            'argparse_name': '--inference-input-type',
            'description': 'Data type of the input layer',
            'default': 'float32',
            'enum': ['float32', 'int8', 'uint8']
        },
        'inferenceoutputtype': {
            'argparse_name': '--inference-output-type',
            'description': 'Data type of the output layer',
            'default': 'float32',
            'enum': ['float32', 'int8', 'uint8']
        },
        'quantize_model': {
            'argparse_name': '--quantize-model',
            'description': 'Tells if model should be quantized',
            'type': bool,
            'default': False
        },
        'dataset_percentage': {
            'description': 'Tells how much data from dataset (from 0.0 to '
                           '1.0) will be used for calibration dataset',
            'type': float,
            'default': 0.25
        }
    }

    outputtypes = [
        'tflite'
    ]

    inputtypes = {
        'keras': kerasconversion,
        'tensorflow': tensorflowconversion
    }

    def __init__(
            self,
            dataset: Dataset,
            compiled_model_path: Path,
            inferenceinputtype: str = 'float32',
            inferenceoutputtype: str = 'float32',
            dataset_percentage: float = 0.25,
            quantize_model: bool = False):
        self.inferenceinputtype = inferenceinputtype
        self.inferenceoutputtype = inferenceoutputtype
        self.dataset_percentage = dataset_percentage
        self.quantize_model = quantize_model
        super().__init__(dataset, compiled_model_path)

    @classmethod
    def from_argparse(cls, dataset, args):
        return cls(
            dataset,
            args.compiled_model_path,
            args.inference_input_type,
            args.inference_output_type,
            args.dataset_percentage,
            args.quantize_model
        )

    def compile(
            self,
            inputmodelpath: Path,
            io_spec: Optional[Dict[str, List[Dict]]] = None):

        # load I/O specification for the model
        if io_spec is None:
            io_spec = self.load_io_specification(inputmodelpath)

        # load the model using chosen input type
        converter = self.inputtypes[self.inputtype](inputmodelpath)

        # preparing model compilation using class arguments
        if self.quantize_model:
            converter.optimizations = [tf.lite.Optimize.DEFAULT]
            converter.target_spec.supported_ops = [
                tf.lite.OpsSet.TFLITE_BUILTINS_INT8
            ]
        converter.inference_input_type = tf.as_dtype(self.inferenceinputtype)
        converter.inference_output_type = tf.as_dtype(self.inferenceoutputtype)

        # dataset can be used during compilation i.e. for calibration
        # purposes
        if self.dataset and self.quantize_model:
            def generator():
                for entry in self.dataset.calibration_dataset_generator(
                        self.dataset_percentage):
                    yield [np.array(entry, dtype=np.float32)]
            converter.representative_dataset = generator

        # compile and save the model
        tflite_model = converter.convert()
        with open(self.compiled_model_path, 'wb') as f:
            f.write(tflite_model)

        # update the I/O specification
        interpreter = tf.lite.Interpreter(model_content=tflite_model)
        signature = interpreter.get_signature_runner()

        def update_io_spec(sig_det, int_det, key):
            for order, spec in enumerate(io_spec[key]):
                old_name = spec['name']
                new_name = sig_det[old_name]['name']
                spec['name'] = new_name
                spec['order'] = order

            quantized = any([det['quantization'][0] != 0 for det in int_det])
            new_spec = []
            for det in int_det:
                spec = [
                    spec for spec in io_spec[key]
                    if det['name'] == spec['name']
                ][0]

                if quantized:
                    scale, zero_point = det['quantization']
                    spec['scale'] = scale
                    spec['zero_point'] = zero_point
                    spec['prequantized_dtype'] = spec['dtype']
                    spec['dtype'] = np.dtype(det['dtype']).name
                new_spec.append(spec)
            io_spec[key] = new_spec

        update_io_spec(
            signature.get_input_details(),
            interpreter.get_input_details(),
            'input'
        )
        update_io_spec(
            signature.get_output_details(),
            interpreter.get_output_details(),
            'output'
        )

        # save updated I/O specification
        self.save_io_specification(inputmodelpath, io_spec)

    def get_framework_and_version(self):
        return 'tensorflow', tf.__version__

There are several important things regarding the code snippet above:

  • The information regarding inputs and outputs can be collected with the self.load_io_specification method, present in all classes.

  • The information about the input format to use is delivered in the self.inputtype field - it is updated automatically by the function consulting the best supported format for previous and current block.

  • If the I/O metadata is affected by the current block, it needs to be updated and saved along with the compiled model using the self.save_io_specification method.

Using the implemented block

In the Python script, the above class can be used with other classes from the Kenning API as is - it is a regular Python code.

To use the implemented block in the JSON scenario (as described in Defining optimization pipelines in Kenning, the module implementing the class needs to be available from the current directory, or the path to the module needs to be added to the PYTHONPATH variable.

Let’s assume that the class was implemented in the my_optimizer.py file. The scenario can look as follows:

{
    "model_wrapper":
    {
        "type": "kenning.modelwrappers.classification.tensorflow_pet_dataset.TensorFlowPetDatasetMobileNetV2",
        "parameters":
        {
            "model_path": "kenning:///models/classification/tensorflow_pet_dataset_mobilenetv2.h5"
        }
    },
    "dataset":
    {
        "type": "kenning.datasets.pet_dataset.PetDataset",
        "parameters":
        {
            "dataset_root": "./build/pet-dataset"
        }
    },
    "optimizers":
    [
        {
            "type": "my_optimizer.TensorFlowLiteCompiler",
            "parameters":
            {
                "compiled_model_path": "./build/compiled-model.tflite",
                "inference_input_type": "float32",
                "inference_output_type": "float32"
            }
        }
    ],
    "runtime":
    {
        "type": "kenning.runtimes.tflite.TFLiteRuntime",
        "parameters":
        {
            "save_model_path": "./build/compiled-model.tflite"
        }
    }
}

The emphasized line demonstrates usage of the implemented TensorFlowLiteCompiler from the my_optimizer.py script.

This sums up the Kenning development process.

Implementing Kenning runtime blocks

Implementing new Runners for KenningFlow

The process of creating new Runner is almost the same as the process of implementing Kenning components described above, with a few additional steps.

First of all, the new component needs to inherit from a Runner class (not necessarily directly). Then you need to implement the following methods:

  • cleanup - cleans resources after execution is stopped

  • should_close - (optional) returns boolean indicating whether a runner ended processing and requests closing Possible reasons: a signal from terminal, user request in GUI, end of data to process, an error, and more. Default implementation always returns False

  • run - method that gets runner inputs, processes them and returns obtained results.

Inputs for the runner are passed to the run method as a dictionary where the key is the input name specified in the KenningFlow JSON and the value is simply the input value.

E.g. for DetectionVisualizer defined in JSON as

{
    "type": "kenning.outputcollectors.detection_visualizer.DetectionVisualizer",
    "parameters": {
        "output_width": 608,
        "output_height": 608
    },
    "inputs": {
        "frame": "cam_frame",
        "detection_data": "predictions"
    }
}

the run method access inputs as follows

from typing import Any, Dict

def run(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
    input_data = inputs['frame']
    output_data = inputs['detection_data']
    self.process_output(input_data, output_data)

Note

In the example above, the run method does not contain a return statement, because this runner does not have any outputs. If you want to create a runner with outputs, this method should return a similar dictionary containing outputs.

Adjusting ModelWrapper for AutoML flow

Implementing a ModelWrapper

To create an AutoML model the default modelwrapper-api has to implement automl-model-api interface. The description of the ModelWrapper implementation can be found in model-io-metadata section. But it has to be extended with a few additional steps, like in the AutoPyTorch-based example with a simple fully-connected neural network:

  • add kenning.core.automl.AutoMLModel based class inheritance to the ModelWrapper (line 35),

  • implement the model class with parameters that can be tuned by AutoML (lines 14-32),

  • point to the model class in the ModelWrapper using full class path (line 52),

  • define which ModelWrapper arguments can be tuned (lines 41-49),

  • implement or override inherited methods, in case of AutoPyTorchModel, they are:

    • (required) get_io_specification_from_dataset - creates IO spec based on the dataset (lines 119-129),

    • (optional) model_params_from_context - generates additional parameters for the model based on the dataset and optional platform (lines 65-75),

    • (optional) define_forbidden_clauses - adds forbidden configurations to the search space,

    • (optional) register_components - adds custom components to AutoPyTorch.

Listing 15 fc_automl.py
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
from typing import Any, Dict, List, Optional, Tuple

import numpy as np
import torch
from torch import nn

from kenning.automl.auto_pytorch import AutoPyTorchModel
from kenning.core.platform import Platform
from kenning.datasets.anomaly_detection_dataset import AnomalyDetectionDataset
from kenning.modelwrappers.frameworks.pytorch import PyTorchWrapper
from kenning.utils.resource_manager import PathOrURI


class FCNetwork(nn.Sequential):
    """
    The simple fully-connected neural network model.
    """
    def __init__(
        self,
        # The input shape is provided by AutoPyTorch
        input_shape: Tuple[int, ...],
        # Remaining arguments should have the same names as parameters
        # from `arguments_structure` and `model_params_from_context` method
        neurons: List[int],
        outputs: int,
    ):
        self.neurons = neurons
        super().__init__(
            *[nn.Linear(in_, out) for in_, out in zip(
                [input_shape[-1]] + neurons, neurons + [outputs]
            )]
        )


class PyTorchFullyConnected(PyTorchWrapper, AutoPyTorchModel):
    """
    The example AutoML-compatible fully connected model.
    """

    arguments_structure = {
        # Required: Example parameter which can be adjusted by AutoML flow
        "neurons": {
            "description": "List of neurons in consecutive layers",
            "type": list[int],
            "default": [8, 16, 8],
            "AutoML": True,
            "list_range": (1, 16),
            "item_range": (1, 256),
        },
    }
    # Required: The path to the class representing model
    model_class = f"{FCNetwork.__module__}.{FCNetwork.__name__}"

    def __init__(
        self,
        model_path: PathOrURI,
        dataset: AnomalyDetectionDataset,
        from_file: bool = True,
        model_name: Optional[str] = None,
        neurons: List[int] = [8, 16, 8],
    ):
        super().__init__(model_path, dataset, from_file, model_name)
        self.neurons = neurons

    # Optional: Method generating additional model parameters
    # based on the dataset
    @classmethod
    def model_params_from_context(
        cls,
        dataset: AnomalyDetectionDataset,
        platform: Optional[Platform] = None,
    ):
        return {
            "outputs": len(dataset.get_class_names()),
        }

    def create_model_structure(self):
        self.model = FCNetwork(
            input_shape=(
                -1,
                self.dataset.window_size * self.dataset.num_features
            ),
            neurons=self.neurons,
            **self.model_params_from_context(self.dataset)
        )

    def prepare_model(self):
        if self.model_prepared:
            return None

        if self.from_file:
            self.load_model(self.model_path)
            self.model_prepared = True
        else:
            self.create_model_structure()

            def weights_init(m):
                torch.nn.init.xavier_uniform_(m.weight)
                torch.nn.init.zeros_(m.bias)

            self.model.classifier.apply(weights_init)
            self.model_prepared = True
            self.save_model(self.model_path)
        self.model.to(self.device)

    def preprocess_input(self, X: List[Any]) -> List[Any]:
        import torch

        X = np.asarray(X[0])
        return [torch.from_numpy(X.reshape(X.shape[0], -1)).to(self.device)]

    def postprocess_outputs(self, y: List[Any]) -> List[np.ndarray]:
        output = np.argmax(
            y[0].detach().cpu().numpy(),
            axis=-1,
        ).reshape(-1).astype(np.int8)
        return (output,)

    # Required: Method generating IO specification
    # based on the dataset
    def get_io_specification_from_dataset(
        cls,
        dataset,
    ) -> Dict[str, List[Dict]]:
        return cls._get_io_specification(
            dataset.num_features,
            dataset.window_size,
            dataset.batch_size,
        )

    @classmethod
    def _get_io_specification(
        cls,
        num_features,
        window_size,
        batch_size: int = -1,
    ) -> Dict[str, List[Dict]]:
        return {
            "input": [
                {
                    "name": "input_1",
                    "shape": (
                        batch_size,
                        window_size,
                        num_features,
                    ),
                    "dtype": "float32",
                }
            ],
            "processed_input": [
                {
                    "name": "input_1",
                    "shape": (
                        batch_size,
                        window_size * num_features
                        if window_size > 0 and num_features > 0
                        else -1,
                    ),
                    "dtype": "float32",
                }
            ],
            "output": [
                {
                    "name": "distances",
                    "shape": (batch_size, 2),
                    "dtype": "float32",
                }
            ],
            "processed_output": [
                {
                    "name": "anomalies",
                    "shape": (batch_size,),
                    "dtype": "int8",
                }
            ],
        }

    def get_io_specification_from_model(self) -> Dict[str, List[Dict]]:
        return self.get_io_specification_from_dataset(self.dataset)

    @classmethod
    def derive_io_spec_from_json_params(
        cls, json_dict: Dict
    ) -> Dict[str, List[Dict]]:
        cls.get_io_specification(-1, -1)

Using the implemented model

To use the model, its full module path has to be specified in the use_models parameter:

Listing 16 fc-scenario.yml
dataset:
  type: AnomalyDetectionDataset
  parameters:
    dataset_root: ./workspace/CATS
    csv_file: kenning:///datasets/anomaly_detection/cats_nano.csv
    split_fraction_test: 0.1
    split_seed: 12345
    inference_batch_size: 1

automl:
  type: AutoPyTorchML
  parameters:
    time_limit: 2
    use_models:
      # Use model wrapper defined in fc_automl.py
      - fc_automl.PyTorchFullyConnected
    output_directory: ./workspace/automl-results
    n_best_models: 5
    optimize_metric: f1
    min_budget: 1
    max_budget: 2

Running the simples AutoML flow, assuming the implementation and the scenario are saved under fc_automl.py and fc-scenario.yml respectively, boils down to:

# Make sure the required dependencies are installed
pip install "kenning[torch,anomaly_detection,auto_pytorch] @ git+https://github.com/antmicro/kenning.git"
# Run AutoML flow with new model
PYTHONPATH="$(pwd):$PYTHONPATH" kenning automl --cfg ./fc-scenario.yml

Last update: 2025-05-30