Quantization Speedup

class nni.compression.pytorch.quantization_speedup.ModelSpeedupTensorRT(model, input_shape, config=None, onnx_path='default_model.onnx', extra_layer_bits=32, strict_datatype=True, calibrate_type=tensorrt.CalibrationAlgoType.ENTROPY_CALIBRATION_2, calib_data_loader=None, calibration_cache='calibration.cache', batchsize=1, input_names=['actual_input_1'], output_names=['output1'])[source]
Parameters
  • model (pytorch model) – The model to speedup by quantization.

  • input_shape (tuple) – The input shape of model, shall pass it to torch.onnx.export.

  • config (dict) – Config recording bits number and name of layers.

  • onnx_path (str) – The path user want to store onnx model which is converted from pytorch model.

  • extra_layer_bits (int) – Other layers which are not in config will be quantized to corresponding bits number.

  • strict_datatype (bool) – Whether constrain layer bits to the number given in config or not. If true, all the layer will be set to given bits strictly. Otherwise, these layers will be set automatically by tensorrt.

  • calibrate_type (tensorrt.tensorrt.CalibrationAlgoType) – The algorithm of calibrating. Please refer to https://docs.nvidia.com/deeplearning/ tensorrt/api/python_api/infer/Int8/Calibrator.html for detail

  • calibrate_data (numpy array) – The data using to calibrate quantization model

  • calibration_cache (str) – The path user want to store calibrate cache file

  • batchsize (int) – The batch size of calibration and inference

  • input_names (list) – Input name of onnx model providing for torch.onnx.export to generate onnx model

  • output_name (list) – Output name of onnx model providing for torch.onnx.export to generate onnx model

compress()[source]

Get onnx config and build tensorrt engine.

export_quantized_model(path)[source]

Export TensorRT quantized model engine which only can be loaded by TensorRT deserialize API.

Parameters

path (str) – The path of export model

inference(test_data)[source]

Do inference by tensorrt builded engine.

Parameters

test_data (pytorch tensor) – Model input tensor

load_quantized_model(path)[source]

Load TensorRT quantized model engine from specific path.

Parameters

path (str) – The path of export model