Model Compression API Reference

Compressors

Compressor

class nni.compression.pytorch.compressor.Compressor(model, config_list, optimizer=None)[源代码]

Abstract base PyTorch compressor

compress()[源代码]

Compress the model with algorithm implemented by subclass.

The model will be instrumented and user should never edit it after calling this method. self.modules_to_compress records all the to-be-compressed layers

返回

model with specified modules compressed.

返回类型

torch.nn.Module

get_modules_to_compress()[源代码]

To obtain all the to-be-compressed modules.

返回

a list of the layers, each of which is a tuple (layer, config), layer is LayerInfo, config is a dict

返回类型

list

get_modules_wrapper()[源代码]

To obtain all the wrapped modules.

返回

a list of the wrapped modules

返回类型

list

reset(checkpoint=None)[源代码]

reset model state dict and model wrapper

select_config(layer)[源代码]

Find the configuration for layer by parsing self.config_list

参数

layer (LayerInfo) -- one layer

返回

the retrieved configuration for this layer, if None, this layer should not be compressed

返回类型

config or None

set_wrappers_attribute(name, value)[源代码]

To register attributes used in wrapped module's forward method. If the type of the value is Torch.tensor, then this value is registered as a buffer in wrapper, which will be saved by model.state_dict. Otherwise, this value is just a regular variable in wrapper.

参数
  • name (str) -- name of the variable

  • value (any) -- value of the variable

update_epoch(epoch)[源代码]

If user want to update model every epoch, user can override this method. This method should be called at the beginning of each epoch

参数

epoch (num) -- the current epoch number

validate_config(model, config_list)[源代码]

subclass can optionally implement this method to check if config_list if valid

class nni.compression.pytorch.compressor.Pruner(model, config_list, optimizer=None)[源代码]

Prune to an exact pruning level specification

mask_dict

Dictionary for saving masks, key should be layer name and value should be a tensor which has the same shape with layer's weight

Type

dict

calc_mask(wrapper, **kwargs)[源代码]

Pruners should overload this method to provide mask for weight tensors. The mask must have the same shape and type comparing to the weight. It will be applied with mul() operation on the weight. This method is effectively hooked to forward() method of the model.

参数

wrapper (Module) -- calculate mask for wrapper.module's weight

compress()[源代码]

Compress the model with algorithm implemented by subclass.

The model will be instrumented and user should never edit it after calling this method. self.modules_to_compress records all the to-be-compressed layers

返回

model with specified modules compressed.

返回类型

torch.nn.Module

export_model(model_path, mask_path=None, onnx_path=None, input_shape=None, device=None, dummy_input=None, opset_version=None)[源代码]

Export pruned model weights, masks and onnx model(optional)

参数
  • model_path (str) -- path to save pruned model state_dict

  • mask_path (str) -- (optional) path to save mask dict

  • onnx_path (str) -- (optional) path to save onnx model

  • input_shape (list or tuple) -- input shape to onnx model, used for creating a dummy input tensor for torch.onnx.export if the input has a complex structure (e.g., a tuple), please directly create the input and pass it to dummy_input instead note: this argument is deprecated and will be removed; please use dummy_input instead

  • device (torch.device) -- device of the model, where to place the dummy input tensor for exporting onnx file; the tensor is placed on cpu if `device` is None only useful when both onnx_path and input_shape are passed note: this argument is deprecated and will be removed; please use dummy_input instead

  • dummy_input (torch.Tensor or tuple) -- dummy input to the onnx model; used when input_shape is not enough to specify dummy input user should ensure that the dummy_input is on the same device as the model

  • opset_version (int) -- opset_version parameter for torch.onnx.export; only useful when onnx_path is not None if not passed, torch.onnx.export will use its default opset_version

get_pruned_weights(dim=0)[源代码]

Log the simulated prune sparsity.

参数

dim (int) -- the pruned dim.

load_model_state_dict(model_state)[源代码]

Load the state dict saved from unwrapped model.

参数

model_state (dict) -- state dict saved from unwrapped model

class nni.compression.pytorch.compressor.Quantizer(model, config_list, optimizer=None, dummy_input=None)[源代码]

Base quantizer for pytorch quantizer

export_model(model_path, calibration_path=None, onnx_path=None, input_shape=None, device=None)[源代码]

Export quantized model weights and calibration parameters

参数
  • model_path (str) -- path to save quantized model weight

  • calibration_path (str) -- (optional) path to save quantize parameters after calibration

  • onnx_path (str) -- (optional) path to save onnx model

  • input_shape (list or tuple) -- input shape to onnx model

  • device (torch.device) -- device of the model, used to place the dummy input tensor for exporting onnx file. the tensor is placed on cpu if `device` is None

返回类型

Dict

export_model_save(model, model_path, calibration_config=None, calibration_path=None, onnx_path=None, input_shape=None, device=None)[源代码]

This method helps save pytorch model, calibration config, onnx model in quantizer.

参数
  • model (pytorch model) -- pytorch model to be saved

  • model_path (str) -- path to save pytorch

  • calibration_config (dict) -- (optional) config of calibration parameters

  • calibration_path (str) -- (optional) path to save quantize parameters after calibration

  • onnx_path (str) -- (optional) path to save onnx model

  • input_shape (list or tuple) -- input shape to onnx model

  • device (torch.device) -- device of the model, used to place the dummy input tensor for exporting onnx file. the tensor is placed on cpu if `device` is None

find_conv_bn_patterns(model, dummy_input)[源代码]

Find all Conv-BN patterns, used for batch normalization folding

参数
  • model (torch.nn.Module) -- model to be analyzed.

  • dummy_input (tupel of torch.tensor) -- inputs to the model, used for generating the torchscript

fold_bn(*inputs, wrapper)[源代码]

Simulate batch normalization folding in the training graph. Folded weight and bias are returned for the following operations.

参数
  • inputs (tuple of torch.Tensor) -- inputs for the module

  • wrapper (QuantizerModuleWrapper) -- the wrapper for origin module

返回类型

Tuple of torch.Tensor

load_calibration_config(calibration_config)[源代码]

This function aims to help quantizer set quantization parameters by loading from a calibration_config which is exported by other quantizer or itself. The main usage of this function is helping quantize aware training quantizer set appropriate initial parameters so that the training process will be much more flexible and converges quickly. What's more, it can also enable quantizer resume quantization model by loading parameters from config.

参数

calibration_config (dict) -- dict which saves quantization parameters, quantizer can export itself calibration config. eg, calibration_config = quantizer.export_model(model_path, calibration_path)

quantize_input(inputs, wrapper, **kwargs)[源代码]

quantize should overload this method to quantize input. This method is effectively hooked to forward() of the model. :param inputs: inputs that needs to be quantized :type inputs: Tensor :param wrapper: the wrapper for origin module :type wrapper: QuantizerModuleWrapper

quantize_output(output, wrapper, **kwargs)[源代码]

quantize should overload this method to quantize output. This method is effectively hooked to forward() of the model. :param output: output that needs to be quantized :type output: Tensor :param wrapper: the wrapper for origin module :type wrapper: QuantizerModuleWrapper

quantize_weight(wrapper, **kwargs)[源代码]

quantize should overload this method to quantize weight. This method is effectively hooked to forward() of the model. :param wrapper: the wrapper for origin module :type wrapper: QuantizerModuleWrapper

record_shape(model, dummy_input)[源代码]

Record input/output's shapes of each module to be quantized

参数
  • model (torch.nn.Module) -- model to be recorded.

  • dummy_input (tupel of torch.tensor) -- inputs to the model.

Module Wrapper

class nni.compression.pytorch.compressor.PrunerModuleWrapper(module, module_name, module_type, config, pruner)[源代码]
forward(*inputs)[源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.

备注

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class nni.compression.pytorch.compressor.QuantizerModuleWrapper(module, module_name, module_type, config, quantizer, bn_module=None)[源代码]
forward(*inputs)[源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.

备注

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Weight Masker

class nni.algorithms.compression.pytorch.pruning.weight_masker.WeightMasker(model, pruner, **kwargs)[源代码]
calc_mask(sparsity, wrapper, wrapper_idx=None)[源代码]

Calculate the mask of given layer. :param sparsity: pruning ratio, preserved weight ratio is 1 - sparsity :type sparsity: float :param wrapper: layer wrapper of this layer :type wrapper: PrunerModuleWrapper :param wrapper_idx: index of this wrapper in pruner's all wrappers :type wrapper_idx: int

返回

dictionary for storing masks, keys of the dict: 'weight_mask': weight mask tensor 'bias_mask': bias mask tensor (optional)

返回类型

dict

class nni.algorithms.compression.pytorch.pruning.structured_pruning_masker.StructuredWeightMasker(model, pruner, preserve_round=1, dependency_aware=False, global_sort=False)[源代码]

A structured pruning masker base class that prunes convolutional layer filters.

参数
  • model (nn.Module) -- model to be pruned

  • pruner (Pruner) -- A Pruner instance used to prune the model

  • preserve_round (int) -- after pruning, preserve filters/channels round to preserve_round, for example: for a Conv2d layer, output channel is 32, sparsity is 0.2, if preserve_round is 1 (no preserve round), then there will be int(32 * 0.2) = 6 filters pruned, and 32 - 6 = 26 filters are preserved. If preserve_round is 4, preserved filters will be round up to 28 (which can be divided by 4) and only 4 filters are pruned.

calc_mask(sparsity, wrapper, wrapper_idx=None, **depen_kwargs)[源代码]

calculate the mask for wrapper.

参数
  • sparsity (float/list of float) -- The target sparsity of the wrapper. If we calculate the mask in the normal way, then sparsity is a float number. In contrast, if we calculate the mask in the dependency-aware way, sparsity is a list of float numbers, each float number corressponds to a sparsity of a layer.

  • wrapper (PrunerModuleWrapper/list of PrunerModuleWrappers) -- The wrapper of the target layer. If we calculate the mask in the normal way, then wrapper is an instance of PrunerModuleWrapper, else wrapper is a list of PrunerModuleWrapper.

  • wrapper_idx (int/list of int) -- The index of the wrapper.

  • depen_kwargs (dict) -- The kw_args for the dependency-aware mode.

get_channel_sum(wrapper, wrapper_idx)[源代码]

Calculate the importance weight for each channel. If want to support the dependency-aware mode for this one-shot pruner, this function must be implemented. :param wrapper: layer wrapper of this layer :type wrapper: PrunerModuleWrapper :param wrapper_idx: index of this wrapper in pruner's all wrappers :type wrapper_idx: int

返回

Tensor that indicates the importance of each channel

返回类型

tensor

get_mask(base_mask, weight, num_prune, wrapper, wrapper_idx, channel_masks=None)[源代码]

Calculate the mask of given layer.

参数
  • base_mask (dict) -- The basic mask with the same shape of weight, all item in the basic mask is 1.

  • weight (tensor) -- the module weight to be pruned

  • num_prune (int) -- Num of filters to prune

  • wrapper (PrunerModuleWrapper) -- layer wrapper of this layer

  • wrapper_idx (int) -- index of this wrapper in pruner's all wrappers

  • channel_masks (Tensor) -- If mask some channels for this layer in advance. In the dependency-aware mode, before calculating the masks for each layer, we will calculate a common mask for all the layers in the dependency set. For the pruners that doesnot support dependency-aware mode, they can just ignore this parameter.

返回

dictionary for storing masks

返回类型

dict

Pruners

class nni.algorithms.compression.pytorch.pruning.sensitivity_pruner.SensitivityPruner(model, config_list, evaluator, finetuner=None, base_algo='l1', sparsity_proportion_calc=None, sparsity_per_iter=0.1, acc_drop_threshold=0.05, checkpoint_dir=None)[源代码]

This function prune the model based on the sensitivity for each layer.

参数
  • model (torch.nn.Module) -- model to be compressed

  • evaluator (function) -- validation function for the model. This function should return the accuracy of the validation dataset. The input parameters of evaluator can be specified in the parameter eval_args and 'eval_kwargs' of the compress function if needed. Example: >>> def evaluator(model): >>> device = torch.device("cuda" if torch.cuda.is_available() else "cpu") >>> val_loader = ... >>> model.eval() >>> correct = 0 >>> with torch.no_grad(): >>> for data, target in val_loader: >>> data, target = data.to(device), target.to(device) >>> output = model(data) >>> # get the index of the max log-probability >>> pred = output.argmax(dim=1, keepdim=True) >>> correct += pred.eq(target.view_as(pred)).sum().item() >>> accuracy = correct / len(val_loader.dataset) >>> return accuracy

  • finetuner (function) -- finetune function for the model. This parameter is not essential, if is not None, the sensitivity pruner will finetune the model after pruning in each iteration. The input parameters of finetuner can be specified in the parameter of compress called finetune_args and finetune_kwargs if needed. Example: >>> def finetuner(model, epoch=3): >>> device = torch.device("cuda" if torch.cuda.is_available() else "cpu") >>> train_loader = ... >>> criterion = torch.nn.CrossEntropyLoss() >>> optimizer = torch.optim.SGD(model.parameters(), lr=0.01) >>> model.train() >>> for _ in range(epoch): >>> for _, (data, target) in enumerate(train_loader): >>> data, target = data.to(device), target.to(device) >>> optimizer.zero_grad() >>> output = model(data) >>> loss = criterion(output, target) >>> loss.backward() >>> optimizer.step()

  • base_algo (str) -- base pruning algorithm. level, l1, l2 or fpgm, by default l1.

  • sparsity_proportion_calc (function) -- This function generate the sparsity proportion between the conv layers according to the sensitivity analysis results. We provide a default function to quantify the sparsity proportion according to the sensitivity analysis results. Users can also customize this function according to their needs. The input of this function is a dict, for example : {'conv1' : {0.1: 0.9, 0.2 : 0.8}, 'conv2' : {0.1: 0.9, 0.2 : 0.8}}, in which, 'conv1' and is the name of the conv layer, and 0.1:0.9 means when the sparsity of conv1 is 0.1 (10%), the model's val accuracy equals to 0.9.

  • sparsity_per_iter (float) -- The sparsity of the model that the pruner try to prune in each iteration.

  • acc_drop_threshold (float) -- The hyperparameter used to quantifiy the sensitivity for each layer.

  • checkpoint_dir (str) -- The dir path to save the checkpoints during the pruning.

calc_mask(wrapper, **kwargs)[源代码]

Pruners should overload this method to provide mask for weight tensors. The mask must have the same shape and type comparing to the weight. It will be applied with mul() operation on the weight. This method is effectively hooked to forward() method of the model.

参数

wrapper (Module) -- calculate mask for wrapper.module's weight

compress(eval_args=None, eval_kwargs=None, finetune_args=None, finetune_kwargs=None, resume_sensitivity=None)[源代码]

This function iteratively prune the model according to the results of the sensitivity analysis.

参数
  • eval_args (list) --

  • eval_kwargs (list& dict) -- Parameters for the val_funtion, the val_function will be called like evaluator(*eval_args, **eval_kwargs)

  • finetune_args (list) --

  • finetune_kwargs (dict) -- Parameters for the finetuner function if needed.

  • resume_sensitivity -- resume the sensitivity results from this file.

create_cfg(ratios)[源代码]

Generate the cfg_list for the pruner according to the prune ratios.

参数

ratios -- For example: {'conv1' : 0.2}

返回

For example: [{'sparsity':0.2, 'op_names':['conv1'], 'op_types':['Conv2d']}]

返回类型

cfg_list

current_sparsity()[源代码]

The sparsity of the weight.

load_sensitivity(filepath)[源代码]

load the sensitivity results exported by the sensitivity analyzer

normalize(ratios, target_pruned)[源代码]

Normalize the prune ratio of each layer according to the total already pruned ratio and the final target total pruning ratio

参数
  • ratios -- Dict object that save the prune ratio for each layer

  • target_pruned -- The amount of the weights expected to be pruned in this iteration

返回

return the normalized prune ratios for each layer.

返回类型

new_ratios

validate_config(model, config_list)[源代码]
参数
  • model (torch.nn.module) -- Model to be pruned

  • config_list (list) -- List on pruning configs

class nni.algorithms.compression.pytorch.pruning.one_shot_pruner.OneshotPruner(model, config_list, pruning_algorithm='level', dependency_aware=False, dummy_input=None, **algo_kwargs)[源代码]

Prune model to an exact pruning level for one time.

validate_config(model, config_list)[源代码]
参数
  • model (torch.nn.Module) -- Model to be pruned

  • config_list (list) -- List on pruning configs

class nni.algorithms.compression.pytorch.pruning.one_shot_pruner.LevelPruner(model, config_list)[源代码]
参数
  • model (torch.nn.Module) -- Model to be pruned

  • config_list (list) --

    Supported keys:
    • sparsity : This is to specify the sparsity operations to be compressed to.

    • op_types : Operation types to prune.

class nni.algorithms.compression.pytorch.pruning.one_shot_pruner.L1FilterPruner(model, config_list, dependency_aware=False, dummy_input=None)[源代码]
参数
  • model (torch.nn.Module) -- Model to be pruned

  • config_list (list) --

    Supported keys:
    • sparsity : This is to specify the sparsity operations to be compressed to.

    • op_types : Only Conv2d is supported in L1FilterPruner.

  • dependency_aware (bool) -- If prune the model in a dependency-aware way. If it is True, this pruner will prune the model according to the l2-norm of weights and the channel-dependency or group-dependency of the model. In this way, the pruner will force the conv layers that have dependencies to prune the same channels, so the speedup module can better harvest the speed benefit from the pruned model. Note that, if this flag is set True , the dummy_input cannot be None, because the pruner needs a dummy input to trace the dependency between the conv layers.

  • dummy_input (torch.Tensor) -- The dummy input to analyze the topology constraints. Note that, the dummy_input should on the same device with the model.

class nni.algorithms.compression.pytorch.pruning.one_shot_pruner.L2FilterPruner(model, config_list, dependency_aware=False, dummy_input=None)[源代码]
参数
  • model (torch.nn.Module) -- Model to be pruned

  • config_list (list) --

    Supported keys:
    • sparsity : This is to specify the sparsity operations to be compressed to.

    • op_types : Only Conv2d is supported in L2FilterPruner.

  • dependency_aware (bool) -- If prune the model in a dependency-aware way. If it is True, this pruner will prune the model according to the l2-norm of weights and the channel-dependency or group-dependency of the model. In this way, the pruner will force the conv layers that have dependencies to prune the same channels, so the speedup module can better harvest the speed benefit from the pruned model. Note that, if this flag is set True , the dummy_input cannot be None, because the pruner needs a dummy input to trace the dependency between the conv layers.

  • dummy_input (torch.Tensor) -- The dummy input to analyze the topology constraints. Note that, the dummy_input should on the same device with the model.

class nni.algorithms.compression.pytorch.pruning.one_shot_pruner.FPGMPruner(model, config_list, dependency_aware=False, dummy_input=None)[源代码]
参数
  • model (torch.nn.Module) -- Model to be pruned

  • config_list (list) --

    Supported keys:
    • sparsity : This is to specify the sparsity operations to be compressed to.

    • op_types : Only Conv2d is supported in FPGM Pruner.

  • dependency_aware (bool) -- If prune the model in a dependency-aware way. If it is True, this pruner will prune the model according to the l2-norm of weights and the channel-dependency or group-dependency of the model. In this way, the pruner will force the conv layers that have dependencies to prune the same channels, so the speedup module can better harvest the speed benefit from the pruned model. Note that, if this flag is set True , the dummy_input cannot be None, because the pruner needs a dummy input to trace the dependency between the conv layers.

  • dummy_input (torch.Tensor) -- The dummy input to analyze the topology constraints. Note that, the dummy_input should on the same device with the model.

class nni.algorithms.compression.pytorch.pruning.iterative_pruner.IterativePruner(model, config_list, optimizer=None, pruning_algorithm='slim', trainer=None, criterion=None, num_iterations=20, epochs_per_iteration=5, dependency_aware=False, dummy_input=None, **algo_kwargs)[源代码]

Prune model during the training process.

compress()[源代码]

Compress the model with algorithm implemented by subclass.

The model will be instrumented and user should never edit it after calling this method. self.modules_to_compress records all the to-be-compressed layers

返回

model with specified modules compressed.

返回类型

torch.nn.Module

class nni.algorithms.compression.pytorch.pruning.iterative_pruner.SlimPruner(model, config_list, optimizer, trainer, criterion, sparsifying_training_epochs=10, scale=0.0001, dependency_aware=False, dummy_input=None)[源代码]
参数
  • model (torch.nn.Module) -- Model to be pruned

  • config_list (list) --

    Supported keys:
    • sparsity : This is to specify the sparsity operations to be compressed to.

    • op_types : Only BatchNorm2d is supported in Slim Pruner.

  • optimizer (torch.optim.Optimizer) -- Optimizer used to train model

  • trainer (function) -- Function used to sparsify BatchNorm2d scaling factors. Users should write this function as a normal function to train the Pytorch model and include model, optimizer, criterion, epoch as function arguments.

  • criterion (function) -- Function used to calculate the loss between the target and the output. For example, you can use torch.nn.CrossEntropyLoss() as input.

  • sparsifying_training_epochs (int) -- The number of channel sparsity regularization training epochs before pruning.

  • scale (float) -- Penalty parameters for sparsification.

  • dependency_aware (bool) -- If prune the model in a dependency-aware way. If it is True, this pruner will prune the model according to the l2-norm of weights and the channel-dependency or group-dependency of the model. In this way, the pruner will force the conv layers that have dependencies to prune the same channels, so the speedup module can better harvest the speed benefit from the pruned model. Note that, if this flag is set True , the dummy_input cannot be None, because the pruner needs a dummy input to trace the dependency between the conv layers.

  • dummy_input (torch.Tensor) -- The dummy input to analyze the topology constraints. Note that, the dummy_input should on the same device with the model.

validate_config(model, config_list)[源代码]

subclass can optionally implement this method to check if config_list if valid

class nni.algorithms.compression.pytorch.pruning.iterative_pruner.TaylorFOWeightFilterPruner(model, config_list, optimizer, trainer, criterion, sparsifying_training_batches=1, dependency_aware=False, dummy_input=None, global_sort=False)[源代码]
参数
  • model (torch.nn.Module) -- Model to be pruned

  • config_list (list) --

    Supported keys:
    • sparsity : How much percentage of convolutional filters are to be pruned.

    • op_types : Currently only Conv2d is supported in TaylorFOWeightFilterPruner.

  • optimizer (torch.optim.Optimizer) -- Optimizer used to train model

  • trainer (function) -- Function used to sparsify BatchNorm2d scaling factors. Users should write this function as a normal function to train the Pytorch model and include model, optimizer, criterion, epoch as function arguments.

  • criterion (function) -- Function used to calculate the loss between the target and the output. For example, you can use torch.nn.CrossEntropyLoss() as input.

  • sparsifying_training_batches (int) -- The number of batches to collect the contributions. Note that the number need to be less than the maximum batch number in one epoch.

  • dependency_aware (bool) -- If prune the model in a dependency-aware way. If it is True, this pruner will prune the model according to the l2-norm of weights and the channel-dependency or group-dependency of the model. In this way, the pruner will force the conv layers that have dependencies to prune the same channels, so the speedup module can better harvest the speed benefit from the pruned model. Note that, if this flag is set True , the dummy_input cannot be None, because the pruner needs a dummy input to trace the dependency between the conv layers.

  • dummy_input (torch.Tensor) -- The dummy input to analyze the topology constraints. Note that, the dummy_input should on the same device with the model.

  • global_sort (bool) -- Only support TaylorFOWeightFilterPruner currently. If prune the model in a global-sort way. If it is True, this pruner will prune the model according to the global contributions information which means channel contributions will be sorted globally and whether specific channel will be pruned depends on global information.

class nni.algorithms.compression.pytorch.pruning.iterative_pruner.ActivationAPoZRankFilterPruner(model, config_list, optimizer, trainer, criterion, activation='relu', sparsifying_training_batches=1, dependency_aware=False, dummy_input=None)[源代码]
参数
  • model (torch.nn.Module) -- Model to be pruned

  • config_list (list) --

    Supported keys:
    • sparsity : How much percentage of convolutional filters are to be pruned.

    • op_types : Only Conv2d is supported in ActivationAPoZRankFilterPruner.

  • optimizer (torch.optim.Optimizer) -- Optimizer used to train model

  • trainer (function) -- Function used to train the model. Users should write this function as a normal function to train the Pytorch model and include model, optimizer, criterion, epoch as function arguments.

  • criterion (function) -- Function used to calculate the loss between the target and the output. For example, you can use torch.nn.CrossEntropyLoss() as input.

  • activation (str) -- The activation type.

  • sparsifying_training_batches (int) -- The number of batches to collect the contributions. Note that the number need to be less than the maximum batch number in one epoch.

  • dependency_aware (bool) -- If prune the model in a dependency-aware way. If it is True, this pruner will prune the model according to the l2-norm of weights and the channel-dependency or group-dependency of the model. In this way, the pruner will force the conv layers that have dependencies to prune the same channels, so the speedup module can better harvest the speed benefit from the pruned model. Note that, if this flag is set True , the dummy_input cannot be None, because the pruner needs a dummy input to trace the dependency between the conv layers.

  • dummy_input (torch.Tensor) -- The dummy input to analyze the topology constraints. Note that, the dummy_input should on the same device with the model.

class nni.algorithms.compression.pytorch.pruning.iterative_pruner.ActivationMeanRankFilterPruner(model, config_list, optimizer, trainer, criterion, activation='relu', sparsifying_training_batches=1, dependency_aware=False, dummy_input=None)[源代码]
参数
  • model (torch.nn.Module) -- Model to be pruned

  • config_list (list) --

    Supported keys:
    • sparsity : How much percentage of convolutional filters are to be pruned.

    • op_types : Only Conv2d is supported in ActivationMeanRankFilterPruner.

  • optimizer (torch.optim.Optimizer) -- Optimizer used to train model.

  • trainer (function) -- Function used to train the model. Users should write this function as a normal function to train the Pytorch model and include model, optimizer, criterion, epoch as function arguments.

  • criterion (function) -- Function used to calculate the loss between the target and the output. For example, you can use torch.nn.CrossEntropyLoss() as input.

  • activation (str) -- The activation type.

  • sparsifying_training_batches (int) -- The number of batches to collect the contributions. Note that the number need to be less than the maximum batch number in one epoch.

  • dependency_aware (bool) -- If prune the model in a dependency-aware way. If it is True, this pruner will prune the model according to the l2-norm of weights and the channel-dependency or group-dependency of the model. In this way, the pruner will force the conv layers that have dependencies to prune the same channels, so the speedup module can better harvest the speed benefit from the pruned model. Note that, if this flag is set True , the dummy_input cannot be None, because the pruner needs a dummy input to trace the dependency between the conv layers.

  • dummy_input (torch.Tensor) -- The dummy input to analyze the topology constraints. Note that, the dummy_input should on the same device with the model.

class nni.algorithms.compression.pytorch.pruning.iterative_pruner.AGPPruner(model, config_list, optimizer, trainer, criterion, num_iterations=10, epochs_per_iteration=1, pruning_algorithm='level')[源代码]
参数
  • model (torch.nn.Module) -- Model to be pruned.

  • config_list (listlist) --

    Supported keys:
    • sparsity : This is to specify the sparsity operations to be compressed to.

    • op_types : See supported type in your specific pruning algorithm.

  • optimizer (torch.optim.Optimizer) -- Optimizer used to train model.

  • trainer (function) -- Function to train the model

  • criterion (function) -- Function used to calculate the loss between the target and the output. For example, you can use torch.nn.CrossEntropyLoss() as input.

  • num_iterations (int) -- Total number of iterations in pruning process. We will calculate mask at the end of an iteration.

  • epochs_per_iteration (int) -- The number of training epochs for each iteration.

  • pruning_algorithm (str) -- Algorithms being used to prune model, choose from ['level', 'slim', 'l1', 'l2', 'fpgm', 'taylorfo', 'apoz', 'mean_activation'], by default level

calc_mask(wrapper, wrapper_idx=None)[源代码]

Calculate the mask of given layer. Scale factors with the smallest absolute value in the BN layer are masked. :param wrapper: the layer to instrument the compression operation :type wrapper: Module :param wrapper_idx: index of this wrapper in pruner's all wrappers :type wrapper_idx: int

返回

Dictionary for storing masks, keys of the dict: 'weight_mask': weight mask tensor 'bias_mask': bias mask tensor (optional)

返回类型

dict | None

compress()[源代码]

Compress the model with algorithm implemented by subclass.

The model will be instrumented and user should never edit it after calling this method. self.modules_to_compress records all the to-be-compressed layers

返回

model with specified modules compressed.

返回类型

torch.nn.Module

compute_target_sparsity(config)[源代码]

Calculate the sparsity for pruning :param config: Layer's pruning config :type config: dict

返回

Target sparsity to be pruned

返回类型

float

update_epoch(epoch)[源代码]

Update epoch :param epoch: current training epoch :type epoch: int

validate_config(model, config_list)[源代码]
参数
  • model (torch.nn.Module) -- Model to be pruned

  • config_list (list) -- List on pruning configs

class nni.algorithms.compression.pytorch.pruning.iterative_pruner.ADMMPruner(model, config_list, trainer, criterion=CrossEntropyLoss(), num_iterations=30, epochs_per_iteration=5, row=0.0001, base_algo='l1')[源代码]

A Pytorch implementation of ADMM Pruner algorithm.

参数
  • model (torch.nn.Module) -- Model to be pruned.

  • config_list (list) -- List on pruning configs.

  • trainer (function) -- Function used for the first subproblem. Users should write this function as a normal function to train the Pytorch model and include model, optimizer, criterion, epoch as function arguments.

  • criterion (function) -- Function used to calculate the loss between the target and the output. By default, we use CrossEntropyLoss in ADMMPruner. For example, you can use torch.nn.CrossEntropyLoss() as input.

  • num_iterations (int) -- Total number of iterations in pruning process. We will calculate mask after we finish all iterations in ADMMPruner.

  • epochs_per_iteration (int) -- Training epochs of the first subproblem.

  • row (float) -- Penalty parameters for ADMM training.

  • base_algo (str) -- Base pruning algorithm. level, l1, l2 or fpgm, by default l1. Given the sparsity distribution among the ops, the assigned base_algo is used to decide which filters/channels/weights to prune.

compress()[源代码]

Compress the model with ADMM.

返回

model with specified modules compressed.

返回类型

torch.nn.Module

validate_config(model, config_list)[源代码]
参数
  • model (torch.nn.Module) -- Model to be pruned

  • config_list (list) -- List on pruning configs

class nni.algorithms.compression.pytorch.pruning.auto_compress_pruner.AutoCompressPruner(model, config_list, trainer, evaluator, dummy_input, criterion=CrossEntropyLoss(), num_iterations=3, optimize_mode='maximize', base_algo='l1', start_temperature=100, stop_temperature=20, cool_down_rate=0.9, perturbation_magnitude=0.35, admm_num_iterations=30, admm_epochs_per_iteration=5, row=0.0001, experiment_data_dir='./')[源代码]

A Pytorch implementation of AutoCompress pruning algorithm.

参数
  • model (pytorch model) -- The model to be pruned.

  • config_list (list) --

    Supported keys:
    • sparsity : The target overall sparsity.

    • op_types : The operation type to prune.

  • trainer (function) -- Function used for the first subproblem of ADMM Pruner. Users should write this function as a normal function to train the Pytorch model and include model, optimizer, criterion, epoch as function arguments.

  • criterion (function) -- Function used to calculate the loss between the target and the output. By default, we use CrossEntropyLoss. For example, you can use torch.nn.CrossEntropyLoss() as input.

  • evaluator (function) --

    function to evaluate the pruned model. This function should include model as the only parameter, and returns a scalar value. Example:

    def evaluator(model):
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        val_loader = ...
        model.eval()
        correct = 0
        with torch.no_grad():
            for data, target in val_loader:
                data, target = data.to(device), target.to(device)
                output = model(data)
                # get the index of the max log-probability
                pred = output.argmax(dim=1, keepdim=True)
                correct += pred.eq(target.view_as(pred)).sum().item()
        accuracy = correct / len(val_loader.dataset)
        return accuracy
    

  • dummy_input (pytorch tensor) -- The dummy input for `jit.trace`, users should put it on right device before pass in.

  • num_iterations (int) -- Number of overall iterations.

  • optimize_mode (str) -- optimize mode, maximize or minimize, by default maximize.

  • base_algo (str) -- Base pruning algorithm. level, l1, l2 or fpgm, by default l1. Given the sparsity distribution among the ops, the assigned base_algo is used to decide which filters/channels/weights to prune.

  • start_temperature (float) -- Start temperature of the simulated annealing process.

  • stop_temperature (float) -- Stop temperature of the simulated annealing process.

  • cool_down_rate (float) -- Cool down rate of the temperature.

  • perturbation_magnitude (float) -- Initial perturbation magnitude to the sparsities. The magnitude decreases with current temperature.

  • admm_num_iterations (int) -- Number of iterations of ADMM Pruner.

  • admm_epochs_per_iteration (int) -- Training epochs of the first optimization subproblem of ADMMPruner.

  • row (float) -- Penalty parameters for ADMM training.

  • experiment_data_dir (string) -- PATH to store temporary experiment data.

calc_mask(wrapper, **kwargs)[源代码]

Pruners should overload this method to provide mask for weight tensors. The mask must have the same shape and type comparing to the weight. It will be applied with mul() operation on the weight. This method is effectively hooked to forward() method of the model.

参数

wrapper (Module) -- calculate mask for wrapper.module's weight

compress()[源代码]

Compress the model with AutoCompress.

返回

model with specified modules compressed.

返回类型

torch.nn.Module

export_model(model_path, mask_path=None, onnx_path=None, input_shape=None, device=None)[源代码]

Export pruned model weights, masks and onnx model(optional)

参数
  • model_path (str) -- path to save pruned model state_dict

  • mask_path (str) -- (optional) path to save mask dict

  • onnx_path (str) -- (optional) path to save onnx model

  • input_shape (list or tuple) -- input shape to onnx model, used for creating a dummy input tensor for torch.onnx.export if the input has a complex structure (e.g., a tuple), please directly create the input and pass it to dummy_input instead note: this argument is deprecated and will be removed; please use dummy_input instead

  • device (torch.device) -- device of the model, where to place the dummy input tensor for exporting onnx file; the tensor is placed on cpu if `device` is None only useful when both onnx_path and input_shape are passed note: this argument is deprecated and will be removed; please use dummy_input instead

  • dummy_input (torch.Tensor or tuple) -- dummy input to the onnx model; used when input_shape is not enough to specify dummy input user should ensure that the dummy_input is on the same device as the model

  • opset_version (int) -- opset_version parameter for torch.onnx.export; only useful when onnx_path is not None if not passed, torch.onnx.export will use its default opset_version

validate_config(model, config_list)[源代码]
参数
  • model (torch.nn.Module) -- Model to be pruned

  • config_list (list) -- List on pruning configs

class nni.algorithms.compression.pytorch.pruning.net_adapt_pruner.NetAdaptPruner(model, config_list, short_term_fine_tuner, evaluator, optimize_mode='maximize', base_algo='l1', sparsity_per_iteration=0.05, experiment_data_dir='./')[源代码]

A Pytorch implementation of NetAdapt compression algorithm.

参数
  • model (pytorch model) -- The model to be pruned.

  • config_list (list) --

    Supported keys:
    • sparsity : The target overall sparsity.

    • op_types : The operation type to prune.

  • short_term_fine_tuner (function) --

    function to short-term fine tune the masked model. This function should include model as the only parameter, and fine tune the model for a short term after each pruning iteration. Example:

    def short_term_fine_tuner(model, epoch=3):
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        train_loader = ...
        criterion = torch.nn.CrossEntropyLoss()
        optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
        model.train()
        for _ in range(epoch):
            for batch_idx, (data, target) in enumerate(train_loader):
                data, target = data.to(device), target.to(device)
                optimizer.zero_grad()
                output = model(data)
                loss = criterion(output, target)
                loss.backward()
                optimizer.step()
    

  • evaluator (function) --

    function to evaluate the masked model. This function should include model as the only parameter, and returns a scalar value. Example:

    def evaluator(model):
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        val_loader = ...
        model.eval()
        correct = 0
        with torch.no_grad():
            for data, target in val_loader:
                data, target = data.to(device), target.to(device)
                output = model(data)
                # get the index of the max log-probability
                pred = output.argmax(dim=1, keepdim=True)
                correct += pred.eq(target.view_as(pred)).sum().item()
        accuracy = correct / len(val_loader.dataset)
        return accuracy
    

  • optimize_mode (str) -- optimize mode, maximize or minimize, by default maximize.

  • base_algo (str) -- Base pruning algorithm. level, l1, l2 or fpgm, by default l1. Given the sparsity distribution among the ops, the assigned base_algo is used to decide which filters/channels/weights to prune.

  • sparsity_per_iteration (float) -- sparsity to prune in each iteration.

  • experiment_data_dir (str) -- PATH to save experiment data, including the config_list generated for the base pruning algorithm and the performance of the pruned model.

calc_mask(wrapper, **kwargs)[源代码]

Pruners should overload this method to provide mask for weight tensors. The mask must have the same shape and type comparing to the weight. It will be applied with mul() operation on the weight. This method is effectively hooked to forward() method of the model.

参数

wrapper (Module) -- calculate mask for wrapper.module's weight

compress()[源代码]

Compress the model.

返回

model with specified modules compressed.

返回类型

torch.nn.Module

validate_config(model, config_list)[源代码]
参数
  • model (torch.nn.Module) -- Model to be pruned

  • config_list (list) -- List on pruning configs

class nni.algorithms.compression.pytorch.pruning.simulated_annealing_pruner.SimulatedAnnealingPruner(model, config_list, evaluator, optimize_mode='maximize', base_algo='l1', start_temperature=100, stop_temperature=20, cool_down_rate=0.9, perturbation_magnitude=0.35, experiment_data_dir='./')[源代码]

A Pytorch implementation of Simulated Annealing compression algorithm.

参数
  • model (pytorch model) -- The model to be pruned.

  • config_list (list) --

    Supported keys:
    • sparsity : The target overall sparsity.

    • op_types : The operation type to prune.

  • evaluator (function) --

    Function to evaluate the pruned model. This function should include model as the only parameter, and returns a scalar value. Example:

    def evaluator(model):
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        val_loader = ...
        model.eval()
        correct = 0
        with torch.no_grad():
            for data, target in val_loader:
                data, target = data.to(device), target.to(device)
                output = model(data)
                # get the index of the max log-probability
                pred = output.argmax(dim=1, keepdim=True)
                correct += pred.eq(target.view_as(pred)).sum().item()
        accuracy = correct / len(val_loader.dataset)
        return accuracy
    

  • optimize_mode (str) -- Optimize mode, maximize or minimize, by default maximize.

  • base_algo (str) -- Base pruning algorithm. level, l1, l2 or fpgm, by default l1. Given the sparsity distribution among the ops, the assigned base_algo is used to decide which filters/channels/weights to prune.

  • start_temperature (float) -- Start temperature of the simulated annealing process.

  • stop_temperature (float) -- Stop temperature of the simulated annealing process.

  • cool_down_rate (float) -- Cool down rate of the temperature.

  • perturbation_magnitude (float) -- Initial perturbation magnitude to the sparsities. The magnitude decreases with current temperature.

  • experiment_data_dir (string) -- PATH to save experiment data, including the config_list generated for the base pruning algorithm, the performance of the pruned model and the pruning history.

calc_mask(wrapper, **kwargs)[源代码]

Pruners should overload this method to provide mask for weight tensors. The mask must have the same shape and type comparing to the weight. It will be applied with mul() operation on the weight. This method is effectively hooked to forward() method of the model.

参数

wrapper (Module) -- calculate mask for wrapper.module's weight

compress(return_config_list=False)[源代码]

Compress the model with Simulated Annealing.

返回

model with specified modules compressed.

返回类型

torch.nn.Module

validate_config(model, config_list)[源代码]
参数
  • model (torch.nn.Module) -- Model to be pruned

  • config_list (list) -- List on pruning configs

class nni.algorithms.compression.pytorch.pruning.lottery_ticket.LotteryTicketPruner(model, config_list, optimizer=None, lr_scheduler=None, reset_weights=True)[源代码]
参数
  • model (pytorch model) -- The model to be pruned

  • config_list (list) --

    Supported keys:
    • prune_iterations : The number of rounds for the iterative pruning.

    • sparsity : The final sparsity when the compression is done.

  • optimizer (pytorch optimizer) -- The optimizer for the model

  • lr_scheduler (pytorch lr scheduler) -- The lr scheduler for the model if used

  • reset_weights (bool) -- Whether reset weights and optimizer at the beginning of each round.

calc_mask(wrapper, **kwargs)[源代码]

Generate mask for the given weight.

参数

wrapper (Module) -- The layer to be pruned

返回

The mask for this weight, it is `None` because this pruner calculates and assigns masks in `prune_iteration_start`, no need to do anything in this function.

返回类型

tensor

get_prune_iterations()[源代码]

Return the range for iterations. In the first prune iteration, masks are all one, thus, add one more iteration

返回

A list for pruning iterations

返回类型

list

prune_iteration_start()[源代码]

Control the pruning procedure on updated epoch number. Should be called at the beginning of the epoch.

validate_config(model, config_list)[源代码]
参数
  • model (torch.nn.Module) -- Model to be pruned

  • config_list (list) --

    Supported keys:
    • prune_iterations : The number of rounds for the iterative pruning.

    • sparsity : The final sparsity when the compression is done.

class nni.algorithms.compression.pytorch.pruning.transformer_pruner.TransformerHeadPruner(model, config_list, head_hidden_dim, attention_name_groups=None, dummy_input=None, ranking_criterion='l1_weight', global_sort=False, num_iterations=1, epochs_per_iteration=1, optimizer=None, trainer=None, criterion=None, forward_runner=None, **algo_kwargs)[源代码]

A pruner specialized for pruning attention heads in models belong to the transformer family.

参数
  • model (torch.nn.Module) -- Model to be pruned. Expect a model from transformers library (e.g., BertModel). This pruner can work with other customized transformer models, but some ranking modes might fail.

  • config_list (list) --

    Supported keys:
    • sparsity : This is to specify the sparsity operations to be compressed to.

    • op_types : Optional. Operation types to prune. (Should be 'Linear' for this pruner.)

    • op_names : Optional. Operation names to prune.

  • head_hidden_dim (int) -- Dimension of the hidden dimension of each attention head. (e.g., 64 for BERT) We assume that this head_hidden_dim is constant across the entire model.

  • attention_name_groups (list (Optional)) -- List of groups of names for weights of each attention layer. Each element should be a four-element list, with the first three corresponding to Q_proj, K_proj, V_proj (in any order) and the last one being output_proj.

  • dummy_input (torch.Tensor (Optional)) -- Input to model's forward method, used to infer module grouping if attention_name_groups is not specified. This tensor is used by the underlying torch.jit.trace to infer the module graph.

  • ranking_criterion (str) --

    The criterion for ranking attention heads. Currently we support:
    • l1_weight: l1 norm of Q_proj, K_proj, and V_proj

    • l2_weight: l2 norm of Q_proj, K_proj, and V_proj

    • l1_activation: l1 norm of the output of attention computation

    • l2_activation: l2 norm of the output of attention computation

    • taylorfo: l1 norm of the output of attention computation * gradient for this output

      (check more details in the masker documentation)

  • global_sort (bool) -- Whether rank the heads globally or locally before deciding heads to prune.

  • num_iterations (int) -- Number of pruning iterations. Defaults to 1 (ont-shot pruning). If num_iterations > 1, the pruner will split the sparsity specified in config_list uniformly and assign a fraction to each pruning iteration.

  • epochs_per_iteration (int) -- Number of finetuning epochs before the next pruning iteration. Only used when num_iterations > 1. If num_iterations is 1, then no finetuning is performed by the pruner after pruning.

  • optimizer (torch.optim.Optimizer) -- Optimizer used to train model

  • trainer (function) -- Function used to finetune the model between pruning iterations. Only used when num_iterations > 1 or ranking_criterion is 'taylorfo'. Users should write this function as a normal function to train the PyTorch model and include model, optimizer, criterion, epoch as function arguments. Note that the trainer is also used for collecting gradients for pruning if ranking_criterion is 'taylorfo'. In that case, epoch=None will be passed.

  • criterion (function) -- Function used to calculate the loss between the target and the output. Only used when num_iterations > 1 or ranking_criterion is 'taylorfo'. For example, you can use torch.nn.CrossEntropyLoss() as input.

  • forward_runner (function) -- Function used to perform a "dry run" on the model on the entire train/validation dataset in order to collect data for pruning required by the criteria 'l1_activation' or 'l2_activation'. Only used when ranking_criterion is 'l1_activation' or 'l2_activation'. Users should write this function as a normal function that accepts a PyTorch model and runs forward on the model using the entire train/validation dataset. This function is not expected to perform any backpropagation or parameter updates.

calc_mask(wrapper, **kwargs)[源代码]

Pruners should overload this method to provide mask for weight tensors. The mask must have the same shape and type comparing to the weight. It will be applied with mul() operation on the weight. This method is effectively hooked to forward() method of the model.

参数

wrapper (Module) -- calculate mask for wrapper.module's weight

compress()[源代码]

Compress the model with algorithm implemented by subclass.

The model will be instrumented and user should never edit it after calling this method. self.modules_to_compress records all the to-be-compressed layers

返回

model with specified modules compressed.

返回类型

torch.nn.Module

group_weight_names_by_graph()[源代码]

Populate self.attention_name_groups by running inference on the module graph. Currently, the group inferred AttentionWeightDependency is limited to a set of four weights, with the first three corresponding to Q_proj, K_proj, V_proj (in any order) and the last one being output_proj.

group_weights_by_name()[源代码]

Populate self.masking_groups using the groups specified by user in attention_name_groups.

remove_ungrouped_modules()[源代码]

Remove non-attention weights that might be mistakenly captured by a simplified config_list. Also update the corresponding list of layer information (self.modules_to_compress)

update_mask()[源代码]

Calculate and update masks for each masking group. If global_sort is set, the masks for all groups are calculated altogether, and then the groups are updated individually.

validate_config(model, config_list)[源代码]
参数
  • model (torch.nn.Module) -- Model to be pruned

  • config_list (list) -- List on pruning configs

validate_weight_groups()[源代码]
Sanity checks:
  • Q, K, V projection weights in each groups must have the same shape

  • output projection weight shape must match total hidden dimension (inferred from Q, K, V projection)

  • Four weights in a group must have the same sparsity in their config

  • If global_sort is specified, all weights must have the same sparsity

  • head_hidden_dim must be a divisor of the output dimension of the projection weights (i.e., the resulting head number must be an integer)

Quantizers

class nni.algorithms.compression.pytorch.quantization.NaiveQuantizer(model, config_list, optimizer=None)[源代码]

quantize weight to 8 bits

quantize_weight(wrapper, **kwargs)[源代码]

quantize should overload this method to quantize weight. This method is effectively hooked to forward() of the model. :param wrapper: the wrapper for origin module :type wrapper: QuantizerModuleWrapper

validate_config(model, config_list)[源代码]

subclass can optionally implement this method to check if config_list if valid

class nni.algorithms.compression.pytorch.quantization.QAT_Quantizer(model, config_list, optimizer, dummy_input=None)[源代码]

Quantizer defined in: Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf

export_model(model_path, calibration_path=None, onnx_path=None, input_shape=None, device=None)[源代码]

Export quantized model weights and calibration parameters(optional)

参数
  • model_path (str) -- path to save quantized model weight

  • calibration_path (str) -- (optional) path to save quantize parameters after calibration

  • onnx_path (str) -- (optional) path to save onnx model

  • input_shape (list or tuple) -- input shape to onnx model

  • device (torch.device) -- device of the model, used to place the dummy input tensor for exporting onnx file. the tensor is placed on cpu if `device` is None

返回类型

Dict

load_calibration_config(calibration_config)[源代码]

This function aims to help quantizer set quantization parameters by loading from a calibration_config which is exported by other quantizer or itself. The main usage of this function is helping quantize aware training quantizer set appropriate initial parameters so that the training process will be much more flexible and converges quickly. What's more, it can also enable quantizer resume quantization model by loading parameters from config.

参数

calibration_config (dict) -- dict which saves quantization parameters, quantizer can export itself calibration config. eg, calibration_config = quantizer.export_model(model_path, calibration_path)

quantize_input(inputs, wrapper, **kwargs)[源代码]

quantize should overload this method to quantize input. This method is effectively hooked to forward() of the model. :param inputs: inputs that needs to be quantized :type inputs: Tensor :param wrapper: the wrapper for origin module :type wrapper: QuantizerModuleWrapper

quantize_output(output, wrapper, **kwargs)[源代码]

quantize should overload this method to quantize output. This method is effectively hooked to forward() of the model. :param output: output that needs to be quantized :type output: Tensor :param wrapper: the wrapper for origin module :type wrapper: QuantizerModuleWrapper

quantize_weight(wrapper, **kwargs)[源代码]

quantize should overload this method to quantize weight. This method is effectively hooked to forward() of the model. :param wrapper: the wrapper for origin module :type wrapper: QuantizerModuleWrapper

step_with_optimizer()[源代码]

override compressor step method, quantization only happens after certain number of steps

validate_config(model, config_list)[源代码]
参数
  • model (torch.nn.Module) -- Model to be pruned

  • config_list (list of dict) -- List of configurations

class nni.algorithms.compression.pytorch.quantization.DoReFaQuantizer(model, config_list, optimizer)[源代码]

Quantizer using the DoReFa scheme, as defined in: Zhou et al., DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients (https://arxiv.org/abs/1606.06160)

export_model(model_path, calibration_path=None, onnx_path=None, input_shape=None, device=None)[源代码]

Export quantized model weights and calibration parameters(optional)

参数
  • model_path (str) -- path to save quantized model weight

  • calibration_path (str) -- (optional) path to save quantize parameters after calibration

  • onnx_path (str) -- (optional) path to save onnx model

  • input_shape (list or tuple) -- input shape to onnx model

  • device (torch.device) -- device of the model, used to place the dummy input tensor for exporting onnx file. the tensor is placed on cpu if `device` is None

返回类型

Dict

quantize_weight(wrapper, **kwargs)[源代码]

quantize should overload this method to quantize weight. This method is effectively hooked to forward() of the model. :param wrapper: the wrapper for origin module :type wrapper: QuantizerModuleWrapper

validate_config(model, config_list)[源代码]
参数
  • model (torch.nn.Module) -- Model to be pruned

  • config_list (list of dict) -- List of configurations

class nni.algorithms.compression.pytorch.quantization.BNNQuantizer(model, config_list, optimizer)[源代码]

Binarized Neural Networks, as defined in: Binarized Neural Networks: Training Deep Neural Networks with Weights and Outputs Constrained to +1 or -1 (https://arxiv.org/abs/1602.02830)

export_model(model_path, calibration_path=None, onnx_path=None, input_shape=None, device=None)[源代码]

Export quantized model weights and calibration parameters(optional)

参数
  • model_path (str) -- path to save quantized model weight

  • calibration_path (str) -- (optional) path to save quantize parameters after calibration

  • onnx_path (str) -- (optional) path to save onnx model

  • input_shape (list or tuple) -- input shape to onnx model

  • device (torch.device) -- device of the model, used to place the dummy input tensor for exporting onnx file. the tensor is placed on cpu if `device` is None

返回类型

Dict

quantize_output(output, wrapper, **kwargs)[源代码]

quantize should overload this method to quantize output. This method is effectively hooked to forward() of the model. :param output: output that needs to be quantized :type output: Tensor :param wrapper: the wrapper for origin module :type wrapper: QuantizerModuleWrapper

quantize_weight(wrapper, **kwargs)[源代码]

quantize should overload this method to quantize weight. This method is effectively hooked to forward() of the model. :param wrapper: the wrapper for origin module :type wrapper: QuantizerModuleWrapper

validate_config(model, config_list)[源代码]
参数
  • model (torch.nn.Module) -- Model to be pruned

  • config_list (list of dict) -- List of configurations

class nni.algorithms.compression.pytorch.quantization.LsqQuantizer(model, config_list, optimizer, dummy_input=None)[源代码]

Quantizer defined in: Learned Step Size Quantization (ICLR 2020) https://arxiv.org/pdf/1902.08153.pdf

export_model(model_path, calibration_path=None, onnx_path=None, input_shape=None, device=None)[源代码]

Export quantized model weights and calibration parameters(optional)

参数
  • model_path (str) -- path to save quantized model weight

  • calibration_path (str) -- (optional) path to save quantize parameters after calibration

  • onnx_path (str) -- (optional) path to save onnx model

  • input_shape (list or tuple) -- input shape to onnx model

  • device (torch.device) -- device of the model, used to place the dummy input tensor for exporting onnx file. the tensor is placed on cpu if `device` is None

返回类型

Dict

static grad_scale(x, scale)[源代码]

Used to scale the gradient. Give tensor x, we have y=grad_scale(x, scale)=x in the forward pass, which means that this function will not change the value of x. In the backward pass, we have:

rac{lpha_L}{lpha_x}= rac{lpha_L}{lpha_y}* rac{lpha_y}{lpha_x}=sclae* rac{lpha_L}{lpha_x}

This means that the origin gradient of x is scaled by a factor of scale. Applying this function to a nn.Parameter will scale the gradient of it without changing its value.

load_calibration_config(calibration_config)[源代码]

This function aims to help quantizer set quantization parameters by loading from a calibration_config which is exported by other quantizer or itself. The main usage of this function is helping quantize aware training quantizer set appropriate initial parameters so that the training process will be much more flexible and converges quickly. What's more, it can also enable quantizer resume quantization model by loading parameters from config.

参数

calibration_config (dict) -- dict which saves quantization parameters, quantizer can export itself calibration config. eg, calibration_config = quantizer.export_model(model_path, calibration_path)

quantize_input(inputs, wrapper, **kwargs)[源代码]

quantize should overload this method to quantize input. This method is effectively hooked to forward() of the model. :param inputs: inputs that needs to be quantized :type inputs: Tensor :param wrapper: the wrapper for origin module :type wrapper: QuantizerModuleWrapper

quantize_output(output, wrapper, **kwargs)[源代码]

quantize should overload this method to quantize output. This method is effectively hooked to forward() of the model. :param output: output that needs to be quantized :type output: Tensor :param wrapper: the wrapper for origin module :type wrapper: QuantizerModuleWrapper

quantize_weight(wrapper, **kwargs)[源代码]

quantize should overload this method to quantize weight. This method is effectively hooked to forward() of the model. :param wrapper: the wrapper for origin module :type wrapper: QuantizerModuleWrapper

static round_pass(x)[源代码]

A simple way to achieve STE operation.

step_with_optimizer()[源代码]

override compressor step method, quantization only happens after certain number of steps

class nni.algorithms.compression.pytorch.quantization.ObserverQuantizer(model, config_list, optimizer=None)[源代码]

This quantizer uses observers to record weight/output statistics to get quantization information. The whole process can be divided into three steps:

  1. It will register observers to the place where quantization would happen (just like registering hooks).

  2. The observers would record tensors' statistics during calibration.

  3. Scale & zero point would be obtained after calibration.

Note that the observer type, tensor dtype and quantization qscheme are hard coded for now. Their customization are under development and will be ready soon.

compress()[源代码]

Calculate quantization information of each tensor. Note that the inference of the compressed model will no longer update the corresponding. Instead, the quantization process will be simulated, which is used to test the accuracy of the quantization.

export_model(model_path, calibration_path=None, onnx_path=None, input_shape=None, device=None)[源代码]

Export quantized model weights and calibration parameters(optional)

参数
  • model_path (str) -- path to save quantized model weight

  • calibration_path (str) -- (optional) path to save quantize parameters after calibration

  • onnx_path (str) -- (optional) path to save onnx model

  • input_shape (list or tuple) -- input shape to onnx model

  • device (torch.device) -- device of the model, used to place the dummy input tensor for exporting onnx file. the tensor is placed on cpu if `device` is None

返回类型

Dict

quantize_input(inputs, wrapper, **kwargs)[源代码]

quantize should overload this method to quantize input. This method is effectively hooked to forward() of the model. :param inputs: inputs that needs to be quantized :type inputs: Tensor :param wrapper: the wrapper for origin module :type wrapper: QuantizerModuleWrapper

quantize_output(output, wrapper, **kwargs)[源代码]

quantize should overload this method to quantize output. This method is effectively hooked to forward() of the model. :param output: output that needs to be quantized :type output: Tensor :param wrapper: the wrapper for origin module :type wrapper: QuantizerModuleWrapper

quantize_weight(wrapper, **kwargs)[源代码]

quantize should overload this method to quantize weight. This method is effectively hooked to forward() of the model. :param wrapper: the wrapper for origin module :type wrapper: QuantizerModuleWrapper

validate_config(model, config_list)[源代码]

subclass can optionally implement this method to check if config_list if valid

Model Speedup

Quantization Speedup

class nni.compression.pytorch.quantization_speedup.backend.BaseModelSpeedup(model, config)[源代码]

Base speedup class for backend engine

compress()[源代码]

This function should be overrided by subclass to build inference engine which will be used to process input data

export_quantized_model(path)[源代码]

This function should be overrided by subclass to build inference engine which will be used to process input data

inference(test_data)[源代码]

This function should be overrided by subclass to provide inference ability, which should return output and inference time.

参数

test_data (numpy data) -- test data given to the inference engine

返回

  • numpy data -- output data will be generated after inference

  • float -- latency of such inference process

class nni.compression.pytorch.quantization_speedup.integrated_tensorrt.ModelSpeedupTensorRT(model, input_shape, config=None, onnx_path='default_model.onnx', extra_layer_bits=32, strict_datatype=True, calibrate_type=tensorrt.CalibrationAlgoType.ENTROPY_CALIBRATION_2, calib_data_loader=None, calibration_cache='calibration.cache', batchsize=1, input_names=['actual_input_1'], output_names=['output1'])[源代码]
compress()[源代码]

Get onnx config and build tensorrt engine.

export_quantized_model(path)[源代码]

Export TensorRT quantized model engine which only can be loaded by TensorRT deserialize API.

参数

path (str) -- The path of export model

inference(test_data)[源代码]

Do inference by tensorrt builded engine.

参数

test_data (pytorch tensor) -- Model input tensor

load_quantized_model(path)[源代码]

Load TensorRT quantized model engine from specific path.

参数

path (str) -- The path of export model

class nni.compression.pytorch.quantization_speedup.calibrator.Calibrator(*args: Any, **kwargs: Any)[源代码]
get_batch(names)[源代码]

This function is used to define the way of feeding calibrating data each batch.

参数

names (str) -- The names of the network inputs for each object in the bindings array

返回

A list of device memory pointers set to the memory containing each network input data, or an empty list if there are no more batches for calibration. You can allocate these device buffers with pycuda, for example, and then cast them to int to retrieve the pointer

返回类型

list

read_calibration_cache()[源代码]

If there is a cache, use it instead of calibrating again. Otherwise, implicitly return None.

返回

A cache object which contains calibration parameters for quantization

返回类型

cache object

write_calibration_cache(cache)[源代码]

Write calibration cache to specific path.

参数

cache (str) -- The calibration cache to write

Compression Utilities

Sensitivity Utilities

class nni.compression.pytorch.utils.sensitivity_analysis.SensitivityAnalysis(model, val_func, sparsities=None, prune_type='l1', early_stop_mode=None, early_stop_value=None)[源代码]
analysis(val_args=None, val_kwargs=None, specified_layers=None)[源代码]

This function analyze the sensitivity to pruning for each conv layer in the target model. If start and end are not set, we analyze all the conv layers by default. Users can specify several layers to analyze or parallelize the analysis process easily through the start and end parameter.

参数
  • val_args (list) -- args for the val_function

  • val_kwargs (dict) -- kwargs for the val_funtion

  • specified_layers (list) -- list of layer names to analyze sensitivity. If this variable is set, then only analyze the conv layers that specified in the list. User can also use this option to parallelize the sensitivity analysis easily.

返回

sensitivities -- dict object that stores the trajectory of the accuracy/loss when the prune ratio changes

返回类型

dict

export(filepath)[源代码]

Export the results of the sensitivity analysis to a csv file. The firstline of the csv file describe the content structure. The first line is constructed by 'layername' and sparsity list. Each line below records the validation metric returned by val_func when this layer is under different sparsities. Note that, due to the early_stop option, some layers may not have the metrics under all sparsities.

layername, 0.25, 0.5, 0.75 conv1, 0.6, 0.55 conv2, 0.61, 0.57, 0.56

参数

filepath (str) -- Path of the output file

load_state_dict(state_dict)[源代码]

Update the weight of the model

update_already_pruned(layername, ratio)[源代码]

Set the already pruned ratio for the target layer.

Topology Utilities

class nni.compression.pytorch.utils.shape_dependency.ChannelDependency(model, dummy_input, traced_model=None, prune_type='Filter')[源代码]
build_dependency()[源代码]

Build the channel dependency for the conv layers in the model.

property dependency_sets

Get the list of the dependency set.

返回

dependency_sets -- list of the dependency sets. For example, [set(['conv1', 'conv2']), set(['conv3', 'conv4'])]

返回类型

list

export(filepath)[源代码]

export the channel dependencies as a csv file. The layers at the same line have output channel dependencies with each other. For example, layer1.1.conv2, conv1, and layer1.0.conv2 have output channel dependencies with each other, which means the output channel(filters) numbers of these three layers should be same with each other, otherwise the model may has shape conflict. Output example: Dependency Set,Convolutional Layers Set 1,layer1.1.conv2,layer1.0.conv2,conv1 Set 2,layer1.0.conv1 Set 3,layer1.1.conv1

class nni.compression.pytorch.utils.shape_dependency.GroupDependency(model, dummy_input, traced_model=None)[源代码]
build_dependency()[源代码]

Build the channel dependency for the conv layers in the model. This function return the group number of each conv layers. Note that, here, the group count of conv layers may be larger than their originl groups. This is because that the input channel will also be grouped for the group conv layers. To make this clear, assume we have two group conv layers: conv1(group=2), conv2(group=4). conv2 takes the output features of conv1 as input. Then we have to the filters of conv1 can still be divided into 4 groups after filter pruning, because the input channels of conv2 should be divided into 4 groups.

返回

self.dependency -- key: the name of conv layers, value: the minimum value that the number of filters should be divisible to.

返回类型

dict

export(filepath)[源代码]

export the group dependency to a csv file. Each line describes a convolution layer, the first part of each line is the Pytorch module name of the conv layer. The second part of each line is the group count of the filters in this layer. Note that, the group count may be larger than this layers original group number. output example: Conv layer, Groups Conv1, 1 Conv2, 2 Conv3, 4

class nni.compression.pytorch.utils.mask_conflict.GroupMaskConflict(masks, model, dummy_input, traced=None)[源代码]
fix_mask()[源代码]

Fix the mask conflict before the mask inference for the layers that has group dependencies. This function should be called before the mask inference of the 'speedup' module.

class nni.compression.pytorch.utils.mask_conflict.ChannelMaskConflict(masks, model, dummy_input, traced=None)[源代码]
fix_mask()[源代码]

Fix the mask conflict before the mask inference for the layers that has shape dependencies. This function should be called before the mask inference of the 'speedup' module. Only structured pruning masks are supported.

Model FLOPs/Parameters Counter

nni.compression.pytorch.utils.counter.count_flops_params(model, x, custom_ops=None, verbose=True, mode='default')[源代码]

Count FLOPs and Params of the given model. This function would identify the mask on the module and take the pruned shape into consideration. Note that, for sturctured pruning, we only identify the remained filters according to its mask, and do not take the pruned input channels into consideration, so the calculated FLOPs will be larger than real number.

The FLOPs is counted "per sample", which means that input has a batch size larger than 1, the calculated FLOPs should not differ from batch size of 1.

参数
  • model (nn.Module) -- Target model.

  • x (tuple or tensor) -- The input shape of data (a tuple), a tensor or a tuple of tensor as input data.

  • custom_ops (dict) -- A mapping of (module -> torch.nn.Module : custom operation) the custom operation is a callback funtion to calculate the module flops and parameters, it will overwrite the default operation. for reference, please see ops in ModelProfiler.

  • verbose (bool) -- If False, mute detail information about modules. Default is True.

  • mode (str) -- the mode of how to collect information. If the mode is set to default, only the information of convolution and linear will be collected. If the mode is set to full, other operations will also be collected.

返回

Representing total FLOPs, total parameters, and a detailed list of results respectively. The list of results are a list of dict, each of which contains (name, module_type, weight_shape, flops, params, input_size, output_size) as its keys.

返回类型

tuple of int, int and dict