Model Compression API Reference¶
Contents
Compressors¶
Compressor¶
-
class
nni.compression.pytorch.compressor.
Compressor
(model, config_list, optimizer=None)[source]¶ Abstract base PyTorch compressor
-
compress
()[source]¶ Compress the model with algorithm implemented by subclass.
The model will be instrumented and user should never edit it after calling this method. self.modules_to_compress records all the to-be-compressed layers
- Returns
model with specified modules compressed.
- Return type
torch.nn.Module
-
get_modules_to_compress
()[source]¶ To obtain all the to-be-compressed modules.
- Returns
a list of the layers, each of which is a tuple (layer, config), layer is LayerInfo, config is a dict
- Return type
list
-
get_modules_wrapper
()[source]¶ To obtain all the wrapped modules.
- Returns
a list of the wrapped modules
- Return type
list
-
select_config
(layer)[source]¶ Find the configuration for layer by parsing self.config_list
- Parameters
layer (LayerInfo) – one layer
- Returns
the retrieved configuration for this layer, if None, this layer should not be compressed
- Return type
config or None
-
set_wrappers_attribute
(name, value)[source]¶ To register attributes used in wrapped module’s forward method. If the type of the value is Torch.tensor, then this value is registered as a buffer in wrapper, which will be saved by model.state_dict. Otherwise, this value is just a regular variable in wrapper.
- Parameters
name (str) – name of the variable
value (any) – value of the variable
-
-
class
nni.compression.pytorch.compressor.
Pruner
(model, config_list, optimizer=None)[source]¶ Prune to an exact pruning level specification
-
mask_dict
¶ Dictionary for saving masks, key should be layer name and value should be a tensor which has the same shape with layer’s weight
- Type
dict
-
calc_mask
(wrapper, **kwargs)[source]¶ Pruners should overload this method to provide mask for weight tensors. The mask must have the same shape and type comparing to the weight. It will be applied with mul() operation on the weight. This method is effectively hooked to forward() method of the model.
- Parameters
wrapper (Module) – calculate mask for wrapper.module’s weight
-
compress
()[source]¶ Compress the model with algorithm implemented by subclass.
The model will be instrumented and user should never edit it after calling this method. self.modules_to_compress records all the to-be-compressed layers
- Returns
model with specified modules compressed.
- Return type
torch.nn.Module
-
export_model
(model_path, mask_path=None, onnx_path=None, input_shape=None, device=None)[source]¶ Export pruned model weights, masks and onnx model(optional)
- Parameters
model_path (str) – path to save pruned model state_dict
mask_path (str) – (optional) path to save mask dict
onnx_path (str) – (optional) path to save onnx model
input_shape (list or tuple) – input shape to onnx model
device (torch.device) – device of the model, used to place the dummy input tensor for exporting onnx file. the tensor is placed on cpu if
`device`
is None
-
-
class
nni.compression.pytorch.compressor.
Quantizer
(model, config_list, optimizer=None)[source]¶ Base quantizer for pytorch quantizer
-
quantize_input
(*inputs, wrapper, **kwargs)[source]¶ quantize should overload this method to quantize input. This method is effectively hooked to
forward()
of the model. :param inputs: inputs that needs to be quantized :type inputs: Tensor :param wrapper: the wrapper for origin module :type wrapper: QuantizerModuleWrapper
-
quantize_output
(output, wrapper, **kwargs)[source]¶ quantize should overload this method to quantize output. This method is effectively hooked to
forward()
of the model. :param output: output that needs to be quantized :type output: Tensor :param wrapper: the wrapper for origin module :type wrapper: QuantizerModuleWrapper
-
quantize_weight
(weight, wrapper, **kwargs)[source]¶ quantize should overload this method to quantize weight. This method is effectively hooked to
forward()
of the model. :param weight: weight that needs to be quantized :type weight: Tensor :param wrapper: the wrapper for origin module :type wrapper: QuantizerModuleWrapper
-
Module Wrapper¶
-
class
nni.compression.pytorch.compressor.
PrunerModuleWrapper
(module, module_name, module_type, config, pruner)[source]¶ -
forward
(*inputs)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
nni.compression.pytorch.compressor.
QuantizerModuleWrapper
(module, module_name, module_type, config, quantizer)[source]¶ -
forward
(*inputs)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
Weight Masker¶
-
class
nni.algorithms.compression.pytorch.pruning.weight_masker.
WeightMasker
(model, pruner, **kwargs)[source]¶ -
calc_mask
(sparsity, wrapper, wrapper_idx=None)[source]¶ Calculate the mask of given layer. :param sparsity: pruning ratio, preserved weight ratio is 1 - sparsity :type sparsity: float :param wrapper: layer wrapper of this layer :type wrapper: PrunerModuleWrapper :param wrapper_idx: index of this wrapper in pruner’s all wrappers :type wrapper_idx: int
- Returns
dictionary for storing masks, keys of the dict: ‘weight_mask’: weight mask tensor ‘bias_mask’: bias mask tensor (optional)
- Return type
dict
-
-
class
nni.algorithms.compression.pytorch.pruning.structured_pruning.
StructuredWeightMasker
(model, pruner, preserve_round=1, dependency_aware=False)[source]¶ A structured pruning masker base class that prunes convolutional layer filters.
- Parameters
model (nn.Module) – model to be pruned
pruner (Pruner) – A Pruner instance used to prune the model
preserve_round (int) – after pruning, preserve filters/channels round to preserve_round, for example: for a Conv2d layer, output channel is 32, sparsity is 0.2, if preserve_round is 1 (no preserve round), then there will be int(32 * 0.2) = 6 filters pruned, and 32 - 6 = 26 filters are preserved. If preserve_round is 4, preserved filters will be round up to 28 (which can be divided by 4) and only 4 filters are pruned.
-
calc_mask
(sparsity, wrapper, wrapper_idx=None, **depen_kwargs)[source]¶ calculate the mask for wrapper.
- Parameters
sparsity (float/list of float) – The target sparsity of the wrapper. If we calculate the mask in the normal way, then sparsity is a float number. In contrast, if we calculate the mask in the dependency-aware way, sparsity is a list of float numbers, each float number corressponds to a sparsity of a layer.
wrapper (PrunerModuleWrapper/list of PrunerModuleWrappers) – The wrapper of the target layer. If we calculate the mask in the normal way, then wrapper is an instance of PrunerModuleWrapper, else wrapper is a list of PrunerModuleWrapper.
wrapper_idx (int/list of int) – The index of the wrapper.
depen_kwargs (dict) – The kw_args for the dependency-aware mode.
-
get_channel_sum
(wrapper, wrapper_idx)[source]¶ Calculate the importance weight for each channel. If want to support the dependency-aware mode for this one-shot pruner, this function must be implemented. :param wrapper: layer wrapper of this layer :type wrapper: PrunerModuleWrapper :param wrapper_idx: index of this wrapper in pruner’s all wrappers :type wrapper_idx: int
- Returns
Tensor that indicates the importance of each channel
- Return type
tensor
-
get_mask
(base_mask, weight, num_prune, wrapper, wrapper_idx, channel_masks=None)[source]¶ Calculate the mask of given layer.
- Parameters
base_mask (dict) – The basic mask with the same shape of weight, all item in the basic mask is 1.
weight (tensor) – the module weight to be pruned
num_prune (int) – Num of filters to prune
wrapper (PrunerModuleWrapper) – layer wrapper of this layer
wrapper_idx (int) – index of this wrapper in pruner’s all wrappers
channel_masks (Tensor) – If mask some channels for this layer in advance. In the dependency-aware mode, before calculating the masks for each layer, we will calculate a common mask for all the layers in the dependency set. For the pruners that doesnot support dependency-aware mode, they can just ignore this parameter.
- Returns
dictionary for storing masks
- Return type
dict
Pruners¶
-
class
nni.algorithms.compression.pytorch.pruning.sensitivity_pruner.
SensitivityPruner
(model, config_list, evaluator, finetuner=None, base_algo='l1', sparsity_proportion_calc=None, sparsity_per_iter=0.1, acc_drop_threshold=0.05, checkpoint_dir=None)[source]¶ This function prune the model based on the sensitivity for each layer.
- Parameters
model (torch.nn.Module) – model to be compressed
evaluator (function) – validation function for the model. This function should return the accuracy of the validation dataset. The input parameters of evaluator can be specified in the parameter eval_args and ‘eval_kwargs’ of the compress function if needed. Example: >>> def evaluator(model): >>> device = torch.device(“cuda” if torch.cuda.is_available() else “cpu”) >>> val_loader = … >>> model.eval() >>> correct = 0 >>> with torch.no_grad(): >>> for data, target in val_loader: >>> data, target = data.to(device), target.to(device) >>> output = model(data) >>> # get the index of the max log-probability >>> pred = output.argmax(dim=1, keepdim=True) >>> correct += pred.eq(target.view_as(pred)).sum().item() >>> accuracy = correct / len(val_loader.dataset) >>> return accuracy
finetuner (function) – finetune function for the model. This parameter is not essential, if is not None, the sensitivity pruner will finetune the model after pruning in each iteration. The input parameters of finetuner can be specified in the parameter of compress called finetune_args and finetune_kwargs if needed. Example: >>> def finetuner(model, epoch=3): >>> device = torch.device(“cuda” if torch.cuda.is_available() else “cpu”) >>> train_loader = … >>> criterion = torch.nn.CrossEntropyLoss() >>> optimizer = torch.optim.SGD(model.parameters(), lr=0.01) >>> model.train() >>> for _ in range(epoch): >>> for _, (data, target) in enumerate(train_loader): >>> data, target = data.to(device), target.to(device) >>> optimizer.zero_grad() >>> output = model(data) >>> loss = criterion(output, target) >>> loss.backward() >>> optimizer.step()
base_algo (str) – base pruning algorithm. level, l1, l2 or fpgm, by default l1.
sparsity_proportion_calc (function) – This function generate the sparsity proportion between the conv layers according to the sensitivity analysis results. We provide a default function to quantify the sparsity proportion according to the sensitivity analysis results. Users can also customize this function according to their needs. The input of this function is a dict, for example : {‘conv1’ : {0.1: 0.9, 0.2 : 0.8}, ‘conv2’ : {0.1: 0.9, 0.2 : 0.8}}, in which, ‘conv1’ and is the name of the conv layer, and 0.1:0.9 means when the sparsity of conv1 is 0.1 (10%), the model’s val accuracy equals to 0.9.
sparsity_per_iter (float) – The sparsity of the model that the pruner try to prune in each iteration.
acc_drop_threshold (float) – The hyperparameter used to quantifiy the sensitivity for each layer.
checkpoint_dir (str) – The dir path to save the checkpoints during the pruning.
-
calc_mask
(wrapper, **kwargs)[source]¶ Pruners should overload this method to provide mask for weight tensors. The mask must have the same shape and type comparing to the weight. It will be applied with mul() operation on the weight. This method is effectively hooked to forward() method of the model.
- Parameters
wrapper (Module) – calculate mask for wrapper.module’s weight
-
compress
(eval_args=None, eval_kwargs=None, finetune_args=None, finetune_kwargs=None, resume_sensitivity=None)[source]¶ This function iteratively prune the model according to the results of the sensitivity analysis.
- Parameters
eval_args (list) –
eval_kwargs (list& dict) – Parameters for the val_funtion, the val_function will be called like evaluator(*eval_args, **eval_kwargs)
finetune_args (list) –
finetune_kwargs (dict) – Parameters for the finetuner function if needed.
resume_sensitivity – resume the sensitivity results from this file.
-
create_cfg
(ratios)[source]¶ Generate the cfg_list for the pruner according to the prune ratios.
- Parameters
ratios – For example: {‘conv1’ : 0.2}
- Returns
For example: [{‘sparsity’:0.2, ‘op_names’:[‘conv1’], ‘op_types’:[‘Conv2d’]}]
- Return type
cfg_list
-
load_sensitivity
(filepath)[source]¶ load the sensitivity results exported by the sensitivity analyzer
-
normalize
(ratios, target_pruned)[source]¶ Normalize the prune ratio of each layer according to the total already pruned ratio and the final target total pruning ratio
- Parameters
ratios – Dict object that save the prune ratio for each layer
target_pruned – The amount of the weights expected to be pruned in this iteration
- Returns
return the normalized prune ratios for each layer.
- Return type
new_ratios
-
class
nni.algorithms.compression.pytorch.pruning.one_shot.
OneshotPruner
(model, config_list, pruning_algorithm='level', optimizer=None, **algo_kwargs)[source]¶ Prune model to an exact pruning level for one time.
-
calc_mask
(wrapper, wrapper_idx=None)[source]¶ Calculate the mask of given layer :param wrapper: the module to instrument the compression operation :type wrapper: Module :param wrapper_idx: index of this wrapper in pruner’s all wrappers :type wrapper_idx: int
- Returns
dictionary for storing masks, keys of the dict: ‘weight_mask’: weight mask tensor ‘bias_mask’: bias mask tensor (optional)
- Return type
dict
-
-
class
nni.algorithms.compression.pytorch.pruning.one_shot.
LevelPruner
(model, config_list, optimizer=None)[source]¶ - Parameters
model (torch.nn.Module) – Model to be pruned
config_list (list) –
- Supported keys:
sparsity : This is to specify the sparsity operations to be compressed to.
op_types : Operation types to prune.
optimizer (torch.optim.Optimizer) – Optimizer used to train model
-
class
nni.algorithms.compression.pytorch.pruning.one_shot.
SlimPruner
(model, config_list, optimizer=None)[source]¶ - Parameters
model (torch.nn.Module) – Model to be pruned
config_list (list) –
- Supported keys:
sparsity : This is to specify the sparsity operations to be compressed to.
op_types : Only BatchNorm2d is supported in Slim Pruner.
optimizer (torch.optim.Optimizer) – Optimizer used to train model
-
class
nni.algorithms.compression.pytorch.pruning.one_shot.
L1FilterPruner
(model, config_list, optimizer=None, dependency_aware=False, dummy_input=None)[source]¶ - Parameters
model (torch.nn.Module) – Model to be pruned
config_list (list) –
- Supported keys:
sparsity : This is to specify the sparsity operations to be compressed to.
op_types : Only Conv2d is supported in L1FilterPruner.
optimizer (torch.optim.Optimizer) – Optimizer used to train model
dependency_aware (bool) – If prune the model in a dependency-aware way. If it is True, this pruner will prune the model according to the l2-norm of weights and the channel-dependency or group-dependency of the model. In this way, the pruner will force the conv layers that have dependencies to prune the same channels, so the speedup module can better harvest the speed benefit from the pruned model. Note that, if this flag is set True , the dummy_input cannot be None, because the pruner needs a dummy input to trace the dependency between the conv layers.
dummy_input (torch.Tensor) – The dummy input to analyze the topology constraints. Note that, the dummy_input should on the same device with the model.
-
class
nni.algorithms.compression.pytorch.pruning.one_shot.
L2FilterPruner
(model, config_list, optimizer=None, dependency_aware=False, dummy_input=None)[source]¶ - Parameters
model (torch.nn.Module) – Model to be pruned
config_list (list) –
- Supported keys:
sparsity : This is to specify the sparsity operations to be compressed to.
op_types : Only Conv2d is supported in L2FilterPruner.
optimizer (torch.optim.Optimizer) – Optimizer used to train model
dependency_aware (bool) – If prune the model in a dependency-aware way. If it is True, this pruner will prune the model according to the l2-norm of weights and the channel-dependency or group-dependency of the model. In this way, the pruner will force the conv layers that have dependencies to prune the same channels, so the speedup module can better harvest the speed benefit from the pruned model. Note that, if this flag is set True , the dummy_input cannot be None, because the pruner needs a dummy input to trace the dependency between the conv layers.
dummy_input (torch.Tensor) – The dummy input to analyze the topology constraints. Note that, the dummy_input should on the same device with the model.
-
class
nni.algorithms.compression.pytorch.pruning.one_shot.
FPGMPruner
(model, config_list, optimizer=None, dependency_aware=False, dummy_input=None)[source]¶ - Parameters
model (torch.nn.Module) – Model to be pruned
config_list (list) –
- Supported keys:
sparsity : This is to specify the sparsity operations to be compressed to.
op_types : Only Conv2d is supported in FPGM Pruner.
optimizer (torch.optim.Optimizer) – Optimizer used to train model
dependency_aware (bool) – If prune the model in a dependency-aware way. If it is True, this pruner will prune the model according to the l2-norm of weights and the channel-dependency or group-dependency of the model. In this way, the pruner will force the conv layers that have dependencies to prune the same channels, so the speedup module can better harvest the speed benefit from the pruned model. Note that, if this flag is set True , the dummy_input cannot be None, because the pruner needs a dummy input to trace the dependency between the conv layers.
dummy_input (torch.Tensor) – The dummy input to analyze the topology constraints. Note that, the dummy_input should on the same device with the model.
-
class
nni.algorithms.compression.pytorch.pruning.one_shot.
TaylorFOWeightFilterPruner
(model, config_list, optimizer=None, statistics_batch_num=1, dependency_aware=False, dummy_input=None)[source]¶ - Parameters
model (torch.nn.Module) – Model to be pruned
config_list (list) –
- Supported keys:
sparsity : How much percentage of convolutional filters are to be pruned.
op_types : Currently only Conv2d is supported in TaylorFOWeightFilterPruner.
optimizer (torch.optim.Optimizer) – Optimizer used to train model
statistics_batch_num (int) – The number of batches to statistic the activation.
dependency_aware (bool) – If prune the model in a dependency-aware way. If it is True, this pruner will prune the model according to the l2-norm of weights and the channel-dependency or group-dependency of the model. In this way, the pruner will force the conv layers that have dependencies to prune the same channels, so the speedup module can better harvest the speed benefit from the pruned model. Note that, if this flag is set True , the dummy_input cannot be None, because the pruner needs a dummy input to trace the dependency between the conv layers.
dummy_input (torch.Tensor) – The dummy input to analyze the topology constraints. Note that, the dummy_input should on the same device with the model.
-
class
nni.algorithms.compression.pytorch.pruning.one_shot.
ActivationAPoZRankFilterPruner
(model, config_list, optimizer=None, activation='relu', statistics_batch_num=1, dependency_aware=False, dummy_input=None)[source]¶ - Parameters
model (torch.nn.Module) – Model to be pruned
config_list (list) –
- Supported keys:
sparsity : How much percentage of convolutional filters are to be pruned.
op_types : Only Conv2d is supported in ActivationAPoZRankFilterPruner.
optimizer (torch.optim.Optimizer) – Optimizer used to train model
activation (str) – The activation type.
statistics_batch_num (int) – The number of batches to statistic the activation.
dependency_aware (bool) – If prune the model in a dependency-aware way. If it is True, this pruner will prune the model according to the l2-norm of weights and the channel-dependency or group-dependency of the model. In this way, the pruner will force the conv layers that have dependencies to prune the same channels, so the speedup module can better harvest the speed benefit from the pruned model. Note that, if this flag is set True , the dummy_input cannot be None, because the pruner needs a dummy input to trace the dependency between the conv layers.
dummy_input (torch.Tensor) – The dummy input to analyze the topology constraints. Note that, the dummy_input should on the same device with the model.
-
class
nni.algorithms.compression.pytorch.pruning.one_shot.
ActivationMeanRankFilterPruner
(model, config_list, optimizer=None, activation='relu', statistics_batch_num=1, dependency_aware=False, dummy_input=None)[source]¶ - Parameters
model (torch.nn.Module) – Model to be pruned
config_list (list) –
- Supported keys:
sparsity : How much percentage of convolutional filters are to be pruned.
op_types : Only Conv2d is supported in ActivationMeanRankFilterPruner.
optimizer (torch.optim.Optimizer) – Optimizer used to train model.
activation (str) – The activation type.
statistics_batch_num (int) – The number of batches to statistic the activation.
dependency_aware (bool) – If prune the model in a dependency-aware way. If it is True, this pruner will prune the model according to the l2-norm of weights and the channel-dependency or group-dependency of the model. In this way, the pruner will force the conv layers that have dependencies to prune the same channels, so the speedup module can better harvest the speed benefit from the pruned model. Note that, if this flag is set True , the dummy_input cannot be None, because the pruner needs a dummy input to trace the dependency between the conv layers.
dummy_input (torch.Tensor) – The dummy input to analyze the topology constraints. Note that, the dummy_input should on the same device with the model.
-
class
nni.algorithms.compression.pytorch.pruning.lottery_ticket.
LotteryTicketPruner
(model, config_list, optimizer=None, lr_scheduler=None, reset_weights=True)[source]¶ - Parameters
model (pytorch model) – The model to be pruned
config_list (list) –
- Supported keys:
prune_iterations : The number of rounds for the iterative pruning.
sparsity : The final sparsity when the compression is done.
optimizer (pytorch optimizer) – The optimizer for the model
lr_scheduler (pytorch lr scheduler) – The lr scheduler for the model if used
reset_weights (bool) – Whether reset weights and optimizer at the beginning of each round.
-
calc_mask
(wrapper, **kwargs)[source]¶ Generate mask for the given
weight
.- Parameters
wrapper (Module) – The layer to be pruned
- Returns
The mask for this weight, it is
`None`
because this pruner calculates and assigns masks in`prune_iteration_start`
, no need to do anything in this function.- Return type
tensor
-
get_prune_iterations
()[source]¶ Return the range for iterations. In the first prune iteration, masks are all one, thus, add one more iteration
- Returns
A list for pruning iterations
- Return type
list
-
class
nni.algorithms.compression.pytorch.pruning.agp.
AGPPruner
(model, config_list, optimizer, pruning_algorithm='level')[source]¶ - Parameters
model (torch.nn.Module) – Model to be pruned.
config_list (listlist) –
- Supported keys:
initial_sparsity: This is to specify the sparsity when compressor starts to compress.
final_sparsity: This is to specify the sparsity when compressor finishes to compress.
start_epoch: This is to specify the epoch number when compressor starts to compress, default start from epoch 0.
end_epoch: This is to specify the epoch number when compressor finishes to compress.
frequency: This is to specify every frequency number epochs compressor compress once, default frequency=1.
optimizer (torch.optim.Optimizer) – Optimizer used to train model.
pruning_algorithm (str) – Algorithms being used to prune model, choose from [‘level’, ‘slim’, ‘l1’, ‘l2’, ‘fpgm’, ‘taylorfo’, ‘apoz’, ‘mean_activation’], by default level
-
calc_mask
(wrapper, wrapper_idx=None)[source]¶ Calculate the mask of given layer. Scale factors with the smallest absolute value in the BN layer are masked. :param wrapper: the layer to instrument the compression operation :type wrapper: Module :param wrapper_idx: index of this wrapper in pruner’s all wrappers :type wrapper_idx: int
- Returns
Dictionary for storing masks, keys of the dict: ‘weight_mask’: weight mask tensor ‘bias_mask’: bias mask tensor (optional)
- Return type
dict | None
-
class
nni.algorithms.compression.pytorch.pruning.admm_pruner.
ADMMPruner
(model, config_list, trainer, num_iterations=30, training_epochs=5, row=0.0001, base_algo='l1')[source]¶ A Pytorch implementation of ADMM Pruner algorithm.
- Parameters
model (torch.nn.Module) – Model to be pruned.
config_list (list) – List on pruning configs.
trainer (function) –
Function used for the first subproblem. Users should write this function as a normal function to train the Pytorch model and include model, optimizer, criterion, epoch, callback as function arguments. Here callback acts as an L2 regulizer as presented in the formula (7) of the original paper. The logic of callback is implemented inside the Pruner, users are just required to insert callback() between loss.backward() and optimizer.step(). Example:
def trainer(model, criterion, optimizer, epoch, callback): device = torch.device("cuda" if torch.cuda.is_available() else "cpu") train_loader = ... model.train() for batch_idx, (data, target) in enumerate(train_loader): data, target = data.to(device), target.to(device) optimizer.zero_grad() output = model(data) loss = criterion(output, target) loss.backward() # callback should be inserted between loss.backward() and optimizer.step() if callback: callback() optimizer.step()
num_iterations (int) – Total number of iterations.
training_epochs (int) – Training epochs of the first subproblem.
row (float) – Penalty parameters for ADMM training.
base_algo (str) – Base pruning algorithm. level, l1, l2 or fpgm, by default l1. Given the sparsity distribution among the ops, the assigned base_algo is used to decide which filters/channels/weights to prune.
-
class
nni.algorithms.compression.pytorch.pruning.auto_compress_pruner.
AutoCompressPruner
(model, config_list, trainer, evaluator, dummy_input, num_iterations=3, optimize_mode='maximize', base_algo='l1', start_temperature=100, stop_temperature=20, cool_down_rate=0.9, perturbation_magnitude=0.35, admm_num_iterations=30, admm_training_epochs=5, row=0.0001, experiment_data_dir='./')[source]¶ A Pytorch implementation of AutoCompress pruning algorithm.
- Parameters
model (pytorch model) – The model to be pruned.
config_list (list) –
- Supported keys:
sparsity : The target overall sparsity.
op_types : The operation type to prune.
trainer (function) –
Function used for the first subproblem of ADMM Pruner. Users should write this function as a normal function to train the Pytorch model and include model, optimizer, criterion, epoch, callback as function arguments. Here callback acts as an L2 regulizer as presented in the formula (7) of the original paper. The logic of callback is implemented inside the Pruner, users are just required to insert callback() between loss.backward() and optimizer.step(). Example:
def trainer(model, criterion, optimizer, epoch, callback): device = torch.device("cuda" if torch.cuda.is_available() else "cpu") train_loader = ... model.train() for batch_idx, (data, target) in enumerate(train_loader): data, target = data.to(device), target.to(device) optimizer.zero_grad() output = model(data) loss = criterion(output, target) loss.backward() # callback should be inserted between loss.backward() and optimizer.step() if callback: callback() optimizer.step()
evaluator (function) –
function to evaluate the pruned model. This function should include model as the only parameter, and returns a scalar value. Example:
def evaluator(model): device = torch.device("cuda" if torch.cuda.is_available() else "cpu") val_loader = ... model.eval() correct = 0 with torch.no_grad(): for data, target in val_loader: data, target = data.to(device), target.to(device) output = model(data) # get the index of the max log-probability pred = output.argmax(dim=1, keepdim=True) correct += pred.eq(target.view_as(pred)).sum().item() accuracy = correct / len(val_loader.dataset) return accuracy
dummy_input (pytorch tensor) – The dummy input for
`jit.trace`
, users should put it on right device before pass in.num_iterations (int) – Number of overall iterations.
optimize_mode (str) – optimize mode, maximize or minimize, by default maximize.
base_algo (str) – Base pruning algorithm. level, l1, l2 or fpgm, by default l1. Given the sparsity distribution among the ops, the assigned base_algo is used to decide which filters/channels/weights to prune.
start_temperature (float) – Start temperature of the simulated annealing process.
stop_temperature (float) – Stop temperature of the simulated annealing process.
cool_down_rate (float) – Cool down rate of the temperature.
perturbation_magnitude (float) – Initial perturbation magnitude to the sparsities. The magnitude decreases with current temperature.
admm_num_iterations (int) – Number of iterations of ADMM Pruner.
admm_training_epochs (int) – Training epochs of the first optimization subproblem of ADMMPruner.
row (float) – Penalty parameters for ADMM training.
experiment_data_dir (string) – PATH to store temporary experiment data.
-
calc_mask
(wrapper, **kwargs)[source]¶ Pruners should overload this method to provide mask for weight tensors. The mask must have the same shape and type comparing to the weight. It will be applied with mul() operation on the weight. This method is effectively hooked to forward() method of the model.
- Parameters
wrapper (Module) – calculate mask for wrapper.module’s weight
-
compress
()[source]¶ Compress the model with AutoCompress.
- Returns
model with specified modules compressed.
- Return type
torch.nn.Module
-
export_model
(model_path, mask_path=None, onnx_path=None, input_shape=None, device=None)[source]¶ Export pruned model weights, masks and onnx model(optional)
- Parameters
model_path (str) – path to save pruned model state_dict
mask_path (str) – (optional) path to save mask dict
onnx_path (str) – (optional) path to save onnx model
input_shape (list or tuple) – input shape to onnx model
device (torch.device) – device of the model, used to place the dummy input tensor for exporting onnx file. the tensor is placed on cpu if
`device`
is None
-
class
nni.algorithms.compression.pytorch.pruning.net_adapt_pruner.
NetAdaptPruner
(model, config_list, short_term_fine_tuner, evaluator, optimize_mode='maximize', base_algo='l1', sparsity_per_iteration=0.05, experiment_data_dir='./')[source]¶ A Pytorch implementation of NetAdapt compression algorithm.
- Parameters
model (pytorch model) – The model to be pruned.
config_list (list) –
- Supported keys:
sparsity : The target overall sparsity.
op_types : The operation type to prune.
short_term_fine_tuner (function) –
function to short-term fine tune the masked model. This function should include model as the only parameter, and fine tune the model for a short term after each pruning iteration. Example:
def short_term_fine_tuner(model, epoch=3): device = torch.device("cuda" if torch.cuda.is_available() else "cpu") train_loader = ... criterion = torch.nn.CrossEntropyLoss() optimizer = torch.optim.SGD(model.parameters(), lr=0.01) model.train() for _ in range(epoch): for batch_idx, (data, target) in enumerate(train_loader): data, target = data.to(device), target.to(device) optimizer.zero_grad() output = model(data) loss = criterion(output, target) loss.backward() optimizer.step()
evaluator (function) –
function to evaluate the masked model. This function should include model as the only parameter, and returns a scalar value. Example:
def evaluator(model): device = torch.device("cuda" if torch.cuda.is_available() else "cpu") val_loader = ... model.eval() correct = 0 with torch.no_grad(): for data, target in val_loader: data, target = data.to(device), target.to(device) output = model(data) # get the index of the max log-probability pred = output.argmax(dim=1, keepdim=True) correct += pred.eq(target.view_as(pred)).sum().item() accuracy = correct / len(val_loader.dataset) return accuracy
optimize_mode (str) – optimize mode, maximize or minimize, by default maximize.
base_algo (str) – Base pruning algorithm. level, l1, l2 or fpgm, by default l1. Given the sparsity distribution among the ops, the assigned base_algo is used to decide which filters/channels/weights to prune.
sparsity_per_iteration (float) – sparsity to prune in each iteration.
experiment_data_dir (str) – PATH to save experiment data, including the config_list generated for the base pruning algorithm and the performance of the pruned model.
-
calc_mask
(wrapper, **kwargs)[source]¶ Pruners should overload this method to provide mask for weight tensors. The mask must have the same shape and type comparing to the weight. It will be applied with mul() operation on the weight. This method is effectively hooked to forward() method of the model.
- Parameters
wrapper (Module) – calculate mask for wrapper.module’s weight
-
class
nni.algorithms.compression.pytorch.pruning.simulated_annealing_pruner.
SimulatedAnnealingPruner
(model, config_list, evaluator, optimize_mode='maximize', base_algo='l1', start_temperature=100, stop_temperature=20, cool_down_rate=0.9, perturbation_magnitude=0.35, experiment_data_dir='./')[source]¶ A Pytorch implementation of Simulated Annealing compression algorithm.
- Parameters
model (pytorch model) – The model to be pruned.
config_list (list) –
- Supported keys:
sparsity : The target overall sparsity.
op_types : The operation type to prune.
evaluator (function) –
Function to evaluate the pruned model. This function should include model as the only parameter, and returns a scalar value. Example:
def evaluator(model): device = torch.device("cuda" if torch.cuda.is_available() else "cpu") val_loader = ... model.eval() correct = 0 with torch.no_grad(): for data, target in val_loader: data, target = data.to(device), target.to(device) output = model(data) # get the index of the max log-probability pred = output.argmax(dim=1, keepdim=True) correct += pred.eq(target.view_as(pred)).sum().item() accuracy = correct / len(val_loader.dataset) return accuracy
optimize_mode (str) – Optimize mode, maximize or minimize, by default maximize.
base_algo (str) – Base pruning algorithm. level, l1, l2 or fpgm, by default l1. Given the sparsity distribution among the ops, the assigned base_algo is used to decide which filters/channels/weights to prune.
start_temperature (float) – Start temperature of the simulated annealing process.
stop_temperature (float) – Stop temperature of the simulated annealing process.
cool_down_rate (float) – Cool down rate of the temperature.
perturbation_magnitude (float) – Initial perturbation magnitude to the sparsities. The magnitude decreases with current temperature.
experiment_data_dir (string) – PATH to save experiment data, including the config_list generated for the base pruning algorithm, the performance of the pruned model and the pruning history.
-
calc_mask
(wrapper, **kwargs)[source]¶ Pruners should overload this method to provide mask for weight tensors. The mask must have the same shape and type comparing to the weight. It will be applied with mul() operation on the weight. This method is effectively hooked to forward() method of the model.
- Parameters
wrapper (Module) – calculate mask for wrapper.module’s weight
Quantizers¶
-
class
nni.algorithms.compression.pytorch.quantization.quantizers.
NaiveQuantizer
(model, config_list, optimizer=None)[source]¶ quantize weight to 8 bits
-
quantize_weight
(wrapper, **kwargs)[source]¶ quantize should overload this method to quantize weight. This method is effectively hooked to
forward()
of the model. :param weight: weight that needs to be quantized :type weight: Tensor :param wrapper: the wrapper for origin module :type wrapper: QuantizerModuleWrapper
-
-
class
nni.algorithms.compression.pytorch.quantization.quantizers.
QAT_Quantizer
(model, config_list, optimizer=None)[source]¶ Quantizer defined in: Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf
-
quantize_output
(output, wrapper, **kwargs)[source]¶ quantize should overload this method to quantize output. This method is effectively hooked to
forward()
of the model. :param output: output that needs to be quantized :type output: Tensor :param wrapper: the wrapper for origin module :type wrapper: QuantizerModuleWrapper
-
quantize_weight
(wrapper, **kwargs)[source]¶ quantize should overload this method to quantize weight. This method is effectively hooked to
forward()
of the model. :param weight: weight that needs to be quantized :type weight: Tensor :param wrapper: the wrapper for origin module :type wrapper: QuantizerModuleWrapper
-
-
class
nni.algorithms.compression.pytorch.quantization.quantizers.
DoReFaQuantizer
(model, config_list, optimizer=None)[source]¶ Quantizer using the DoReFa scheme, as defined in: Zhou et al., DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients (https://arxiv.org/abs/1606.06160)
-
quantize_weight
(wrapper, **kwargs)[source]¶ quantize should overload this method to quantize weight. This method is effectively hooked to
forward()
of the model. :param weight: weight that needs to be quantized :type weight: Tensor :param wrapper: the wrapper for origin module :type wrapper: QuantizerModuleWrapper
-
-
class
nni.algorithms.compression.pytorch.quantization.quantizers.
BNNQuantizer
(model, config_list, optimizer=None)[source]¶ Binarized Neural Networks, as defined in: Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 (https://arxiv.org/abs/1602.02830)
-
quantize_output
(output, wrapper, **kwargs)[source]¶ quantize should overload this method to quantize output. This method is effectively hooked to
forward()
of the model. :param output: output that needs to be quantized :type output: Tensor :param wrapper: the wrapper for origin module :type wrapper: QuantizerModuleWrapper
-
quantize_weight
(wrapper, **kwargs)[source]¶ quantize should overload this method to quantize weight. This method is effectively hooked to
forward()
of the model. :param weight: weight that needs to be quantized :type weight: Tensor :param wrapper: the wrapper for origin module :type wrapper: QuantizerModuleWrapper
-
Compression Utilities¶
Sensitivity Utilities¶
-
class
nni.compression.pytorch.utils.sensitivity_analysis.
SensitivityAnalysis
(model, val_func, sparsities=None, prune_type='l1', early_stop_mode=None, early_stop_value=None)[source]¶ -
analysis
(val_args=None, val_kwargs=None, specified_layers=None)[source]¶ This function analyze the sensitivity to pruning for each conv layer in the target model. If start and end are not set, we analyze all the conv layers by default. Users can specify several layers to analyze or parallelize the analysis process easily through the start and end parameter.
- Parameters
val_args (list) – args for the val_function
val_kwargs (dict) – kwargs for the val_funtion
specified_layers (list) – list of layer names to analyze sensitivity. If this variable is set, then only analyze the conv layers that specified in the list. User can also use this option to parallelize the sensitivity analysis easily.
- Returns
sensitivities – dict object that stores the trajectory of the accuracy/loss when the prune ratio changes
- Return type
dict
-
export
(filepath)[source]¶ Export the results of the sensitivity analysis to a csv file. The firstline of the csv file describe the content structure. The first line is constructed by ‘layername’ and sparsity list. Each line below records the validation metric returned by val_func when this layer is under different sparsities. Note that, due to the early_stop option, some layers may not have the metrics under all sparsities.
layername, 0.25, 0.5, 0.75 conv1, 0.6, 0.55 conv2, 0.61, 0.57, 0.56
- Parameters
filepath (str) – Path of the output file
-
Topology Utilities¶
-
class
nni.compression.pytorch.utils.shape_dependency.
ChannelDependency
(model=None, dummy_input=None, traced_model=None)[source]¶ -
-
property
dependency_sets
¶ Get the list of the dependency set.
- Returns
dependency_sets – list of the dependency sets. For example, [set([‘conv1’, ‘conv2’]), set([‘conv3’, ‘conv4’])]
- Return type
list
-
export
(filepath)[source]¶ export the channel dependencies as a csv file. The layers at the same line have output channel dependencies with each other. For example, layer1.1.conv2, conv1, and layer1.0.conv2 have output channel dependencies with each other, which means the output channel(filters) numbers of these three layers should be same with each other, otherwise the model may has shape conflict.
Output example: Dependency Set,Convolutional Layers Set 1,layer1.1.conv2,layer1.0.conv2,conv1 Set 2,layer1.0.conv1 Set 3,layer1.1.conv1
-
property
-
class
nni.compression.pytorch.utils.shape_dependency.
GroupDependency
(model=None, dummy_input=None, traced_model=None)[source]¶ -
build_dependency
()[source]¶ Build the channel dependency for the conv layers in the model. This function return the group number of each conv layers. Note that, here, the group count of conv layers may be larger than their originl groups. This is because that the input channel will also be grouped for the group conv layers. To make this clear, assume we have two group conv layers: conv1(group=2), conv2(group=4). conv2 takes the output features of conv1 as input. Then we have to the filters of conv1 can still be divided into 4 groups after filter pruning, because the input channels of conv2 shoule be divided into 4 groups.
- Returns
self.dependency – key: the name of conv layers, value: the minimum value that the number of filters should be divisible to.
- Return type
dict
-
export
(filepath)[source]¶ export the group dependency to a csv file. Each line describes a convolution layer, the first part of each line is the Pytorch module name of the conv layer. The second part of each line is the group count of the filters in this layer. Note that, the group count may be larger than this layers original group number.
output example: Conv layer, Groups Conv1, 1 Conv2, 2 Conv3, 4
-
-
class
nni.compression.pytorch.utils.mask_conflict.
CatMaskPadding
(masks, model, dummy_input=None, traced=None)[source]¶
-
class
nni.compression.pytorch.utils.mask_conflict.
GroupMaskConflict
(masks, model=None, dummy_input=None, traced=None)[source]¶
Model FLOPs/Parameters Counter¶
-
nni.compression.pytorch.utils.counter.
count_flops_params
(model, x, custom_ops=None, verbose=True, mode='default')[source]¶ Count FLOPs and Params of the given model. This function would identify the mask on the module and take the pruned shape into consideration. Note that, for sturctured pruning, we only identify the remained filters according to its mask, and do not take the pruned input channels into consideration, so the calculated FLOPs will be larger than real number.
- Parameters
model (nn.Module) – Target model.
x (tuple or tensor) – The input shape of data (a tuple), a tensor or a tuple of tensor as input data.
custom_ops (dict) – A mapping of (module -> torch.nn.Module : custom operation) the custom operation is a callback funtion to calculate the module flops and parameters, it will overwrite the default operation. for reference, please see
ops
inModelProfiler
.verbose (bool) – If False, mute detail information about modules. Default is True.
mode (str) – the mode of how to collect information. If the mode is set to
default
, only the information of convolution and linear will be collected. If the mode is set tofull
, other operations will also be collected.
- Returns
Representing total FLOPs, total parameters, and a detailed list of results respectively. The list of results are a list of dict, each of which contains (name, module_type, weight_shape, flops, params, input_size, output_size) as its keys.
- Return type
tuple of int, int and dict