# Pruner¶

## Basic Pruner¶

### Level Pruner¶

class nni.compression.pytorch.pruning.LevelPruner(model, config_list, mode='normal', balance_gran=None)[source]

This is a basic pruner, and in some papers called it magnitude pruning or fine-grained pruning. It will mask the smallest magnitude weights in each specified layer by a saprsity ratio configured in the config list.

Parameters:
• model (Module) – Model to be pruned.

• config_list (List[Dict]) –

Supported keys:
• sparsity : This is to specify the sparsity for each layer in this config to be compressed.

• sparsity_per_layer : Equals to sparsity.

• op_types : Operation types to be pruned.

• op_names : Operation names to be pruned.

• op_partial_names: Operation partial names to be pruned, will be autocompleted by NNI.

• exclude : Set True then the layers setting by op_types and op_names will be excluded from pruning.

• mode (str) – ‘normal’ or ‘balance’. If setting ‘normal’ mode, target tensor will be pruned in the way of finegrained pruning. If setting ‘balance’ mode, a specal sparse pattern will chosen by pruner. Take linear operation an example, weight tensor will be split into sub block whose shape is aligned to balance_gran. Then finegrained pruning will be applied internal of sub block. This sparsity pattern have more chance to achieve better trade-off between model performance and hardware acceleration. Please refer to releated paper for further information Balanced Sparsity for Efficient DNN Inference on GPU.

• balance_gran (Optional[List]) –

Balance_gran is for special sparse pattern balanced sparsity, Default value is None which means pruning without awaring balance, namely normal finegrained pruning. If passing list of int, LevelPruner will prune the model in the granularity of multi-dimension block. Attention that the length of balance_gran should be smaller than tensor dimension. For instance, in Linear operation, length of balance_gran should be equal or smaller than two since dimension of pruning weight is two. If setting balbance_gran = [5, 5], sparsity = 0.6, pruner will divide pruning parameters into multiple block with tile size (5,5) and each bank has 5 * 5 values and 10 values would be kept after pruning. Finegrained pruning is applied in the granularity of block so that each block will kept same number of non-zero values after pruning. Such pruning method “balance” the non-zero value in tensor which create chance for better hardware acceleration.

Note: If length of given balance_gran smaller than length of pruning tensor shape, it will be made up

in right align(such as example 1).

example 1:

operation: Linear pruning tensor: weight pruning tensor shape: [32, 32] sparsity: 50% balance_gran: [4]

pruning result: Weight tensor whose shape is [32, 32] will be split into 256 [1, 4] sub blocks.

Each sub block will be pruned 2 values.

example 2:

operation: Linear pruning tensor: weight pruning tensor shape: [64, 64] sparsity: 25% balance_gran: [32, 32]

pruning result: Weight tensor whose shape is [64, 64] will be split into 4 [32, 32] sub blocks.

Each sub block will be pruned 256 values.

Examples

>>> model = ...
>>> from nni.compression.pytorch.pruning import LevelPruner
>>> config_list = [{ 'sparsity': 0.8, 'op_types': ['default'] }]
>>> pruner = LevelPruner(model, config_list)


For detailed example please refer to examples/model_compress/pruning/level_pruning_torch.py

### L1 Norm Pruner¶

class nni.compression.pytorch.pruning.L1NormPruner(model, config_list, mode='normal', dummy_input=None)[source]

L1 norm pruner computes the l1 norm of the layer weight on the first dimension, then prune the weight blocks on this dimension with smaller l1 norm values. i.e., compute the l1 norm of the filters in convolution layer as metric values, compute the l1 norm of the weight by rows in linear layer as metric values.

For more details, please refer to PRUNING FILTERS FOR EFFICIENT CONVNETS.

In addition, L1 norm pruner also supports dependency-aware mode.

Parameters:
• model (Module) – Model to be pruned.

• config_list (List[Dict]) –

Supported keys:
• sparsity : This is to specify the sparsity for each layer in this config to be compressed.

• sparsity_per_layer : Equals to sparsity.

• op_types : Conv2d and Linear are supported in L1NormPruner.

• op_names : Operation names to be pruned.

• op_partial_names: Operation partial names to be pruned, will be autocompleted by NNI.

• exclude : Set True then the layers setting by op_types and op_names will be excluded from pruning.

• mode (str) – ‘normal’ or ‘dependency_aware’. If prune the model in a dependency-aware way, this pruner will prune the model according to the l1-norm of weights and the channel-dependency or group-dependency of the model. In this way, the pruner will force the conv layers that have dependencies to prune the same channels, so the speedup module can better harvest the speed benefit from the pruned model. Note that, if set ‘dependency_aware’ , the dummy_input cannot be None, because the pruner needs a dummy input to trace the dependency between the conv layers.

• dummy_input (Optional[Tensor]) – The dummy input to analyze the topology constraints. Note that, the dummy_input should on the same device with the model.

### L2 Norm Pruner¶

class nni.compression.pytorch.pruning.L2NormPruner(model, config_list, mode='normal', dummy_input=None)[source]

L2 norm pruner is a variant of L1 norm pruner. The only different between L2 norm pruner and L1 norm pruner is L2 norm pruner prunes the weight with the smallest L2 norm of the weights.

L2 norm pruner also supports dependency-aware mode.

Parameters:
• model (Module) – Model to be pruned.

• config_list (List[Dict]) –

Supported keys:
• sparsity : This is to specify the sparsity for each layer in this config to be compressed.

• sparsity_per_layer : Equals to sparsity.

• op_types : Conv2d and Linear are supported in L2NormPruner.

• op_names : Operation names to be pruned.

• op_partial_names: Operation partial names to be pruned, will be autocompleted by NNI.

• exclude : Set True then the layers setting by op_types and op_names will be excluded from pruning.

• mode (str) – ‘normal’ or ‘dependency_aware’. If prune the model in a dependency-aware way, this pruner will prune the model according to the l2-norm of weights and the channel-dependency or group-dependency of the model. In this way, the pruner will force the conv layers that have dependencies to prune the same channels, so the speedup module can better harvest the speed benefit from the pruned model. Note that, if set ‘dependency_aware’ , the dummy_input cannot be None, because the pruner needs a dummy input to trace the dependency between the conv layers.

• dummy_input (Optional[Tensor]) – The dummy input to analyze the topology constraints. Note that, the dummy_input should on the same device with the model.

Examples

>>> model = ...
>>> from nni.compression.pytorch.pruning import L2NormPruner
>>> config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
>>> pruner = L2NormPruner(model, config_list)


For detailed example please refer to examples/model_compress/pruning/norm_pruning_torch.py

### FPGM Pruner¶

class nni.compression.pytorch.pruning.FPGMPruner(model, config_list, mode='normal', dummy_input=None)[source]

FPGM pruner prunes the blocks of the weight on the first dimension with the smallest geometric median. FPGM chooses the weight blocks with the most replaceable contribution.

For more details, please refer to Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration.

FPGM pruner also supports dependency-aware mode.

Parameters:
• model (Module) – Model to be pruned.

• config_list (List[Dict]) –

Supported keys:
• sparsity : This is to specify the sparsity for each layer in this config to be compressed.

• sparsity_per_layer : Equals to sparsity.

• op_types : Conv2d and Linear are supported in FPGMPruner.

• op_names : Operation names to be pruned.

• op_partial_names: Operation partial names to be pruned, will be autocompleted by NNI.

• exclude : Set True then the layers setting by op_types and op_names will be excluded from pruning.

• mode (str) – ‘normal’ or ‘dependency_aware’. If prune the model in a dependency-aware way, this pruner will prune the model according to the FPGM of weights and the channel-dependency or group-dependency of the model. In this way, the pruner will force the conv layers that have dependencies to prune the same channels, so the speedup module can better harvest the speed benefit from the pruned model. Note that, if set ‘dependency_aware’ , the dummy_input cannot be None, because the pruner needs a dummy input to trace the dependency between the conv layers.

• dummy_input (Optional[Tensor]) – The dummy input to analyze the topology constraints. Note that, the dummy_input should on the same device with the model.

Examples

>>> model = ...
>>> from nni.compression.pytorch.pruning import FPGMPruner
>>> config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
>>> pruner = FPGMPruner(model, config_list)


For detailed example please refer to examples/model_compress/pruning/fpgm_pruning_torch.py

### Slim Pruner¶

class nni.compression.pytorch.pruning.SlimPruner(model: Module, config_list: List[Dict], evaluator: Evaluator, training_epochs: int, scale: float = 0.0001, mode='global')[source]
class nni.compression.pytorch.pruning.SlimPruner(model: Module, config_list: List[Dict], trainer: Callable[[Module, Optimizer, Callable], None], traced_optimizer: Optimizer, criterion: Callable[[Tensor, Tensor], Tensor], training_epochs: int, scale: float = 0.0001, mode='global')

Slim pruner adds sparsity regularization on the scaling factors of batch normalization (BN) layers during training to identify unimportant channels. The channels with small scaling factor values will be pruned.

For more details, please refer to Learning Efficient Convolutional Networks through Network Slimming.

Parameters:
• model (Module) – Model to be pruned.

• config_list (List[Dict]) –

Supported keys:
• sparsity : This is to specify the sparsity for each layer in this config to be compressed.

• sparsity_per_layer : Equals to sparsity.

• total_sparsity : This is to specify the total sparsity for all layers in this config, each layer may have different sparsity.

• max_sparsity_per_layer : Always used with total_sparsity. Limit the max sparsity of each layer.

• op_types : Only BatchNorm2d is supported in SlimPruner.

• op_names : Operation names to be pruned.

• op_partial_names: Operation partial names to be pruned, will be autocompleted by NNI.

• exclude : Set True then the layers setting by op_types and op_names will be excluded from pruning.

• evaluator (nni.compression.pytorch.utils.evaluator.Evaluator) –

evaluator is used to replace the previous trainer, traced_optimizer and criterion API. NNI will use the evaluator to intervene in the model training process, so as to perform training-aware model compression. All training-aware model compression will use the evaluator as the entry for intervention training in the future. Usually you just need to wrap some classes with nni.trace or package the training process as a function to initialize the evaluator. Please refer Compression Evaluator for a full tutorial on how to initialize a evaluator.

The following are two simple examples, if you use pytorch_lightning, please refer to nni.compression.pytorch.LightningEvaluator, if you use native pytorch, please refer to nni.compression.pytorch.TorchEvaluator:

# LightningEvaluator example
import pytorch_lightning
lightning_trainer = nni.trace(pytorch_lightning.Trainer)(max_epochs=1, max_steps=50, logger=TensorBoardLogger(...))
lightning_data_module = nni.trace(pytorch_lightning.LightningDataModule)(...)

from nni.compression.pytorch import LightningEvaluator
evaluator = LightningEvaluator(lightning_trainer, lightning_data_module)

# TorchEvaluator example
import torch
import torch.nn.functional as F

def training_model(model, optimizer, criterion, lr_scheduler, max_steps, max_epochs, *args, **kwargs):
# max_steps, max_epochs might be None, which means unlimited training time,
# so here we need set a default termination condition (by default, total_epochs=10, total_steps=100000).
total_epochs = max_epochs if max_epochs else 10
total_steps = max_steps if max_steps else 100000
current_step = 0

for epoch in range(total_epochs):
...
result = model(input_data)
loss = criterion(result, target)
loss.backward()
optimizer.step()
current_step += 1
if current_step >= total_steps:
return
lr_scheduler.step()

traced_optimizer = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.01)
criterion = F.nll_loss

from nni.compression.pytorch import TorchEvaluator
evaluator = TorchEvaluator(training_func=training_model, optimziers=traced_optimizer, criterion=criterion)


The old API (trainer, traced_optimizer and criterion) is still supported and will be deprecated in v3.0. If you want to consult the old API, please refer to v2.8 pruner API.

• training_epochs – The epoch number for training model to sparsify the BN weight.

• scale – Penalty parameter for sparsification, which could reduce overfitting.

• mode – ‘normal’ or ‘global’. If prune the model in a global way, all layer weights with same config will be considered uniformly. That means a single layer may not reach or exceed the sparsity setting in config, but the total pruned weights meet the sparsity setting.

Notes

For detailed example please refer to examples/model_compress/pruning/slim_pruning_torch.py

### Activation APoZ Rank Pruner¶

class nni.compression.pytorch.pruning.ActivationAPoZRankPruner(model: Module, config_list: List[Dict], evaluator: Evaluator, training_steps: int, activation: str = 'relu', mode: str = 'normal', dummy_input: Optional[Tensor] = None)[source]
class nni.compression.pytorch.pruning.ActivationAPoZRankPruner(model: Module, config_list: List[Dict], trainer: Callable[[Module, Optimizer, Callable], None], traced_optimizer: Optimizer, criterion: Callable[[Tensor, Tensor], Tensor], training_batches: int, activation: str = 'relu', mode: str = 'normal', dummy_input: Optional[Tensor] = None)

Activation APoZ rank pruner is a pruner which prunes on the first weight dimension, with the smallest importance criterion APoZ calculated from the output activations of convolution layers to achieve a preset level of network sparsity. The pruning criterion APoZ is explained in the paper Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures.

The APoZ is defined as: $$APoZ_{c}^{(i)} = APoZ\left(O_{c}^{(i)}\right)=\frac{\sum_{k}^{N} \sum_{j}^{M} f\left(O_{c, j}^{(i)}(k)=0\right)}{N \times M}$$

Activation APoZ rank pruner also supports dependency-aware mode.

Parameters:
• model (Module) – Model to be pruned.

• config_list (List[Dict]) –

Supported keys:
• sparsity : This is to specify the sparsity for each layer in this config to be compressed.

• sparsity_per_layer : Equals to sparsity.

• op_types : Conv2d and Linear are supported in ActivationAPoZRankPruner.

• op_names : Operation names to be pruned.

• op_partial_names: Operation partial names to be pruned, will be autocompleted by NNI.

• exclude : Set True then the layers setting by op_types and op_names will be excluded from pruning.

• evaluator (nni.compression.pytorch.utils.evaluator.Evaluator) –

evaluator is used to replace the previous trainer, traced_optimizer and criterion API. NNI will use the evaluator to intervene in the model training process, so as to perform training-aware model compression. All training-aware model compression will use the evaluator as the entry for intervention training in the future. Usually you just need to wrap some classes with nni.trace or package the training process as a function to initialize the evaluator. Please refer Compression Evaluator for a full tutorial on how to initialize a evaluator.

The following are two simple examples, if you use pytorch_lightning, please refer to nni.compression.pytorch.LightningEvaluator, if you use native pytorch, please refer to nni.compression.pytorch.TorchEvaluator:

# LightningEvaluator example
import pytorch_lightning
lightning_trainer = nni.trace(pytorch_lightning.Trainer)(max_epochs=1, max_steps=50, logger=TensorBoardLogger(...))
lightning_data_module = nni.trace(pytorch_lightning.LightningDataModule)(...)

from nni.compression.pytorch import LightningEvaluator
evaluator = LightningEvaluator(lightning_trainer, lightning_data_module)

# TorchEvaluator example
import torch
import torch.nn.functional as F

def training_model(model, optimizer, criterion, lr_scheduler, max_steps, max_epochs, *args, **kwargs):
# max_steps, max_epochs might be None, which means unlimited training time,
# so here we need set a default termination condition (by default, total_epochs=10, total_steps=100000).
total_epochs = max_epochs if max_epochs else 10
total_steps = max_steps if max_steps else 100000
current_step = 0

for epoch in range(total_epochs):
...
result = model(input_data)
loss = criterion(result, target)
loss.backward()
optimizer.step()
current_step += 1
if current_step >= total_steps:
return
lr_scheduler.step()

traced_optimizer = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.01)
criterion = F.nll_loss

from nni.compression.pytorch import TorchEvaluator
evaluator = TorchEvaluator(training_func=training_model, optimziers=traced_optimizer, criterion=criterion)


The old API (trainer, traced_optimizer and criterion) is still supported and will be deprecated in v3.0. If you want to consult the old API, please refer to v2.8 pruner API.

• training_steps – The step number used to collect activations.

• mode – ‘normal’ or ‘dependency_aware’. If prune the model in a dependency-aware way, this pruner will prune the model according to the activation-based metrics and the channel-dependency or group-dependency of the model. In this way, the pruner will force the conv layers that have dependencies to prune the same channels, so the speedup module can better harvest the speed benefit from the pruned model. Note that, if set ‘dependency_aware’ , the dummy_input cannot be None, because the pruner needs a dummy input to trace the dependency between the conv layers.

• dummy_input – The dummy input to analyze the topology constraints. Note that, the dummy_input should on the same device with the model.

Notes

For detailed example please refer to examples/model_compress/pruning/activation_pruning_torch.py

### Activation Mean Rank Pruner¶

class nni.compression.pytorch.pruning.ActivationMeanRankPruner(model: Module, config_list: List[Dict], evaluator: Evaluator, training_steps: int, activation: str = 'relu', mode: str = 'normal', dummy_input: Optional[Tensor] = None)[source]
class nni.compression.pytorch.pruning.ActivationMeanRankPruner(model: Module, config_list: List[Dict], trainer: Callable[[Module, Optimizer, Callable], None], traced_optimizer: Optimizer, criterion: Callable[[Tensor, Tensor], Tensor], training_batches: int, activation: str = 'relu', mode: str = 'normal', dummy_input: Optional[Tensor] = None)

Activation mean rank pruner is a pruner which prunes on the first weight dimension, with the smallest importance criterion mean activation calculated from the output activations of convolution layers to achieve a preset level of network sparsity.

The pruning criterion mean activation is explained in section 2.2 of the paper Pruning Convolutional Neural Networks for Resource Efficient Inference.

Activation mean rank pruner also supports dependency-aware mode.

Parameters:
• model (Module) – Model to be pruned.

• config_list (List[Dict]) –

Supported keys:
• sparsity : This is to specify the sparsity for each layer in this config to be compressed.

• sparsity_per_layer : Equals to sparsity.

• op_types : Conv2d and Linear are supported in ActivationPruner.

• op_names : Operation names to be pruned.

• op_partial_names: Operation partial names to be pruned, will be autocompleted by NNI.

• exclude : Set True then the layers setting by op_types and op_names will be excluded from pruning.

• evaluator (nni.compression.pytorch.utils.evaluator.Evaluator) –

evaluator is used to replace the previous trainer, traced_optimizer and criterion API. NNI will use the evaluator to intervene in the model training process, so as to perform training-aware model compression. All training-aware model compression will use the evaluator as the entry for intervention training in the future. Usually you just need to wrap some classes with nni.trace or package the training process as a function to initialize the evaluator. Please refer Compression Evaluator for a full tutorial on how to initialize a evaluator.

The following are two simple examples, if you use pytorch_lightning, please refer to nni.compression.pytorch.LightningEvaluator, if you use native pytorch, please refer to nni.compression.pytorch.TorchEvaluator:

# LightningEvaluator example
import pytorch_lightning
lightning_trainer = nni.trace(pytorch_lightning.Trainer)(max_epochs=1, max_steps=50, logger=TensorBoardLogger(...))
lightning_data_module = nni.trace(pytorch_lightning.LightningDataModule)(...)

from nni.compression.pytorch import LightningEvaluator
evaluator = LightningEvaluator(lightning_trainer, lightning_data_module)

# TorchEvaluator example
import torch
import torch.nn.functional as F

def training_model(model, optimizer, criterion, lr_scheduler, max_steps, max_epochs, *args, **kwargs):
# max_steps, max_epochs might be None, which means unlimited training time,
# so here we need set a default termination condition (by default, total_epochs=10, total_steps=100000).
total_epochs = max_epochs if max_epochs else 10
total_steps = max_steps if max_steps else 100000
current_step = 0

for epoch in range(total_epochs):
...
result = model(input_data)
loss = criterion(result, target)
loss.backward()
optimizer.step()
current_step += 1
if current_step >= total_steps:
return
lr_scheduler.step()

traced_optimizer = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.01)
criterion = F.nll_loss

from nni.compression.pytorch import TorchEvaluator
evaluator = TorchEvaluator(training_func=training_model, optimziers=traced_optimizer, criterion=criterion)


The old API (trainer, traced_optimizer and criterion) is still supported and will be deprecated in v3.0. If you want to consult the old API, please refer to v2.8 pruner API.

• training_steps – The step number used to collect activations.

• mode – ‘normal’ or ‘dependency_aware’. If prune the model in a dependency-aware way, this pruner will prune the model according to the activation-based metrics and the channel-dependency or group-dependency of the model. In this way, the pruner will force the conv layers that have dependencies to prune the same channels, so the speedup module can better harvest the speed benefit from the pruned model. Note that, if set ‘dependency_aware’ , the dummy_input cannot be None, because the pruner needs a dummy input to trace the dependency between the conv layers.

• dummy_input – The dummy input to analyze the topology constraints. Note that, the dummy_input should on the same device with the model.

Notes

For detailed example please refer to examples/model_compress/pruning/activation_pruning_torch.py

### Taylor FO Weight Pruner¶

class nni.compression.pytorch.pruning.TaylorFOWeightPruner(model: Module, config_list: List[Dict], evaluator: Evaluator, training_steps: int, mode: str = 'normal', dummy_input: Optional[Tensor] = None)[source]
class nni.compression.pytorch.pruning.TaylorFOWeightPruner(model: Module, config_list: List[Dict], trainer: Callable[[Module, Optimizer, Callable], None], traced_optimizer: Optimizer, criterion: Callable[[Tensor, Tensor], Tensor], training_batches: int, mode: str = 'normal', dummy_input: Optional[Tensor] = None)

Taylor FO weight pruner is a pruner which prunes on the first weight dimension, based on estimated importance calculated from the first order taylor expansion on weights to achieve a preset level of network sparsity. The estimated importance is defined as the paper Importance Estimation for Neural Network Pruning.

$$\widehat{\mathcal{I}}_{\mathcal{S}}^{(1)}(\mathbf{W}) \triangleq \sum_{s \in \mathcal{S}} \mathcal{I}_{s}^{(1)}(\mathbf{W})=\sum_{s \in \mathcal{S}}\left(g_{s} w_{s}\right)^{2}$$

Taylor FO weight pruner also supports dependency-aware mode.

What’s more, we provide a global-sort mode for this pruner which is aligned with paper implementation.

Parameters:
• model (Module) – Model to be pruned.

• config_list (List[Dict]) –

Supported keys:
• sparsity : This is to specify the sparsity for each layer in this config to be compressed.

• sparsity_per_layer : Equals to sparsity.

• total_sparsity : This is to specify the total sparsity for all layers in this config, each layer may have different sparsity.

• max_sparsity_per_layer : Always used with total_sparsity. Limit the max sparsity of each layer.

• op_types : Conv2d and Linear are supported in TaylorFOWeightPruner.

• op_names : Operation names to be pruned.

• op_partial_names: Operation partial names to be pruned, will be autocompleted by NNI.

• exclude : Set True then the layers setting by op_types and op_names will be excluded from pruning.

• evaluator (nni.compression.pytorch.utils.evaluator.Evaluator) –

evaluator is used to replace the previous trainer, traced_optimizer and criterion API. NNI will use the evaluator to intervene in the model training process, so as to perform training-aware model compression. All training-aware model compression will use the evaluator as the entry for intervention training in the future. Usually you just need to wrap some classes with nni.trace or package the training process as a function to initialize the evaluator. Please refer Compression Evaluator for a full tutorial on how to initialize a evaluator.

The following are two simple examples, if you use pytorch_lightning, please refer to nni.compression.pytorch.LightningEvaluator, if you use native pytorch, please refer to nni.compression.pytorch.TorchEvaluator:

# LightningEvaluator example
import pytorch_lightning
lightning_trainer = nni.trace(pytorch_lightning.Trainer)(max_epochs=1, max_steps=50, logger=TensorBoardLogger(...))
lightning_data_module = nni.trace(pytorch_lightning.LightningDataModule)(...)

from nni.compression.pytorch import LightningEvaluator
evaluator = LightningEvaluator(lightning_trainer, lightning_data_module)

# TorchEvaluator example
import torch
import torch.nn.functional as F

def training_model(model, optimizer, criterion, lr_scheduler, max_steps, max_epochs, *args, **kwargs):
# max_steps, max_epochs might be None, which means unlimited training time,
# so here we need set a default termination condition (by default, total_epochs=10, total_steps=100000).
total_epochs = max_epochs if max_epochs else 10
total_steps = max_steps if max_steps else 100000
current_step = 0

for epoch in range(total_epochs):
...
result = model(input_data)
loss = criterion(result, target)
loss.backward()
optimizer.step()
current_step += 1
if current_step >= total_steps:
return
lr_scheduler.step()

traced_optimizer = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.01)
criterion = F.nll_loss

from nni.compression.pytorch import TorchEvaluator
evaluator = TorchEvaluator(training_func=training_model, optimziers=traced_optimizer, criterion=criterion)


The old API (trainer, traced_optimizer and criterion) is still supported and will be deprecated in v3.0. If you want to consult the old API, please refer to v2.8 pruner API.

• training_steps – The step number used to collect gradients.

• mode

‘normal’, ‘dependency_aware’ or ‘global’.

If prune the model in a dependency-aware way, this pruner will prune the model according to the taylorFO and the channel-dependency or group-dependency of the model. In this way, the pruner will force the conv layers that have dependencies to prune the same channels, so the speedup module can better harvest the speed benefit from the pruned model. Note that, if set ‘dependency_aware’ , the dummy_input cannot be None, because the pruner needs a dummy input to trace the dependency between the conv layers.

If prune the model in a global way, all layer weights with same config will be considered uniformly. That means a single layer may not reach or exceed the sparsity setting in config, but the total pruned weights meet the sparsity setting.

• dummy_input – The dummy input to analyze the topology constraints. Note that, the dummy_input should on the same device with the model.

Notes

For detailed example please refer to examples/model_compress/pruning/taylorfo_pruning_torch.py

class nni.compression.pytorch.pruning.ADMMPruner(model: Module, config_list: List[Dict], evaluator: Evaluator, iterations: int, training_epochs: int, granularity: str = 'fine-grained')[source]
class nni.compression.pytorch.pruning.ADMMPruner(model: Module, config_list: List[Dict], trainer: Callable[[Module, Optimizer, Callable], None], traced_optimizer: Optimizer, criterion: Callable[[Tensor, Tensor], Tensor], iterations: int, training_epochs: int, granularity: str = 'fine-grained')

Alternating Direction Method of Multipliers (ADMM) is a mathematical optimization technique, by decomposing the original nonconvex problem into two subproblems that can be solved iteratively. In weight pruning problem, these two subproblems are solved via 1) gradient descent algorithm and 2) Euclidean projection respectively.

During the process of solving these two subproblems, the weights of the original model will be changed. Then a fine-grained pruning will be applied to prune the model according to the config list given.

This solution framework applies both to non-structured and different variations of structured pruning schemes.

For more details, please refer to A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers.

Parameters:
• model (Module) – Model to be pruned.

• config_list (List[Dict]) –

Supported keys:
• sparsity : This is to specify the sparsity for each layer in this config to be compressed.

• sparsity_per_layer : Equals to sparsity.

• rho : Penalty parameters in ADMM algorithm.

• op_types : Operation types to be pruned.

• op_names : Operation names to be pruned.

• op_partial_names: Operation partial names to be pruned, will be autocompleted by NNI.

• exclude : Set True then the layers setting by op_types and op_names will be excluded from pruning.

• evaluator (nni.compression.pytorch.utils.evaluator.Evaluator) –

evaluator is used to replace the previous trainer, traced_optimizer and criterion API. NNI will use the evaluator to intervene in the model training process, so as to perform training-aware model compression. All training-aware model compression will use the evaluator as the entry for intervention training in the future. Usually you just need to wrap some classes with nni.trace or package the training process as a function to initialize the evaluator. Please refer Compression Evaluator for a full tutorial on how to initialize a evaluator.

The following are two simple examples, if you use pytorch_lightning, please refer to nni.compression.pytorch.LightningEvaluator, if you use native pytorch, please refer to nni.compression.pytorch.TorchEvaluator:

# LightningEvaluator example
import pytorch_lightning
lightning_trainer = nni.trace(pytorch_lightning.Trainer)(max_epochs=1, max_steps=50, logger=TensorBoardLogger(...))
lightning_data_module = nni.trace(pytorch_lightning.LightningDataModule)(...)

from nni.compression.pytorch import LightningEvaluator
evaluator = LightningEvaluator(lightning_trainer, lightning_data_module)

# TorchEvaluator example
import torch
import torch.nn.functional as F

def training_model(model, optimizer, criterion, lr_scheduler, max_steps, max_epochs, *args, **kwargs):
# max_steps, max_epochs might be None, which means unlimited training time,
# so here we need set a default termination condition (by default, total_epochs=10, total_steps=100000).
total_epochs = max_epochs if max_epochs else 10
total_steps = max_steps if max_steps else 100000
current_step = 0

for epoch in range(total_epochs):
...
result = model(input_data)
loss = criterion(result, target)
loss.backward()
optimizer.step()
current_step += 1
if current_step >= total_steps:
return
lr_scheduler.step()

traced_optimizer = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.01)
criterion = F.nll_loss

from nni.compression.pytorch import TorchEvaluator
evaluator = TorchEvaluator(training_func=training_model, optimziers=traced_optimizer, criterion=criterion)


The old API (trainer, traced_optimizer and criterion) is still supported and will be deprecated in v3.0. If you want to consult the old API, please refer to v2.8 pruner API.

• iterations – The total iteration number in admm pruning algorithm.

• training_epochs – The epoch number for training model in each iteration.

• granularity – ‘fine-grained’ or ‘coarse-grained’. If ‘coarse-grained’ is set, ADMM pruner will generate masks on output channels wise. In original admm pruning paper, author implemented a fine-grained admm pruning. In auto-compress paper, author used coarse-grained admm pruning.

Notes

## Scheduled Pruners¶

### Linear Pruner¶

class nni.compression.pytorch.pruning.LinearPruner(model: Module, config_list: List[Dict], pruning_algorithm: str, total_iteration: int, log_dir: str = '.', keep_intermediate_result: bool = False, evaluator: Evaluator | None = None, speedup: bool = False, pruning_params: Dict = {})[source]
class nni.compression.pytorch.pruning.LinearPruner(model: Module, config_list: List[Dict], pruning_algorithm: str, total_iteration: int, log_dir: str = '.', keep_intermediate_result: bool = False, finetuner: _LEGACY_FINETUNER | None = None, speedup: bool = False, dummy_input: Any | None = None, evaluator: _LEGACY_EVALUATOR | None = None, pruning_params: Dict = {})

Linear pruner is an iterative pruner, it will increase sparsity evenly from scratch during each iteration.

For example, the final sparsity is set as 0.5, and the iteration number is 5, then the sparsity used in each iteration are [0, 0.1, 0.2, 0.3, 0.4, 0.5].

Parameters:
• model (Module) – The origin unwrapped pytorch model to be pruned.

• config_list (List[Dict]) – The origin config list provided by the user.

• pruning_algorithm (str) – Supported pruning algorithm [‘level’, ‘l1’, ‘l2’, ‘fpgm’, ‘slim’, ‘apoz’, ‘mean_activation’, ‘taylorfo’, ‘admm’]. This iterative pruner will use the chosen corresponding pruner to prune the model in each iteration.

• total_iteration (int) – The total iteration number.

• log_dir (str) – The log directory use to saving the result, you can find the best result under this folder.

• keep_intermediate_result (bool) – If keeping the intermediate result, including intermediate model and masks during each iteration.

• evaluator (nni.compression.pytorch.utils.evaluator.Evaluator) –

evaluator is used to replace the previous finetuner, dummy_input and old evaluator API. NNI will use the evaluator to intervene in the model training process, so as to perform training-aware model compression. All training-aware model compression will use the evaluator as the entry for intervention training in the future. Usually you just need to wrap some classes with nni.trace or package the training process as a function to initialize the evaluator. Please refer Compression Evaluator for a full tutorial on how to initialize a evaluator.

The following are two simple examples, if you use pytorch_lightning, please refer to nni.compression.pytorch.LightningEvaluator, if you use native pytorch, please refer to nni.compression.pytorch.TorchEvaluator:

# LightningEvaluator example
import pytorch_lightning
lightning_trainer = nni.trace(pytorch_lightning.Trainer)(max_epochs=1, max_steps=50, logger=TensorBoardLogger(...))
lightning_data_module = nni.trace(pytorch_lightning.LightningDataModule)(...)

from nni.compression.pytorch import LightningEvaluator
evaluator = LightningEvaluator(lightning_trainer, lightning_data_module)

# TorchEvaluator example
import torch
import torch.nn.functional as F

def training_model(model, optimizer, criterion, lr_scheduler, max_steps, max_epochs, *args, **kwargs):
# max_steps, max_epochs might be None, which means unlimited training time,
# so here we need set a default termination condition (by default, total_epochs=10, total_steps=100000).
total_epochs = max_epochs if max_epochs else 10
total_steps = max_steps if max_steps else 100000
current_step = 0

for epoch in range(total_epochs):
...
result = model(input_data)
loss = criterion(result, target)
loss.backward()
optimizer.step()
current_step += 1
if current_step >= total_steps:
return
lr_scheduler.step()

traced_optimizer = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.01)
criterion = F.nll_loss

from nni.compression.pytorch import TorchEvaluator
evaluator = TorchEvaluator(training_func=training_model, optimziers=traced_optimizer, criterion=criterion)


The old API (finetuner, dummy_input and old evaluator) is still supported and will be deprecated in v3.0. If you want to consult the old API, please refer to v2.8 pruner API.

• speedup – If set True, speedup the model at the end of each iteration to make the pruned model compact.

• pruning_params – If the chosen pruning_algorithm has extra parameters, put them as a dict to pass in.

Notes

For detailed example please refer to examples/model_compress/pruning/iterative_pruning_torch.py

### AGP Pruner¶

class nni.compression.pytorch.pruning.AGPPruner(model: Module, config_list: List[Dict], pruning_algorithm: str, total_iteration: int, log_dir: str = '.', keep_intermediate_result: bool = False, evaluator: Evaluator | None = None, speedup: bool = False, pruning_params: Dict = {})[source]
class nni.compression.pytorch.pruning.AGPPruner(model: Module, config_list: List[Dict], pruning_algorithm: str, total_iteration: int, log_dir: str = '.', keep_intermediate_result: bool = False, finetuner: _LEGACY_FINETUNER | None = None, speedup: bool = False, dummy_input: Any | None = None, evaluator: _LEGACY_EVALUATOR | None = None, pruning_params: Dict = {})

This is an iterative pruner, which the sparsity is increased from an initial sparsity value $$s_{i}$$ (usually 0) to a final sparsity value $$s_{f}$$ over a span of $$n$$ pruning iterations, starting at training step $$t_{0}$$ and with pruning frequency $$\Delta t$$:

$$s_{t}=s_{f}+\left(s_{i}-s_{f}\right)\left(1-\frac{t-t_{0}}{n \Delta t}\right)^{3} \text { for } t \in\left\{t_{0}, t_{0}+\Delta t, \ldots, t_{0} + n \Delta t\right\}$$

For more details please refer to To prune, or not to prune: exploring the efficacy of pruning for model compression.

Parameters:
• model (Module) – The origin unwrapped pytorch model to be pruned.

• config_list (List[Dict]) – The origin config list provided by the user.

• pruning_algorithm (str) – Supported pruning algorithm [‘level’, ‘l1’, ‘l2’, ‘fpgm’, ‘slim’, ‘apoz’, ‘mean_activation’, ‘taylorfo’, ‘admm’]. This iterative pruner will use the chosen corresponding pruner to prune the model in each iteration.

• total_iteration (int) – The total iteration number.

• log_dir (str) – The log directory use to saving the result, you can find the best result under this folder.

• keep_intermediate_result (bool) – If keeping the intermediate result, including intermediate model and masks during each iteration.

• evaluator (nni.compression.pytorch.utils.evaluator.Evaluator) –

evaluator is used to replace the previous finetuner, dummy_input and old evaluator API. NNI will use the evaluator to intervene in the model training process, so as to perform training-aware model compression. All training-aware model compression will use the evaluator as the entry for intervention training in the future. Usually you just need to wrap some classes with nni.trace or package the training process as a function to initialize the evaluator. Please refer Compression Evaluator for a full tutorial on how to initialize a evaluator.

The following are two simple examples, if you use pytorch_lightning, please refer to nni.compression.pytorch.LightningEvaluator, if you use native pytorch, please refer to nni.compression.pytorch.TorchEvaluator:

# LightningEvaluator example
import pytorch_lightning
lightning_trainer = nni.trace(pytorch_lightning.Trainer)(max_epochs=1, max_steps=50, logger=TensorBoardLogger(...))
lightning_data_module = nni.trace(pytorch_lightning.LightningDataModule)(...)

from nni.compression.pytorch import LightningEvaluator
evaluator = LightningEvaluator(lightning_trainer, lightning_data_module)

# TorchEvaluator example
import torch
import torch.nn.functional as F

def training_model(model, optimizer, criterion, lr_scheduler, max_steps, max_epochs, *args, **kwargs):
# max_steps, max_epochs might be None, which means unlimited training time,
# so here we need set a default termination condition (by default, total_epochs=10, total_steps=100000).
total_epochs = max_epochs if max_epochs else 10
total_steps = max_steps if max_steps else 100000
current_step = 0

for epoch in range(total_epochs):
...
result = model(input_data)
loss = criterion(result, target)
loss.backward()
optimizer.step()
current_step += 1
if current_step >= total_steps:
return
lr_scheduler.step()

traced_optimizer = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.01)
criterion = F.nll_loss

from nni.compression.pytorch import TorchEvaluator
evaluator = TorchEvaluator(training_func=training_model, optimziers=traced_optimizer, criterion=criterion)


The old API (finetuner, dummy_input and old evaluator) is still supported and will be deprecated in v3.0. If you want to consult the old API, please refer to v2.8 pruner API.

• speedup – If set True, speedup the model at the end of each iteration to make the pruned model compact.

• pruning_params – If the chosen pruning_algorithm has extra parameters, put them as a dict to pass in.

Notes

For detailed example please refer to examples/model_compress/pruning/iterative_pruning_torch.py

### Lottery Ticket Pruner¶

class nni.compression.pytorch.pruning.LotteryTicketPruner(model: Module, config_list: List[Dict], pruning_algorithm: str, total_iteration: int, log_dir: str = '.', keep_intermediate_result: bool = False, evaluator: Evaluator | None = None, speedup: bool = False, reset_weight: bool = True, pruning_params: Dict = {})[source]
class nni.compression.pytorch.pruning.LotteryTicketPruner(model: Module, config_list: List[Dict], pruning_algorithm: str, total_iteration: int, log_dir: str = '.', keep_intermediate_result: bool = False, finetuner: _LEGACY_FINETUNER | None = None, speedup: bool = False, dummy_input: Optional[Tensor] = None, evaluator: _LEGACY_EVALUATOR | None = None, reset_weight: bool = True, pruning_params: Dict = {})

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks, authors Jonathan Frankle and Michael Carbin,provides comprehensive measurement and analysis, and articulate the lottery ticket hypothesis: dense, randomly-initialized, feed-forward networks contain subnetworks (winning tickets) that – when trained in isolation – reach test accuracy comparable to the original network in a similar number of iterations.

In this paper, the authors use the following process to prune a model, called iterative prunning:

1. Randomly initialize a neural network f(x;theta_0) (where theta0 follows D{theta}).

2. Train the network for j iterations, arriving at parameters theta_j.

3. Prune p% of the parameters in theta_j, creating a mask m.

4. Reset the remaining parameters to their values in theta_0, creating the winning ticket f(x;m*theta_0).

5. Repeat step 2, 3, and 4.

If the configured final sparsity is P (e.g., 0.8) and there are n times iterative pruning, each iterative pruning prunes 1-(1-P)^(1/n) of the weights that survive the previous round.

Parameters:
• model (Module) – The origin unwrapped pytorch model to be pruned.

• config_list (List[Dict]) – The origin config list provided by the user.

• pruning_algorithm (str) – Supported pruning algorithm [‘level’, ‘l1’, ‘l2’, ‘fpgm’, ‘slim’, ‘apoz’, ‘mean_activation’, ‘taylorfo’, ‘admm’]. This iterative pruner will use the chosen corresponding pruner to prune the model in each iteration.

• total_iteration (int) – The total iteration number.

• log_dir (str) – The log directory use to saving the result, you can find the best result under this folder.

• keep_intermediate_result (bool) – If keeping the intermediate result, including intermediate model and masks during each iteration.

• evaluator (nni.compression.pytorch.utils.evaluator.Evaluator) –

evaluator is used to replace the previous finetuner, dummy_input and old evaluator API. NNI will use the evaluator to intervene in the model training process, so as to perform training-aware model compression. All training-aware model compression will use the evaluator as the entry for intervention training in the future. Usually you just need to wrap some classes with nni.trace or package the training process as a function to initialize the evaluator. Please refer Compression Evaluator for a full tutorial on how to initialize a evaluator.

The following are two simple examples, if you use pytorch_lightning, please refer to nni.compression.pytorch.LightningEvaluator, if you use native pytorch, please refer to nni.compression.pytorch.TorchEvaluator:

# LightningEvaluator example
import pytorch_lightning
lightning_trainer = nni.trace(pytorch_lightning.Trainer)(max_epochs=1, max_steps=50, logger=TensorBoardLogger(...))
lightning_data_module = nni.trace(pytorch_lightning.LightningDataModule)(...)

from nni.compression.pytorch import LightningEvaluator
evaluator = LightningEvaluator(lightning_trainer, lightning_data_module)

# TorchEvaluator example
import torch
import torch.nn.functional as F

def training_model(model, optimizer, criterion, lr_scheduler, max_steps, max_epochs, *args, **kwargs):
# max_steps, max_epochs might be None, which means unlimited training time,
# so here we need set a default termination condition (by default, total_epochs=10, total_steps=100000).
total_epochs = max_epochs if max_epochs else 10
total_steps = max_steps if max_steps else 100000
current_step = 0

for epoch in range(total_epochs):
...
result = model(input_data)
loss = criterion(result, target)
loss.backward()
optimizer.step()
current_step += 1
if current_step >= total_steps:
return
lr_scheduler.step()

traced_optimizer = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.01)
criterion = F.nll_loss

from nni.compression.pytorch import TorchEvaluator
evaluator = TorchEvaluator(training_func=training_model, optimziers=traced_optimizer, criterion=criterion)


The old API (finetuner, dummy_input and old evaluator) is still supported and will be deprecated in v3.0. If you want to consult the old API, please refer to v2.8 pruner API.

• speedup – If set True, speedup the model at the end of each iteration to make the pruned model compact.

• reset_weight – If set True, the model weight will reset to the original model weight at the end of each iteration step.

• pruning_params – If the chosen pruning_algorithm has extra parameters, put them as a dict to pass in.

Notes

For detailed example please refer to examples/model_compress/pruning/iterative_pruning_torch.py

### Simulated Annealing Pruner¶

class nni.compression.pytorch.pruning.SimulatedAnnealingPruner(model: Module, config_list: List[Dict], evaluator: Evaluator, start_temperature: float = 100, stop_temperature: float = 20, cool_down_rate: float = 0.9, perturbation_magnitude: float = 0.35, pruning_algorithm: str = 'level', pruning_params: Dict = {}, log_dir: Union[str, Path] = '.', keep_intermediate_result: bool = False, speedup: bool = False)[source]
class nni.compression.pytorch.pruning.SimulatedAnnealingPruner(model: Module, config_list: List[Dict], evaluator: Callable[[Module], float], start_temperature: float = 100, stop_temperature: float = 20, cool_down_rate: float = 0.9, perturbation_magnitude: float = 0.35, pruning_algorithm: str = 'level', pruning_params: Dict = {}, log_dir: Union[str, Path] = '.', keep_intermediate_result: bool = False, finetuner: _LEGACY_FINETUNER | None = None, speedup: bool = False, dummy_input: Optional[Tensor] = None)

We implement a guided heuristic search method, Simulated Annealing (SA) algorithm. As mentioned in the paper, this method is enhanced on guided search based on prior experience. The enhanced SA technique is based on the observation that a DNN layer with more number of weights often has a higher degree of model compression with less impact on overall accuracy.

• Randomly initialize a pruning rate distribution (sparsities).

• While current_temperature < stop_temperature:

1. generate a perturbation to current distribution

2. Perform fast evaluation on the perturbated distribution

3. accept the perturbation according to the performance and probability, if not accepted, return to step 1

4. cool down, current_temperature <- current_temperature * cool_down_rate

For more details, please refer to AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates.

Parameters:
• model (Module) – The origin unwrapped pytorch model to be pruned.

• config_list (List[Dict]) – The origin config list provided by the user.

• evaluator (nni.compression.pytorch.utils.evaluator.Evaluator) –

evaluator is used to replace the previous finetuner, dummy_input and old evaluator API. NNI will use the evaluator to intervene in the model training process, so as to perform training-aware model compression. All training-aware model compression will use the evaluator as the entry for intervention training in the future. Usually you just need to wrap some classes with nni.trace or package the training process as a function to initialize the evaluator. Please refer Compression Evaluator for a full tutorial on how to initialize a evaluator.

The following are two simple examples, if you use pytorch_lightning, please refer to nni.compression.pytorch.LightningEvaluator, if you use native pytorch, please refer to nni.compression.pytorch.TorchEvaluator:

# LightningEvaluator example
import pytorch_lightning
lightning_trainer = nni.trace(pytorch_lightning.Trainer)(max_epochs=1, max_steps=50, logger=TensorBoardLogger(...))
lightning_data_module = nni.trace(pytorch_lightning.LightningDataModule)(...)

from nni.compression.pytorch import LightningEvaluator
evaluator = LightningEvaluator(lightning_trainer, lightning_data_module)

# TorchEvaluator example
import torch
import torch.nn.functional as F

def training_model(model, optimizer, criterion, lr_scheduler, max_steps, max_epochs, *args, **kwargs):
# max_steps, max_epochs might be None, which means unlimited training time,
# so here we need set a default termination condition (by default, total_epochs=10, total_steps=100000).
total_epochs = max_epochs if max_epochs else 10
total_steps = max_steps if max_steps else 100000
current_step = 0

for epoch in range(total_epochs):
...
result = model(input_data)
loss = criterion(result, target)
loss.backward()
optimizer.step()
current_step += 1
if current_step >= total_steps:
return
lr_scheduler.step()

traced_optimizer = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.01)
criterion = F.nll_loss

from nni.compression.pytorch import TorchEvaluator
evaluator = TorchEvaluator(training_func=training_model, optimziers=traced_optimizer, criterion=criterion)


The old API (finetuner, dummy_input and old evaluator) is still supported and will be deprecated in v3.0. If you want to consult the old API, please refer to v2.8 pruner API.

• start_temperature – Start temperature of the simulated annealing process.

• stop_temperature – Stop temperature of the simulated annealing process.

• cool_down_rate – Cool down rate of the temperature.

• perturbation_magnitude – Initial perturbation magnitude to the sparsities. The magnitude decreases with current temperature.

• pruning_algorithm – Supported pruning algorithm [‘level’, ‘l1’, ‘l2’, ‘fpgm’, ‘slim’, ‘apoz’, ‘mean_activation’, ‘taylorfo’, ‘admm’]. This iterative pruner will use the chosen corresponding pruner to prune the model in each iteration.

• pruning_params – If the chosen pruning_algorithm has extra parameters, put them as a dict to pass in.

• log_dir – The log directory use to saving the result, you can find the best result under this folder.

• keep_intermediate_result – If keeping the intermediate result, including intermediate model and masks during each iteration.

• speedup – If set True, speedup the model at the end of each iteration to make the pruned model compact.

Notes

For detailed example please refer to examples/model_compress/pruning/simulated_anealing_pruning_torch.py

### Auto Compress Pruner¶

class nni.compression.pytorch.pruning.AutoCompressPruner(model: Module, config_list: List[Dict], total_iteration: int, admm_params: Dict, sa_params: Dict, log_dir: str = '.', keep_intermediate_result: bool = False, evaluator: Evaluator | None = None, speedup: bool = False)[source]
class nni.compression.pytorch.pruning.AutoCompressPruner(model: Module, config_list: List[Dict], total_iteration: int, admm_params: Dict, sa_params: Dict, log_dir: str = '.', keep_intermediate_result: bool = False, finetuner: Optional[Callable[[Module], None]] = None, speedup: bool = False, dummy_input: Optional[Tensor] = None, evaluator: Optional[Callable[[Module], float]] = None)

For total iteration number $$N$$, AutoCompressPruner prune the model that survive the previous iteration for a fixed sparsity ratio (e.g., $$1-{(1-0.8)}^{(1/N)}$$) to achieve the overall sparsity (e.g., $$0.8$$):

1. Generate sparsities distribution using SimulatedAnnealingPruner
2. Perform ADMM-based pruning to generate pruning result for the next iteration.


For more details, please refer to AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates.

Parameters:
• model (Module) – The origin unwrapped pytorch model to be pruned.

• config_list (List[Dict]) – The origin config list provided by the user.

• total_iteration (int) – The total iteration number.

The parameters passed to the ADMMPruner.

• evaluatorLightningEvaluator or TorchEvaluator or TransformersEvaluator.

The same with the evaluator of AutoCompressPruner input parameter.

• iterationsint.

The total iteration number in admm pruning algorithm.

• training_epochsint.

The epoch number for training model in each iteration.

• sa_params (Dict) –

The parameters passed to the SimulatedAnnealingPruner.

• evaluatorLightningEvaluator or TorchEvaluator or TransformersEvaluator.

The same with the evaluator of AutoCompressPruner input parameter.

• start_temperaturefloat. Default: 100.

Start temperature of the simulated annealing process.

• stop_temperaturefloat. Default: 20.

Stop temperature of the simulated annealing process.

• cool_down_ratefloat. Default: 0.9.

Cooldown rate of the temperature.

• perturbation_magnitudefloat. Default: 0.35.

Initial perturbation magnitude to the sparsities. The magnitude decreases with current temperature.

• pruning_algorithmstr. Default: ‘level’.

Supported pruning algorithm [‘level’, ‘l1’, ‘l2’, ‘fpgm’, ‘slim’, ‘apoz’, ‘mean_activation’, ‘taylorfo’, ‘admm’].

• pruning_paramsDict. Default: dict().

If the chosen pruning_algorithm has extra parameters, put them as a dict to pass in.

• log_dir (str) – The log directory used to save the result, you can find the best result under this folder.

• keep_intermediate_result (bool) – If keeping the intermediate result, including intermediate model and masks during each iteration.

• evaluator (nni.compression.pytorch.utils.evaluator.Evaluator) –

evaluator is used to replace the previous finetuner, dummy_input and old evaluator API. NNI will use the evaluator to intervene in the model training process, so as to perform training-aware model compression. All training-aware model compression will use the evaluator as the entry for intervention training in the future. Usually you just need to wrap some classes with nni.trace or package the training process as a function to initialize the evaluator. Please refer Compression Evaluator for a full tutorial on how to initialize a evaluator.

The following are two simple examples, if you use pytorch_lightning, please refer to nni.compression.pytorch.LightningEvaluator, if you use native pytorch, please refer to nni.compression.pytorch.TorchEvaluator:

# LightningEvaluator example
import pytorch_lightning
lightning_trainer = nni.trace(pytorch_lightning.Trainer)(max_epochs=1, max_steps=50, logger=TensorBoardLogger(...))
lightning_data_module = nni.trace(pytorch_lightning.LightningDataModule)(...)

from nni.compression.pytorch import LightningEvaluator
evaluator = LightningEvaluator(lightning_trainer, lightning_data_module)

# TorchEvaluator example
import torch
import torch.nn.functional as F

def training_model(model, optimizer, criterion, lr_scheduler, max_steps, max_epochs, *args, **kwargs):
# max_steps, max_epochs might be None, which means unlimited training time,
# so here we need set a default termination condition (by default, total_epochs=10, total_steps=100000).
total_epochs = max_epochs if max_epochs else 10
total_steps = max_steps if max_steps else 100000
current_step = 0

for epoch in range(total_epochs):
...
result = model(input_data)
loss = criterion(result, target)
loss.backward()
optimizer.step()
current_step += 1
if current_step >= total_steps:
return
lr_scheduler.step()

traced_optimizer = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.01)
criterion = F.nll_loss

from nni.compression.pytorch import TorchEvaluator
evaluator = TorchEvaluator(training_func=training_model, optimziers=traced_optimizer, criterion=criterion)


The old API (finetuner, dummy_input and old evaluator) is still supported and will be deprecated in v3.0. If you want to consult the old API, please refer to v2.8 pruner API.

• speedup – If set True, speedup the model at the end of each iteration to make the pruned model compact.

Notes

The full script can be found here.

### AMC Pruner¶

class nni.compression.pytorch.pruning.AMCPruner(total_episode: int, model: Module, config_list: List[Dict], evaluator: Evaluator, pruning_algorithm: str = 'l1', log_dir: str = '.', keep_intermediate_result: bool = False, ddpg_params: dict = {}, pruning_params: dict = {}, target: str = 'flops')[source]
class nni.compression.pytorch.pruning.AMCPruner(total_episode: int, model: Module, config_list: List[Dict], dummy_input: Tensor, evaluator: Callable[[Module], float], pruning_algorithm: str = 'l1', log_dir: str = '.', keep_intermediate_result: bool = False, finetuner: Optional[Callable[[Module], None]] = None, ddpg_params: dict = {}, pruning_params: dict = {}, target: str = 'flops')

AMC pruner leverages reinforcement learning to provide the model compression policy. According to the author, this learning-based compression policy outperforms conventional rule-based compression policy by having a higher compression ratio, better preserving the accuracy and freeing human labor.

For more details, please refer to AMC: AutoML for Model Compression and Acceleration on Mobile Devices.

Suggust config all total_sparsity in config_list a same value. AMC pruner will treat the first sparsity in config_list as the global sparsity.

Parameters:
• total_episode (int) – The total episode number.

• model (Module) – The model to be pruned.

• config_list (List[Dict]) –

Supported keys :
• total_sparsity : This is to specify the total sparsity for all layers in this config, each layer may have different sparsity.

• max_sparsity_per_layer : Always used with total_sparsity. Limit the max sparsity of each layer.

• op_types : Operation type to be pruned.

• op_names : Operation name to be pruned.

• op_partial_names: Operation partial names to be pruned, will be autocompleted by NNI.

• exclude : Set True then the layers setting by op_types and op_names will be excluded from pruning.

• evaluator (nni.compression.pytorch.utils.evaluator.Evaluator) –

evaluator is used to replace the previous finetuner, dummy_input and old evaluator API. NNI will use the evaluator to intervene in the model training process, so as to perform training-aware model compression. All training-aware model compression will use the evaluator as the entry for intervention training in the future. Usually you just need to wrap some classes with nni.trace or package the training process as a function to initialize the evaluator. Please refer Compression Evaluator for a full tutorial on how to initialize a evaluator.

The following are two simple examples, if you use pytorch_lightning, please refer to nni.compression.pytorch.LightningEvaluator, if you use native pytorch, please refer to nni.compression.pytorch.TorchEvaluator:

# LightningEvaluator example
import pytorch_lightning
lightning_trainer = nni.trace(pytorch_lightning.Trainer)(max_epochs=1, max_steps=50, logger=TensorBoardLogger(...))
lightning_data_module = nni.trace(pytorch_lightning.LightningDataModule)(...)

from nni.compression.pytorch import LightningEvaluator
evaluator = LightningEvaluator(lightning_trainer, lightning_data_module)

# TorchEvaluator example
import torch
import torch.nn.functional as F

def training_model(model, optimizer, criterion, lr_scheduler, max_steps, max_epochs, *args, **kwargs):
# max_steps, max_epochs might be None, which means unlimited training time,
# so here we need set a default termination condition (by default, total_epochs=10, total_steps=100000).
total_epochs = max_epochs if max_epochs else 10
total_steps = max_steps if max_steps else 100000
current_step = 0

for epoch in range(total_epochs):
...
result = model(input_data)
loss = criterion(result, target)
loss.backward()
optimizer.step()
current_step += 1
if current_step >= total_steps:
return
lr_scheduler.step()

traced_optimizer = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.01)
criterion = F.nll_loss

from nni.compression.pytorch import TorchEvaluator
evaluator = TorchEvaluator(training_func=training_model, optimziers=traced_optimizer, criterion=criterion)


The old API (finetuner, dummy_input and old evaluator) is still supported and will be deprecated in v3.0. If you want to consult the old API, please refer to v2.8 pruner API.

• pruning_algorithm (str) – Supported pruning algorithm [‘l1’, ‘l2’, ‘fpgm’, ‘apoz’, ‘mean_activation’, ‘taylorfo’]. This iterative pruner will use the chosen corresponding pruner to prune the model in each iteration.

• log_dir (str) – The log directory use to saving the result, you can find the best result under this folder.

• keep_intermediate_result (bool) – If keeping the intermediate result, including intermediate model and masks during each iteration.

• ddpg_params (Dict) – Configuration dict to configure the DDPG agent, any key unset will be set to default implicitly. - hidden1: hidden num of first fully connect layer. Default: 300 - hidden2: hidden num of second fully connect layer. Default: 300 - lr_c: learning rate for critic. Default: 1e-3 - lr_a: learning rate for actor. Default: 1e-4 - warmup: number of episodes without training but only filling the replay memory. During warmup episodes, random actions ares used for pruning. Default: 100 - discount: next Q value discount for deep Q value target. Default: 0.99 - bsize: minibatch size for training DDPG agent. Default: 64 - rmsize: memory size for each layer. Default: 100 - window_length: replay buffer window length. Default: 1 - tau: moving average for target network being used by soft_update. Default: 0.99 - init_delta: initial variance of truncated normal distribution. Default: 0.5 - delta_decay: delta decay during exploration. Default: 0.99 # parameters for training ddpg agent - max_episode_length: maximum episode length. Default: 1e9 - epsilon: linear decay of exploration policy. Default: 50000

• pruning_params (Dict) – If the pruner corresponding to the chosen pruning_algorithm has extra parameters, put them as a dict to pass in.

• target (str) – ‘flops’ or ‘params’. Note that the sparsity in other pruners always means the parameters sparse, but in AMC, you can choose flops sparse. This parameter is used to explain what the sparsity setting in config_list refers to.

Notes

The full script can be found here.

## Other Pruner¶

### Movement Pruner¶

class nni.compression.pytorch.pruning.MovementPruner(model: Module, config_list: List[Dict], evaluator: Evaluator, warm_up_step: int, cool_down_beginning_step: int, training_epochs: int | None = None, training_steps: int | None = None, regular_scale: float | None = None, movement_mode: Literal['hard', 'soft'] = 'hard', sparse_granularity: Literal['auto', 'finegrained'] = 'finegrained')[source]
class nni.compression.pytorch.pruning.MovementPruner(model: Module, config_list: List[Dict], trainer: Callable[[Module, Optimizer, Callable], None], traced_optimizer: Optimizer, criterion: Callable[[Tensor, Tensor], Tensor], training_epochs: int, warm_up_step: int, cool_down_beginning_step: int)

Movement pruner is an implementation of movement pruning. This is a “fine-pruning” algorithm, which means the masks may change during each fine-tuning step. Each weight element will be scored by the opposite of the sum of the product of weight and its gradient during each step. This means the weight elements moving towards zero will accumulate negative scores, the weight elements moving away from zero will accumulate positive scores. The weight elements with low scores will be masked during inference.

The following figure from the paper shows the weight pruning by movement pruning.

For more details, please refer to Movement Pruning: Adaptive Sparsity by Fine-Tuning.

Parameters:
• model (Module) – Model to be pruned.

• config_list (List[Dict]) –

Supported keys:
• sparsity : This is to specify the sparsity for each layer in this config to be compressed.

• sparsity_per_layer : Equals to sparsity.

• op_types : Operation types to be pruned.

• op_names : Operation names to be pruned.

• op_partial_names: Operation partial names to be pruned, will be autocompleted by NNI.

• exclude : Set True then the layers setting by op_types and op_names will be excluded from pruning.

• evaluator (nni.compression.pytorch.utils.evaluator.Evaluator) –

evaluator is used to replace the previous trainer, traced_optimizer and criterion API. NNI will use the evaluator to intervene in the model training process, so as to perform training-aware model compression. All training-aware model compression will use the evaluator as the entry for intervention training in the future. Usually you just need to wrap some classes with nni.trace or package the training process as a function to initialize the evaluator. Please refer Compression Evaluator for a full tutorial on how to initialize a evaluator.

The following are two simple examples, if you use pytorch_lightning, please refer to nni.compression.pytorch.LightningEvaluator, if you use native pytorch, please refer to nni.compression.pytorch.TorchEvaluator:

# LightningEvaluator example
import pytorch_lightning
lightning_trainer = nni.trace(pytorch_lightning.Trainer)(max_epochs=1, max_steps=50, logger=TensorBoardLogger(...))
lightning_data_module = nni.trace(pytorch_lightning.LightningDataModule)(...)

from nni.compression.pytorch import LightningEvaluator
evaluator = LightningEvaluator(lightning_trainer, lightning_data_module)

# TorchEvaluator example
import torch
import torch.nn.functional as F

def training_model(model, optimizer, criterion, lr_scheduler, max_steps, max_epochs, *args, **kwargs):
# max_steps, max_epochs might be None, which means unlimited training time,
# so here we need set a default termination condition (by default, total_epochs=10, total_steps=100000).
total_epochs = max_epochs if max_epochs else 10
total_steps = max_steps if max_steps else 100000
current_step = 0

for epoch in range(total_epochs):
...
result = model(input_data)
loss = criterion(result, target)
loss.backward()
optimizer.step()
current_step += 1
if current_step >= total_steps:
return
lr_scheduler.step()

traced_optimizer = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.01)
criterion = F.nll_loss

from nni.compression.pytorch import TorchEvaluator
evaluator = TorchEvaluator(training_func=training_model, optimziers=traced_optimizer, criterion=criterion)


The old API (trainer, traced_optimizer and criterion) is still supported and will be deprecated in v3.0. If you want to consult the old API, please refer to v2.8 pruner API.

• warm_up_step – The total optimizer.step() number before start pruning for warm up. Make sure warm_up_step is smaller than cool_down_beginning_step.

• cool_down_beginning_step – The number of steps at which sparsity stops growing, note that the sparsity stop growing doesn’t mean masks not changed. The sparsity after each optimizer.step() is: total_sparsity * (1 - (1 - (current_step - warm_up_step) / (cool_down_beginning_step - warm_up_step)) ** 3).

• training_epochs – The total epoch number for training the model. Make sure the total optimizer.step() in training_epochs is bigger than cool_down_beginning_step. If both training_epochs and training_steps are set, pruning will stop when either is reached.

• training_steps – The total step number for training the model. Make sure training_epochs is bigger than cool_down_beginning_step. If both training_epochs and training_steps are set, pruning will stop when either is reached.

• regular_scale – Use to scale the movement score regular loss. In ‘soft’ mode, higher regular scale means higher final sparsity. The recommended range is 1 ~ 30.

• movement_mode

‘hard’ or ‘soft’. Note that in ‘soft’ mode, sparsity set in the config_list means the sparsify threshold, ‘soft’ mode cannot precisely control the sparsity rate, but usually has higher performance compared with ‘hard’ mode. sparsity in ‘soft’ mode usually set to 0.1, and using regular_scale to control the final relative sparsity.

For detailed differences between ‘hard’ and ‘soft’, please refer to the paper. In short, ‘hard’ means that the corresponding layer is pruned to a fixed ratio by the topk method according to the movement score, which is the sparsity ratio set in config_list. ‘soft’ means that the final sparsity size will not be fixed, but the generation of the mask will be controlled by a threshold, and the positions corresponding to scores below the threshold will be masked during the movement training process.

• sparse_granularity – This is an experimental interface, by default, apply ‘finegrained’ pruning. If ‘auto’ is set, will try to apply structure pruning. For the attention layer, will apply block sparse with size [head_width, head_width]. For the following two linear layers (FFN), will apply output channel pruning for the first linear, and the input channel pruning for the second one. ‘auto’ only support partial hugingface transformers right now (bart, bert, t5).

Notes

For detailed example please refer to examples/model_compress/pruning/movement_pruning_glue.py