Pruner on NNI Compressor¶

Index of supported pruning algorithms

Level Pruner
AGP Pruner
Lottery Ticket Hypothesis
Slim Pruner
Filter Pruners with Weight Rank
Filter Pruners with Activation Rank
- APoZ Rank Pruner
- Activation Mean Rank Pruner
Filter Pruners with Gradient Rank
- Taylor FO On Weight Pruner

Level Pruner¶

This is one basic one-shot pruner: you can set a target sparsity level (expressed as a fraction, 0.6 means we will prune 60%).

We first sort the weights in the specified layer by their absolute values. And then mask to zero the smallest magnitude weights until the desired sparsity level is reached.

Usage¶

Tensorflow code

from nni.compression.tensorflow import LevelPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['default'] }]
pruner = LevelPruner(model_graph, config_list)
pruner.compress()

PyTorch code

from nni.compression.torch import LevelPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['default'] }]
pruner = LevelPruner(model, config_list)
pruner.compress()

User configuration for Level Pruner¶

sparsity: This is to specify the sparsity operations to be compressed to

AGP Pruner¶

This is an iterative pruner, In To prune, or not to prune: exploring the efficacy of pruning for model compression, authors Michael Zhu and Suyog Gupta provide an algorithm to prune the weight gradually.

We introduce a new automated gradual pruning algorithm in which the sparsity is increased from an initial sparsity value si (usually 0) to a final sparsity value sf over a span of n pruning steps, starting at training step t0 and with pruning frequency ∆t: The binary weight masks are updated every ∆t steps as the network is trained to gradually increase the sparsity of the network while allowing the network training steps to recover from any pruning-induced loss in accuracy. In our experience, varying the pruning frequency ∆t between 100 and 1000 training steps had a negligible impact on the final model quality. Once the model achieves the target sparsity sf , the weight masks are no longer updated. The intuition behind this sparsity function in equation

Usage¶

You can prune all weight from 0% to 80% sparsity in 10 epoch with the code below.

Tensorflow code

from nni.compression.tensorflow import AGP_Pruner
config_list = [{
    'initial_sparsity': 0,
    'final_sparsity': 0.8,
    'start_epoch': 0,
    'end_epoch': 10,
    'frequency': 1,
    'op_types': 'default'
}]
pruner = AGP_Pruner(tf.get_default_graph(), config_list)
pruner.compress()

PyTorch code

from nni.compression.torch import AGP_Pruner
config_list = [{
    'initial_sparsity': 0,
    'final_sparsity': 0.8,
    'start_epoch': 0,
    'end_epoch': 10,
    'frequency': 1,
    'op_types': ['default']
}]
pruner = AGP_Pruner(model, config_list)
pruner.compress()

you should add code below to update epoch number when you finish one epoch in your training code.

Tensorflow code

pruner.update_epoch(epoch, sess)

PyTorch code

pruner.update_epoch(epoch)

You can view example for more information

User configuration for AGP Pruner¶

initial_sparsity: This is to specify the sparsity when compressor starts to compress
final_sparsity: This is to specify the sparsity when compressor finishes to compress
start_epoch: This is to specify the epoch number when compressor starts to compress, default start from epoch 0
end_epoch: This is to specify the epoch number when compressor finishes to compress
frequency: This is to specify every frequency number epochs compressor compress once, default frequency=1

Lottery Ticket Hypothesis¶

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks, authors Jonathan Frankle and Michael Carbin,provides comprehensive measurement and analysis, and articulate the lottery ticket hypothesis: dense, randomly-initialized, feed-forward networks contain subnetworks (winning tickets) that – when trained in isolation – reach test accuracy comparable to the original network in a similar number of iterations.

In this paper, the authors use the following process to prune a model, called iterative prunning:

Randomly initialize a neural network f(x;theta_0) (where theta_0 follows D_{theta}).

Train the network for j iterations, arriving at parameters theta_j.

Prune p% of the parameters in theta_j, creating a mask m.

Reset the remaining parameters to their values in theta_0, creating the winning ticket f(x;m*theta_0).

Repeat step 2, 3, and 4.

If the configured final sparsity is P (e.g., 0.8) and there are n times iterative pruning, each iterative pruning prunes 1-(1-P)^(1/n) of the weights that survive the previous round.

Usage¶

PyTorch code

from nni.compression.torch import LotteryTicketPruner
config_list = [{
    'prune_iterations': 5,
    'sparsity': 0.8,
    'op_types': ['default']
}]
pruner = LotteryTicketPruner(model, config_list, optimizer)
pruner.compress()
for _ in pruner.get_prune_iterations():
    pruner.prune_iteration_start()
    for epoch in range(epoch_num):
        ...

The above configuration means that there are 5 times of iterative pruning. As the 5 times iterative pruning are executed in the same run, LotteryTicketPruner needs model and optimizer (Note that should add lr_scheduler if used) to reset their states every time a new prune iteration starts. Please use get_prune_iterations to get the pruning iterations, and invoke prune_iteration_start at the beginning of each iteration. epoch_num is better to be large enough for model convergence, because the hypothesis is that the performance (accuracy) got in latter rounds with high sparsity could be comparable with that got in the first round. Simple reproducing results can be found here.

Tensorflow version will be supported later.

User configuration for LotteryTicketPruner¶

prune_iterations: The number of rounds for the iterative pruning, i.e., the number of iterative pruning.
sparsity: The final sparsity when the compression is done.

Slim Pruner¶

This is an one-shot pruner, In ‘Learning Efficient Convolutional Networks through Network Slimming’, authors Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan and Changshui Zhang.

../_images/slim_pruner.png

Slim Pruner prunes channels in the convolution layers by masking corresponding scaling factors in the later BN layers, L1 regularization on the scaling factors should be applied in batch normalization (BN) layers while training, scaling factors of BN layers are globally ranked while pruning, so the sparse model can be automatically found given sparsity.

Usage¶

PyTorch code

from nni.compression.torch import SlimPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['BatchNorm2d'] }]
pruner = SlimPruner(model, config_list)
pruner.compress()

User configuration for Slim Pruner¶

sparsity: This is to specify the sparsity operations to be compressed to
op_types: Only BatchNorm2d is supported in Slim Pruner

WeightRankFilterPruner¶

WeightRankFilterPruner is a series of pruners which prune the filters with the smallest importance criterion calculated from the weights in convolution layers to achieve a preset level of network sparsity

FPGM Pruner¶

This is an one-shot pruner, FPGM Pruner is an implementation of paper Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration

FPGMPruner prune filters with the smallest geometric median

../_images/fpgm_fig1.png

Previous works utilized “smaller-norm-less-important” criterion to prune filters with smaller norm values in a convolutional neural network. In this paper, we analyze this norm-based criterion and point out that its effectiveness depends on two requirements that are not always met: (1) the norm deviation of the filters should be large; (2) the minimum norm of the filters should be small. To solve this problem, we propose a novel filter pruning method, namely Filter Pruning via Geometric Median (FPGM), to compress the model regardless of those two requirements. Unlike previous methods, FPGM compresses CNN models by pruning filters with redundancy, rather than those with “relatively less” importance.

Usage¶

Tensorflow code

from nni.compression.tensorflow import FPGMPruner
config_list = [{
    'sparsity': 0.5,
    'op_types': ['Conv2D']
}]
pruner = FPGMPruner(model, config_list)
pruner.compress()

PyTorch code

from nni.compression.torch import FPGMPruner
config_list = [{
    'sparsity': 0.5,
    'op_types': ['Conv2d']
}]
pruner = FPGMPruner(model, config_list)
pruner.compress()

Note: FPGM Pruner is used to prune convolutional layers within deep neural networks, therefore the op_types field supports only convolutional layers.

you should add code below to update epoch number at beginning of each epoch.

Tensorflow code

pruner.update_epoch(epoch, sess)

PyTorch code

pruner.update_epoch(epoch)

You can view example for more information

User configuration for FPGM Pruner¶

sparsity: How much percentage of convolutional filters are to be pruned.

L1Filter Pruner¶

This is an one-shot pruner, In ‘PRUNING FILTERS FOR EFFICIENT CONVNETS’, authors Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet and Hans Peter Graf. The reproduced experiment results can be found here

../_images/l1filter_pruner.png

L1Filter Pruner prunes filters in the convolution layers

The procedure of pruning m filters from the ith convolutional layer is as follows:

For each filter $http://latex.codecogs.com/gif.latex?F_%7Bi,j%7D$ , calculate the sum of its absolute kernel weights $http://latex.codecogs.com/gif.latex?s_j=%5Csum_%7Bl=1%7D%5E%7Bn_i%7D%5Csum%7CK_l%7C$

Sort the filters by $http://latex.codecogs.com/gif.latex?s_j$ .

Prune $http://latex.codecogs.com/gif.latex?m$ filters with the smallest sum values and their corresponding feature maps. The kernels in the next convolutional layer corresponding to the pruned feature maps are also removed.

A new kernel matrix is created for both the $http://latex.codecogs.com/gif.latex?i$ th and $http://latex.codecogs.com/gif.latex?i+1$ th layers, and the remaining kernel weights are copied to the new model.

Usage¶

PyTorch code

from nni.compression.torch import L1FilterPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = L1FilterPruner(model, config_list)
pruner.compress()

User configuration for L1Filter Pruner¶

sparsity: This is to specify the sparsity operations to be compressed to
op_types: Only Conv1d and Conv2d is supported in L1Filter Pruner

L2Filter Pruner¶

This is a structured pruning algorithm that prunes the filters with the smallest L2 norm of the weights. It is implemented as a one-shot pruner.

Usage¶

PyTorch code

from nni.compression.torch import L2FilterPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = L2FilterPruner(model, config_list)
pruner.compress()

User configuration for L2Filter Pruner¶

sparsity: This is to specify the sparsity operations to be compressed to
op_types: Only Conv1d and Conv2d is supported in L2Filter Pruner

ActivationRankFilterPruner¶

ActivationRankFilterPruner is a series of pruners which prune the filters with the smallest importance criterion calculated from the output activations of convolution layers to achieve a preset level of network sparsity.

ActivationAPoZRankFilterPruner¶

We implemented it as a one-shot pruner, it prunes convolutional layers based on the criterion APoZ which is explained in the paper Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures. Iterative pruning based on APoZ will be supported in future release.

The APoZ is defined as:

../_images/apoz.png

Usage¶

PyTorch code

from nni.compression.torch import ActivationAPoZRankFilterPruner
config_list = [{
    'sparsity': 0.5,
    'op_types': ['Conv2d']
}]
pruner = ActivationAPoZRankFilterPruner(model, config_list, statistics_batch_num=1)
pruner.compress()

Note: ActivationAPoZRankFilterPruner is used to prune convolutional layers within deep neural networks, therefore the op_types field supports only convolutional layers.

You can view example for more information

User configuration for ActivationAPoZRankFilterPruner¶

sparsity: How much percentage of convolutional filters are to be pruned.
op_types: Only Conv2d is supported in ActivationAPoZRankFilterPruner

ActivationMeanRankFilterPruner¶

We implemented it as a one-shot pruner, it prunes convolutional layers based on the criterion mean activation which is explained in section 2.2 of the paperPruning Convolutional Neural Networks for Resource Efficient Inference. Other pruning criteria mentioned in this paper will be supported in future release.

Usage¶

PyTorch code

from nni.compression.torch import ActivationMeanRankFilterPruner
config_list = [{
    'sparsity': 0.5,
    'op_types': ['Conv2d']
}]
pruner = ActivationMeanRankFilterPruner(model, config_list)
pruner.compress()

Note: ActivationMeanRankFilterPruner is used to prune convolutional layers within deep neural networks, therefore the op_types field supports only convolutional layers.

You can view example for more information

User configuration for ActivationMeanRankFilterPruner¶

sparsity: How much percentage of convolutional filters are to be pruned.
op_types: Only Conv2d is supported in ActivationMeanRankFilterPruner.

GradientRankFilterPruner¶

GradientRankFilterPruner is a series of pruners which prune the filters with the smallest importance criterion calculated from the gradients of convolution layers to achieve a preset level of network sparsity.

TaylorFOWeightFilterPruner¶

We implemented it as a one-shot pruner, it prunes convolutional layers based on the first order taylor expansion on weights. The estimated importance of filters is defined as the paper Importance Estimation for Neural Network Pruning. Other pruning criteria mentioned in this paper will be supported in future release.

../_images/importance_estimation_sum.png

Usage¶

PyTorch code

from nni.compression.torch import TaylorFOWeightFilterPruner
config_list = [{
    'sparsity': 0.5,
    'op_types': ['Conv2d']
}]
pruner = TaylorFOWeightFilterPruner(model, config_list, optimizer)
pruner.compress()

You can view example for more information

User configuration for GradientWeightSumFilterPruner¶

sparsity: How much percentage of convolutional filters are to be pruned.
op_types: Currently only Conv2d is supported in TaylorFOWeightFilterPruner.