Compression Config Specification

Each sub-config in the config list is a dict, and the scope of each setting (key) is only internal to each sub-config. If multiple sub-configs are configured for the same layer, the later ones will overwrite the previous ones.

Common Keys in Config


The type of the layers targeted by this sub-config. If op_names is not set in this sub-config, all layers in the model that satisfy the type will be selected. If op_names is set in this sub-config, the selected layers should satisfy both type and name.


The name of the layers targeted by this sub-config. If op_types is set in this sub-config, the selected layer should satisfy both type and name.


The exclude and sparsity keyword are mutually exclusive and cannot exist in the same sub-config. If exclude is set in sub-config, the layers selected by this config will not be compressed.

Special Keys for Pruning


This key will share with Quantization Config in the future.

This key is for the layers to be pruned with names that have the same sub-string. NNI will find all names in the model, find names that contain one of op_partial_names, and append them into the op_names.


The sparsity ratio of each selected layer.

e.g., the sparsity_per_layer is 0.8 means each selected layer will mask 80% values on the weight. If layer_1 (500 parameters) and layer_2 (1000 parameters) are selected in this sub-config, then layer_1 will be masked 400 parameters and layer_2 will be masked 800 parameters.


The sparsity ratio of all selected layers, means that sparsity ratio may no longer be even between layers.

e.g., the total_sparsity is 0.8 means 80% of parameters in this sub-config will be masked. If layer_1 (500 parameters) and layer_2 (1000 parameters) are selected in this sub-config, then layer_1 and layer_2 will be masked a total of 1200 parameters, how these total parameters are distributed between the two layers is determined by the pruning algorithm.


sparsity is an old config key from the pruning v1, it has the same meaning as sparsity_per_layer. You can also use sparsity right now, but it will be deprecated in the future.


This key is usually used with total_sparsity. It limits the maximum sparsity ratio of each layer.

In total_sparsity example, there are 1200 parameters that need to be masked and all parameters in layer_1 may be totally masked. To avoid this situation, max_sparsity_per_layer can be set as 0.9, this means up to 450 parameters can be masked in layer_1, and 900 parameters can be masked in layer_2.

Special Keys for Quantization


Currently, nni support three kind of quantization types: ‘weight’, ‘input’, ‘output’. It can be set as str or List[str]. Note that ‘weight’ and ‘input’ are always quantize together, e.g., ['input', 'weight'].


Bits length of quantization, key is the quantization type set in quant_types, value is the length, eg. {‘weight’: 8}, when the type is int, all quantization types share same bits length.


Specific key for QAT Quantizer. Disable quantization until model are run by certain number of steps, this allows the network to enter a more stable. State where output quantization ranges do not exclude a significant fraction of values, default value is 0.


Suppose we want to compress the following model:

class Model(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.dropout1 = nn.Dropout2d(0.25)
        self.dropout2 = nn.Dropout2d(0.5)
        self.fc1 = nn.Linear(9216, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):

First, we need to determine where to compress, use the following config list to specify all Conv2d modules and module named fc1:

config_list = [{'op_types': ['Conv2d']}, {'op_names': ['fc1']}]

Sometimes we may need to compress all modules of a certain type, except for a few special ones. Writing all the module names is laborious at this point, we can use exclude to quickly specify the compression target modules:

config_list = [{
    'op_types': ['Conv2d', 'Linear']
}, {
    'exclude': True,
    'op_names': ['fc2']

The above two config lists are equivalent to the model we want to compress, they both use conv1, conv2, and fc1 as compression targets.

Let’s take a simple pruning config list example, pruning all Conv2d modules with 50% sparsity, and pruning fc1 with 80% sparsity:

config_list = [{
    'op_types': ['Conv2d'],
    'total_sparsity': 0.5
}, {
    'op_names': ['fc1'],
    'total_sparsity': 0.8

Then if you want to try model quantization, here is a simple config list example:

config_list = [{
    'op_types': ['Conv2d'],
    'quant_types': ['input', 'weight'],
    'quant_bits': {'input': 8, 'weight': 8}
}, {
    'op_names': ['fc1'],
    'quant_types': ['input', 'weight'],
    'quant_bits': {'input': 8, 'weight': 8}