Framework Related¶
Pruner¶
- class nni.algorithms.compression.v2.pytorch.base.Pruner(model, config_list)[source]¶
The abstract class for pruning algorithm. Inherit this class and implement the _reset_tools to customize a pruner.
- export_model(model_path, mask_path=None)[source]¶
Export pruned model weights, masks and onnx model(optional)
- Parameters
model_path (str) – Path to save pruned model state_dict. The weight and bias have already multiplied the masks.
mask_path (Optional[str]) – Path to save mask dict.
- get_modules_wrapper()[source]¶
- Returns
An ordered dict, key is the name of the module, value is the wrapper of the module.
- Return type
OrderedDict[str, PrunerModuleWrapper]
- get_origin2wrapped_parameter_name_map()[source]¶
Get the name mapping of parameters from original model to wrapped model.
- Returns
Return a dict {original_model_parameter_name: wrapped_model_parameter_name}
- Return type
Dict[str, str]
PrunerModuleWrapper¶
- class nni.algorithms.compression.v2.pytorch.base.PrunerModuleWrapper(module, module_name, config)[source]¶
Wrap a module to enable data parallel, forward method customization and buffer registeration.
- Parameters
module (Module) – The module user wants to compress.
config (Dict) – The configurations that users specify for compression.
module_name (str) – The name of the module to compress, wrapper module shares same name.
BasicPruner¶
- class nni.algorithms.compression.v2.pytorch.pruning.basic_pruner.BasicPruner(model, config_list)[source]¶
- compress()[source]¶
Used to generate the mask. Pruning process is divided in three stages. self.data_collector collect the data used to calculate the specify metric. self.metrics_calculator calculate the metric and self.sparsity_allocator generate the mask depend on the metric.
- Returns
Return the wrapped model and mask.
- Return type
Tuple[Module, Dict]
- reset_tools()[source]¶
This function is used to reset self.data_collector, self.metrics_calculator and self.sparsity_allocator. The subclass needs to implement this function to complete the pruning process. See compress() to understand how NNI use these three part to generate mask for the bound model.
DataCollector¶
- class nni.algorithms.compression.v2.pytorch.pruning.tools.DataCollector(compressor)[source]¶
An abstract class for collect the data needed by the compressor.
- Parameters
compressor (Pruner) – The compressor binded with this DataCollector.
MetricsCalculator¶
- class nni.algorithms.compression.v2.pytorch.pruning.tools.MetricsCalculator(scalers=None)[source]¶
An abstract class for calculate a kind of metrics of the given data.
- Parameters
scalers (Dict[str, Dict[str, Scaling]] | Scaling | None) – Scaler is used to scale the metrics’ size. It scaling metric to the same size as the shrinked mask in the sparsity allocator. If you want to use different scalers for different pruning targets in different modules, please use a dict {module_name: {target_name: scaler}}. If allocator meets an unspecified module name, it will try to use scalers[‘_default’][target_name] to scale its mask. If allocator meets an unspecified target name, it will try to use scalers[module_name][‘_default’] to scale its mask. Passing in a scaler instead of a dict of scalers will be treated as passed in {‘_default’: {‘_default’: scalers}}. Passing in None means no need to scale.
- calculate_metrics(data)[source]¶
- Parameters
data (Dict) – A dict handle the data used to calculate metrics. Usually has format like {module_name: tensor_type_data}.
- Returns
The key is the layer_name, value is the metric. Note that the metric has the same size with the data size on dim.
- Return type
Dict[str, Tensor]
SparsityAllocator¶
- class nni.algorithms.compression.v2.pytorch.pruning.tools.SparsityAllocator(pruner, scalers=None, continuous_mask=True)[source]¶
A base class for allocating mask based on metrics.
- Parameters
pruner (Pruner) – The pruner that binded with this SparsityAllocator.
scalers (Dict[str, Dict[str, Scaling]] | Scaling | None) – Scaler is used to scale the masks’ size. It shrinks the mask of the same size as the pruning target to the same size as the metric, or expands the mask of the same size as the metric to the same size as the pruning target. If you want to use different scalers for different pruning targets in different modules, please use a dict {module_name: {target_name: scaler}}. If allocator meets an unspecified module name, it will try to use scalers[‘_default’][target_name] to scale its mask. If allocator meets an unspecified target name, it will try to use scalers[module_name][‘_default’] to scale its mask. Passing in a scaler instead of a dict of scalers will be treated as passed in {‘_default’: {‘_default’: scalers}}. Passing in None means no need to scale.
continuous_mask (bool) – If set True, the part that has been masked will be masked first. If set False, the part that has been masked may be unmasked due to the increase of its corresponding metric.
- common_target_masks_generation(metrics)[source]¶
Generate masks for metrics-dependent targets.
- Parameters
metrics (Dict[str, Tensor]) – The format is {module_name: weight_metric}. The metric of weight usually has the same size with shrinked mask.
- Returns
The format is {module_name: {target_name: mask}}. Return the masks of the same size as its target.
- Return type
Dict[str, Dict[str, Tensor]]
- generate_sparsity(metrics)[source]¶
The main function of SparsityAllocator, generate a set of masks based on the given metrics.
- Parameters
metrics (Dict) – A metric dict with format {module_name: weight_metric}
- Returns
The masks format is {module_name: {target_name: mask}}.
- Return type
Dict[str, Dict[str, Tensor]]
- special_target_masks_generation(masks)[source]¶
Some pruning targets’ mask generation depends on other targets, i.e., bias mask depends on weight mask. This function is used to generate these masks, and it be called at the end of generate_sparsity.
- Parameters
masks (Dict[str, Dict[str, Tensor]]) – The format is {module_name: {target_name: mask}}. It is usually the return value of common_target_masks_generation.
BasePruningScheduler¶
- class nni.algorithms.compression.v2.pytorch.base.BasePruningScheduler[source]¶
-
- get_best_result()[source]¶
- Returns
Return the task result that has the best performance, inculde task id, the compact model, the masks on the compact model, score and config list used in this task.
- Return type
Tuple[int, Module, Dict[str, Dict[str, Tensor]], float, List[Dict]]
TaskGenerator¶
- class nni.algorithms.compression.v2.pytorch.pruning.tools.TaskGenerator(origin_model, origin_masks={}, origin_config_list=[], log_dir='.', keep_intermediate_result=False)[source]¶
This class used to generate config list for pruner in each iteration.
- Parameters
origin_model (Optional[Module]) – The origin unwrapped pytorch model to be pruned.
origin_masks (Optional[Dict[str, Dict[str, Tensor]]]) – The pre masks on the origin model. This mask maybe user-defined or maybe generate by previous pruning.
origin_config_list (Optional[List[Dict]]) – The origin config list provided by the user. Note that this config_list is directly config the origin model. This means the sparsity provided by the origin_masks should also be recorded in the origin_config_list.
log_dir (Union[str, Path]) – The log directory use to saving the task generator log.
keep_intermediate_result (bool) – If keeping the intermediate result, including intermediate model and masks during each iteration.
Quantizer¶
- class nni.compression.pytorch.compressor.Quantizer(model, config_list, optimizer=None, dummy_input=None)[source]¶
Base quantizer for pytorch quantizer
- export_model(model_path, calibration_path=None, onnx_path=None, input_shape=None, device=None)[source]¶
Export quantized model weights and calibration parameters
- Parameters
model_path (str) – path to save quantized model weight
calibration_path (str) – (optional) path to save quantize parameters after calibration
onnx_path (str) – (optional) path to save onnx model
input_shape (list or tuple) – input shape to onnx model
device (torch.device) – device of the model, used to place the dummy input tensor for exporting onnx file. the tensor is placed on cpu if
`device`
is None
- Return type
Dict
- export_model_save(model, model_path, calibration_config=None, calibration_path=None, onnx_path=None, input_shape=None, device=None)[source]¶
This method helps save pytorch model, calibration config, onnx model in quantizer.
- Parameters
model (pytorch model) – pytorch model to be saved
model_path (str) – path to save pytorch
calibration_config (dict) – (optional) config of calibration parameters
calibration_path (str) – (optional) path to save quantize parameters after calibration
onnx_path (str) – (optional) path to save onnx model
input_shape (list or tuple) – input shape to onnx model
device (torch.device) – device of the model, used to place the dummy input tensor for exporting onnx file. the tensor is placed on cpu if
`device`
is None
- find_conv_bn_patterns(model, dummy_input)[source]¶
Find all Conv-BN patterns, used for batch normalization folding
- Parameters
model (torch.nn.Module) – model to be analyzed.
dummy_input (tupel of torch.tensor) – inputs to the model, used for generating the torchscript
- fold_bn(*inputs, wrapper)[source]¶
Simulate batch normalization folding in the training graph. Folded weight and bias are returned for the following operations.
- Parameters
inputs (tuple of torch.Tensor) – inputs for the module
wrapper (QuantizerModuleWrapper) – the wrapper for origin module
- Return type
Tuple of torch.Tensor
- load_calibration_config(calibration_config)[source]¶
This function aims to help quantizer set quantization parameters by loading from a calibration_config which is exported by other quantizer or itself. The main usage of this function is helping quantize aware training quantizer set appropriate initial parameters so that the training process will be much more flexible and converges quickly. What’s more, it can also enable quantizer resume quantization model by loading parameters from config.
- Parameters
calibration_config (dict) – dict which saves quantization parameters, quantizer can export itself calibration config. eg, calibration_config = quantizer.export_model(model_path, calibration_path)
- quantize_input(inputs, wrapper, **kwargs)[source]¶
quantize should overload this method to quantize input. This method is effectively hooked to
forward()
of the model.- Parameters
inputs (Tensor) – inputs that needs to be quantized
wrapper (QuantizerModuleWrapper) – the wrapper for origin module
- quantize_output(output, wrapper, **kwargs)[source]¶
quantize should overload this method to quantize output. This method is effectively hooked to
forward()
of the model.- Parameters
output (Tensor) – output that needs to be quantized
wrapper (QuantizerModuleWrapper) – the wrapper for origin module
- quantize_weight(wrapper, **kwargs)[source]¶
quantize should overload this method to quantize weight. This method is effectively hooked to
forward()
of the model.- Parameters
wrapper (QuantizerModuleWrapper) – the wrapper for origin module
QuantizerModuleWrapper¶
QuantGrad¶
- class nni.compression.pytorch.compressor.QuantGrad(*args, **kwargs)[source]¶
Base class for overriding backward function of quantization operation.
- classmethod get_bits_length(config, quant_type)[source]¶
Get bits for quantize config
- Parameters
config (Dict) – the configuration for quantization
quant_type (str) – quant type
- Returns
n-bits for quantization configuration
- Return type
int
- static quant_backward(tensor, grad_output, quant_type, scale, zero_point, qmin, qmax)[source]¶
This method should be overrided by subclass to provide customized backward function, default implementation is Straight-Through Estimator
- Parameters
tensor (Tensor) – input of quantization operation
grad_output (Tensor) – gradient of the output of quantization operation
scale (Tensor) – the type of quantization, it can be QuantType.INPUT, QuantType.WEIGHT, QuantType.OUTPUT, you can define different behavior for different types.
zero_point (Tensor) – zero_point for quantizing tensor
qmin (Tensor) – quant_min for quantizing tensor
qmax (Tensor) – quant_max for quantizng tensor
- Returns
gradient of the input of quantization operation
- Return type
tensor