Skip to content
Neural Network Intelligence logo
Neural Network Intelligence Framework Related
Type to start searching
    GitHub
    • Neural Network Intelligence 
    • Python API Reference 
    • Compression API Reference 
    • Framework Related
    GitHub
    • Overview
    • Get Started
    • Install NNI
    • Quickstart
    • User Guide
    •  Hyperparameter Optimization
      • Overview
      •  Quickstart
        • PyTorch
        • TensorFlow
      • Search Space
      • Tuners
      • Assessors
      •  Advanced Usage
        • Command Line Tool Example
        • Implement Custom Tuners and Assessors
        • Install Custom or 3rd-party Tuners and Assessors
        • Tuner Benchmark
        • Tuner Benchmark Example Statistics
    •  Neural Architecture Search
      • Overview
      • Quickstart
      • Construct Model Space
      • Exploration Strategy
      • Model Evaluator
      •  Advanced Usage
        • Execution Engines
        • Hardware-aware NAS
        • Construct Space with Mutator
        • Customize Exploration Strategy
        • Serialization
        •  NAS Benchmark
          • Overview
          • Examples
    •  Model Compression
      • Overview
      •  Pruning
        • Overview
        • Quickstart
        • Pruner
        • Speedup
      •  Quantization
        • Overview
        • Quickstart
        • Quantizer
        • SpeedUp
      • Config Specification
      •  Advanced Usage
        • Customize Basic Pruner
        • Customize Quantizer
        • Customize Scheduled Pruning Process
        • Utilities
    •  Feature Engineering
      • Overview
      • GradientFeatureSelector
      • GBDTSelector
    •  Experiment
      • Overview
      •  Training Service
        • Overview
        • Local
        • Remote
        • OpenPAI
        • Kubeflow
        • AdaptDL
        • FrameworkController
        • AML
        • PAI-DLC
        • Hybrid
        • Customize a Training Service
        • Shared Storage
      •  Web Portal
        • Experiment Web Portal
        • Visualize with TensorBoard
      • Experiment Management
    • References
    •  Python API
      • Hyperparameter Optimization
      •  Neural Architecture Search
        • Search Space
        • Strategy
        • Evaluator
        • Others
      •  Model Compression
        • Pruner
        • Quantizer
        • Pruning Speedup
        • Quantization Speedup
        • Compression Utilities
        • Framework Related
          • Framework Related
            • Pruner
            • PrunerModuleWrapper
            • BasicPruner
            • DataCollector
            • MetricsCalculator
            • SparsityAllocator
            • BasePruningScheduler
            • TaskGenerator
            • Quantizer
            • QuantizerModuleWrapper
            • QuantGrad
      • Experiment
      • Others
    • Experiment Config Reference
    • nnictl Commands
    • Misc
    • Examples
    •  Use Cases and Solutions
      • Overview
      •  Automatic Model Tuning (HPO/NAS)
        • Tuning SVD automatically
        • EfficientNet on NNI
        • Automatic Model Architecture Search for Reading Comprehension
        • Parallelizing Optimization for TPE
      •  Automatic System Tuning (AutoSys)
        • Tuning SPTAG (Space Partition Tree And Graph) automatically
        • Tuning the performance of RocksDB
        • Tuning Tensor Operators automatically
      •  Model Compression
        • Knowledge distillation with NNI model compression
      •  Feature Engineering
        • NNI review article from Zhihu: - By Garvin Li
      •  Performance measurement, comparison and analysis
        • Neural Architecture Search Comparison
        • Hyper-parameter Tuning Algorithm Comparsion
        • Model Compression Algorithm Comparsion
      • Use NNI on Google Colab
      • nnSpider Emoticons
    • Research and Publications
    • Build from Source
    • Contribution Guide
    • Change Log
    • Framework Related
      • Pruner
      • PrunerModuleWrapper
      • BasicPruner
      • DataCollector
      • MetricsCalculator
      • SparsityAllocator
      • BasePruningScheduler
      • TaskGenerator
      • Quantizer
      • QuantizerModuleWrapper
      • QuantGrad

    Framework Related¶

    Pruner¶

    class nni.algorithms.compression.v2.pytorch.base.Pruner(model, config_list)[source]¶

    The abstract class for pruning algorithm. Inherit this class and implement the _reset_tools to customize a pruner.

    compress()[source]¶
    Returns

    Return the wrapped model and mask.

    Return type

    Tuple[Module, Dict]

    export_model(model_path, mask_path=None)[source]¶

    Export pruned model weights, masks and onnx model(optional)

    Parameters
    • model_path (str) – Path to save pruned model state_dict. The weight and bias have already multiplied the masks.

    • mask_path (Optional[str]) – Path to save mask dict.

    get_modules_wrapper()[source]¶
    Returns

    An ordered dict, key is the name of the module, value is the wrapper of the module.

    Return type

    OrderedDict[str, PrunerModuleWrapper]

    get_origin2wrapped_parameter_name_map()[source]¶

    Get the name mapping of parameters from original model to wrapped model.

    Returns

    Return a dict {original_model_parameter_name: wrapped_model_parameter_name}

    Return type

    Dict[str, str]

    load_masks(masks)[source]¶

    Load an exist masks on the wrapper. You can train the model with an exist masks after load the masks.

    Parameters

    masks (Dict[str, Dict[str, torch.Tensor]]) – The masks dict with format {‘op_name’: {‘weight’: mask, ‘bias’: mask}}.

    show_pruned_weights(dim=0)[source]¶

    Log the simulated prune sparsity.

    Parameters

    dim (int) – The pruned dim.

    PrunerModuleWrapper¶

    class nni.algorithms.compression.v2.pytorch.base.PrunerModuleWrapper(module, module_name, config)[source]¶

    Wrap a module to enable data parallel, forward method customization and buffer registeration.

    Parameters
    • module (torch.nn.modules.module.Module) – The module user wants to compress.

    • config (Dict) – The configurations that users specify for compression.

    • module_name (str) – The name of the module to compress, wrapper module shares same name.

    BasicPruner¶

    class nni.algorithms.compression.v2.pytorch.pruning.basic_pruner.BasicPruner(model, config_list)[source]¶
    compress()[source]¶

    Used to generate the mask. Pruning process is divided in three stages. self.data_collector collect the data used to calculate the specify metric. self.metrics_calculator calculate the metric and self.sparsity_allocator generate the mask depend on the metric.

    Returns

    Return the wrapped model and mask.

    Return type

    Tuple[Module, Dict]

    reset_tools()[source]¶

    This function is used to reset self.data_collector, self.metrics_calculator and self.sparsity_allocator. The subclass needs to implement this function to complete the pruning process. See compress() to understand how NNI use these three part to generate mask for the bound model.

    DataCollector¶

    class nni.algorithms.compression.v2.pytorch.pruning.tools.DataCollector(compressor)[source]¶

    An abstract class for collect the data needed by the compressor.

    Parameters

    compressor (nni.algorithms.compression.v2.pytorch.base.pruner.Pruner) – The compressor binded with this DataCollector.

    collect()[source]¶

    Collect the compressor needed data, i.e., module weight, the output of activation function.

    Returns

    Usually has format like {module_name: tensor_type_data}.

    Return type

    Dict

    reset()[source]¶

    Reset the DataCollector.

    MetricsCalculator¶

    class nni.algorithms.compression.v2.pytorch.pruning.tools.MetricsCalculator(dim=None, block_sparse_size=None)[source]¶

    An abstract class for calculate a kind of metrics of the given data.

    Parameters
    • dim (Optional[Union[int, List[int]]]) –

      The dimensions that corresponding to the under pruning weight dimensions in collected data. None means one-to-one correspondence between pruned dimensions and data, which equal to set dim as all data dimensions. Only these dim will be kept and other dimensions of the data will be reduced.

      Example:

      If you want to prune the Conv2d weight in filter level, and the weight size is (32, 16, 3, 3) [out-channel, in-channel, kernal-size-1, kernal-size-2]. Then the under pruning dimensions is [0], which means you want to prune the filter or out-channel.

      Case 1: Directly collect the conv module weight as data to calculate the metric. Then the data has size (32, 16, 3, 3). Mention that the dimension 0 of the data is corresponding to the under pruning weight dimension 0. So in this case, dim=0 will set in __init__.

      Case 2: Use the output of the conv module as data to calculate the metric. Then the data has size (batch_num, 32, feature_map_size_1, feature_map_size_2). Mention that the dimension 1 of the data is corresponding to the under pruning weight dimension 0. So in this case, dim=1 will set in __init__.

      In both of these two case, the metric of this module has size (32,).

    • block_sparse_size (Optional[Union[int, List[int]]]) –

      This used to describe the block size a metric value represented. By default, None means the block size is ones(len(dim)). Make sure len(dim) == len(block_sparse_size), and the block_sparse_size dimension position is corresponding to dim.

      Example:

      The under pruning weight size is (768, 768), and you want to apply a block sparse on dim=[0] with block size [64, 768], then you can set block_sparse_size=[64]. The final metric size is (12,).

    calculate_metrics(data)[source]¶
    Parameters

    data (Dict) – A dict handle the data used to calculate metrics. Usually has format like {module_name: tensor_type_data}.

    Returns

    The key is the layer_name, value is the metric. Note that the metric has the same size with the data size on dim.

    Return type

    Dict[str, Tensor]

    SparsityAllocator¶

    class nni.algorithms.compression.v2.pytorch.pruning.tools.SparsityAllocator(pruner, dim=None, block_sparse_size=None, continuous_mask=True)[source]¶

    An abstract class for allocate mask based on metrics.

    Parameters
    • pruner (nni.algorithms.compression.v2.pytorch.base.pruner.Pruner) – The pruner that binded with this SparsityAllocator.

    • dim (Optional[Union[int, List[int]]]) –

      The under pruning weight dimensions, which metric size should equal to the under pruning weight size on these dimensions. None means one-to-one correspondence between pruned dimensions and metric, which equal to set dim as all under pruning weight dimensions. The mask will expand to the weight size depend on dim.

      Example:

      The under pruning weight has size (2, 3, 4), and dim=1 means the under pruning weight dimension is 1. Then the metric should have a size (3,), i.e., metric=[0.9, 0.1, 0.8]. Assuming by some kind of SparsityAllocator get the mask on weight dimension 1 mask=[1, 0, 1], then the dimension mask will expand to the final mask [[[1, 1, 1, 1], [0, 0, 0, 0], [1, 1, 1, 1]], [[1, 1, 1, 1], [0, 0, 0, 0], [1, 1, 1, 1]]].

    • block_sparse_size (Optional[Union[int, List[int]]]) –

      This used to describe the block size a metric value represented. By default, None means the block size is ones(len(dim)). Make sure len(dim) == len(block_sparse_size), and the block_sparse_size dimension position is corresponding to dim.

      Example:

      The metric size is (12,), and block_sparse_size=[64], then the mask will expand to (768,) at first before expand with dim.

    • continuous_mask (bool) – Inherit the mask already in the wrapper if set True.

    generate_sparsity(metrics)[source]¶
    Parameters

    metrics (Dict) – A metric dict. The key is the name of layer, the value is its metric.

    BasePruningScheduler¶

    class nni.algorithms.compression.v2.pytorch.base.BasePruningScheduler[source]¶
    compress()[source]¶

    The pruning schedule main loop.

    generate_task()[source]¶
    Returns

    Return the next pruning task.

    Return type

    Optional[Task]

    get_best_result()[source]¶
    Returns

    Return the task result that has the best performance, inculde task id, the compact model, the masks on the compact model, score and config list used in this task.

    Return type

    Tuple[int, Module, Dict[str, Dict[str, Tensor]], float, List[Dict]]

    pruning_one_step(task)[source]¶

    Pruning the model defined in task.

    Parameters

    task (nni.algorithms.compression.v2.pytorch.base.scheduler.Task) – The pruning task in this step.

    Returns

    Return the result of the task in this step.

    Return type

    TaskResult

    record_task_result(task_result)[source]¶
    Parameters

    task_result (nni.algorithms.compression.v2.pytorch.base.scheduler.TaskResult) – The result of the task

    TaskGenerator¶

    class nni.algorithms.compression.v2.pytorch.pruning.tools.TaskGenerator(origin_model, origin_masks={}, origin_config_list=[], log_dir='.', keep_intermediate_result=False)[source]¶

    This class used to generate config list for pruner in each iteration.

    Parameters
    • origin_model (Optional[torch.nn.modules.module.Module]) – The origin unwrapped pytorch model to be pruned.

    • origin_masks (Optional[Dict[str, Dict[str, torch.Tensor]]]) – The pre masks on the origin model. This mask maybe user-defined or maybe generate by previous pruning.

    • origin_config_list (Optional[List[Dict]]) – The origin config list provided by the user. Note that this config_list is directly config the origin model. This means the sparsity provided by the origin_masks should also be recorded in the origin_config_list.

    • log_dir (Union[str, pathlib.Path]) – The log directory use to saving the task generator log.

    • keep_intermediate_result (bool) – If keeping the intermediate result, including intermediate model and masks during each iteration.

    get_best_result()[source]¶
    Returns

    If self._best_task_id is not None, return best task id, best compact model, masks on the compact model, score, config list used in this task.

    Return type

    Optional[Tuple[int, Module, Dict[str, Dict[str, Tensor]], float, List[Dict]]]

    next()[source]¶
    Returns

    Return the next task from pending tasks.

    Return type

    Optional[Task]

    receive_task_result(task_result)[source]¶
    Parameters

    task_result (nni.algorithms.compression.v2.pytorch.base.scheduler.TaskResult) – The result of the task.

    Quantizer¶

    class nni.compression.pytorch.compressor.Quantizer(model, config_list, optimizer=None, dummy_input=None)[source]¶

    Base quantizer for pytorch quantizer

    export_model(model_path, calibration_path=None, onnx_path=None, input_shape=None, device=None)[source]¶

    Export quantized model weights and calibration parameters

    Parameters
    • model_path (str) – path to save quantized model weight

    • calibration_path (str) – (optional) path to save quantize parameters after calibration

    • onnx_path (str) – (optional) path to save onnx model

    • input_shape (list or tuple) – input shape to onnx model

    • device (torch.device) – device of the model, used to place the dummy input tensor for exporting onnx file. the tensor is placed on cpu if `device` is None

    Return type

    Dict

    export_model_save(model, model_path, calibration_config=None, calibration_path=None, onnx_path=None, input_shape=None, device=None)[source]¶

    This method helps save pytorch model, calibration config, onnx model in quantizer.

    Parameters
    • model (pytorch model) – pytorch model to be saved

    • model_path (str) – path to save pytorch

    • calibration_config (dict) – (optional) config of calibration parameters

    • calibration_path (str) – (optional) path to save quantize parameters after calibration

    • onnx_path (str) – (optional) path to save onnx model

    • input_shape (list or tuple) – input shape to onnx model

    • device (torch.device) – device of the model, used to place the dummy input tensor for exporting onnx file. the tensor is placed on cpu if `device` is None

    find_conv_bn_patterns(model, dummy_input)[source]¶

    Find all Conv-BN patterns, used for batch normalization folding

    Parameters
    • model (torch.nn.Module) – model to be analyzed.

    • dummy_input (tupel of torch.tensor) – inputs to the model, used for generating the torchscript

    fold_bn(*inputs, wrapper)[source]¶

    Simulate batch normalization folding in the training graph. Folded weight and bias are returned for the following operations.

    Parameters
    • inputs (tuple of torch.Tensor) – inputs for the module

    • wrapper (QuantizerModuleWrapper) – the wrapper for origin module

    Return type

    Tuple of torch.Tensor

    load_calibration_config(calibration_config)[source]¶

    This function aims to help quantizer set quantization parameters by loading from a calibration_config which is exported by other quantizer or itself. The main usage of this function is helping quantize aware training quantizer set appropriate initial parameters so that the training process will be much more flexible and converges quickly. What’s more, it can also enable quantizer resume quantization model by loading parameters from config.

    Parameters

    calibration_config (dict) – dict which saves quantization parameters, quantizer can export itself calibration config. eg, calibration_config = quantizer.export_model(model_path, calibration_path)

    quantize_input(inputs, wrapper, **kwargs)[source]¶

    quantize should overload this method to quantize input. This method is effectively hooked to forward() of the model.

    Parameters
    • inputs (Tensor) – inputs that needs to be quantized

    • wrapper (QuantizerModuleWrapper) – the wrapper for origin module

    quantize_output(output, wrapper, **kwargs)[source]¶

    quantize should overload this method to quantize output. This method is effectively hooked to forward() of the model.

    Parameters
    • output (Tensor) – output that needs to be quantized

    • wrapper (QuantizerModuleWrapper) – the wrapper for origin module

    quantize_weight(wrapper, **kwargs)[source]¶

    quantize should overload this method to quantize weight. This method is effectively hooked to forward() of the model.

    Parameters

    wrapper (QuantizerModuleWrapper) – the wrapper for origin module

    record_shape(model, dummy_input)[source]¶

    Record input/output’s shapes of each module to be quantized

    Parameters
    • model (torch.nn.Module) – model to be recorded.

    • dummy_input (tupel of torch.tensor) – inputs to the model.

    QuantizerModuleWrapper¶

    class nni.compression.pytorch.compressor.QuantizerModuleWrapper(module, module_name, module_type, config, quantizer, bn_module=None)[source]¶

    QuantGrad¶

    class nni.compression.pytorch.compressor.QuantGrad(*args, **kwargs)[source]¶

    Base class for overriding backward function of quantization operation.

    classmethod get_bits_length(config, quant_type)[source]¶

    Get bits for quantize config

    Parameters
    • config (Dict) – the configuration for quantization

    • quant_type (str) – quant type

    Returns

    n-bits for quantization configuration

    Return type

    int

    static quant_backward(tensor, grad_output, quant_type, scale, zero_point, qmin, qmax)[source]¶

    This method should be overrided by subclass to provide customized backward function, default implementation is Straight-Through Estimator

    Parameters
    • tensor (Tensor) – input of quantization operation

    • grad_output (Tensor) – gradient of the output of quantization operation

    • scale (Tensor) – the type of quantization, it can be QuantType.INPUT, QuantType.WEIGHT, QuantType.OUTPUT, you can define different behavior for different types.

    • zero_point (Tensor) – zero_point for quantizing tensor

    • qmin (Tensor) – quant_min for quantizing tensor

    • qmax (Tensor) – quant_max for quantizng tensor

    Returns

    gradient of the input of quantization operation

    Return type

    tensor

    Previous Compression Utilities
    Next Experiment API Reference
    © Copyright 2022, Microsoft.
    Created using Sphinx 4.5.0. and Material for Sphinx