Quantizer¶

QAT Quantizer¶

class nni.compression.quantization.QATQuantizer(model: Module, config_list: List[Dict], evaluator: Evaluator, quant_start_step: int = 0)[source]¶

class nni.compression.quantization.QATQuantizer(model: Module, config_list: List[Dict], evaluator: Evaluator, quant_start_step: int = 0, existed_wrappers: Dict[str, ModuleWrapper] | None = None)

Quantizer defined in: Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Authors Benoit Jacob and Skirmantas Kligys provide an algorithm to quantize the model with training.

We propose an approach that simulates quantization effects in the forward pass of training. Backpropagation still happens as usual, and all weights and biases are stored in floating point so that they can be easily nudged by small amounts. The forward propagation pass however simulates quantized inference as it will happen in the inference engine, by implementing in floating-point arithmetic the rounding behavior of the quantization scheme:

Weights are quantized before they are convolved with the input. If batch normalization (see [17]) is used for the layer, the batch normalization parameters are “folded into” the weights before quantization.

Activations are quantized at points where they would be during inference, e.g. after the activation function is applied to a convolutional or fully connected layer’s output, or after a bypass connection adds or concatenates the outputs of several layers together such as in ResNets.

Parameters:

model (torch.nn.Module) – Model to be quantized.
config_list (List[Dict]) – A list of dict, each dict configure which module need to be quantized, and how to quantize. Please refer Compression Config Specification for more information.

evaluator (Evaluator) –

NNI will use the evaluator to intervene in the model training process, so as to perform training-aware model compression. All training-aware model compression will use the evaluator as the entry for intervention training in the future. Usually you just need to wrap some classes with nni.trace or package the training process as a function to initialize the evaluator. Please refer Compression Evaluator for a full tutorial on how to initialize a evaluator.

The following are two simple examples, if you use native pytorch, please refer to TorchEvaluator, if you use pytorch_lightning, please refer to LightningEvaluator, if you use huggingface transformer trainer, please refer to TransformersEvaluator:

# LightningEvaluator example
import pytorch_lightning
lightning_trainer = nni.trace(pytorch_lightning.Trainer)(max_epochs=1, max_steps=50, logger=TensorBoardLogger(...))
lightning_data_module = nni.trace(pytorch_lightning.LightningDataModule)(...)

from nni.compression import LightningEvaluator
evaluator = LightningEvaluator(lightning_trainer, lightning_data_module)

# TorchEvaluator example
import torch
import torch.nn.functional as F

# The user customized `training_step` should follow this paramter signature,
# the first is `batch`, the second is `model`,
# and the return value of `training_step` should be loss, or tuple with the first element is loss,
# or dict with key 'loss'.
def training_step(batch, model, *args, **kwargs):
    input_data, target = batch
    result = model(input_data)
    return F.nll_loss(result, target)

# The user customized `training_model` should follow this paramter signature,
# (model, optimizer, `training_step`, lr_scheduler, max_steps, max_epochs, ...),
# and note that `training_step`` should be defined out of `training_model`.
def training_model(model, optimizer, training_step, lr_scheduler, max_steps, max_epochs, *args, **kwargs):
    # max_steps, max_epochs might be None, which means unlimited training time,
    # so here we need set a default termination condition (by default, total_epochs=10, total_steps=100000).
    total_epochs = max_epochs if max_epochs else 10
    total_steps = max_steps if max_steps else 100000
    current_step = 0

    # init dataloader
    train_dataloader = ...

    for epoch in range(total_epochs):
        ...
        for batch in train_dataloader:
            optimizer.zero_grad()
            loss = training_step(batch, model)
            loss.backward()
            optimizer.step()
            current_step += 1
            if current_step >= total_steps:
                return
        lr_scheduler.step()

import nni
traced_optimizer = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.01)

from nni.compression import TorchEvaluator
evaluator = TorchEvaluator(training_func=training_model, optimziers=traced_optimizer, training_step=training_step)

# TransformersEvaluator example
from transformers.trainer import Trainer
trainer = nni.trace(Trainer)(model=model, args=training_args)

from nni.compression import TransformersEvaluator
evaluator = TransformersEvaluator(trainer)

quant_start_step (int) – The steps for warmup training before QAT begin.

Examples

>>> from nni.compression.quantization import QATQuantizer
>>> from nni.compression.utils import TorchEvaluator
>>> model = ...
>>> optimizer = ...
>>> max_steps, max_epochs = ..., ...
>>> evaluator = TorchEvaluator(train, optimizer, training_step)
>>> quantizer = QATQuantizer(model, configure_list, evaluator)
>>> _, calibration_config = quantizer.compress(max_steps, max_epochs)

DoReFa Quantizer¶

class nni.compression.quantization.DoReFaQuantizer(model: Module, config_list: List[Dict], evaluator: Evaluator)[source]¶

class nni.compression.quantization.DoReFaQuantizer(model: Module, config_list: List[Dict], evaluator: Evaluator, existed_wrappers: Dict[str, ModuleWrapper] | None = None)

Dorefa-Quantizer, as defined in: DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients, authors Shuchang Zhou and Yuxin Wu provide an algorithm named DoReFa to quantize the weight, activation and gradients with training.

Parameters:

model (torch.nn.Module) – Model to be quantized.
config_list (List[Dict]) – A list of dict, each dict configure which module need to be quantized, and how to quantize. Please refer Compression Config Specification for more information.

evaluator (Evaluator) –

NNI will use the evaluator to intervene in the model training process, so as to perform training-aware model compression. All training-aware model compression will use the evaluator as the entry for intervention training in the future. Usually you just need to wrap some classes with nni.trace or package the training process as a function to initialize the evaluator. Please refer Compression Evaluator for a full tutorial on how to initialize a evaluator.

The following are two simple examples, if you use native pytorch, please refer to TorchEvaluator, if you use pytorch_lightning, please refer to LightningEvaluator, if you use huggingface transformer trainer, please refer to TransformersEvaluator:

# LightningEvaluator example
import pytorch_lightning
lightning_trainer = nni.trace(pytorch_lightning.Trainer)(max_epochs=1, max_steps=50, logger=TensorBoardLogger(...))
lightning_data_module = nni.trace(pytorch_lightning.LightningDataModule)(...)

from nni.compression import LightningEvaluator
evaluator = LightningEvaluator(lightning_trainer, lightning_data_module)

# TorchEvaluator example
import torch
import torch.nn.functional as F

# The user customized `training_step` should follow this paramter signature,
# the first is `batch`, the second is `model`,
# and the return value of `training_step` should be loss, or tuple with the first element is loss,
# or dict with key 'loss'.
def training_step(batch, model, *args, **kwargs):
    input_data, target = batch
    result = model(input_data)
    return F.nll_loss(result, target)

# The user customized `training_model` should follow this paramter signature,
# (model, optimizer, `training_step`, lr_scheduler, max_steps, max_epochs, ...),
# and note that `training_step`` should be defined out of `training_model`.
def training_model(model, optimizer, training_step, lr_scheduler, max_steps, max_epochs, *args, **kwargs):
    # max_steps, max_epochs might be None, which means unlimited training time,
    # so here we need set a default termination condition (by default, total_epochs=10, total_steps=100000).
    total_epochs = max_epochs if max_epochs else 10
    total_steps = max_steps if max_steps else 100000
    current_step = 0

    # init dataloader
    train_dataloader = ...

    for epoch in range(total_epochs):
        ...
        for batch in train_dataloader:
            optimizer.zero_grad()
            loss = training_step(batch, model)
            loss.backward()
            optimizer.step()
            current_step += 1
            if current_step >= total_steps:
                return
        lr_scheduler.step()

import nni
traced_optimizer = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.01)

from nni.compression import TorchEvaluator
evaluator = TorchEvaluator(training_func=training_model, optimziers=traced_optimizer, training_step=training_step)

# TransformersEvaluator example
from transformers.trainer import Trainer
trainer = nni.trace(Trainer)(model=model, args=training_args)

from nni.compression import TransformersEvaluator
evaluator = TransformersEvaluator(trainer)

Examples

>>> from nni.compression.quantization import DoReFaQuantizer
>>> from nni.compression.utils import TorchEvaluator
>>> model = ...
>>> optimizer = ...
>>> max_steps, max_epochs = ..., ...
>>> evaluator = TorchEvaluator(train, optimizer, training_step)
>>> quantizer = DoReFaQuantizer(model, configure_list, evaluator)
>>> _, calibration_config = quantizer.compress(max_steps, max_epochs)

BNN Quantizer¶

class nni.compression.quantization.BNNQuantizer(model: Module, config_list: List[Dict], evaluator: Evaluator)[source]¶

class nni.compression.quantization.BNNQuantizer(model: Module, config_list: List[Dict], evaluator: Evaluator, existed_wrappers: Dict[str, ModuleWrapper] | None = None)

BinaryNet Quantization, as defined in: Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1,

We introduce a method to train Binarized Neural Networks (BNNs) - neural networks with binary weights and activations at run-time. At training-time the binary weights and activations are used for computing the parameters gradients. During the forward pass, BNNs drastically reduce memory size and accesses, and replace most arithmetic operations with bit-wise operations, which is expected to substantially improve power-efficiency.

Parameters:

model (torch.nn.Module) – Model to be quantized.
config_list (List[Dict]) – A list of dict, each dict configure which module need to be quantized, and how to quantize. Please refer Compression Config Specification for more information.

evaluator (Evaluator) –

NNI will use the evaluator to intervene in the model training process, so as to perform training-aware model compression. All training-aware model compression will use the evaluator as the entry for intervention training in the future. Usually you just need to wrap some classes with nni.trace or package the training process as a function to initialize the evaluator. Please refer Compression Evaluator for a full tutorial on how to initialize a evaluator.

The following are two simple examples, if you use native pytorch, please refer to TorchEvaluator, if you use pytorch_lightning, please refer to LightningEvaluator, if you use huggingface transformer trainer, please refer to TransformersEvaluator:

# LightningEvaluator example
import pytorch_lightning
lightning_trainer = nni.trace(pytorch_lightning.Trainer)(max_epochs=1, max_steps=50, logger=TensorBoardLogger(...))
lightning_data_module = nni.trace(pytorch_lightning.LightningDataModule)(...)

from nni.compression import LightningEvaluator
evaluator = LightningEvaluator(lightning_trainer, lightning_data_module)

# TorchEvaluator example
import torch
import torch.nn.functional as F

# The user customized `training_step` should follow this paramter signature,
# the first is `batch`, the second is `model`,
# and the return value of `training_step` should be loss, or tuple with the first element is loss,
# or dict with key 'loss'.
def training_step(batch, model, *args, **kwargs):
    input_data, target = batch
    result = model(input_data)
    return F.nll_loss(result, target)

# The user customized `training_model` should follow this paramter signature,
# (model, optimizer, `training_step`, lr_scheduler, max_steps, max_epochs, ...),
# and note that `training_step`` should be defined out of `training_model`.
def training_model(model, optimizer, training_step, lr_scheduler, max_steps, max_epochs, *args, **kwargs):
    # max_steps, max_epochs might be None, which means unlimited training time,
    # so here we need set a default termination condition (by default, total_epochs=10, total_steps=100000).
    total_epochs = max_epochs if max_epochs else 10
    total_steps = max_steps if max_steps else 100000
    current_step = 0

    # init dataloader
    train_dataloader = ...

    for epoch in range(total_epochs):
        ...
        for batch in train_dataloader:
            optimizer.zero_grad()
            loss = training_step(batch, model)
            loss.backward()
            optimizer.step()
            current_step += 1
            if current_step >= total_steps:
                return
        lr_scheduler.step()

import nni
traced_optimizer = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.01)

from nni.compression import TorchEvaluator
evaluator = TorchEvaluator(training_func=training_model, optimziers=traced_optimizer, training_step=training_step)

# TransformersEvaluator example
from transformers.trainer import Trainer
trainer = nni.trace(Trainer)(model=model, args=training_args)

from nni.compression import TransformersEvaluator
evaluator = TransformersEvaluator(trainer)

Examples

>>> from nni.compression.quantization import BNNQuantizer
>>> from nni.compression.utils import TorchEvaluator
>>> model = ...
>>> optimizer = ...
>>> max_steps, max_epochs = ..., ...
>>> evaluator = TorchEvaluator(train, optimizer, training_step)
>>> quantizer = BNNQuantizer(model, configure_list, evaluator)
>>> _, calibration_config = quantizer.compress(max_steps, max_epochs)

LSQ Quantizer¶

class nni.compression.quantization.LsqQuantizer(model: Module, config_list: List[Dict], evaluator: Evaluator)[source]¶

class nni.compression.quantization.LsqQuantizer(model: Module, config_list: List[Dict], evaluator: Evaluator, existed_wrappers: Dict[str, ModuleWrapper] | None = None)

LsqQuantizer, as defined in: LEARNED STEP SIZE QUANTIZATION, authors Steven K. Esser and Jeffrey L. McKinstry provide an algorithm to train the scales with gradients.

The authors introduce a novel means to estimate and scale the task loss gradient at each weight and activation layer’s quantizer step size, such that it can be learned in conjunction with other network parameters.

Parameters:

model (torch.nn.Module) – Model to be quantized.
config_list (List[Dict]) – A list of dict, each dict configure which module need to be quantized, and how to quantize. Please refer Compression Config Specification for more information.

evaluator (Evaluator) –

NNI will use the evaluator to intervene in the model training process, so as to perform training-aware model compression. All training-aware model compression will use the evaluator as the entry for intervention training in the future. Usually you just need to wrap some classes with nni.trace or package the training process as a function to initialize the evaluator. Please refer Compression Evaluator for a full tutorial on how to initialize a evaluator.

The following are two simple examples, if you use native pytorch, please refer to TorchEvaluator, if you use pytorch_lightning, please refer to LightningEvaluator, if you use huggingface transformer trainer, please refer to TransformersEvaluator:

# LightningEvaluator example
import pytorch_lightning
lightning_trainer = nni.trace(pytorch_lightning.Trainer)(max_epochs=1, max_steps=50, logger=TensorBoardLogger(...))
lightning_data_module = nni.trace(pytorch_lightning.LightningDataModule)(...)

from nni.compression import LightningEvaluator
evaluator = LightningEvaluator(lightning_trainer, lightning_data_module)

# TorchEvaluator example
import torch
import torch.nn.functional as F

# The user customized `training_step` should follow this paramter signature,
# the first is `batch`, the second is `model`,
# and the return value of `training_step` should be loss, or tuple with the first element is loss,
# or dict with key 'loss'.
def training_step(batch, model, *args, **kwargs):
    input_data, target = batch
    result = model(input_data)
    return F.nll_loss(result, target)

# The user customized `training_model` should follow this paramter signature,
# (model, optimizer, `training_step`, lr_scheduler, max_steps, max_epochs, ...),
# and note that `training_step`` should be defined out of `training_model`.
def training_model(model, optimizer, training_step, lr_scheduler, max_steps, max_epochs, *args, **kwargs):
    # max_steps, max_epochs might be None, which means unlimited training time,
    # so here we need set a default termination condition (by default, total_epochs=10, total_steps=100000).
    total_epochs = max_epochs if max_epochs else 10
    total_steps = max_steps if max_steps else 100000
    current_step = 0

    # init dataloader
    train_dataloader = ...

    for epoch in range(total_epochs):
        ...
        for batch in train_dataloader:
            optimizer.zero_grad()
            loss = training_step(batch, model)
            loss.backward()
            optimizer.step()
            current_step += 1
            if current_step >= total_steps:
                return
        lr_scheduler.step()

import nni
traced_optimizer = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.01)

from nni.compression import TorchEvaluator
evaluator = TorchEvaluator(training_func=training_model, optimziers=traced_optimizer, training_step=training_step)

# TransformersEvaluator example
from transformers.trainer import Trainer
trainer = nni.trace(Trainer)(model=model, args=training_args)

from nni.compression import TransformersEvaluator
evaluator = TransformersEvaluator(trainer)

Examples

>>> from nni.compression.quantization import LsqQuantizer
>>> from nni.compression.utils import TorchEvaluator
>>> model = ...
>>> optimizer = ...
>>> max_steps, max_epochs = ..., ...
>>> evaluator = TorchEvaluator(train, optimizer, training_step)
>>> quantizer = LsqQuantizer(model, configure_list, evaluator)
>>> _, calibration_config = quantizer.compress(max_steps, max_epochs)

PTQ Quantizer¶

class nni.compression.quantization.PtqQuantizer(model: Module, config_list: List[Dict], evaluator: Evaluator)[source]¶

class nni.compression.quantization.PtqQuantizer(model: Module, config_list: List[Dict], evaluator: Evaluator, existed_wrappers: Dict[str, ModuleWrapper] | None = None)

Post Training Quantization

Parameters:

model (torch.nn.Module) – Model to be quantized.
config_list (List[Dict]) – A list of dict, each dict configure which module need to be quantized, and how to quantize. Please refer Compression Config Specification for more information.

evaluator (Evaluator) –

NNI will use the evaluator to intervene in the model training process, so as to perform training-aware model compression. All training-aware model compression will use the evaluator as the entry for intervention training in the future. Usually you just need to wrap some classes with nni.trace or package the training process as a function to initialize the evaluator. Please refer Compression Evaluator for a full tutorial on how to initialize a evaluator.

The following are two simple examples, if you use native pytorch, please refer to TorchEvaluator, if you use pytorch_lightning, please refer to LightningEvaluator, if you use huggingface transformer trainer, please refer to TransformersEvaluator:

# LightningEvaluator example
import pytorch_lightning
lightning_trainer = nni.trace(pytorch_lightning.Trainer)(max_epochs=1, max_steps=50, logger=TensorBoardLogger(...))
lightning_data_module = nni.trace(pytorch_lightning.LightningDataModule)(...)

from nni.compression import LightningEvaluator
evaluator = LightningEvaluator(lightning_trainer, lightning_data_module)

# TorchEvaluator example
import torch
import torch.nn.functional as F

# The user customized `training_step` should follow this paramter signature,
# the first is `batch`, the second is `model`,
# and the return value of `training_step` should be loss, or tuple with the first element is loss,
# or dict with key 'loss'.
def training_step(batch, model, *args, **kwargs):
    input_data, target = batch
    result = model(input_data)
    return F.nll_loss(result, target)

# The user customized `training_model` should follow this paramter signature,
# (model, optimizer, `training_step`, lr_scheduler, max_steps, max_epochs, ...),
# and note that `training_step`` should be defined out of `training_model`.
def training_model(model, optimizer, training_step, lr_scheduler, max_steps, max_epochs, *args, **kwargs):
    # max_steps, max_epochs might be None, which means unlimited training time,
    # so here we need set a default termination condition (by default, total_epochs=10, total_steps=100000).
    total_epochs = max_epochs if max_epochs else 10
    total_steps = max_steps if max_steps else 100000
    current_step = 0

    # init dataloader
    train_dataloader = ...

    for epoch in range(total_epochs):
        ...
        for batch in train_dataloader:
            optimizer.zero_grad()
            loss = training_step(batch, model)
            loss.backward()
            optimizer.step()
            current_step += 1
            if current_step >= total_steps:
                return
        lr_scheduler.step()

import nni
traced_optimizer = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.01)

from nni.compression import TorchEvaluator
evaluator = TorchEvaluator(training_func=training_model, optimziers=traced_optimizer, training_step=training_step)

# TransformersEvaluator example
from transformers.trainer import Trainer
trainer = nni.trace(Trainer)(model=model, args=training_args)

from nni.compression import TransformersEvaluator
evaluator = TransformersEvaluator(trainer)

Examples

>>> from nni.compression.quantization import PtqQuantizer
>>> from nni.compression.utils import TorchEvaluator
>>> model = ...
>>> optimizer = ...
>>> max_steps, max_epochs = ..., ...
>>> evaluator = TorchEvaluator(train, optimizer, training_step)
>>> quantizer = PtqQuantizer(model, configure_list, evaluator)
>>> _, calibration_config = quantizer.compress(max_steps, max_epochs)