Strategy¶

Multi-trial Strategy¶

Random¶

class nni.retiarii.strategy.Random(variational=False, dedup=True, model_filter=None)[source]¶

Random search on the search space.

Parameters

variational (bool) – Do not dry run to get the full search space. Used when the search space has variational size or candidates. Default: false.
dedup (bool) – Do not try the same configuration twice. When variational is true, deduplication is not supported. Default: true.
model_filter (Callable[[Model], bool]) – Feed the model and return a bool. This will filter the models in search space and select which to submit.

GridSearch¶

class nni.retiarii.strategy.GridSearch(shuffle=True)[source]¶

Traverse the search space and try all the possible combinations one by one.

Parameters: shuffle (bool) – Shuffle the order in a candidate list, so that they are tried in a random order. Default: true.

RegularizedEvolution¶

class nni.retiarii.strategy.RegularizedEvolution(optimize_mode='maximize', population_size=100, sample_size=25, cycles=20000, mutation_prob=0.05, on_failure='ignore', model_filter=None)[source]¶

Algorithm for regularized evolution (i.e. aging evolution). Follows “Algorithm 1” in Real et al. “Regularized Evolution for Image Classifier Architecture Search”.

Parameters

optimize_mode (str) – Can be one of “maximize” and “minimize”. Default: maximize.
population_size (int) – The number of individuals to keep in the population. Default: 100.
cycles (int) – The number of cycles (trials) the algorithm should run for. Default: 20000.
sample_size (int) – The number of individuals that should participate in each tournament. Default: 25.
mutation_prob (float) – Probability that mutation happens in each dim. Default: 0.05
on_failure (str) – Can be one of “ignore” and “worst”. If “ignore”, simply give up the model and find a new one. If “worst”, mark the model as -inf (if maximize, inf if minimize), so that the algorithm “learns” to avoid such model. Default: ignore.
model_filter (Callable[[Model], bool]) – Feed the model and return a bool. This will filter the models in search space and select which to submit.

TPE¶

class nni.retiarii.strategy.TPE[source]¶

The Tree-structured Parzen Estimator (TPE) is a sequential model-based optimization (SMBO) approach.

Find the details in Algorithms for Hyper-Parameter Optimization.

SMBO methods sequentially construct models to approximate the performance of hyperparameters based on historical measurements, and then subsequently choose new hyperparameters to test based on this model.

PolicyBasedRL¶

class nni.retiarii.strategy.PolicyBasedRL(max_collect=100, trial_per_collect=20, policy_fn=None)[source]¶

Algorithm for policy-based reinforcement learning. This is a wrapper of algorithms provided in tianshou (PPO by default), and can be easily customized with other algorithms that inherit BasePolicy (e.g., REINFORCE as in this paper).

Parameters

max_collect (int) – How many times collector runs to collect trials for RL. Default 100.
trial_per_collect (int) – How many trials (trajectories) each time collector collects. After each collect, trainer will sample batch from replay buffer and do the update. Default: 20.
policy_fn (function) – Takes ModelEvaluationEnv as input and return a policy. See PolicyBasedRL._default_policy_fn() for an example.

One-shot Strategy¶

Note

The usage of one-shot has been refreshed in v2.8. Please see legacy one-shot trainers for the old-style one-shot strategies.

DARTS¶

class nni.retiarii.strategy.DARTS(**kwargs)[source]¶

Continuous relaxation of the architecture representation, allowing efficient search of the architecture using gradient descent. Reference.

DARTS algorithm is one of the most fundamental one-shot algorithm. DARTS repeats iterations, where each iteration consists of 2 training phases. The phase 1 is architecture step, in which model parameters are frozen and the architecture parameters are trained. The phase 2 is model step, in which architecture parameters are frozen and model parameters are trained.

The current implementation corresponds to DARTS (1st order) in paper. Second order (unrolled 2nd-order derivatives) is not supported yet.

New in version 2.8: Supports searching for ValueChoices on operations, with the technique described in FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions. One difference is that, in DARTS, we are using Softmax instead of GumbelSoftmax.

The supported mutation primitives of DARTS are:

nni.retiarii.nn.pytorch.LayerChoice.
nni.retiarii.nn.pytorch.InputChoice.
nni.retiarii.nn.pytorch.ValueChoice (only when used in Linear, Conv2d, BatchNorm2d, MultiheadAttention).
nni.retiarii.nn.pytorch.Repeat.
nni.retiarii.nn.pytorch.Cell.
nni.retiarii.nn.pytorch.NasBench201Cell.

Parameters

mutation_hooks (list[MutationHook]) –
Extra mutation hooks to support customized mutation on primitives other than built-ins.

Mutation hooks are callable that inputs an Module and returns a BaseSuperNetModule. They are invoked in traverse_and_mutate_submodules(), on each submodules. For each submodule, the hook list are invoked subsequently, the later hooks can see the result from previous hooks. The modules that are processed by mutation_hooks will be replaced by the returned module, stored in nas_modules, and be the focus of the NAS algorithm.

The hook list will be appended by default_mutation_hooks in each one-shot module.

To be more specific, the input arguments are four arguments:
1. a module that might be processed,
2. name of the module in its parent module,
3. a memo dict whose usage depends on the particular algorithm.
4. keyword arguments (configurations).
Note that the memo should be read/written by hooks. There won’t be any hooks called on root module.

The returned arguments can be also one of the three kinds:
1. tuple of: BaseSuperNetModule or None, and boolean,
2. boolean,
3. BaseSuperNetModule or None.
The boolean value is suppress indicates whether the following hooks should be called. When it’s true, it suppresses the subsequent hooks, and they will never be invoked. Without boolean value specified, it’s assumed to be false. If a none value appears on the place of BaseSuperNetModule, it means the hook suggests to keep the module unchanged, and nothing will happen.

An example of mutation hook is given in no_default_hook(). However it’s recommended to implement mutation hooks by deriving BaseSuperNetModule, and add its classmethod mutate to this list.
arc_learning_rate (float) – Learning rate for architecture optimizer. Default: 3.0e-4

ENAS¶

class nni.retiarii.strategy.ENAS(**kwargs)[source]¶

RL controller learns to generate the best network on a super-net. See ENAS paper.

There are 2 steps in an epoch.

Firstly, training model parameters.
Secondly, training ENAS RL agent. The agent will produce a sample of model architecture to get the best reward.

Note

ENAS requires the evaluator to report metrics via self.log in its validation_step. See explanation of reward_metric_name for details.

The supported mutation primitives of ENAS are:

nni.retiarii.nn.pytorch.LayerChoice.
nni.retiarii.nn.pytorch.InputChoice.
nni.retiarii.nn.pytorch.ValueChoice (only when used in Linear, Conv2d, BatchNorm2d, MultiheadAttention).
nni.retiarii.nn.pytorch.Repeat.
nni.retiarii.nn.pytorch.Cell.
nni.retiarii.nn.pytorch.NasBench201Cell.

Parameters

mutation_hooks (list[MutationHook]) –
Extra mutation hooks to support customized mutation on primitives other than built-ins.

Mutation hooks are callable that inputs an Module and returns a BaseSuperNetModule. They are invoked in traverse_and_mutate_submodules(), on each submodules. For each submodule, the hook list are invoked subsequently, the later hooks can see the result from previous hooks. The modules that are processed by mutation_hooks will be replaced by the returned module, stored in nas_modules, and be the focus of the NAS algorithm.

The hook list will be appended by default_mutation_hooks in each one-shot module.

To be more specific, the input arguments are four arguments:
1. a module that might be processed,
2. name of the module in its parent module,
3. a memo dict whose usage depends on the particular algorithm.
4. keyword arguments (configurations).
Note that the memo should be read/written by hooks. There won’t be any hooks called on root module.

The returned arguments can be also one of the three kinds:
1. tuple of: BaseSuperNetModule or None, and boolean,
2. boolean,
3. BaseSuperNetModule or None.
The boolean value is suppress indicates whether the following hooks should be called. When it’s true, it suppresses the subsequent hooks, and they will never be invoked. Without boolean value specified, it’s assumed to be false. If a none value appears on the place of BaseSuperNetModule, it means the hook suggests to keep the module unchanged, and nothing will happen.

An example of mutation hook is given in no_default_hook(). However it’s recommended to implement mutation hooks by deriving BaseSuperNetModule, and add its classmethod mutate to this list.
ctrl_kwargs (dict) – Optional kwargs that will be passed to ReinforceController.
entropy_weight (float) – Weight of sample entropy loss in RL.
skip_weight (float) – Weight of skip penalty loss. See ReinforceController for details.
baseline_decay (float) – Decay factor of reward baseline, which is used to normalize the reward in RL. At each step, the new reward baseline will be equal to baseline_decay * baseline_old + reward * (1 - baseline_decay).
ctrl_steps_aggregate (int) – Number of steps for which the gradients will be accumulated, before updating the weights of RL controller.
ctrl_grad_clip (float) – Gradient clipping value of controller.
reward_metric_name (str or None) – The name of the metric which is treated as reward. This will be not effective when there’s only one metric returned from evaluator. If there are multiple, by default, it will find the metric with key name default. If reward_metric_name is specified, it will find reward_metric_name. Otherwise it raises an exception indicating multiple metrics are found.

class nni.retiarii.oneshot.pytorch.enas.ReinforceController(fields, lstm_size=64, lstm_num_layers=1, tanh_constant=1.5, skip_target=0.4, temperature=None, entropy_reduction='sum')[source]¶

A controller that mutates the graph with RL.

Parameters

fields (list of ReinforceField) – List of fields to choose.
lstm_size (int) – Controller LSTM hidden units.
lstm_num_layers (int) – Number of layers for stacked LSTM.
tanh_constant (float) – Logits will be equal to tanh_constant * tanh(logits). Don’t use tanh if this value is None.
skip_target (float) – Target probability that skipconnect (chosen by InputChoice) will appear. If the chosen number of inputs is away from the skip_connect, there will be a sample skip penalty which is a KL divergence added.
temperature (float) – Temperature constant that divides the logits.
entropy_reduction (str) – Can be one of sum and mean. How the entropy of multi-input-choice is reduced.

GumbelDARTS¶

class nni.retiarii.strategy.GumbelDARTS(**kwargs)[source]¶

Choose the best block by using Gumbel Softmax random sampling and differentiable training. See FBNet and SNAS.

This is a DARTS-based method that uses gumbel-softmax to simulate one-hot distribution. Essentially, it tries to mimick the behavior of sampling one path on forward by gradually cool down the temperature, aiming to bridge the gap between differentiable architecture weights and discretization of architectures.

New in version 2.8: Supports searching for ValueChoices on operations, with the technique described in FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions.

The supported mutation primitives of GumbelDARTS are:

nni.retiarii.nn.pytorch.LayerChoice.
nni.retiarii.nn.pytorch.InputChoice.
nni.retiarii.nn.pytorch.ValueChoice (only when used in Linear, Conv2d, BatchNorm2d, MultiheadAttention).
nni.retiarii.nn.pytorch.Repeat.
nni.retiarii.nn.pytorch.Cell.
nni.retiarii.nn.pytorch.NasBench201Cell.

Parameters

mutation_hooks (list[MutationHook]) –
Extra mutation hooks to support customized mutation on primitives other than built-ins.

Mutation hooks are callable that inputs an Module and returns a BaseSuperNetModule. They are invoked in traverse_and_mutate_submodules(), on each submodules. For each submodule, the hook list are invoked subsequently, the later hooks can see the result from previous hooks. The modules that are processed by mutation_hooks will be replaced by the returned module, stored in nas_modules, and be the focus of the NAS algorithm.

The hook list will be appended by default_mutation_hooks in each one-shot module.

To be more specific, the input arguments are four arguments:
1. a module that might be processed,
2. name of the module in its parent module,
3. a memo dict whose usage depends on the particular algorithm.
4. keyword arguments (configurations).
Note that the memo should be read/written by hooks. There won’t be any hooks called on root module.

The returned arguments can be also one of the three kinds:
1. tuple of: BaseSuperNetModule or None, and boolean,
2. boolean,
3. BaseSuperNetModule or None.
The boolean value is suppress indicates whether the following hooks should be called. When it’s true, it suppresses the subsequent hooks, and they will never be invoked. Without boolean value specified, it’s assumed to be false. If a none value appears on the place of BaseSuperNetModule, it means the hook suggests to keep the module unchanged, and nothing will happen.

An example of mutation hook is given in no_default_hook(). However it’s recommended to implement mutation hooks by deriving BaseSuperNetModule, and add its classmethod mutate to this list.
gumbel_temperature (float) – The initial temperature used in gumbel-softmax.
use_temp_anneal (bool) – If true, a linear annealing will be applied to gumbel_temperature. Otherwise, run at a fixed temperature. See SNAS for details.
min_temp (float) – The minimal temperature for annealing. No need to set this if you set use_temp_anneal False.
arc_learning_rate (float) – Learning rate for architecture optimizer. Default: 3.0e-4

RandomOneShot¶

class nni.retiarii.strategy.RandomOneShot(**kwargs)[source]¶

Train a super-net with uniform path sampling. See reference.

In each epoch, model parameters are trained after a uniformly random sampling of each choice. Notably, the exporting result is also a random sample of the search space.

The supported mutation primitives of RandomOneShot are:

nni.retiarii.nn.pytorch.LayerChoice.
nni.retiarii.nn.pytorch.InputChoice.
nni.retiarii.nn.pytorch.ValueChoice (only when used in Linear, Conv2d, BatchNorm2d, MultiheadAttention).
nni.retiarii.nn.pytorch.Repeat.
nni.retiarii.nn.pytorch.Cell.
nni.retiarii.nn.pytorch.NasBench201Cell.

Parameters

mutation_hooks (list[MutationHook]) –

Extra mutation hooks to support customized mutation on primitives other than built-ins.

Mutation hooks are callable that inputs an Module and returns a BaseSuperNetModule. They are invoked in traverse_and_mutate_submodules(), on each submodules. For each submodule, the hook list are invoked subsequently, the later hooks can see the result from previous hooks. The modules that are processed by mutation_hooks will be replaced by the returned module, stored in nas_modules, and be the focus of the NAS algorithm.

The hook list will be appended by default_mutation_hooks in each one-shot module.

To be more specific, the input arguments are four arguments:

a module that might be processed,
name of the module in its parent module,
a memo dict whose usage depends on the particular algorithm.
keyword arguments (configurations).

Note that the memo should be read/written by hooks. There won’t be any hooks called on root module.

The returned arguments can be also one of the three kinds:

tuple of: BaseSuperNetModule or None, and boolean,
boolean,
BaseSuperNetModule or None.

The boolean value is suppress indicates whether the following hooks should be called. When it’s true, it suppresses the subsequent hooks, and they will never be invoked. Without boolean value specified, it’s assumed to be false. If a none value appears on the place of BaseSuperNetModule, it means the hook suggests to keep the module unchanged, and nothing will happen.

An example of mutation hook is given in no_default_hook(). However it’s recommended to implement mutation hooks by deriving BaseSuperNetModule, and add its classmethod mutate to this list.

Proxyless¶

class nni.retiarii.strategy.Proxyless(**kwargs)[source]¶

A low-memory-consuming optimized version of differentiable architecture search. See reference.

This is a DARTS-based method that resamples the architecture to reduce memory consumption. Essentially, it samples one path on forward, and implements its own backward to update the architecture parameters based on only one path.

The supported mutation primitives of Proxyless are:

Parameters

mutation_hooks (list[MutationHook]) –
Extra mutation hooks to support customized mutation on primitives other than built-ins.

Mutation hooks are callable that inputs an Module and returns a BaseSuperNetModule. They are invoked in traverse_and_mutate_submodules(), on each submodules. For each submodule, the hook list are invoked subsequently, the later hooks can see the result from previous hooks. The modules that are processed by mutation_hooks will be replaced by the returned module, stored in nas_modules, and be the focus of the NAS algorithm.

The hook list will be appended by default_mutation_hooks in each one-shot module.

To be more specific, the input arguments are four arguments:
1. a module that might be processed,
2. name of the module in its parent module,
3. a memo dict whose usage depends on the particular algorithm.
4. keyword arguments (configurations).
Note that the memo should be read/written by hooks. There won’t be any hooks called on root module.

The returned arguments can be also one of the three kinds:
1. tuple of: BaseSuperNetModule or None, and boolean,
2. boolean,
3. BaseSuperNetModule or None.
The boolean value is suppress indicates whether the following hooks should be called. When it’s true, it suppresses the subsequent hooks, and they will never be invoked. Without boolean value specified, it’s assumed to be false. If a none value appears on the place of BaseSuperNetModule, it means the hook suggests to keep the module unchanged, and nothing will happen.

An example of mutation hook is given in no_default_hook(). However it’s recommended to implement mutation hooks by deriving BaseSuperNetModule, and add its classmethod mutate to this list.
arc_learning_rate (float) – Learning rate for architecture optimizer. Default: 3.0e-4

Customization¶

Multi-trial¶

class nni.retiarii.Sampler[source]: Handles Mutator.choice() calls.

class nni.retiarii.strategy.BaseStrategy[source]¶

nni.retiarii.execution.budget_exhausted()[source]¶

nni.retiarii.execution.get_and_register_default_listener(engine)[source]¶

nni.retiarii.execution.get_execution_engine()[source]¶

nni.retiarii.execution.is_stopped_exec(model)[source]¶

nni.retiarii.execution.list_models(*models)[source]¶

nni.retiarii.execution.query_available_resources()[source]¶

nni.retiarii.execution.set_execution_engine(engine)[source]¶

nni.retiarii.execution.submit_models(*models)[source]¶

nni.retiarii.execution.wait_models(*models)[source]¶

One-shot¶

base_lightning¶

class nni.retiarii.oneshot.pytorch.base_lightning.BaseOneShotLightningModule(model, mutation_hooks=None)[source]¶

The base class for all one-shot NAS modules.

In NNI, we try to separate the “search” part and “training” part in one-shot NAS. The “training” part is defined with evaluator interface (has to be lightning evaluator interface to work with oneshot). Since the lightning evaluator has already broken down the training into minimal building blocks, we can re-assemble them after combining them with the “search” part of a particular algorithm.

After the re-assembling, this module has defined all the search + training. The experiment can use a lightning trainer (which is another part in the evaluator) to train this module, so as to complete the search process.

Essential function such as preprocessing user’s model, redirecting lightning hooks for user’s model, configuring optimizers and exporting NAS result are implemented in this class.

nas_modules¶

Modules that have been mutated, which the search algorithms should care about.

Type: list[BaseSuperNetModule]

model¶

PyTorch lightning module. A model space with training recipe defined (wrapped by LightningModule in evaluator).

Type: pl.LightningModule

Parameters

inner_module (pytorch_lightning.LightningModule) – It’s a LightningModule that defines computations, train/val loops, optimizers in a single class. When used in NNI, the inner_module is the combination of instances of evaluator + base model (to be precise, a base model wrapped with LightningModule in evaluator).
mutation_hooks (list[MutationHook]) –
Extra mutation hooks to support customized mutation on primitives other than built-ins.

Mutation hooks are callable that inputs an Module and returns a BaseSuperNetModule. They are invoked in traverse_and_mutate_submodules(), on each submodules. For each submodule, the hook list are invoked subsequently, the later hooks can see the result from previous hooks. The modules that are processed by mutation_hooks will be replaced by the returned module, stored in nas_modules, and be the focus of the NAS algorithm.

The hook list will be appended by default_mutation_hooks in each one-shot module.

To be more specific, the input arguments are four arguments:
1. a module that might be processed,
2. name of the module in its parent module,
3. a memo dict whose usage depends on the particular algorithm.
4. keyword arguments (configurations).
Note that the memo should be read/written by hooks. There won’t be any hooks called on root module.

The returned arguments can be also one of the three kinds:
1. tuple of: BaseSuperNetModule or None, and boolean,
2. boolean,
3. BaseSuperNetModule or None.
The boolean value is suppress indicates whether the following hooks should be called. When it’s true, it suppresses the subsequent hooks, and they will never be invoked. Without boolean value specified, it’s assumed to be false. If a none value appears on the place of BaseSuperNetModule, it means the hook suggests to keep the module unchanged, and nothing will happen.

An example of mutation hook is given in no_default_hook(). However it’s recommended to implement mutation hooks by deriving BaseSuperNetModule, and add its classmethod mutate to this list.

architecture_optimizers()[source]¶

Get architecture optimizers from all optimizers. Use this to get your architecture optimizers in training_step().

Returns: opts – Architecture optimizers defined in configure_architecture_optimizers(). This will be None if there is no architecture optimizers.
Return type: list[Optimizer], Optimizer, None

call_lr_schedulers(batch_index)[source]¶

Function that imitates lightning trainer’s behaviour of calling user’s lr schedulers. Since auto_optimization is turned off by this class, you can use this function to make schedulers behave as they were automatically handled by the lightning trainer.

Parameters: batch_idx (int) – batch index

call_weight_optimizers(method)[source]¶

Function that imitates lightning trainer’s behavior of calling user’s optimizers. Since auto_optimization is turned off by this class, you can use this function to make user optimizers behave as they were automatically handled by the lightning trainer.

Parameters: method (str) – Method to call. Only step and zero_grad are supported now.

configure_architecture_optimizers()[source]¶

Hook kept for subclasses. A specific NAS method inheriting this base class should return its architecture optimizers here if architecture parameters are needed. Note that lr schedulers are not supported now for architecture_optimizers.

Returns: arc_optimizers – Optimizers used by a specific NAS algorithm. Return None if no architecture optimizers are needed.
Return type: list[Optimizer], Optimizer

configure_optimizers()[source]¶

Combine architecture optimizers and user’s model optimizers. You can overwrite configure_architecture_optimizers() if architecture optimizers are needed in your NAS algorithm.

For now model is tested against evaluators in nni.retiarii.evaluator.pytorch.lightning and it only returns 1 optimizer. But for extendibility, codes for other return value types are also implemented.

default_mutation_hooks()[source]¶

Override this to define class-default mutation hooks.

export()[source]¶

Export the NAS result, ideally the best choice of each nas_modules. You may implement an export method for your customized nas_modules.

Returns: Keys are names of nas_modules, and values are the choice indices of them.
Return type: dict

mutate_kwargs()[source]¶

Extra keyword arguments passed to mutation hooks. Usually algo-specific.

resample()[source]¶

Trigger the resample for each nas_modules. Sometimes (e.g., in differentiable cases), it does nothing.

Returns: Sampled architecture.
Return type: dict

search_space_spec()[source]¶

Get the search space specification from nas_modules.

Returns: Key is the name of the choice, value is the corresponding ParameterSpec.
Return type: dict

training_step(batch, batch_idx)[source]¶: This is the implementation of what happens in training loops of one-shot algos. It usually calls self.model.training_step which implements the real training recipe of the users’ model.

weight_optimizers()[source]¶

Get user optimizers from all optimizers. Use this to get user optimizers in training_step().

Returns: opts – Optimizers defined by user’s model. This will be None if there is no user optimizers.
Return type: list[Optimizer], Optimizer, None

class nni.retiarii.oneshot.pytorch.base_lightning.BaseSuperNetModule[source]¶

Mutated module in super-net. Usually, the feed-forward of the module itself is undefined. It has to be resampled with resample() so that a specific path is selected. (Sometimes, this is not required. For example, differentiable super-net.)

A super-net module usually corresponds to one sample. But two exceptions:

A module can have multiple parameter spec. For example, a convolution-2d can sample kernel size, channels at the same time.
Multiple modules can share one parameter spec. For example, multiple layer choices with the same label.

For value choice compositions, the parameter spec are bounded to the underlying (original) value choices, rather than their compositions.

export(memo)[source]¶

Export the final architecture within this module. It should have the same keys as search_space_spec().

Parameters: memo (dict[str, Any]) – Use memo to avoid the same label gets exported multiple times.

classmethod mutate(module, name, memo, mutate_kwargs)[source]¶

This is a mutation hook that creates a BaseSuperNetModule. The method should be implemented in each specific super-net module, because they usually have specific rules about what kind of modules to operate on.

Parameters

module (nn.Module) – The module to be mutated (replaced).
name (str) – Name of this module. With full prefix. For example, module1.block1.conv.
memo (dict) – Memo to enable sharing parameters among mutated modules. It should be read and written by mutate functions themselves.
mutate_kwargs (dict) – Algo-related hyper-parameters, and some auxiliary information.

Returns

The mutation result, along with an optional boolean flag indicating whether to suppress follow-up mutation hooks. See BaseOneShotLightningModule for details.

Return type

Union[BaseSuperNetModule, bool, tuple[BaseSuperNetModule, bool]]

resample(memo)[source]¶

Resample the super-net module.

Parameters: memo (dict[str, Any]) – Used to ensure the consistency of samples with the same label.
Returns: Sampled result. If nothing new is sampled, it should return an empty dict.
Return type: dict

search_space_spec()[source]¶

Space specification (sample points). Mapping from spec name to ParameterSpec. The names in choices should be in the same format of export.

For example:

{"layer1": ParameterSpec(values=["conv", "pool"])}

nni.retiarii.oneshot.pytorch.base_lightning.no_default_hook(module, name, memo, mutate_kwargs)[source]¶

Add this hook at the end of your hook list to raise error for unsupported mutation primitives.

nni.retiarii.oneshot.pytorch.base_lightning.traverse_and_mutate_submodules(root_module, hooks, mutate_kwargs, topdown=True)[source]¶

Traverse the module-tree of root_module, and call hooks on every tree node.

Parameters

root_module (nn.Module) – User-defined model space. Since this method is called in the __init__ of BaseOneShotLightningModule, it’s usually a pytorch_lightning.LightningModule. The mutation will be in-place on root_module.
hooks (list[MutationHook]) – List of mutation hooks. See BaseOneShotLightningModule for how to write hooks. When a hook returns an module, the module will be replaced (mutated) to the new module.
mutate_kwargs (dict) – Extra keyword arguments passed to hooks.
topdown (bool, default = False) – If topdown is true, hooks are first called, before traversing its sub-module (i.e., pre-order DFS). Otherwise, sub-modules are first traversed, before calling hooks on this node (i.e., post-order DFS).

Returns

modules – The replace result.

Return type

dict[str, nn.Module]

dataloader¶

class nni.retiarii.oneshot.pytorch.dataloader.ConcatLoader(loaders, mode='min_size')[source]¶

This loader is same as CombinedLoader in PyTorch-Lightning, but concatenate sub-loaders instead of loading them in parallel.

Parameters

loaders (dict[str, Any]) –
For example,
```
{
    "train": DataLoader(train_dataset),
    "val": DataLoader(val_dataset)
}
```
In this example, the loader will first produce the batches from “train”, then “val”.
mode (str) – Only support “min_size” for now.

supermodule.differentiable¶

class nni.retiarii.oneshot.pytorch.supermodule.differentiable.DifferentiableMixedCell(op_factory, num_nodes, num_ops_per_node, num_predecessors, preprocessor, postprocessor, concat_dim, memo, mutate_kwargs, label)[source]¶

Implementation of Cell under differentiable context.

An architecture parameter is created on each edge of the full-connected graph.

export(memo)[source]¶

Tricky export.

Reference: https://github.com/quark0/darts/blob/f276dd346a09ae3160f8e3aca5c7b193fda1da37/cnn/model_search.py#L135 We don’t avoid selecting operations like none here, because it looks like a different search space.

resample(memo)[source]¶: Differentiable doesn’t need to resample.

class nni.retiarii.oneshot.pytorch.supermodule.differentiable.DifferentiableMixedInput(n_candidates, n_chosen, alpha, softmax, label)[source]¶

Mixed input. Forward returns a weighted sum of candidates. Implementation is very similar to DifferentiableMixedLayer.

Parameters

n_candidates (int) – Expect number of input candidates.
n_chosen (int) – Expect numebr of inputs finally chosen.
alpha (Tensor) – Tensor that stores the “learnable” weights.
softmax (nn.Module) – Customizable softmax function. Usually nn.Softmax(-1).
label (str) – Name of the choice.

label¶

Name of the choice.

Type: str

export(memo)[source]¶: Choose the operator with the top n_chosen logits.

forward(inputs)[source]¶: Forward takes a list of input candidates.

named_parameters(*args, **kwargs)[source]¶: Named parameters excluding architecture parameters.

parameters(*args, **kwargs)[source]¶: Parameters excluding architecture parameters.

reduction(items, weights)[source]¶

Override this for customized reduction.

resample(memo)[source]¶: Do nothing. Differentiable layer doesn’t need resample.

class nni.retiarii.oneshot.pytorch.supermodule.differentiable.DifferentiableMixedLayer(paths, alpha, softmax, label)[source]¶

Mixed layer, in which fprop is decided by a weighted sum of several layers. Proposed in DARTS: Differentiable Architecture Search.

The weight alpha is usually learnable, and optimized on validation dataset.

Differentiable sampling layer requires all operators returning the same shape for one input, as all outputs will be weighted summed to get the final output.

Parameters

paths (list[tuple[str, nn.Module]]) – Layers to choose from. Each is a tuple of name, and its module.
alpha (Tensor) – Tensor that stores the “learnable” weights.
softmax (nn.Module) – Customizable softmax function. Usually nn.Softmax(-1).
label (str) – Name of the choice.

op_names¶

Operator names.

Type: str

label¶

Name of the choice.

Type: str

export(memo)[source]¶: Choose the operator with the maximum logit.

forward(*args, **kwargs)[source]¶: The forward of mixed layer accepts same arguments as its sub-layer.

named_parameters(*args, **kwargs)[source]¶: Named parameters excluding architecture parameters.

parameters(*args, **kwargs)[source]¶: Parameters excluding architecture parameters.

reduction(items, weights)[source]¶

Override this for customized reduction.

resample(memo)[source]¶: Do nothing. Differentiable layer doesn’t need resample.

class nni.retiarii.oneshot.pytorch.supermodule.differentiable.DifferentiableMixedRepeat(blocks, depth, softmax, memo)[source]¶

Implementaion of Repeat in a differentiable supernet. Result is a weighted sum of possible prefixes, sliced by possible depths.

If the output is not a single tensor, it will be summed at every independant dimension. See weighted_sum() for details.

export(memo)[source]¶: Choose argmax for each leaf value choice.

reduction(items, weights, depths)[source]¶

Override this for customized reduction.

resample(memo)[source]¶: Do nothing.

class nni.retiarii.oneshot.pytorch.supermodule.differentiable.MixedOpDifferentiablePolicy(operation, memo, mutate_kwargs)[source]¶

Implementes the differentiable sampling in mixed operation.

One mixed operation can have multiple value choices in its arguments. Thus the _arch_alpha here is a parameter dict, and named_parameters filters out multiple parameters with _arch_alpha as its prefix.

When this class is asked for forward_argument, it returns a distribution, i.e., a dict from int to float based on its weights.

All the parameters (_arch_alpha, parameters(), _softmax) are saved as attributes of operation, rather than self, because this class itself is not a nn.Module, and saved parameters here won’t be optimized.

export(operation, memo)[source]¶

Export is argmax for each leaf value choice.

resample(operation, memo)[source]¶

Differentiable. Do nothing in resample.

supermodule.sampling¶

class nni.retiarii.oneshot.pytorch.supermodule.sampling.MixedOpPathSamplingPolicy(operation, memo, mutate_kwargs)[source]¶

Implementes the path sampling in mixed operation.

One mixed operation can have multiple value choices in its arguments. Each value choice can be further decomposed into “leaf value choices”. We sample the leaf nodes, and composits them into the values on arguments.

export(operation, memo)[source]¶

Export is also random for each leaf value choice.

resample(operation, memo)[source]¶

Random sample for each leaf value choice.

class nni.retiarii.oneshot.pytorch.supermodule.sampling.PathSamplingCell(op_factory, num_nodes, num_ops_per_node, num_predecessors, preprocessor, postprocessor, concat_dim, memo, mutate_kwargs, label)[source]¶

The implementation of super-net cell follows DARTS.

When factory_used is true, it reconstructs the cell for every possible combination of operation and input index, because for different input index, the cell factory could instantiate different operations (e.g., with different stride). On export, we first have best (operation, input) pairs, the select the best num_ops_per_node.

loose_end is not supported yet, because it will cause more problems (e.g., shape mismatch). We assumes loose_end to be all regardless of its configuration.

A supernet cell can’t slim its own weight to fit into a sub network, which is also a known issue.

export(memo)[source]¶: Randomly choose one to export.

resample(memo)[source]¶: Random choose one path if label is not found in memo.

class nni.retiarii.oneshot.pytorch.supermodule.sampling.PathSamplingInput(n_candidates, n_chosen, reduction_type, label)[source]¶

Mixed input. Take a list of tensor as input, select some of them and return the sum.

_sampled¶

Sampled input indices.

Type: int or list of int

export(memo)[source]¶: Random choose one name if label isn’t found in memo.

reduction(items, sampled)[source]¶

Override this to implement customized reduction.

resample(memo)[source]¶: Random choose one path / multiple paths if label is not found in memo. If one path is selected, only one integer will be in self._sampled. If multiple paths are selected, a list will be in self._sampled.

class nni.retiarii.oneshot.pytorch.supermodule.sampling.PathSamplingLayer(paths, label)[source]¶

Mixed layer, in which fprop is decided by exactly one inner layer or sum of multiple (sampled) layers. If multiple modules are selected, the result will be summed and returned.

_sampled¶

Sampled module indices.

Type: int or list of str

label¶

Name of the choice.

Type: str

export(memo)[source]¶: Random choose one name if label isn’t found in memo.

reduction(items, sampled)[source]¶

Override this to implement customized reduction.

resample(memo)[source]¶: Random choose one path if label is not found in memo.

class nni.retiarii.oneshot.pytorch.supermodule.sampling.PathSamplingRepeat(blocks, depth)[source]¶

Implementaion of Repeat in a path-sampling supernet. Samples one / some of the prefixes of the repeated blocks.

_sampled¶

Sampled depth.

Type: int or list of int

export(memo)[source]¶: Random choose one if every choice not in memo.

reduction(items, sampled)[source]¶

Override this to implement customized reduction.

resample(memo)[source]¶: Since depth is based on ValueChoice, we only need to randomly sample every leaf value choices.

supermodule.proxyless¶

Implementation of ProxylessNAS: a hyrbid approach between differentiable and sampling. The support remains limited. Known limitations include:

No support for multiple arguments in forward.
No support for mixed-operation (value choice).
The code contains duplicates. Needs refactor.

class nni.retiarii.oneshot.pytorch.supermodule.proxyless.ProxylessMixedInput(n_candidates, n_chosen, alpha, softmax, label)[source]¶

Proxyless version of differentiable input choice. See ProxylessLayerChoice for implementation details.

export(memo)[source]¶: Chose the argmax if label isn’t found in memo.

resample(memo)[source]¶: Sample one path based on alpha if label is not found in memo.

class nni.retiarii.oneshot.pytorch.supermodule.proxyless.ProxylessMixedLayer(paths, alpha, softmax, label)[source]¶

Proxyless version of differentiable mixed layer. It resamples a single-path every time, rather than go through the softmax.

export(memo)[source]¶: Chose the argmax if label isn’t found in memo.

resample(memo)[source]¶: Sample one path based on alpha if label is not found in memo.

supermodule.operation¶

Operations that support weight sharing at a fine-grained level, which is commonly known as super-kernel (as in channel search), or weight entanglement.

class nni.retiarii.oneshot.pytorch.supermodule.operation.MixedBatchNorm2d(module_kwargs)[source]¶

Mixed BatchNorm2d operation.

Supported arguments are:

num_features
eps (only supported in path sampling)
momentum (only supported in path sampling)

For path-sampling, prefix of weight, bias, running_mean and running_var are sliced. For weighted cases, the maximum num_features is used directly.

Momentum is required to be float. PyTorch BatchNorm supports a case where momentum can be none, which is not supported here.

bound_type¶: alias of torch.nn.modules.batchnorm.BatchNorm2d

class nni.retiarii.oneshot.pytorch.supermodule.operation.MixedConv2d(module_kwargs)[source]¶

Mixed conv2d op.

Supported arguments are:

in_channels
out_channels
groups
stride (only supported in path sampling)
kernel_size
padding
dilation (only supported in path sampling)

padding will be the “max” padding in differentiable mode.

Mutable groups is NOT supported in most cases of differentiable mode. However, we do support one special case when the group number is proportional to in_channels and out_channels. This is often the case of depth-wise convolutions.

For channels, prefix will be sliced. For kernels, we take the small kernel from the center and round it to floor (left top). For example

max_kernel = 5*5, sampled_kernel = 3*3, then we take [1: 4]
max_kernel = 5*5, sampled_kernel = 2*2, then we take [1: 3]
□ □ □ □ □   □ □ □ □ □
□ ■ ■ ■ □   □ ■ ■ □ □
□ ■ ■ ■ □   □ ■ ■ □ □
□ ■ ■ ■ □   □ □ □ □ □
□ □ □ □ □   □ □ □ □ □

bound_type¶: alias of torch.nn.modules.conv.Conv2d

class nni.retiarii.oneshot.pytorch.supermodule.operation.MixedLinear(module_kwargs)[source]¶

Mixed linear operation.

Supported arguments are:

in_features
out_features

Prefix of weight and bias will be sliced.

bound_type¶: alias of torch.nn.modules.linear.Linear

class nni.retiarii.oneshot.pytorch.supermodule.operation.MixedMultiHeadAttention(module_kwargs)[source]¶

Mixed multi-head attention.

Supported arguments are:

embed_dim
num_heads (only supported in path sampling)
kdim
vdim
dropout (only supported in path sampling)

At init, it constructs the largest possible Q, K, V dimension. At forward, it slices the prefix to weight matrices according to the sampled value. For in_proj_bias and in_proj_weight, three parts will be sliced and concatenated together: [0, embed_dim), [max_embed_dim, max_embed_dim + embed_dim), [max_embed_dim * 2, max_embed_dim * 2 + embed_dim).

Warning

All candidates of embed_dim should be divisible by all candidates of num_heads.

bound_type¶: alias of torch.nn.modules.activation.MultiheadAttention

class nni.retiarii.oneshot.pytorch.supermodule.operation.MixedOperation(module_kwargs)[source]¶

This is the base class for all mixed operations. It’s what you should inherit to support a new operation with ValueChoice.

It contains commonly used utilities that will ease the effort to write customized mixed oeprations, i.e., operations with ValueChoice in its arguments. To customize, please write your own mixed operation, and add the hook into mutation_hooks parameter when using the strategy.

By design, for a mixed operation to work in a specific algorithm, at least two classes are needed.

One class needs to inherit this class, to control operation-related behavior, such as how to initialize the operation such that the sampled operation can be its sub-operation.
The other one needs to inherit MixedOperationSamplingPolicy, which controls algo-related behavior, such as sampling.

The two classes are linked with sampling_policy attribute in MixedOperation, whose type is set via mixed_op_sampling in mutate_kwargs when MixedOperation.mutate() is called.

With this design, one mixed-operation (e.g., MixedConv2d) can work in multiple algorithms (e.g., both DARTS and ENAS), saving the engineering effort to rewrite all operations for each specific algo.

This class should also define a bound_type, to control the matching type in mutate, an argument_list, to control which arguments can be dynamically used in forward. This list will also be used in mutate for sanity check.

export(memo)[source]¶: Delegates to MixedOperationSamplingPolicy.export().

forward(*args, **kwargs)[source]¶: First get sampled arguments, then forward with the sampled arguments (by calling forward_with_args).

forward_argument(name)[source]¶

Get the argument used in forward. This if often related to algo. We redirect this to sampling policy.

forward_with_args(*args, **kwargs)[source]¶: To control real fprop. The accepted arguments are argument_list, appended by forward arguments in the bound_type.

classmethod mutate(module, name, memo, mutate_kwargs)[source]¶: Find value choice in module’s arguments and replace the whole module

resample(memo)[source]¶: Delegates to MixedOperationSamplingPolicy.resample().

super_init_argument(name, value_choice)[source]¶

Get the initialization argument when constructing super-kernel, i.e., calling super().__init__(). This is often related to specific operator, rather than algo.

For example:

def super_init_argument(self, name, value_choice):
    return max(value_choice.candidates)

class nni.retiarii.oneshot.pytorch.supermodule.operation.MixedOperationSamplingPolicy(operation, memo, mutate_kwargs)[source]¶

Algo-related part for mixed Operation.

MixedOperation delegates its resample and export to this policy (or its subclass), so that one Operation can be easily combined with different kinds of sampling.

One SamplingStrategy corresponds to one mixed operation.

export(operation, memo)[source]¶

The handler of MixedOperation.export().

forward_argument(operation, name)[source]¶

Computing the argument with name used in operation’s forward. Usually a value, or a distribution of value.

resample(operation, memo)[source]¶

The handler of MixedOperation.resample().