Strategy¶
Multi-trial Strategy¶
Random¶
- class nni.retiarii.strategy.Random(variational=False, dedup=True, model_filter=None)[source]¶
Random search on the search space.
- Parameters
variational (bool) – Do not dry run to get the full search space. Used when the search space has variational size or candidates. Default: false.
dedup (bool) – Do not try the same configuration twice. When variational is true, deduplication is not supported. Default: true.
model_filter (Callable[[Model], bool]) – Feed the model and return a bool. This will filter the models in search space and select which to submit.
GridSearch¶
RegularizedEvolution¶
- class nni.retiarii.strategy.RegularizedEvolution(optimize_mode='maximize', population_size=100, sample_size=25, cycles=20000, mutation_prob=0.05, on_failure='ignore', model_filter=None)[source]¶
Algorithm for regularized evolution (i.e. aging evolution). Follows “Algorithm 1” in Real et al. “Regularized Evolution for Image Classifier Architecture Search”.
- Parameters
optimize_mode (str) – Can be one of “maximize” and “minimize”. Default: maximize.
population_size (int) – The number of individuals to keep in the population. Default: 100.
cycles (int) – The number of cycles (trials) the algorithm should run for. Default: 20000.
sample_size (int) – The number of individuals that should participate in each tournament. Default: 25.
mutation_prob (float) – Probability that mutation happens in each dim. Default: 0.05
on_failure (str) – Can be one of “ignore” and “worst”. If “ignore”, simply give up the model and find a new one. If “worst”, mark the model as -inf (if maximize, inf if minimize), so that the algorithm “learns” to avoid such model. Default: ignore.
model_filter (Callable[[Model], bool]) – Feed the model and return a bool. This will filter the models in search space and select which to submit.
TPE¶
- class nni.retiarii.strategy.TPE[source]¶
The Tree-structured Parzen Estimator (TPE) is a sequential model-based optimization (SMBO) approach.
Find the details in Algorithms for Hyper-Parameter Optimization.
SMBO methods sequentially construct models to approximate the performance of hyperparameters based on historical measurements, and then subsequently choose new hyperparameters to test based on this model.
PolicyBasedRL¶
- class nni.retiarii.strategy.PolicyBasedRL(max_collect=100, trial_per_collect=20, policy_fn=None)[source]¶
Algorithm for policy-based reinforcement learning. This is a wrapper of algorithms provided in tianshou (PPO by default), and can be easily customized with other algorithms that inherit
BasePolicy
(e.g., REINFORCE as in this paper).- Parameters
max_collect (int) – How many times collector runs to collect trials for RL. Default 100.
trial_per_collect (int) – How many trials (trajectories) each time collector collects. After each collect, trainer will sample batch from replay buffer and do the update. Default: 20.
policy_fn (function) – Takes
ModelEvaluationEnv
as input and return a policy. SeePolicyBasedRL._default_policy_fn()
for an example.
One-shot Strategy¶
Note
The usage of one-shot has been refreshed in v2.8. Please see legacy one-shot trainers for the old-style one-shot strategies.
DARTS¶
- class nni.retiarii.strategy.DARTS(**kwargs)[source]¶
Continuous relaxation of the architecture representation, allowing efficient search of the architecture using gradient descent. Reference.
DARTS algorithm is one of the most fundamental one-shot algorithm. DARTS repeats iterations, where each iteration consists of 2 training phases. The phase 1 is architecture step, in which model parameters are frozen and the architecture parameters are trained. The phase 2 is model step, in which architecture parameters are frozen and model parameters are trained.
The current implementation corresponds to DARTS (1st order) in paper. Second order (unrolled 2nd-order derivatives) is not supported yet.
New in version 2.8: Supports searching for ValueChoices on operations, with the technique described in FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions. One difference is that, in DARTS, we are using Softmax instead of GumbelSoftmax.
The supported mutation primitives of DARTS are:
nni.retiarii.nn.pytorch.ValueChoice
(only when used in Linear, Conv2d, BatchNorm2d, MultiheadAttention).
- Parameters
mutation_hooks (list[MutationHook]) –
Extra mutation hooks to support customized mutation on primitives other than built-ins.
Mutation hooks are callable that inputs an Module and returns a
BaseSuperNetModule
. They are invoked intraverse_and_mutate_submodules()
, on each submodules. For each submodule, the hook list are invoked subsequently, the later hooks can see the result from previous hooks. The modules that are processed bymutation_hooks
will be replaced by the returned module, stored innas_modules
, and be the focus of the NAS algorithm.The hook list will be appended by
default_mutation_hooks
in each one-shot module.To be more specific, the input arguments are four arguments:
a module that might be processed,
name of the module in its parent module,
a memo dict whose usage depends on the particular algorithm.
keyword arguments (configurations).
Note that the memo should be read/written by hooks. There won’t be any hooks called on root module.
The returned arguments can be also one of the three kinds:
tuple of:
BaseSuperNetModule
or None, and boolean,boolean,
BaseSuperNetModule
or None.
The boolean value is
suppress
indicates whether the following hooks should be called. When it’s true, it suppresses the subsequent hooks, and they will never be invoked. Without boolean value specified, it’s assumed to be false. If a none value appears on the place ofBaseSuperNetModule
, it means the hook suggests to keep the module unchanged, and nothing will happen.An example of mutation hook is given in
no_default_hook()
. However it’s recommended to implement mutation hooks by derivingBaseSuperNetModule
, and add its classmethodmutate
to this list.arc_learning_rate (float) – Learning rate for architecture optimizer. Default: 3.0e-4
ENAS¶
- class nni.retiarii.strategy.ENAS(**kwargs)[source]¶
RL controller learns to generate the best network on a super-net. See ENAS paper.
There are 2 steps in an epoch.
Firstly, training model parameters.
Secondly, training ENAS RL agent. The agent will produce a sample of model architecture to get the best reward.
Note
ENAS requires the evaluator to report metrics via
self.log
in itsvalidation_step
. See explanation ofreward_metric_name
for details.The supported mutation primitives of ENAS are:
nni.retiarii.nn.pytorch.ValueChoice
(only when used in Linear, Conv2d, BatchNorm2d, MultiheadAttention).
- Parameters
mutation_hooks (list[MutationHook]) –
Extra mutation hooks to support customized mutation on primitives other than built-ins.
Mutation hooks are callable that inputs an Module and returns a
BaseSuperNetModule
. They are invoked intraverse_and_mutate_submodules()
, on each submodules. For each submodule, the hook list are invoked subsequently, the later hooks can see the result from previous hooks. The modules that are processed bymutation_hooks
will be replaced by the returned module, stored innas_modules
, and be the focus of the NAS algorithm.The hook list will be appended by
default_mutation_hooks
in each one-shot module.To be more specific, the input arguments are four arguments:
a module that might be processed,
name of the module in its parent module,
a memo dict whose usage depends on the particular algorithm.
keyword arguments (configurations).
Note that the memo should be read/written by hooks. There won’t be any hooks called on root module.
The returned arguments can be also one of the three kinds:
tuple of:
BaseSuperNetModule
or None, and boolean,boolean,
BaseSuperNetModule
or None.
The boolean value is
suppress
indicates whether the following hooks should be called. When it’s true, it suppresses the subsequent hooks, and they will never be invoked. Without boolean value specified, it’s assumed to be false. If a none value appears on the place ofBaseSuperNetModule
, it means the hook suggests to keep the module unchanged, and nothing will happen.An example of mutation hook is given in
no_default_hook()
. However it’s recommended to implement mutation hooks by derivingBaseSuperNetModule
, and add its classmethodmutate
to this list.ctrl_kwargs (dict) – Optional kwargs that will be passed to
ReinforceController
.entropy_weight (float) – Weight of sample entropy loss in RL.
skip_weight (float) – Weight of skip penalty loss. See
ReinforceController
for details.baseline_decay (float) – Decay factor of reward baseline, which is used to normalize the reward in RL. At each step, the new reward baseline will be equal to
baseline_decay * baseline_old + reward * (1 - baseline_decay)
.ctrl_steps_aggregate (int) – Number of steps for which the gradients will be accumulated, before updating the weights of RL controller.
ctrl_grad_clip (float) – Gradient clipping value of controller.
reward_metric_name (str or None) – The name of the metric which is treated as reward. This will be not effective when there’s only one metric returned from evaluator. If there are multiple, by default, it will find the metric with key name
default
. If reward_metric_name is specified, it will find reward_metric_name. Otherwise it raises an exception indicating multiple metrics are found.
- class nni.retiarii.oneshot.pytorch.enas.ReinforceController(fields, lstm_size=64, lstm_num_layers=1, tanh_constant=1.5, skip_target=0.4, temperature=None, entropy_reduction='sum')[source]¶
A controller that mutates the graph with RL.
- Parameters
fields (list of ReinforceField) – List of fields to choose.
lstm_size (int) – Controller LSTM hidden units.
lstm_num_layers (int) – Number of layers for stacked LSTM.
tanh_constant (float) – Logits will be equal to
tanh_constant * tanh(logits)
. Don’t usetanh
if this value isNone
.skip_target (float) – Target probability that skipconnect (chosen by InputChoice) will appear. If the chosen number of inputs is away from the
skip_connect
, there will be a sample skip penalty which is a KL divergence added.temperature (float) – Temperature constant that divides the logits.
entropy_reduction (str) – Can be one of
sum
andmean
. How the entropy of multi-input-choice is reduced.
GumbelDARTS¶
- class nni.retiarii.strategy.GumbelDARTS(**kwargs)[source]¶
Choose the best block by using Gumbel Softmax random sampling and differentiable training. See FBNet and SNAS.
This is a DARTS-based method that uses gumbel-softmax to simulate one-hot distribution. Essentially, it tries to mimick the behavior of sampling one path on forward by gradually cool down the temperature, aiming to bridge the gap between differentiable architecture weights and discretization of architectures.
New in version 2.8: Supports searching for ValueChoices on operations, with the technique described in FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions.
The supported mutation primitives of GumbelDARTS are:
nni.retiarii.nn.pytorch.ValueChoice
(only when used in Linear, Conv2d, BatchNorm2d, MultiheadAttention).
- Parameters
mutation_hooks (list[MutationHook]) –
Extra mutation hooks to support customized mutation on primitives other than built-ins.
Mutation hooks are callable that inputs an Module and returns a
BaseSuperNetModule
. They are invoked intraverse_and_mutate_submodules()
, on each submodules. For each submodule, the hook list are invoked subsequently, the later hooks can see the result from previous hooks. The modules that are processed bymutation_hooks
will be replaced by the returned module, stored innas_modules
, and be the focus of the NAS algorithm.The hook list will be appended by
default_mutation_hooks
in each one-shot module.To be more specific, the input arguments are four arguments:
a module that might be processed,
name of the module in its parent module,
a memo dict whose usage depends on the particular algorithm.
keyword arguments (configurations).
Note that the memo should be read/written by hooks. There won’t be any hooks called on root module.
The returned arguments can be also one of the three kinds:
tuple of:
BaseSuperNetModule
or None, and boolean,boolean,
BaseSuperNetModule
or None.
The boolean value is
suppress
indicates whether the following hooks should be called. When it’s true, it suppresses the subsequent hooks, and they will never be invoked. Without boolean value specified, it’s assumed to be false. If a none value appears on the place ofBaseSuperNetModule
, it means the hook suggests to keep the module unchanged, and nothing will happen.An example of mutation hook is given in
no_default_hook()
. However it’s recommended to implement mutation hooks by derivingBaseSuperNetModule
, and add its classmethodmutate
to this list.gumbel_temperature (float) – The initial temperature used in gumbel-softmax.
use_temp_anneal (bool) – If true, a linear annealing will be applied to
gumbel_temperature
. Otherwise, run at a fixed temperature. See SNAS for details.min_temp (float) – The minimal temperature for annealing. No need to set this if you set
use_temp_anneal
False.arc_learning_rate (float) – Learning rate for architecture optimizer. Default: 3.0e-4
RandomOneShot¶
- class nni.retiarii.strategy.RandomOneShot(**kwargs)[source]¶
Train a super-net with uniform path sampling. See reference.
In each epoch, model parameters are trained after a uniformly random sampling of each choice. Notably, the exporting result is also a random sample of the search space.
The supported mutation primitives of RandomOneShot are:
nni.retiarii.nn.pytorch.ValueChoice
(only when used in Linear, Conv2d, BatchNorm2d, MultiheadAttention).
- Parameters
mutation_hooks (list[MutationHook]) –
Extra mutation hooks to support customized mutation on primitives other than built-ins.
Mutation hooks are callable that inputs an Module and returns a
BaseSuperNetModule
. They are invoked intraverse_and_mutate_submodules()
, on each submodules. For each submodule, the hook list are invoked subsequently, the later hooks can see the result from previous hooks. The modules that are processed bymutation_hooks
will be replaced by the returned module, stored innas_modules
, and be the focus of the NAS algorithm.The hook list will be appended by
default_mutation_hooks
in each one-shot module.To be more specific, the input arguments are four arguments:
a module that might be processed,
name of the module in its parent module,
a memo dict whose usage depends on the particular algorithm.
keyword arguments (configurations).
Note that the memo should be read/written by hooks. There won’t be any hooks called on root module.
The returned arguments can be also one of the three kinds:
tuple of:
BaseSuperNetModule
or None, and boolean,boolean,
BaseSuperNetModule
or None.
The boolean value is
suppress
indicates whether the following hooks should be called. When it’s true, it suppresses the subsequent hooks, and they will never be invoked. Without boolean value specified, it’s assumed to be false. If a none value appears on the place ofBaseSuperNetModule
, it means the hook suggests to keep the module unchanged, and nothing will happen.An example of mutation hook is given in
no_default_hook()
. However it’s recommended to implement mutation hooks by derivingBaseSuperNetModule
, and add its classmethodmutate
to this list.
Proxyless¶
- class nni.retiarii.strategy.Proxyless(**kwargs)[source]¶
A low-memory-consuming optimized version of differentiable architecture search. See reference.
This is a DARTS-based method that resamples the architecture to reduce memory consumption. Essentially, it samples one path on forward, and implements its own backward to update the architecture parameters based on only one path.
The supported mutation primitives of Proxyless are:
- Parameters
mutation_hooks (list[MutationHook]) –
Extra mutation hooks to support customized mutation on primitives other than built-ins.
Mutation hooks are callable that inputs an Module and returns a
BaseSuperNetModule
. They are invoked intraverse_and_mutate_submodules()
, on each submodules. For each submodule, the hook list are invoked subsequently, the later hooks can see the result from previous hooks. The modules that are processed bymutation_hooks
will be replaced by the returned module, stored innas_modules
, and be the focus of the NAS algorithm.The hook list will be appended by
default_mutation_hooks
in each one-shot module.To be more specific, the input arguments are four arguments:
a module that might be processed,
name of the module in its parent module,
a memo dict whose usage depends on the particular algorithm.
keyword arguments (configurations).
Note that the memo should be read/written by hooks. There won’t be any hooks called on root module.
The returned arguments can be also one of the three kinds:
tuple of:
BaseSuperNetModule
or None, and boolean,boolean,
BaseSuperNetModule
or None.
The boolean value is
suppress
indicates whether the following hooks should be called. When it’s true, it suppresses the subsequent hooks, and they will never be invoked. Without boolean value specified, it’s assumed to be false. If a none value appears on the place ofBaseSuperNetModule
, it means the hook suggests to keep the module unchanged, and nothing will happen.An example of mutation hook is given in
no_default_hook()
. However it’s recommended to implement mutation hooks by derivingBaseSuperNetModule
, and add its classmethodmutate
to this list.arc_learning_rate (float) – Learning rate for architecture optimizer. Default: 3.0e-4
Customization¶
Multi-trial¶
- class nni.retiarii.Sampler[source]
Handles Mutator.choice() calls.
One-shot¶
base_lightning¶
- class nni.retiarii.oneshot.pytorch.base_lightning.BaseOneShotLightningModule(model, mutation_hooks=None)[source]¶
The base class for all one-shot NAS modules.
In NNI, we try to separate the “search” part and “training” part in one-shot NAS. The “training” part is defined with evaluator interface (has to be lightning evaluator interface to work with oneshot). Since the lightning evaluator has already broken down the training into minimal building blocks, we can re-assemble them after combining them with the “search” part of a particular algorithm.
After the re-assembling, this module has defined all the search + training. The experiment can use a lightning trainer (which is another part in the evaluator) to train this module, so as to complete the search process.
Essential function such as preprocessing user’s model, redirecting lightning hooks for user’s model, configuring optimizers and exporting NAS result are implemented in this class.
- nas_modules¶
Modules that have been mutated, which the search algorithms should care about.
- Type
list[BaseSuperNetModule]
- model¶
PyTorch lightning module. A model space with training recipe defined (wrapped by LightningModule in evaluator).
- Type
pl.LightningModule
- Parameters
inner_module (pytorch_lightning.LightningModule) – It’s a LightningModule that defines computations, train/val loops, optimizers in a single class. When used in NNI, the
inner_module
is the combination of instances of evaluator + base model (to be precise, a base model wrapped with LightningModule in evaluator).mutation_hooks (list[MutationHook]) –
Extra mutation hooks to support customized mutation on primitives other than built-ins.
Mutation hooks are callable that inputs an Module and returns a
BaseSuperNetModule
. They are invoked intraverse_and_mutate_submodules()
, on each submodules. For each submodule, the hook list are invoked subsequently, the later hooks can see the result from previous hooks. The modules that are processed bymutation_hooks
will be replaced by the returned module, stored innas_modules
, and be the focus of the NAS algorithm.The hook list will be appended by
default_mutation_hooks
in each one-shot module.To be more specific, the input arguments are four arguments:
a module that might be processed,
name of the module in its parent module,
a memo dict whose usage depends on the particular algorithm.
keyword arguments (configurations).
Note that the memo should be read/written by hooks. There won’t be any hooks called on root module.
The returned arguments can be also one of the three kinds:
tuple of:
BaseSuperNetModule
or None, and boolean,boolean,
BaseSuperNetModule
or None.
The boolean value is
suppress
indicates whether the following hooks should be called. When it’s true, it suppresses the subsequent hooks, and they will never be invoked. Without boolean value specified, it’s assumed to be false. If a none value appears on the place ofBaseSuperNetModule
, it means the hook suggests to keep the module unchanged, and nothing will happen.An example of mutation hook is given in
no_default_hook()
. However it’s recommended to implement mutation hooks by derivingBaseSuperNetModule
, and add its classmethodmutate
to this list.
- architecture_optimizers()[source]¶
Get architecture optimizers from all optimizers. Use this to get your architecture optimizers in
training_step()
.- Returns
opts – Architecture optimizers defined in
configure_architecture_optimizers()
. This will be None if there is no architecture optimizers.- Return type
list[Optimizer], Optimizer, None
- call_lr_schedulers(batch_index)[source]¶
Function that imitates lightning trainer’s behaviour of calling user’s lr schedulers. Since auto_optimization is turned off by this class, you can use this function to make schedulers behave as they were automatically handled by the lightning trainer.
- Parameters
batch_idx (int) – batch index
- call_weight_optimizers(method)[source]¶
Function that imitates lightning trainer’s behavior of calling user’s optimizers. Since auto_optimization is turned off by this class, you can use this function to make user optimizers behave as they were automatically handled by the lightning trainer.
- Parameters
method (str) – Method to call. Only
step
andzero_grad
are supported now.
- configure_architecture_optimizers()[source]¶
Hook kept for subclasses. A specific NAS method inheriting this base class should return its architecture optimizers here if architecture parameters are needed. Note that lr schedulers are not supported now for architecture_optimizers.
- Returns
arc_optimizers – Optimizers used by a specific NAS algorithm. Return None if no architecture optimizers are needed.
- Return type
list[Optimizer], Optimizer
- configure_optimizers()[source]¶
Combine architecture optimizers and user’s model optimizers. You can overwrite
configure_architecture_optimizers()
if architecture optimizers are needed in your NAS algorithm.For now
model
is tested against evaluators innni.retiarii.evaluator.pytorch.lightning
and it only returns 1 optimizer. But for extendibility, codes for other return value types are also implemented.
- export()[source]¶
Export the NAS result, ideally the best choice of each
nas_modules
. You may implement anexport
method for your customizednas_modules
.- Returns
Keys are names of
nas_modules
, and values are the choice indices of them.- Return type
dict
- resample()[source]¶
Trigger the resample for each
nas_modules
. Sometimes (e.g., in differentiable cases), it does nothing.- Returns
Sampled architecture.
- Return type
dict
- search_space_spec()[source]¶
Get the search space specification from
nas_modules
.- Returns
Key is the name of the choice, value is the corresponding
ParameterSpec
.- Return type
dict
- training_step(batch, batch_idx)[source]¶
This is the implementation of what happens in training loops of one-shot algos. It usually calls
self.model.training_step
which implements the real training recipe of the users’ model.
- weight_optimizers()[source]¶
Get user optimizers from all optimizers. Use this to get user optimizers in
training_step()
.- Returns
opts – Optimizers defined by user’s model. This will be None if there is no user optimizers.
- Return type
list[Optimizer], Optimizer, None
- class nni.retiarii.oneshot.pytorch.base_lightning.BaseSuperNetModule[source]¶
Mutated module in super-net. Usually, the feed-forward of the module itself is undefined. It has to be resampled with
resample()
so that a specific path is selected. (Sometimes, this is not required. For example, differentiable super-net.)A super-net module usually corresponds to one sample. But two exceptions:
A module can have multiple parameter spec. For example, a convolution-2d can sample kernel size, channels at the same time.
Multiple modules can share one parameter spec. For example, multiple layer choices with the same label.
For value choice compositions, the parameter spec are bounded to the underlying (original) value choices, rather than their compositions.
- export(memo)[source]¶
Export the final architecture within this module. It should have the same keys as
search_space_spec()
.- Parameters
memo (dict[str, Any]) – Use memo to avoid the same label gets exported multiple times.
- classmethod mutate(module, name, memo, mutate_kwargs)[source]¶
This is a mutation hook that creates a
BaseSuperNetModule
. The method should be implemented in each specific super-net module, because they usually have specific rules about what kind of modules to operate on.- Parameters
module (nn.Module) – The module to be mutated (replaced).
name (str) – Name of this module. With full prefix. For example,
module1.block1.conv
.memo (dict) – Memo to enable sharing parameters among mutated modules. It should be read and written by mutate functions themselves.
mutate_kwargs (dict) – Algo-related hyper-parameters, and some auxiliary information.
- Returns
The mutation result, along with an optional boolean flag indicating whether to suppress follow-up mutation hooks. See
BaseOneShotLightningModule
for details.- Return type
Union[BaseSuperNetModule, bool, tuple[BaseSuperNetModule, bool]]
- nni.retiarii.oneshot.pytorch.base_lightning.no_default_hook(module, name, memo, mutate_kwargs)[source]¶
Add this hook at the end of your hook list to raise error for unsupported mutation primitives.
- nni.retiarii.oneshot.pytorch.base_lightning.traverse_and_mutate_submodules(root_module, hooks, mutate_kwargs, topdown=True)[source]¶
Traverse the module-tree of
root_module
, and callhooks
on every tree node.- Parameters
root_module (nn.Module) – User-defined model space. Since this method is called in the
__init__
ofBaseOneShotLightningModule
, it’s usually apytorch_lightning.LightningModule
. The mutation will be in-place onroot_module
.hooks (list[MutationHook]) – List of mutation hooks. See
BaseOneShotLightningModule
for how to write hooks. When a hook returns an module, the module will be replaced (mutated) to the new module.mutate_kwargs (dict) – Extra keyword arguments passed to hooks.
topdown (bool, default = False) – If topdown is true, hooks are first called, before traversing its sub-module (i.e., pre-order DFS). Otherwise, sub-modules are first traversed, before calling hooks on this node (i.e., post-order DFS).
- Returns
modules – The replace result.
- Return type
dict[str, nn.Module]
dataloader¶
- class nni.retiarii.oneshot.pytorch.dataloader.ConcatLoader(loaders, mode='min_size')[source]¶
This loader is same as CombinedLoader in PyTorch-Lightning, but concatenate sub-loaders instead of loading them in parallel.
- Parameters
loaders (dict[str, Any]) –
For example,
{ "train": DataLoader(train_dataset), "val": DataLoader(val_dataset) }
In this example, the loader will first produce the batches from “train”, then “val”.
mode (str) – Only support “min_size” for now.
supermodule.differentiable¶
- class nni.retiarii.oneshot.pytorch.supermodule.differentiable.DifferentiableMixedCell(op_factory, num_nodes, num_ops_per_node, num_predecessors, preprocessor, postprocessor, concat_dim, memo, mutate_kwargs, label)[source]¶
Implementation of Cell under differentiable context.
An architecture parameter is created on each edge of the full-connected graph.
- export(memo)[source]¶
Tricky export.
Reference: https://github.com/quark0/darts/blob/f276dd346a09ae3160f8e3aca5c7b193fda1da37/cnn/model_search.py#L135 We don’t avoid selecting operations like
none
here, because it looks like a different search space.
- class nni.retiarii.oneshot.pytorch.supermodule.differentiable.DifferentiableMixedInput(n_candidates, n_chosen, alpha, softmax, label)[source]¶
Mixed input. Forward returns a weighted sum of candidates. Implementation is very similar to
DifferentiableMixedLayer
.- Parameters
n_candidates (int) – Expect number of input candidates.
n_chosen (int) – Expect numebr of inputs finally chosen.
alpha (Tensor) – Tensor that stores the “learnable” weights.
softmax (nn.Module) – Customizable softmax function. Usually
nn.Softmax(-1)
.label (str) – Name of the choice.
- label¶
Name of the choice.
- Type
str
- class nni.retiarii.oneshot.pytorch.supermodule.differentiable.DifferentiableMixedLayer(paths, alpha, softmax, label)[source]¶
Mixed layer, in which fprop is decided by a weighted sum of several layers. Proposed in DARTS: Differentiable Architecture Search.
The weight
alpha
is usually learnable, and optimized on validation dataset.Differentiable sampling layer requires all operators returning the same shape for one input, as all outputs will be weighted summed to get the final output.
- Parameters
paths (list[tuple[str, nn.Module]]) – Layers to choose from. Each is a tuple of name, and its module.
alpha (Tensor) – Tensor that stores the “learnable” weights.
softmax (nn.Module) – Customizable softmax function. Usually
nn.Softmax(-1)
.label (str) – Name of the choice.
- op_names¶
Operator names.
- Type
str
- label¶
Name of the choice.
- Type
str
- class nni.retiarii.oneshot.pytorch.supermodule.differentiable.DifferentiableMixedRepeat(blocks, depth, softmax, memo)[source]¶
Implementaion of Repeat in a differentiable supernet. Result is a weighted sum of possible prefixes, sliced by possible depths.
If the output is not a single tensor, it will be summed at every independant dimension. See
weighted_sum()
for details.
- class nni.retiarii.oneshot.pytorch.supermodule.differentiable.MixedOpDifferentiablePolicy(operation, memo, mutate_kwargs)[source]¶
Implementes the differentiable sampling in mixed operation.
One mixed operation can have multiple value choices in its arguments. Thus the
_arch_alpha
here is a parameter dict, andnamed_parameters
filters out multiple parameters with_arch_alpha
as its prefix.When this class is asked for
forward_argument
, it returns a distribution, i.e., a dict from int to float based on its weights.All the parameters (
_arch_alpha
,parameters()
,_softmax
) are saved as attributes ofoperation
, rather thanself
, because this class itself is not ann.Module
, and saved parameters here won’t be optimized.
supermodule.sampling¶
- class nni.retiarii.oneshot.pytorch.supermodule.sampling.MixedOpPathSamplingPolicy(operation, memo, mutate_kwargs)[source]¶
Implementes the path sampling in mixed operation.
One mixed operation can have multiple value choices in its arguments. Each value choice can be further decomposed into “leaf value choices”. We sample the leaf nodes, and composits them into the values on arguments.
- class nni.retiarii.oneshot.pytorch.supermodule.sampling.PathSamplingCell(op_factory, num_nodes, num_ops_per_node, num_predecessors, preprocessor, postprocessor, concat_dim, memo, mutate_kwargs, label)[source]¶
The implementation of super-net cell follows DARTS.
When
factory_used
is true, it reconstructs the cell for every possible combination of operation and input index, because for different input index, the cell factory could instantiate different operations (e.g., with different stride). On export, we first have best (operation, input) pairs, the select the bestnum_ops_per_node
.loose_end
is not supported yet, because it will cause more problems (e.g., shape mismatch). We assumesloose_end
to beall
regardless of its configuration.A supernet cell can’t slim its own weight to fit into a sub network, which is also a known issue.
- class nni.retiarii.oneshot.pytorch.supermodule.sampling.PathSamplingInput(n_candidates, n_chosen, reduction_type, label)[source]¶
Mixed input. Take a list of tensor as input, select some of them and return the sum.
- _sampled¶
Sampled input indices.
- Type
int or list of int
- class nni.retiarii.oneshot.pytorch.supermodule.sampling.PathSamplingLayer(paths, label)[source]¶
Mixed layer, in which fprop is decided by exactly one inner layer or sum of multiple (sampled) layers. If multiple modules are selected, the result will be summed and returned.
- _sampled¶
Sampled module indices.
- Type
int or list of str
- label¶
Name of the choice.
- Type
str
supermodule.proxyless¶
Implementation of ProxylessNAS: a hyrbid approach between differentiable and sampling. The support remains limited. Known limitations include:
No support for multiple arguments in forward.
No support for mixed-operation (value choice).
The code contains duplicates. Needs refactor.
- class nni.retiarii.oneshot.pytorch.supermodule.proxyless.ProxylessMixedInput(n_candidates, n_chosen, alpha, softmax, label)[source]¶
Proxyless version of differentiable input choice. See
ProxylessLayerChoice
for implementation details.
supermodule.operation¶
Operations that support weight sharing at a fine-grained level, which is commonly known as super-kernel (as in channel search), or weight entanglement.
- class nni.retiarii.oneshot.pytorch.supermodule.operation.MixedBatchNorm2d(module_kwargs)[source]¶
Mixed BatchNorm2d operation.
Supported arguments are:
num_features
eps
(only supported in path sampling)momentum
(only supported in path sampling)
For path-sampling, prefix of
weight
,bias
,running_mean
andrunning_var
are sliced. For weighted cases, the maximumnum_features
is used directly.Momentum is required to be float. PyTorch BatchNorm supports a case where momentum can be none, which is not supported here.
- bound_type¶
alias of
torch.nn.modules.batchnorm.BatchNorm2d
- class nni.retiarii.oneshot.pytorch.supermodule.operation.MixedConv2d(module_kwargs)[source]¶
Mixed conv2d op.
Supported arguments are:
in_channels
out_channels
groups
stride
(only supported in path sampling)kernel_size
padding
dilation
(only supported in path sampling)
padding
will be the “max” padding in differentiable mode.Mutable
groups
is NOT supported in most cases of differentiable mode. However, we do support one special case when the group number is proportional toin_channels
andout_channels
. This is often the case of depth-wise convolutions.For channels, prefix will be sliced. For kernels, we take the small kernel from the center and round it to floor (left top). For example
max_kernel = 5*5, sampled_kernel = 3*3, then we take [1: 4] max_kernel = 5*5, sampled_kernel = 2*2, then we take [1: 3] □ □ □ □ □ □ □ □ □ □ □ ■ ■ ■ □ □ ■ ■ □ □ □ ■ ■ ■ □ □ ■ ■ □ □ □ ■ ■ ■ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □
- bound_type¶
alias of
torch.nn.modules.conv.Conv2d
- class nni.retiarii.oneshot.pytorch.supermodule.operation.MixedLinear(module_kwargs)[source]¶
Mixed linear operation.
Supported arguments are:
in_features
out_features
Prefix of weight and bias will be sliced.
- bound_type¶
alias of
torch.nn.modules.linear.Linear
- class nni.retiarii.oneshot.pytorch.supermodule.operation.MixedMultiHeadAttention(module_kwargs)[source]¶
Mixed multi-head attention.
Supported arguments are:
embed_dim
num_heads
(only supported in path sampling)kdim
vdim
dropout
(only supported in path sampling)
At init, it constructs the largest possible Q, K, V dimension. At forward, it slices the prefix to weight matrices according to the sampled value. For
in_proj_bias
andin_proj_weight
, three parts will be sliced and concatenated together:[0, embed_dim)
,[max_embed_dim, max_embed_dim + embed_dim)
,[max_embed_dim * 2, max_embed_dim * 2 + embed_dim)
.Warning
All candidates of
embed_dim
should be divisible by all candidates ofnum_heads
.- bound_type¶
alias of
torch.nn.modules.activation.MultiheadAttention
- class nni.retiarii.oneshot.pytorch.supermodule.operation.MixedOperation(module_kwargs)[source]¶
This is the base class for all mixed operations. It’s what you should inherit to support a new operation with ValueChoice.
It contains commonly used utilities that will ease the effort to write customized mixed oeprations, i.e., operations with ValueChoice in its arguments. To customize, please write your own mixed operation, and add the hook into
mutation_hooks
parameter when using the strategy.By design, for a mixed operation to work in a specific algorithm, at least two classes are needed.
One class needs to inherit this class, to control operation-related behavior, such as how to initialize the operation such that the sampled operation can be its sub-operation.
The other one needs to inherit
MixedOperationSamplingPolicy
, which controls algo-related behavior, such as sampling.
The two classes are linked with
sampling_policy
attribute inMixedOperation
, whose type is set viamixed_op_sampling
inmutate_kwargs
whenMixedOperation.mutate()
is called.With this design, one mixed-operation (e.g., MixedConv2d) can work in multiple algorithms (e.g., both DARTS and ENAS), saving the engineering effort to rewrite all operations for each specific algo.
This class should also define a
bound_type
, to control the matching type in mutate, anargument_list
, to control which arguments can be dynamically used inforward
. This list will also be used in mutate for sanity check.- export(memo)[source]¶
Delegates to
MixedOperationSamplingPolicy.export()
.
- forward(*args, **kwargs)[source]¶
First get sampled arguments, then forward with the sampled arguments (by calling
forward_with_args
).
- forward_argument(name)[source]¶
Get the argument used in forward. This if often related to algo. We redirect this to sampling policy.
- forward_with_args(*args, **kwargs)[source]¶
To control real fprop. The accepted arguments are
argument_list
, appended by forward arguments in thebound_type
.
- classmethod mutate(module, name, memo, mutate_kwargs)[source]¶
Find value choice in module’s arguments and replace the whole module
- resample(memo)[source]¶
Delegates to
MixedOperationSamplingPolicy.resample()
.
- super_init_argument(name, value_choice)[source]¶
Get the initialization argument when constructing super-kernel, i.e., calling
super().__init__()
. This is often related to specific operator, rather than algo.For example:
def super_init_argument(self, name, value_choice): return max(value_choice.candidates)
- class nni.retiarii.oneshot.pytorch.supermodule.operation.MixedOperationSamplingPolicy(operation, memo, mutate_kwargs)[source]¶
Algo-related part for mixed Operation.
MixedOperation
delegates its resample and export to this policy (or its subclass), so that one Operation can be easily combined with different kinds of sampling.One SamplingStrategy corresponds to one mixed operation.
- export(operation, memo)[source]¶
The handler of
MixedOperation.export()
.
- forward_argument(operation, name)[source]¶
Computing the argument with
name
used in operation’s forward. Usually a value, or a distribution of value.
- resample(operation, memo)[source]¶
The handler of
MixedOperation.resample()
.