Strategy¶
Multi-trial Strategy¶
Random¶
GridSearch¶
RegularizedEvolution¶
TPE¶
PolicyBasedRL¶
One-shot Strategy¶
Note
The usage of one-shot has been refreshed in v2.8. Please see legacy one-shot trainers for the old-style one-shot strategies.
DARTS¶
ENAS¶
- class nni.retiarii.oneshot.pytorch.enas.ReinforceController(fields, lstm_size=64, lstm_num_layers=1, tanh_constant=1.5, skip_target=0.4, temperature=None, entropy_reduction='sum')[source]¶
A controller that mutates the graph with RL.
- Parameters:
fields (list of ReinforceField) – List of fields to choose.
lstm_size (int) – Controller LSTM hidden units.
lstm_num_layers (int) – Number of layers for stacked LSTM.
tanh_constant (float) – Logits will be equal to
tanh_constant * tanh(logits)
. Don’t usetanh
if this value isNone
.skip_target (float) – Target probability that skipconnect (chosen by InputChoice) will appear. If the chosen number of inputs is away from the
skip_connect
, there will be a sample skip penalty which is a KL divergence added.temperature (float) – Temperature constant that divides the logits.
entropy_reduction (str) – Can be one of
sum
andmean
. How the entropy of multi-input-choice is reduced.
GumbelDARTS¶
RandomOneShot¶
Proxyless¶
Customization¶
Multi-trial¶
- class nni.retiarii.Sampler[source]
Handles Mutator.choice() calls.
One-shot¶
base_lightning¶
- class nni.retiarii.oneshot.pytorch.base_lightning.BaseOneShotLightningModule(model, mutation_hooks=None)[source]¶
The base class for all one-shot NAS modules.
In NNI, we try to separate the “search” part and “training” part in one-shot NAS. The “training” part is defined with evaluator interface (has to be lightning evaluator interface to work with oneshot). Since the lightning evaluator has already broken down the training into minimal building blocks, we can re-assemble them after combining them with the “search” part of a particular algorithm.
After the re-assembling, this module has defined all the search + training. The experiment can use a lightning trainer (which is another part in the evaluator) to train this module, so as to complete the search process.
Essential function such as preprocessing user’s model, redirecting lightning hooks for user’s model, configuring optimizers and exporting NAS result are implemented in this class.
- nas_modules¶
Modules that have been mutated, which the search algorithms should care about.
- Type:
list[BaseSuperNetModule]
- model¶
PyTorch lightning module. A model space with training recipe defined (wrapped by LightningModule in evaluator).
- Type:
pl.LightningModule
- Parameters:
inner_module (pytorch_lightning.LightningModule) – It’s a LightningModule that defines computations, train/val loops, optimizers in a single class. When used in NNI, the
inner_module
is the combination of instances of evaluator + base model (to be precise, a base model wrapped with LightningModule in evaluator).mutation_hooks (list[MutationHook]) –
Extra mutation hooks to support customized mutation on primitives other than built-ins.
Mutation hooks are callable that inputs an Module and returns a
BaseSuperNetModule
. They are invoked intraverse_and_mutate_submodules()
, on each submodules. For each submodule, the hook list are invoked subsequently, the later hooks can see the result from previous hooks. The modules that are processed bymutation_hooks
will be replaced by the returned module, stored innas_modules
, and be the focus of the NAS algorithm.The hook list will be appended by
default_mutation_hooks
in each one-shot module.To be more specific, the input arguments are four arguments:
a module that might be processed,
name of the module in its parent module,
a memo dict whose usage depends on the particular algorithm.
keyword arguments (configurations).
Note that the memo should be read/written by hooks. There won’t be any hooks called on root module.
The returned arguments can be also one of the three kinds:
tuple of:
BaseSuperNetModule
or None, and boolean,boolean,
BaseSuperNetModule
or None.
The boolean value is
suppress
indicates whether the following hooks should be called. When it’s true, it suppresses the subsequent hooks, and they will never be invoked. Without boolean value specified, it’s assumed to be false. If a none value appears on the place ofBaseSuperNetModule
, it means the hook suggests to keep the module unchanged, and nothing will happen.An example of mutation hook is given in
no_default_hook()
. However it’s recommended to implement mutation hooks by derivingBaseSuperNetModule
, and add its classmethodmutate
to this list.
- advance_lr_schedulers(batch_idx)[source]¶
Advance the learning rates, when manual optimization is turned on.
The full implementation is here. We only include a partial implementation here. Advanced features like Reduce-lr-on-plateau are not supported.
- advance_optimization(loss, batch_idx, gradient_clip_val=None, gradient_clip_algorithm=None)[source]¶
Run the optimizer defined in evaluators, when manual optimization is turned on.
Call this method when the model should be optimized. To keep it as neat as possible, we only implement the basic
zero_grad
,backward
,grad_clip
, andstep
here. Many hooks and pre/post-processing are omitted. Inherit this method if you need more advanced behavior.The full optimizer step could be found here. We only implement part of the optimizer loop here.
- Parameters:
batch_idx (int) – The current batch index.
- architecture_optimizers()[source]¶
Get the optimizers configured in
configure_architecture_optimizers()
.
- configure_architecture_optimizers()[source]¶
Hook kept for subclasses. A specific NAS method inheriting this base class should return its architecture optimizers here if architecture parameters are needed. Note that lr schedulers are not supported now for architecture_optimizers.
- Return type:
Optimizers used by a specific NAS algorithm. Return None if no architecture optimizers are needed.
- configure_optimizers()[source]¶
Transparently configure optimizers for the inner model, unless one-shot algorithm has its own optimizer (via
configure_architecture_optimizers()
), in which case, the optimizer will be appended to the list.The return value is still one of the 6 types defined in PyTorch-Lightning.
- export()[source]¶
Export the NAS result, ideally the best choice of each
nas_modules
. You may implement anexport
method for your customizednas_modules
.- Returns:
Keys are names of
nas_modules
, and values are the choice indices of them.- Return type:
dict
- export_probs()[source]¶
Export the probability of every choice in the search space got chosen.
Note
If such method of some modules is not implemented, they will be simply ignored.
- Returns:
In most cases, keys are names of
nas_modules
suffixed with/
and choice name. Values are the probability / logits depending on the implementation.- Return type:
dict
- resample(memo=None)[source]¶
Trigger the resample for each
nas_modules
. Sometimes (e.g., in differentiable cases), it does nothing.- Parameters:
memo (dict[str, Any]) – Used to ensure the consistency of samples with the same label.
- Returns:
Sampled architecture.
- Return type:
dict
- search_space_spec()[source]¶
Get the search space specification from
nas_modules
.- Returns:
Key is the name of the choice, value is the corresponding
ParameterSpec
.- Return type:
dict
- class nni.retiarii.oneshot.pytorch.base_lightning.BaseSuperNetModule[source]¶
Mutated module in super-net. Usually, the feed-forward of the module itself is undefined. It has to be resampled with
resample()
so that a specific path is selected. (Sometimes, this is not required. For example, differentiable super-net.)A super-net module usually corresponds to one sample. But two exceptions:
A module can have multiple parameter spec. For example, a convolution-2d can sample kernel size, channels at the same time.
Multiple modules can share one parameter spec. For example, multiple layer choices with the same label.
For value choice compositions, the parameter spec are bounded to the underlying (original) value choices, rather than their compositions.
- export(memo)[source]¶
Export the final architecture within this module. It should have the same keys as
search_space_spec()
.- Parameters:
memo (dict[str, Any]) – Use memo to avoid the same label gets exported multiple times.
- export_probs(memo)[source]¶
Export the probability / logits of every choice got chosen.
- Parameters:
memo (dict[str, Any]) – Use memo to avoid the same label gets exported multiple times.
- classmethod mutate(module, name, memo, mutate_kwargs)[source]¶
This is a mutation hook that creates a
BaseSuperNetModule
. The method should be implemented in each specific super-net module, because they usually have specific rules about what kind of modules to operate on.- Parameters:
module (nn.Module) – The module to be mutated (replaced).
name (str) – Name of this module. With full prefix. For example,
module1.block1.conv
.memo (dict) – Memo to enable sharing parameters among mutated modules. It should be read and written by mutate functions themselves.
mutate_kwargs (dict) – Algo-related hyper-parameters, and some auxiliary information.
- Returns:
The mutation result, along with an optional boolean flag indicating whether to suppress follow-up mutation hooks. See
BaseOneShotLightningModule
for details.- Return type:
Union[BaseSuperNetModule, bool, tuple[BaseSuperNetModule, bool]]
- nni.retiarii.oneshot.pytorch.base_lightning.no_default_hook(module, name, memo, mutate_kwargs)[source]¶
Add this hook at the end of your hook list to raise error for unsupported mutation primitives.
- nni.retiarii.oneshot.pytorch.base_lightning.traverse_and_mutate_submodules(root_module, hooks, mutate_kwargs, topdown=True)[source]¶
Traverse the module-tree of
root_module
, and callhooks
on every tree node.- Parameters:
root_module (nn.Module) – User-defined model space. Since this method is called in the
__init__
ofBaseOneShotLightningModule
, it’s usually apytorch_lightning.LightningModule
. The mutation will be in-place onroot_module
.hooks (list[MutationHook]) – List of mutation hooks. See
BaseOneShotLightningModule
for how to write hooks. When a hook returns an module, the module will be replaced (mutated) to the new module.mutate_kwargs (dict) – Extra keyword arguments passed to hooks.
topdown (bool, default = False) – If topdown is true, hooks are first called, before traversing its sub-module (i.e., pre-order DFS). Otherwise, sub-modules are first traversed, before calling hooks on this node (i.e., post-order DFS).
- Returns:
modules – The replace result.
- Return type:
dict[str, nn.Module]
dataloader¶
- class nni.retiarii.oneshot.pytorch.dataloader.ConcatLoader(loaders, mode='min_size')[source]¶
This loader is same as CombinedLoader in PyTorch-Lightning, but concatenate sub-loaders instead of loading them in parallel.
- Parameters:
loaders (dict[str, Any]) –
For example,
{ "train": DataLoader(train_dataset), "val": DataLoader(val_dataset) }
In this example, the loader will first produce the batches from “train”, then “val”.
mode (str) – Only support “min_size” for now.
supermodule.differentiable¶
- class nni.retiarii.oneshot.pytorch.supermodule.differentiable.DifferentiableMixedCell(op_factory, num_nodes, num_ops_per_node, num_predecessors, preprocessor, postprocessor, concat_dim, memo, mutate_kwargs, label)[source]¶
Implementation of Cell under differentiable context.
Similar to PathSamplingCell, this cell only handles cells of specific kinds (e.g., with loose end).
An architecture parameter is created on each edge of the full-connected graph.
- class nni.retiarii.oneshot.pytorch.supermodule.differentiable.DifferentiableMixedInput(n_candidates, n_chosen, alpha, softmax, label)[source]¶
Mixed input. Forward returns a weighted sum of candidates. Implementation is very similar to
DifferentiableMixedLayer
.- Parameters:
n_candidates (int) – Expect number of input candidates.
n_chosen (int) – Expect numebr of inputs finally chosen.
alpha (Tensor) – Tensor that stores the “learnable” weights.
softmax (nn.Module) – Customizable softmax function. Usually
nn.Softmax(-1)
.label (str) – Name of the choice.
- label¶
Name of the choice.
- Type:
str
- class nni.retiarii.oneshot.pytorch.supermodule.differentiable.DifferentiableMixedLayer(paths, alpha, softmax, label)[source]¶
Mixed layer, in which fprop is decided by a weighted sum of several layers. Proposed in DARTS: Differentiable Architecture Search.
The weight
alpha
is usually learnable, and optimized on validation dataset.Differentiable sampling layer requires all operators returning the same shape for one input, as all outputs will be weighted summed to get the final output.
- Parameters:
paths (list[tuple[str, nn.Module]]) – Layers to choose from. Each is a tuple of name, and its module.
alpha (Tensor) – Tensor that stores the “learnable” weights.
softmax (nn.Module) – Customizable softmax function. Usually
nn.Softmax(-1)
.label (str) – Name of the choice.
- op_names¶
Operator names.
- Type:
str
- label¶
Name of the choice.
- Type:
str
- class nni.retiarii.oneshot.pytorch.supermodule.differentiable.DifferentiableMixedRepeat(blocks, depth, softmax, memo)[source]¶
Implementaion of Repeat in a differentiable supernet. Result is a weighted sum of possible prefixes, sliced by possible depths.
If the output is not a single tensor, it will be summed at every independant dimension. See
weighted_sum()
for details.
- class nni.retiarii.oneshot.pytorch.supermodule.differentiable.GumbelSoftmax(dim=-1)[source]¶
Wrapper of
F.gumbel_softmax
. dim = -1 by default.
- class nni.retiarii.oneshot.pytorch.supermodule.differentiable.MixedOpDifferentiablePolicy(operation, memo, mutate_kwargs)[source]¶
Implementes the differentiable sampling in mixed operation.
One mixed operation can have multiple value choices in its arguments. Thus the
_arch_alpha
here is a parameter dict, andnamed_parameters
filters out multiple parameters with_arch_alpha
as its prefix.When this class is asked for
forward_argument
, it returns a distribution, i.e., a dict from int to float based on its weights.All the parameters (
_arch_alpha
,parameters()
,_softmax
) are saved as attributes ofoperation
, rather thanself
, because this class itself is not ann.Module
, and saved parameters here won’t be optimized.
supermodule.sampling¶
- class nni.retiarii.oneshot.pytorch.supermodule.sampling.MixedOpPathSamplingPolicy(operation, memo, mutate_kwargs)[source]¶
Implements the path sampling in mixed operation.
One mixed operation can have multiple value choices in its arguments. Each value choice can be further decomposed into “leaf value choices”. We sample the leaf nodes, and composits them into the values on arguments.
- class nni.retiarii.oneshot.pytorch.supermodule.sampling.PathSamplingCell(op_factory, num_nodes, num_ops_per_node, num_predecessors, preprocessor, postprocessor, concat_dim, memo, mutate_kwargs, label)[source]¶
The implementation of super-net cell follows DARTS.
When
factory_used
is true, it reconstructs the cell for every possible combination of operation and input index, because for different input index, the cell factory could instantiate different operations (e.g., with different stride). On export, we first have best (operation, input) pairs, the select the bestnum_ops_per_node
.loose_end
is not supported yet, because it will cause more problems (e.g., shape mismatch). We assumesloose_end
to beall
regardless of its configuration.A supernet cell can’t slim its own weight to fit into a sub network, which is also a known issue.
- class nni.retiarii.oneshot.pytorch.supermodule.sampling.PathSamplingInput(n_candidates, n_chosen, reduction_type, label)[source]¶
Mixed input. Take a list of tensor as input, select some of them and return the sum.
- _sampled¶
Sampled input indices.
- Type:
int or list of int
- class nni.retiarii.oneshot.pytorch.supermodule.sampling.PathSamplingLayer(paths, label)[source]¶
Mixed layer, in which fprop is decided by exactly one inner layer or sum of multiple (sampled) layers. If multiple modules are selected, the result will be summed and returned.
- _sampled¶
Sampled module indices.
- Type:
int or list of str
- label¶
Name of the choice.
- Type:
str
supermodule.proxyless¶
- class nni.retiarii.oneshot.pytorch.supermodule.proxyless.ProxylessMixedInput(n_candidates, n_chosen, alpha, softmax, label)[source]¶
Proxyless version of differentiable input choice. See
ProxylessMixedLayer
for implementation details.
supermodule.operation¶
- class nni.retiarii.oneshot.pytorch.supermodule.operation.MixedBatchNorm2d(module_kwargs)[source]¶
Mixed BatchNorm2d operation.
Supported arguments are:
num_features
eps
(only supported in path sampling)momentum
(only supported in path sampling)
For path-sampling, prefix of
weight
,bias
,running_mean
andrunning_var
are sliced. For weighted cases, the maximumnum_features
is used directly.Momentum is required to be float. PyTorch BatchNorm supports a case where momentum can be none, which is not supported here.
- bound_type¶
alias of
BatchNorm2d
- class nni.retiarii.oneshot.pytorch.supermodule.operation.MixedConv2d(module_kwargs)[source]¶
Mixed conv2d op.
Supported arguments are:
in_channels
out_channels
groups
stride
(only supported in path sampling)kernel_size
padding
dilation
(only supported in path sampling)
padding
will be the “max” padding in differentiable mode.Mutable
groups
is NOT supported in most cases of differentiable mode. However, we do support one special case when the group number is proportional toin_channels
andout_channels
. This is often the case of depth-wise convolutions.For channels, prefix will be sliced. For kernels, we take the small kernel from the center and round it to floor (left top). For example
max_kernel = 5*5, sampled_kernel = 3*3, then we take [1: 4] max_kernel = 5*5, sampled_kernel = 2*2, then we take [1: 3] □ □ □ □ □ □ □ □ □ □ □ ■ ■ ■ □ □ ■ ■ □ □ □ ■ ■ ■ □ □ ■ ■ □ □ □ ■ ■ ■ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □ □
- bound_type¶
alias of
Conv2d
- class nni.retiarii.oneshot.pytorch.supermodule.operation.MixedLayerNorm(module_kwargs)[source]¶
Mixed LayerNorm operation.
Supported arguments are:
normalized_shape
eps
(only supported in path sampling)
For path-sampling, prefix of
weight
andbias
are sliced. For weighted cases, the maximumnormalized_shape
is used directly.eps is required to be float.
- bound_type¶
alias of
LayerNorm
- class nni.retiarii.oneshot.pytorch.supermodule.operation.MixedLinear(module_kwargs)[source]¶
Mixed linear operation.
Supported arguments are:
in_features
out_features
Prefix of weight and bias will be sliced.
- bound_type¶
alias of
Linear
- class nni.retiarii.oneshot.pytorch.supermodule.operation.MixedMultiHeadAttention(module_kwargs)[source]¶
Mixed multi-head attention.
Supported arguments are:
embed_dim
num_heads
(only supported in path sampling)kdim
vdim
dropout
(only supported in path sampling)
At init, it constructs the largest possible Q, K, V dimension. At forward, it slices the prefix to weight matrices according to the sampled value. For
in_proj_bias
andin_proj_weight
, three parts will be sliced and concatenated together:[0, embed_dim)
,[max_embed_dim, max_embed_dim + embed_dim)
,[max_embed_dim * 2, max_embed_dim * 2 + embed_dim)
.Warning
All candidates of
embed_dim
should be divisible by all candidates ofnum_heads
.- bound_type¶
alias of
MultiheadAttention
- class nni.retiarii.oneshot.pytorch.supermodule.operation.MixedOperation(module_kwargs)[source]¶
This is the base class for all mixed operations. It’s what you should inherit to support a new operation with ValueChoice.
It contains commonly used utilities that will ease the effort to write customized mixed oeprations, i.e., operations with ValueChoice in its arguments. To customize, please write your own mixed operation, and add the hook into
mutation_hooks
parameter when using the strategy.By design, for a mixed operation to work in a specific algorithm, at least two classes are needed.
One class needs to inherit this class, to control operation-related behavior, such as how to initialize the operation such that the sampled operation can be its sub-operation.
The other one needs to inherit
MixedOperationSamplingPolicy
, which controls algo-related behavior, such as sampling.
The two classes are linked with
sampling_policy
attribute inMixedOperation
, whose type is set viamixed_op_sampling
inmutate_kwargs
whenMixedOperation.mutate()
is called.With this design, one mixed-operation (e.g., MixedConv2d) can work in multiple algorithms (e.g., both DARTS and ENAS), saving the engineering effort to rewrite all operations for each specific algo.
This class should also define a
bound_type
, to control the matching type in mutate, anargument_list
, to control which arguments can be dynamically used inforward
. This list will also be used in mutate for sanity check.- export(memo)[source]¶
Delegates to
MixedOperationSamplingPolicy.export()
.
- export_probs(memo)[source]¶
Delegates to
MixedOperationSamplingPolicy.export_probs()
.
- forward(*args, **kwargs)[source]¶
First get sampled arguments, then forward with the sampled arguments (by calling
forward_with_args
).
- forward_argument(name)[source]¶
Get the argument used in forward. This if often related to algo. We redirect this to sampling policy.
- forward_with_args(*args, **kwargs)[source]¶
To control real fprop. The accepted arguments are
argument_list
, appended by forward arguments in thebound_type
.
- classmethod mutate(module, name, memo, mutate_kwargs)[source]¶
Find value choice in module’s arguments and replace the whole module
- resample(memo)[source]¶
Delegates to
MixedOperationSamplingPolicy.resample()
.
- slice_param(**kwargs)[source]¶
Slice the params and buffers for subnet forward and state dict. When there is a mapping=True in kwargs, the return result will be wrapped in dict.
- super_init_argument(name, value_choice)[source]¶
Get the initialization argument when constructing super-kernel, i.e., calling
super().__init__()
. This is often related to specific operator, rather than algo.For example:
def super_init_argument(self, name, value_choice): return max(value_choice.candidates)
- class nni.retiarii.oneshot.pytorch.supermodule.operation.MixedOperationSamplingPolicy(operation, memo, mutate_kwargs)[source]¶
Algo-related part for mixed Operation.
MixedOperation
delegates its resample and export to this policy (or its subclass), so that one Operation can be easily combined with different kinds of sampling.One SamplingStrategy corresponds to one mixed operation.
- export(operation, memo)[source]¶
The handler of
MixedOperation.export()
.
- export_probs(operation, memo)[source]¶
The handler of
MixedOperation.export_probs()
.
- forward_argument(operation, name)[source]¶
Computing the argument with
name
used in operation’s forward. Usually a value, or a distribution of value.
- resample(operation, memo)[source]¶
The handler of
MixedOperation.resample()
.