Retiarii API Reference¶
Contents
Inline Mutation APIs¶
-
class
nni.retiarii.nn.pytorch.
LayerChoice
(candidates: Union[Dict[str, torch.nn.modules.module.Module], List[torch.nn.modules.module.Module]], label: Optional[str] = None, **kwargs)[source]¶ Layer choice selects one of the
candidates
, then apply it on inputs and return results.Layer choice does not allow itself to be nested.
- Parameters
candidates (list of nn.Module or OrderedDict) – A module list to be selected from.
label (str) – Identifier of the layer choice.
-
length
¶ Deprecated. Number of ops to choose from.
len(layer_choice)
is recommended.- Type
int
-
names
¶ Names of candidates.
- Type
list of str
-
choices
¶ Deprecated. A list of all candidate modules in the layer choice module.
list(layer_choice)
is recommended, which will serve the same purpose.- Type
list of Module
Notes
candidates
can be a list of modules or a ordered dict of named modules, for example,self.op_choice = LayerChoice(OrderedDict([ ("conv3x3", nn.Conv2d(3, 16, 128)), ("conv5x5", nn.Conv2d(5, 16, 128)), ("conv7x7", nn.Conv2d(7, 16, 128)) ]))
Elements in layer choice can be modified or deleted. Use
del self.op_choice["conv5x5"]
orself.op_choice[1] = nn.Conv3d(...)
. Adding more choices is not supported yet.-
forward
(x)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class
nni.retiarii.nn.pytorch.
InputChoice
(n_candidates: int, n_chosen: int = 1, reduction: str = 'sum', label: Optional[str] = None, **kwargs)[source]¶ Input choice selects
n_chosen
inputs fromchoose_from
(containsn_candidates
keys). Usereduction
to specify how chosen inputs are reduced into one output. A few options are:none
: do nothing and return the list directly.sum
: summing all the chosen inputs.mean
: taking the average of all chosen inputs.concat
: concatenate all chosen inputs at dimension 1.
We don’t support customizing reduction yet.
- Parameters
n_candidates (int) – Number of inputs to choose from. It is required.
n_chosen (int) – Recommended inputs to choose. If None, mutator is instructed to select any.
reduction (str) –
mean
,concat
,sum
ornone
.label (str) – Identifier of the input choice.
-
forward
(candidate_inputs: List[torch.Tensor]) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class
nni.retiarii.nn.pytorch.
ValueChoice
(candidates: List[Any], label: Optional[str] = None)[source]¶ ValueChoice is to choose one from
candidates
.In most use scenarios, ValueChoice should be passed to the init parameters of a serializable module. For example,
class Net(nn.Module): def __init__(self): super().__init__() self.conv = nn.Conv2d(3, nn.ValueChoice([32, 64]), kernel_size=nn.ValueChoice([3, 5, 7])) def forward(self, x): return self.conv(x)
In case, you want to search a parameter that is used repeatedly, this is also possible by sharing the same value choice instance. (Sharing the label should have the same effect.) For example,
class Net(nn.Module): def __init__(self): super().__init__() hidden_dim = nn.ValueChoice([128, 512]) self.fc = nn.Sequential( nn.Linear(64, hidden_dim), nn.Linear(hidden_dim, 10) ) # the following code has the same effect. # self.fc = nn.Sequential( # nn.Linear(64, nn.ValueChoice([128, 512], label='dim')), # nn.Linear(nn.ValueChoice([128, 512], label='dim'), 10) # ) def forward(self, x): return self.fc(x)
Note that ValueChoice should be used directly. Transformations like
nn.Linear(32, nn.ValueChoice([64, 128]) * 2)
are not supported.Another common use case is to initialize the values to choose from in init and call the module in forward to get the chosen value. Usually, this is used to pass a mutable value to a functional API like
torch.xxx
ornn.functional.xxx`
. For example,class Net(nn.Module): def __init__(self): super().__init__() self.dropout_rate = nn.ValueChoice([0., 1.]) def forward(self, x): return F.dropout(x, self.dropout_rate())
- Parameters
candidates (list) – List of values to choose from.
label (str) – Identifier of the value choice.
-
forward
()[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class
nni.retiarii.nn.pytorch.
ChosenInputs
(chosen: List[int], reduction: str)[source]¶ A module that chooses from a tensor list and outputs a reduced tensor. The already-chosen version of InputChoice.
-
forward
(candidate_inputs)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
Graph Mutation APIs¶
-
class
nni.retiarii.
Mutator
(sampler: Optional[nni.retiarii.mutator.Sampler] = None)[source]¶ Mutates graphs in model to generate new model. Mutator class will be used in two places:
Inherit Mutator to implement graph mutation logic.
Use Mutator subclass to implement NAS strategy.
In scenario 1, the subclass should implement Mutator.mutate() interface with Mutator.choice(). In scenario 2, strategy should use constructor or Mutator.bind_sampler() to initialize subclass, and then use Mutator.apply() to mutate model. For certain mutator subclasses, strategy or sampler can use Mutator.dry_run() to predict choice candidates. # Method names are open for discussion.
-
apply
(model: nni.retiarii.graph.Model) → nni.retiarii.graph.Model[source]¶ Apply this mutator on a model. Returns mutated model. The model will be copied before mutation and the original model will not be modified.
-
bind_sampler
(sampler: nni.retiarii.mutator.Sampler) → nni.retiarii.mutator.Mutator[source]¶ Set the sampler which will handle Mutator.choice calls.
-
class
nni.retiarii.
Model
(_internal=False)[source]¶ Represents a neural network model.
During mutation, one Model object is created for each trainable snapshot. For example, consider a mutator that insert a node at an edge for each iteration. In one iteration, the mutator invokes 4 primitives: add node, remove edge, add edge to head, add edge to tail. These 4 primitives operates in one Model object. When they are all done the model will be set to “frozen” (trainable) status and be submitted to execution engine. And then a new iteration starts, and a new Model object is created by forking last model.
-
status
¶ See ModelStatus.
-
root_graph
¶ The outermost graph which usually takes dataset as input and feeds output to loss function.
-
graphs
¶ All graphs (subgraphs) in this model.
-
evaluator
¶ Model evaluator
-
history
¶ Mutation history. self is directly mutated from self.history[-1]; self.history[-1] is mutated from `self.history[-2], and so on. self.history[0] is the base graph.
-
metric
¶ Training result of the model, or None if it’s not yet trained or has failed to train.
-
intermediate_metrics
¶ Intermediate training metrics. If the model is not trained, it’s an empty list.
-
fork
() → nni.retiarii.graph.Model[source]¶ Create a new model which has same topology, names, and IDs to current one.
Can only be invoked on a frozen model. The new model will be in Mutating state.
This API is used in mutator base class.
-
get_node_by_name
(node_name: str) → nni.retiarii.graph.Node[source]¶ Traverse all the nodes to find the matched node with the given name.
-
get_nodes_by_label
(label: str) → List[nni.retiarii.graph.Node][source]¶ Traverse all the nodes to find the matched node(s) with the given name. There could be multiple nodes with the same name. Name space name can uniquely identify a graph or node.
NOTE: the implementation does not support the class abstration
-
-
class
nni.retiarii.
Graph
(model: nni.retiarii.graph.Model, graph_id: int, name: Optional[str] = None, _internal: bool = False)[source]¶ Graph topology.
This class simply represents the topology, with no semantic meaning. All other information like metric, non-graph functions, mutation history, etc should go to Model.
Each graph belongs to and only belongs to one Model.
-
model
¶ The model containing (and owning) this graph.
-
id
¶ Unique ID in the model. If two models have graphs of identical ID, they are semantically the same graph. Typically this means one graph is mutated from another, or they are both mutated from one ancestor.
-
name
¶ Mnemonic name of this graph. It should have an one-to-one mapping with ID.
-
input_names
¶ Optional mnemonic names of input parameters.
-
output_names
¶ Optional mnemonic names of output values.
-
input_node
¶ …
-
output_node
¶ …
…
-
nodes
¶ All input/output/hidden nodes.
-
edges
¶ …
-
fork
() → nni.retiarii.graph.Graph[source]¶ Fork the model and returns corresponding graph in new model. This shortcut might be helpful because many algorithms only cares about “stem” subgraph instead of whole model.
-
get_node_by_id
(node_id: int) → Optional[nni.retiarii.graph.Node][source]¶ Returns the node which has specified name; or returns None if no node has this name.
-
-
class
nni.retiarii.
Node
(graph, node_id, name, operation, _internal=False)[source]¶ An operation or an opaque subgraph inside a graph.
Each node belongs to and only belongs to one Graph. Nodes should never be created with constructor. Use Graph.add_node() instead.
The node itself is for topology only. Information of tensor calculation should all go inside operation attribute.
TODO: parameter of subgraph (cell) It’s easy to assign parameters on cell node, but it’s hard to “use” them. We need to design a way to reference stored cell parameters in inner node operations. e.g. self.fc = Linear(self.units) <- how to express self.units in IR?
-
graph
¶ The graph containing this node.
-
id
¶ Unique ID in the model. If two models have nodes with same ID, they are semantically the same node.
-
name
¶ Mnemonic name. It should have an one-to-one mapping with ID.
-
operation
¶ …
-
cell
¶ Read only shortcut to get the referenced subgraph. If this node is not a subgraph (is a primitive operation), accessing cell will raise an error.
-
predecessors
¶ Predecessor nodes of this node in the graph. This is an optional mutation helper.
-
successors
¶ Successor nodes of this node in the graph. This is an optional mutation helper.
-
incoming_edges
¶ Incoming edges of this node in the graph. This is an optional mutation helper.
-
outgoing_edges
¶ Outgoing edges of this node in the graph. This is an optional mutation helper.
-
-
class
nni.retiarii.
Edge
(head: Tuple[nni.retiarii.graph.Node, Optional[int]], tail: Tuple[nni.retiarii.graph.Node, Optional[int]], _internal: bool = False)[source]¶ A tensor, or “data flow”, between two nodes.
Example forward code snippet:
` a, b, c = split(x) p = concat(a, c) q = sum(b, p) z = relu(q) `
- Edges in above snippet:
head: (split, 0), tail: (concat, 0) # a in concat
head: (split, 2), tail: (concat, 1) # c in concat
head: (split, 1), tail: (sum, -1 or 0) # b in sum
head: (concat, null), tail: (sum, -1 or 1) # p in sum
head: (sum, null), tail: (relu, null) # q in relu
-
graph
¶ …
-
head
¶ Head node.
-
tail
¶ Tail node.
-
head_slot
¶ Index of outputs in head node. If the node has only one output, this should be null.
-
tail_slot
¶ Index of inputs in tail node. If the node has only one input, this should be null. If the node does not care about order, this can be -1.
-
class
nni.retiarii.
Operation
(type_name: str, parameters: Dict[str, Any], _internal: bool = False)[source]¶ Calculation logic of a graph node.
The constructor is private. Use Operation.new() to create operation object.
Operation is a naive record. Do not “mutate” its attributes or store information relate to specific node. All complex logic should be implemented in Node class.
-
type
¶ Operation type name (e.g. Conv2D). If it starts with underscore, the “operation” is a special one (e.g. subgraph, input/output).
-
parameters
¶ Arbitrary key-value parameters (e.g. kernel_size).
-
Oneshot Trainers¶
-
class
nni.retiarii.oneshot.pytorch.
DartsTrainer
(model, loss, metrics, optimizer, num_epochs, dataset, grad_clip=5.0, learning_rate=0.0025, batch_size=64, workers=4, device=None, log_frequency=None, arc_learning_rate=0.0003, unrolled=False)[source]¶ DARTS trainer.
- Parameters
model (nn.Module) – PyTorch model to be trained.
loss (callable) – Receives logits and ground truth label, return a loss tensor.
metrics (callable) – Receives logits and ground truth label, return a dict of metrics.
optimizer (Optimizer) – The optimizer used for optimizing the model.
num_epochs (int) – Number of epochs planned for training.
dataset (Dataset) – Dataset for training. Will be split for training weights and architecture weights.
grad_clip (float) – Gradient clipping. Set to 0 to disable. Default: 5.
learning_rate (float) – Learning rate to optimize the model.
batch_size (int) – Batch size.
workers (int) – Workers for data loading.
device (torch.device) –
torch.device("cpu")
ortorch.device("cuda")
.log_frequency (int) – Step count per logging.
arc_learning_rate (float) – Learning rate of architecture parameters.
unrolled (float) –
True
if using second order optimization, else first order optimization.
-
class
nni.retiarii.oneshot.pytorch.
EnasTrainer
(model, loss, metrics, reward_function, optimizer, num_epochs, dataset, batch_size=64, workers=4, device=None, log_frequency=None, grad_clip=5.0, entropy_weight=0.0001, skip_weight=0.8, baseline_decay=0.999, ctrl_lr=0.00035, ctrl_steps_aggregate=20, ctrl_kwargs=None)[source]¶ ENAS trainer.
- Parameters
model (nn.Module) – PyTorch model to be trained.
loss (callable) – Receives logits and ground truth label, return a loss tensor.
metrics (callable) – Receives logits and ground truth label, return a dict of metrics.
reward_function (callable) – Receives logits and ground truth label, return a tensor, which will be feeded to RL controller as reward.
optimizer (Optimizer) – The optimizer used for optimizing the model.
num_epochs (int) – Number of epochs planned for training.
dataset (Dataset) – Dataset for training. Will be split for training weights and architecture weights.
batch_size (int) – Batch size.
workers (int) – Workers for data loading.
device (torch.device) –
torch.device("cpu")
ortorch.device("cuda")
.log_frequency (int) – Step count per logging.
grad_clip (float) – Gradient clipping. Set to 0 to disable. Default: 5.
entropy_weight (float) – Weight of sample entropy loss.
skip_weight (float) – Weight of skip penalty loss.
baseline_decay (float) – Decay factor of baseline. New baseline will be equal to
baseline_decay * baseline_old + reward * (1 - baseline_decay)
.ctrl_lr (float) – Learning rate for RL controller.
ctrl_steps_aggregate (int) – Number of steps that will be aggregated into one mini-batch for RL controller.
ctrl_steps (int) – Number of mini-batches for each epoch of RL controller learning.
ctrl_kwargs (dict) – Optional kwargs that will be passed to
ReinforceController
.
-
class
nni.retiarii.oneshot.pytorch.
ProxylessTrainer
(model, loss, metrics, optimizer, num_epochs, dataset, warmup_epochs=0, batch_size=64, workers=4, device=None, log_frequency=None, arc_learning_rate=0.001)[source]¶ Proxyless trainer.
- Parameters
model (nn.Module) – PyTorch model to be trained.
loss (callable) – Receives logits and ground truth label, return a loss tensor.
metrics (callable) – Receives logits and ground truth label, return a dict of metrics.
optimizer (Optimizer) – The optimizer used for optimizing the model.
num_epochs (int) – Number of epochs planned for training.
dataset (Dataset) – Dataset for training. Will be split for training weights and architecture weights.
warmup_epochs (int) – Number of epochs to warmup model parameters.
batch_size (int) – Batch size.
workers (int) – Workers for data loading.
device (torch.device) –
torch.device("cpu")
ortorch.device("cuda")
.log_frequency (int) – Step count per logging.
arc_learning_rate (float) – Learning rate of architecture parameters.
-
class
nni.retiarii.oneshot.pytorch.
SinglePathTrainer
(model, loss, metrics, optimizer, num_epochs, dataset_train, dataset_valid, mutator=None, batch_size=64, workers=4, device=None, log_frequency=None)[source]¶ Single-path trainer. Samples a path every time and backpropagates on that path.
- Parameters
model (nn.Module) – Model with mutables.
loss (callable) – Called with logits and targets. Returns a loss tensor.
metrics (callable) – Returns a dict that maps metrics keys to metrics data.
optimizer (Optimizer) – Optimizer that optimizes the model.
num_epochs (int) – Number of epochs of training.
dataset_train (Dataset) – Dataset of training.
dataset_valid (Dataset) – Dataset of validation.
batch_size (int) – Batch size.
workers (int) – Number of threads for data preprocessing. Not used for this trainer. Maybe removed in future.
device (torch.device) – Device object. Either
torch.device("cuda")
ortorch.device("cpu")
. WhenNone
, trainer will automatic detects GPU and selects GPU first.log_frequency (int) – Number of mini-batches to log metrics.
Strategies¶
-
class
nni.retiarii.strategy.
Random
(variational=False, dedup=True)[source]¶ Random search on the search space.
- Parameters
variational (bool) – Do not dry run to get the full search space. Used when the search space has variational size or candidates. Default: false.
dedup (bool) – Do not try the same configuration twice. When variational is true, deduplication is not supported. Default: true.
-
class
nni.retiarii.strategy.
GridSearch
(shuffle=True)[source]¶ Traverse the search space and try all the possible combinations one by one.
- Parameters
shuffle (bool) – Shuffle the order in a candidate list, so that they are tried in a random order. Default: true.
-
class
nni.retiarii.strategy.
RegularizedEvolution
(optimize_mode='maximize', population_size=100, sample_size=25, cycles=20000, mutation_prob=0.05, on_failure='ignore')[source]¶ Algorithm for regularized evolution (i.e. aging evolution). Follows “Algorithm 1” in Real et al. “Regularized Evolution for Image Classifier Architecture Search”.
- Parameters
optimize_mode (str) – Can be one of “maximize” and “minimize”. Default: maximize.
population_size (int) – The number of individuals to keep in the population. Default: 100.
cycles (int) – The number of cycles (trials) the algorithm should run for. Default: 20000.
sample_size (int) – The number of individuals that should participate in each tournament. Default: 25.
mutation_prob (float) – Probability that mutation happens in each dim. Default: 0.05
on_failure (str) – Can be one of “ignore” and “worst”. If “ignore”, simply give up the model and find a new one. If “worst”, mark the model as -inf (if maximize, inf if minimize), so that the algorithm “learns” to avoid such model. Default: ignore.