Search Space Zoo¶

DartsCell¶

DartsCell is extracted from CNN model. A DartsCell is a directed acyclic graph containing an ordered sequence of N nodes and each node stands for a latent representation (e.g. feature map in a convolutional network). Directed edges from Node 1 to Node 2 are associated with some operations that transform Node 1 and the result is stored on Node 2. The Candidate operators between nodes is predefined and unchangeable. One edge represents an operation that chosen from the predefined ones to be applied to the starting node of the edge. One cell contains two input nodes, a single output node, and other n_node nodes. The input nodes are defined as the cell outputs in the previous two layers. The output of the cell is obtained by applying a reduction operation (e.g. concatenation) to all the intermediate nodes. To make the search space continuous, the categorical choice of a particular operation is relaxed to a softmax over all possible operations. By adjusting the weight of softmax on every node, the operation with the highest probability is chosen to be part of the final structure. A CNN model can be formed by stacking several cells together, which builds a search space. Note that, in DARTS paper all cells in the model share the same structure.

One structure in the Darts search space is shown below. Note that, NNI merges the last one of the four intermediate nodes and the output node.

The predefined operators are shown here.

class nni.nas.pytorch.search_space_zoo.DartsCell(n_nodes, channels_pp, channels_p, channels, reduction_p, reduction)[source]¶

Builtin Darts Cell structure. There are n_nodes nodes in one cell, in which the first two nodes’ values are fixed to the results of previous previous cell and previous cell respectively. One node will connect all the nodes after with predefined operations in a mutable way. The last node accepts five inputs from nodes before and it concats all inputs in channels as the output of the current cell, and the number of output channels is n_nodes times channels.

Parameters

n_nodes (int) – the number of nodes contained in this cell
channels_pp (int) – the number of previous previous cell’s output channels
channels_p (int) – the number of previous cell’s output channels
channels (int) – the number of output channels for each node
reduction_p (bool) – Is previous cell a reduction cell
reduction (bool) – is current cell a reduction cell

forward(pprev, prev)[source]¶

Parameters

pprev (torch.Tensor) – the output of the previous previous layer
prev (torch.Tensor) – the output of the previous layer

Example code¶

example code

git clone https://github.com/Microsoft/nni.git
cd nni/examples/nas/search_space_zoo
# search the best structure
python3 darts_example.py

Candidate operators¶

All supported operators for Darts are listed below.

MaxPool / AvgPool
- MaxPool: Call torch.nn.MaxPool2d. This operation applies a 2D max pooling over all input channels. Its parameters kernel_size=3 and padding=1 are fixed. The pooling result will pass through a BatchNorm2d then return as the result.
- AvgPool: Call torch.nn.AvgPool2d. This operation applies a 2D average pooling over all input channels. Its parameters kernel_size=3 and padding=1 are fixed. The pooling result will pass through a BatchNorm2d then return as the result.
  
  MaxPool / AvgPool with kernel_size=3 and padding=1 followed by BatchNorm2d

class nni.nas.pytorch.search_space_zoo.darts_ops.PoolBN(pool_type, C, kernel_size, stride, padding, affine=True)[source]¶

AvgPool or MaxPool with BN. pool_type must be max or avg.

Parameters

pool_type (str) – choose operation
C (int) – number of channels
kernal_size (int) – size of the convolving kernel
stride (int) – stride of the convolution
padding (int) – zero-padding added to both sides of the input
affine (bool) – is using affine in BatchNorm

SkipConnect

There is no operation between two nodes. Call torch.nn.Identity to forward what it gets to the output.
Zero operation

There is no connection between two nodes.
DilConv3x3 / DilConv5x5

DilConv3x3: (Dilated) depthwise separable Conv. It’s a 3x3 depthwise convolution with C_in groups, followed by a 1x1 pointwise convolution. It reduces the amount of parameters. Input is first passed through relu, then DilConv and finally batchNorm2d. Note that the operation is not Dilated Convolution, but we follow the convention in NAS papers to name it DilConv. 3x3 DilConv has parameters kernel_size=3, padding=1 and 5x5 DilConv has parameters kernel_size=5, padding=4.
class nni.nas.pytorch.search_space_zoo.darts_ops.DilConv(C_in, C_out, kernel_size, stride, padding, dilation, affine=True)[source]¶
(Dilated) depthwise separable conv. ReLU - (Dilated) depthwise separable - Pointwise - BN. If dilation == 2, 3x3 conv => 5x5 receptive field, 5x5 conv => 9x9 receptive field.
Parameters
C_in (int) – the number of input channels

C_out (int) – the number of output channels

kernal_size – size of the convolving kernel

padding – zero-padding added to both sides of the input

dilation (int) – spacing between kernel elements.

affine (bool) – is using affine in BatchNorm
SepConv3x3 / SepConv5x5

Composed of two DilConvs with fixed kernel_size=3, padding=1 or kernel_size=5, padding=2 sequentially.
class nni.nas.pytorch.search_space_zoo.darts_ops.SepConv(C_in, C_out, kernel_size, stride, padding, affine=True)[source]¶
Depthwise separable conv. DilConv(dilation=1) * 2.
Parameters
C_in (int) – the number of input channels

C_out (int) – the number of output channels

kernal_size – size of the convolving kernel

padding – zero-padding added to both sides of the input

dilation (int) – spacing between kernel elements.

affine (bool) – is using affine in BatchNorm

ENASMicroLayer¶

This layer is extracted from the model designed here. A model contains several blocks that share the same architecture. A block is made up of some normal layers and reduction layers, ENASMicroLayer is a unified implementation of the two types of layers. The only difference between the two layers is that reduction layers apply all operations with stride=2.

ENAS Micro employs a DAG with N nodes in one cell, where the nodes represent local computations, and the edges represent the flow of information between the N nodes. One cell contains two input nodes and a single output node. The following nodes choose two previous nodes as input and apply two operations from predefined ones then add them as the output of this node. For example, Node 4 chooses Node 1 and Node 3 as inputs then applies MaxPool and AvgPool on the inputs respectively, then adds and sums them as the output of Node 4. Nodes that are not served as input for any other node are viewed as the output of the layer. If there are multiple output nodes, the model will calculate the average of these nodes as the layer output.

The ENAS micro search space is shown below.

The predefined operators can be seen here.

class nni.nas.pytorch.search_space_zoo.ENASMicroLayer(num_nodes, in_channels_pp, in_channels_p, out_channels, reduction)[source]¶

Builtin EnasMicroLayer. Micro search designs only one building block whose architecture is repeated throughout the final architecture. A cell has num_nodes nodes and searches the topology and operations among them in RL way. The first two nodes in a layer stand for the outputs from previous previous layer and previous layer respectively. For the following nodes, the controller chooses two previous nodes and applies two operations respectively for each node. Nodes that are not served as input for any other node are viewed as the output of the layer. If there are multiple output nodes, the model will calculate the average of these nodes as the layer output. Every node’s output has out_channels channels so the result of the layer has the same number of channels as each node.

Parameters

num_nodes (int) – the number of nodes contained in this layer
in_channles_pp (int) – the number of previous previous layer’s output channels
in_channels_p (int) – the number of previous layer’s output channels
out_channels (int) – output channels of this layer
reduction (bool) – is reduction operation empolyed before this layer

forward(pprev, prev)[source]¶

Parameters

pprev (torch.Tensor) – the output of the previous previous layer
prev (torch.Tensor) – the output of the previous layer

The Reduction Layer is made up of two Conv operations followed by BatchNorm, each of them will output C_out//2 channels and concat them in channels as the output. The Convolution has kernel_size=1 and stride=2, and they perform alternate sampling on the input to reduce the resolution without loss of information. This layer is wrapped in ENASMicroLayer.

Example code¶

example code

git clone https://github.com/Microsoft/nni.git
cd nni/examples/nas/search_space_zoo
# search the best cell structure
python3 enas_micro_example.py

Candidate operators¶

All supported operators for ENAS micro search are listed below.

MaxPool / AvgPool
- MaxPool: Call torch.nn.MaxPool2d. This operation applies a 2D max pooling over all input channels followed by BatchNorm2d. Its parameters are fixed to kernel_size=3, stride=1 and padding=1.
- AvgPool: Call torch.nn.AvgPool2d. This operation applies a 2D average pooling over all input channels followed by BatchNorm2d. Its parameters are fixed to kernel_size=3, stride=1 and padding=1.

class nni.nas.pytorch.search_space_zoo.enas_ops.Pool(pool_type, kernel_size, stride, padding)[source]¶

Pooling structure

Parameters

pool_type (str) – only accept max for MaxPool and avg for AvgPool
kernal_size (int) – size of the convolving kernel
stride (int) – stride of the convolution
padding (int) – zero-padding added to both sides of the input

SepConv
- SepConvBN3x3: ReLU followed by a DilConv and BatchNorm. Convolution parameters are kernel_size=3, stride=1 and padding=1.
- SepConvBN5x5: Do the same operation as the previous one but it has different kernel sizes and paddings, which is set to 5 and 2 respectively.

class nni.nas.pytorch.search_space_zoo.enas_ops.SepConvBN(C_in, C_out, kernel_size, padding)[source]¶

Implement SepConv followed by BatchNorm. The structure is ReLU ==> SepConv ==> BN.

Parameters

C_in (int) – the number of imput channels
C_out (int) – the number of output channels
kernal_size (int) – size of the convolving kernel
padding (int) – zero-padding added to both sides of the input

SkipConnect

Call torch.nn.Identity to connect directly to the next cell.

ENASMacroLayer¶

In Macro search, the controller makes two decisions for each layer: i) the operation to perform on the result of the previous layer, ii) which the previous layer to connect to for SkipConnects. ENAS uses a controller to design the whole model architecture instead of one of its components. The output of operations is going to concat with the tensor of the chosen layer for SkipConnect. NNI provides predefined operators for macro search, which are listed in Candidate operators.

Part of one structure in the ENAS macro search space is shown below.

class nni.nas.pytorch.search_space_zoo.ENASMacroLayer(key, prev_labels, in_filters, out_filters)[source]¶

Builtin ENAS Marco Layer. With search space changing to layer level, the controller decides what operation is employed and the previous layer to connect to for skip connections. The model is made up of the same layers but the choice of each layer may be different.

Parameters

key (str) – the name of this layer
prev_labels (str) – names of all previous layers
in_filters (int) – the number of input channels
out_filters – the number of output channels

forward(prev_list)[source]¶

Parameters: prev_list (list) – The cell selects the last element of the list as input and applies an operation on it. The cell chooses none/one/multiple tensor(s) as SkipConnect(s) from the list excluding the last element.

To describe the whole search space, NNI provides a model, which is built by stacking the layers.

class nni.nas.pytorch.search_space_zoo.ENASMacroGeneralModel(num_layers=12, out_filters=24, in_channels=3, num_classes=10, dropout_rate=0.0)[source]¶

The network is made up by stacking ENASMacroLayer. The Macro search space contains these layers. Each layer chooses an operation from predefined ones and SkipConnect then forms a network.

Parameters

num_layers (int) – The number of layers contained in the network.
out_filters (int) – The number of each layer’s output channels.
in_channel (int) – The number of input’s channels.
num_classes (int) – The number of classes for classification.
dropout_rate (float) – Dropout layer’s dropout rate before the final dense layer.

forward(x)[source]¶

Parameters: x (torch.Tensor) – the input of the network

Example code¶

example code

git clone https://github.com/Microsoft/nni.git
cd nni/examples/nas/search_space_zoo
# search the best cell structure
python3 enas_macro_example.py

Candidate operators¶

All supported operators for ENAS macro search are listed below.

ConvBranch

All input first passes into a StdConv, which is made up of a 1x1Conv followed by BatchNorm2d and ReLU. Then the intermediate result goes through one of the operations listed below. The final result is calculated through a BatchNorm2d and ReLU as post-procedure.
- Separable Conv3x3: If separable=True, the cell will use SepConv instead of normal Conv operation. SepConv’s kernel_size=3, stride=1 and padding=1.
- Separable Conv5x5: SepConv’s kernel_size=5, stride=1 and padding=2.
- Normal Conv3x3: If separable=False, the cell will use a normal Conv operations with kernel_size=3, stride=1 and padding=1.
- Normal Conv5x5: Conv’s kernel_size=5, stride=1 and padding=2.

class nni.nas.pytorch.search_space_zoo.enas_ops.ConvBranch(C_in, C_out, kernel_size, stride, padding, separable)[source]¶

Conv structure for Macro search. First pass through a 1x1 Conv, then Conv operation with kernal_size equals 3 or 5 followed by BatchNorm and ReLU.

Parameters

C_in (int) – the number of input channels
C_out (int) – the number of output channels
kernal_size (int) – size of the convolving kernel
stride (int) – stride of the convolution
padding (int) – zero-padding added to both sides of the input
separable (True) – is separable Conv is used

PoolBranch

All input first passes into a StdConv, which is made up of a 1x1Conv followed by BatchNorm2d and ReLU. Then the intermediate goes through pooling operation followed by BatchNorm.
- AvgPool: Call torch.nn.AvgPool2d. This operation applies a 2D average pooling over all input channels. Its parameters are fixed to kernel_size=3, stride=1 and padding=1.
- MaxPool: Call torch.nn.MaxPool2d. This operation applies a 2D max pooling over all input channels. Its parameters are fixed to kernel_size=3, stride=1 and padding=1.

class nni.nas.pytorch.search_space_zoo.enas_ops.PoolBranch(pool_type, C_in, C_out, kernel_size, stride, padding, affine=False)[source]¶

Pooling structure for Macro search. First pass through a 1x1 Conv, then pooling operation followed by BatchNorm2d.

Parameters

pool_type (str) – only accept max for MaxPool and avg for AvgPool
C_in (int) – the number of input channels
C_out (int) – the number of output channels
kernal_size (int) – size of the convolving kernel
stride (int) – stride of the convolution
padding (int) – zero-padding added to both sides of the input

NAS-Bench-201¶

NAS Bench 201 defines a unified search space, which is algorithm agnostic. The predefined skeleton consists of a stack of cells that share the same architecture. Every cell contains four nodes and a DAG is formed by connecting edges among them, where the node represents the sum of feature maps and the edge stands for an operation transforming a tensor from the source node to the target node. The predefined candidate operators can be found in Candidate operators.

The search space of NAS Bench 201 is shown below.

class nni.nas.pytorch.nasbench201.NASBench201Cell(cell_id, C_in, C_out, stride, bn_affine=True, bn_momentum=0.1, bn_track_running_stats=True)[source]¶

Builtin cell structure of NAS Bench 201. One cell contains four nodes. The First node serves as an input node accepting the output of the previous cell. And other nodes connect to all previous nodes with an edge that represents an operation chosen from a set to transform the tensor from the source node to the target node. Every node accepts all its inputs and adds them as its output.

Parameters

cell_id (str) – the name of this cell
C_in (int) – the number of input channels of the cell
C_out (int) – the number of output channels of the cell
stride (int) – stride of all convolution operations in the cell
bn_affine (bool) – If set to True, all torch.nn.BatchNorm2d in this cell will have learnable affine parameters. Default: True
bn_momentum (float) – the value used for the running_mean and running_var computation. Default: 0.1
bn_track_running_stats (bool) – When set to True, all torch.nn.BatchNorm2d in this cell tracks the running mean and variance. Default: True

forward(input)[source]¶

Parameters: input (torch.tensor) – the output of the previous layer

Example code¶

example code

# for structure searching
git clone https://github.com/Microsoft/nni.git
cd nni/examples/nas/search_space_zoo
python3 nas_bench_201.py

Candidate operators¶

All supported operators for NAS Bench 201 are listed below.

AvgPool

If the number of input channels is not equal to the number of output channels, the input will first pass through a ReLUConvBN layer with kernel_size=1, stride=1, padding=0, and dilation=0. Call torch.nn.AvgPool2d. This operation applies a 2D average pooling over all input channels followed by BatchNorm2d. Its parameters are fixed to kernel_size=3 and padding=1.

class nni.nas.pytorch.nasbench201.nasbench201_ops.Pooling(C_in, C_out, stride, bn_affine=True, bn_momentum=0.1, bn_track_running_stats=True)[source]¶

Parameters

C_in (int) – the number of input channels
C_out (int) – the number of output channels
stride (int) – stride of the convolution
bn_affine (bool) – If set to True, torch.nn.BatchNorm2d will have learnable affine parameters. Default: True
bn_momentun (float) – the value used for the running_mean and running_var computation. Default: 0.1
bn_track_running_stats (bool) – When set to True, torch.nn.BatchNorm2d tracks the running mean and variance. Default: True

forward(x)[source]¶

Parameters: x (torch.Tensor) – input tensor

Conv
- Conv1x1: Consist of a sequence of ReLU, nn.Cinv2d and BatchNorm. The Conv operation’s parameter is fixed to kernal_size=1, padding=0, and dilation=1.
- Conv3x3: Consist of a sequence of ReLU, nn.Cinv2d and BatchNorm. The Conv operation’s parameter is fixed to kernal_size=3, padding=1, and dilation=1.

class nni.nas.pytorch.nasbench201.nasbench201_ops.ReLUConvBN(C_in, C_out, kernel_size, stride, padding, dilation, bn_affine=True, bn_momentum=0.1, bn_track_running_stats=True)[source]¶

Parameters

C_in (int) – the number of input channels
C_out (int) – the number of output channels
stride (int) – stride of the convolution
padding (int) – zero-padding added to both sides of the input
dilation (int) – spacing between kernel elements
bn_affine (bool) – If set to True, torch.nn.BatchNorm2d will have learnable affine parameters. Default: True
bn_momentun (float) – the value used for the running_mean and running_var computation. Default: 0.1
bn_track_running_stats (bool) – When set to True, torch.nn.BatchNorm2d tracks the running mean and variance. Default: True

forward(x)[source]¶

Parameters: x (torch.Tensor) – input tensor

SkipConnect

Call torch.nn.Identity to connect directly to the next cell.
Zeroize

Generate zero tensors indicating there is no connection from the source node to the target node.

class nni.nas.pytorch.nasbench201.nasbench201_ops.Zero(C_in, C_out, stride)[source]¶

Parameters

C_in (int) – the number of input channels
C_out (int) – the number of output channels
stride (int) – stride of the convolution

forward(x)[source]¶

Parameters: x (torch.Tensor) – input tensor