NAS Benchmarks (experimental)¶
Introduction¶
To imporve the reproducibility of NAS algorithms as well as reducing computing resource requirements, researchers proposed a series of NAS benchmarks such as NAS-Bench-101, NAS-Bench-201, NDS, etc. NNI provides a query interface for users to acquire these benchmarks. Within just a few lines of code, researcher are able to evaluate their NAS algorithms easily and fairly by utilizing these benchmarks.
Prerequisites¶
- Please prepare a folder to household all the benchmark databases. By default, it can be found at
${HOME}/.nni/nasbenchmark
. You can place it anywhere you like, and specify it inNASBENCHMARK_DIR
before importing NNI. - Please install
peewee
viapip install peewee
, which NNI uses to connect to database.
Data Preparation¶
To avoid storage and legality issues, we do not provide any prepared databases. Please follow the following steps.
- Clone NNI to your machine and enter
examples/nas/benchmarks
directory.
git clone -b ${NNI_VERSION} https://github.com/microsoft/nni
cd nni/examples/nas/benchmarks
Replace ${NNI_VERSION}
with a released version name or branch name, e.g., v1.7
.
- Install dependencies via
pip3 install -r xxx.requirements.txt
.xxx
can benasbench101
,nasbench201
ornds
. - Generate the database via
./xxx.sh
. The directory that stores the benchmark file can be configured withNASBENCHMARK_DIR
environment variable, which defaults to~/.nni/nasbenchmark
. Note that the NAS-Bench-201 dataset will be downloaded from a google drive.
Please make sure there is at least 10GB free disk space and note that the conversion process can take up to hours to complete.
Example Usages¶
Please refer to examples usages of Benchmarks API.
NAS-Bench-101¶
NAS-Bench-101 contains 423,624 unique neural networks, combined with 4 variations in number of epochs (4, 12, 36, 108), each of which is trained 3 times. It is a cell-wise search space, which constructs and stacks a cell by enumerating DAGs with at most 7 operators, and no more than 9 connections. All operators can be chosen from CONV3X3_BN_RELU
, CONV1X1_BN_RELU
and MAXPOOL3X3
, except the first operator (always INPUT
) and last operator (always OUTPUT
).
Notably, NAS-Bench-101 eliminates invalid cells (e.g., there is no path from input to output, or there is redundant computation). Furthermore, isomorphic cells are de-duplicated, i.e., all the remaining cells are computationally unique.
API Documentation¶
-
nni.nas.benchmarks.nasbench101.
query_nb101_trial_stats
(arch, num_epochs, isomorphism=True, reduction=None)[source]¶ Query trial stats of NAS-Bench-101 given conditions.
Parameters: - arch (dict or None) – If a dict, it is in the format that is described in
nni.nas.benchmark.nasbench101.Nb101TrialConfig
. Only trial stats matched will be returned. If none, architecture will be a wildcard. - num_epochs (int or None) – If int, matching results will be returned. Otherwise a wildcard.
- isomorphism (boolean) – Whether to match essentially-same architecture, i.e., architecture with the same graph-invariant hash value.
- reduction (str or None) – If ‘none’ or None, all trial stats will be returned directly. If ‘mean’, fields in trial stats will be averaged given the same trial config.
Returns: A generator of
nni.nas.benchmark.nasbench101.Nb101TrialStats
objects, where each of them has been converted into a dict.Return type: generator of dict
- arch (dict or None) – If a dict, it is in the format that is described in
-
nasbench101.
INPUT
= 'input'¶
-
nasbench101.
OUTPUT
= 'output'¶
-
nasbench101.
CONV3X3_BN_RELU
= 'conv3x3-bn-relu'¶
-
nasbench101.
CONV1X1_BN_RELU
= 'conv1x1-bn-relu'¶
-
nasbench101.
MAXPOOL3X3
= 'maxpool3x3'¶
-
class
nni.nas.benchmarks.nasbench101.
Nb101TrialConfig
(*args, **kwargs)[source]¶ Trial config for NAS-Bench-101.
-
arch
¶ A dict with keys
op1
,op2
, … andinput1
,input2
, … Vertices are enumerate from 0. Since node 0 is input node, it is skipped in this dict. Eachop
is one ofnni.nas.benchmark.nasbench101.CONV3X3_BN_RELU
,nni.nas.benchmark.nasbench101.CONV1X1_BN_RELU
, andnni.nas.benchmark.nasbench101.MAXPOOL3X3
. Eachinput
is a list of previous nodes. For exampleinput5
can be[0, 1, 3]
.Type: dict
-
num_vertices
¶ Number of vertices (nodes) in one cell. Should be less than or equal to 7 in default setup.
Type: int
-
hash
¶ Graph-invariant MD5 string for this architecture.
Type: str
-
num_epochs
¶ Number of epochs planned for this trial. Should be one of 4, 12, 36, 108 in default setup.
Type: int
-
-
class
nni.nas.benchmarks.nasbench101.
Nb101TrialStats
(*args, **kwargs)[source]¶ Computation statistics for NAS-Bench-101. Each corresponds to one trial. Each config has multiple trials with different random seeds, but unfortunately seed for each trial is unavailable. NAS-Bench-101 trains and evaluates on CIFAR-10 by default. The original training set is divided into 40k training images and 10k validation images, and the original validation set is used for test only.
-
config
¶ Setup for this trial data.
Type: Nb101TrialConfig
-
train_acc
¶ Final accuracy on training data, ranging from 0 to 100.
Type: float
-
valid_acc
¶ Final accuracy on validation data, ranging from 0 to 100.
Type: float
-
test_acc
¶ Final accuracy on test data, ranging from 0 to 100.
Type: float
-
parameters
¶ Number of trainable parameters in million.
Type: float
-
training_time
¶ Duration of training in seconds.
Type: float
-
-
class
nni.nas.benchmarks.nasbench101.
Nb101IntermediateStats
(*args, **kwargs)[source]¶ Intermediate statistics for NAS-Bench-101.
-
trial
¶ The exact trial where the intermediate result is produced.
Type: Nb101TrialStats
-
current_epoch
¶ Elapsed epochs when evaluation is done.
Type: int
-
train_acc
¶ Intermediate accuracy on training data, ranging from 0 to 100.
Type: float
-
valid_acc
¶ Intermediate accuracy on validation data, ranging from 0 to 100.
Type: float
-
test_acc
¶ Intermediate accuracy on test data, ranging from 0 to 100.
Type: float
-
training_time
¶ Time elapsed in seconds.
Type: float
-
-
nni.nas.benchmarks.nasbench101.graph_util.
nasbench_format_to_architecture_repr
(adjacency_matrix, labeling)[source]¶ Computes a graph-invariance MD5 hash of the matrix and label pair. Imported from NAS-Bench-101 repo.
Parameters: - adjacency_matrix (np.ndarray) – A 2D array of shape NxN, where N is the number of vertices.
matrix[u][v]
is 1 if there is a direct edge from u to v, otherwise it will be 0. - labeling (list of str) – A list of str that starts with input and ends with output. The intermediate nodes are chosen from candidate operators.
Returns: Converted number of vertices and architecture.
Return type: tuple and int and dict
- adjacency_matrix (np.ndarray) – A 2D array of shape NxN, where N is the number of vertices.
-
nni.nas.benchmarks.nasbench101.graph_util.
infer_num_vertices
(architecture)[source]¶ Infer number of vertices from an architecture dict.
Parameters: architecture (dict) – Architecture in NNI format. Returns: Number of vertices. Return type: int
-
nni.nas.benchmarks.nasbench101.graph_util.
hash_module
(architecture, vertices)[source]¶ Computes a graph-invariance MD5 hash of the matrix and label pair. This snippet is modified from code in NAS-Bench-101 repo.
Parameters: - matrix (np.ndarray) – Square upper-triangular adjacency matrix.
- labeling (list of int) – Labels of length equal to both dimensions of matrix.
Returns: MD5 hash of the matrix and labeling.
Return type: str
NAS-Bench-201¶
Paper link Open-source API Implementations
NAS-Bench-201 is a cell-wise search space that views nodes as tensors and edges as operators. The search space contains all possible densely-connected DAGs with 4 nodes, resulting in 15,625 candidates in total. Each operator (i.e., edge) is selected from a pre-defined operator set (NONE
, SKIP_CONNECT
, CONV_1X1
, CONV_3X3
and AVG_POOL_3X3
). Training appraoches vary in the dataset used (CIFAR-10, CIFAR-100, ImageNet) and number of epochs scheduled (12 and 200). Each combination of architecture and training approach is repeated 1 - 3 times with different random seeds.
API Documentation¶
-
nni.nas.benchmarks.nasbench201.
query_nb201_trial_stats
(arch, num_epochs, dataset, reduction=None)[source]¶ Query trial stats of NAS-Bench-201 given conditions.
Parameters: - arch (dict or None) – If a dict, it is in the format that is described in
nni.nas.benchmark.nasbench201.Nb201TrialConfig
. Only trial stats matched will be returned. If none, architecture will be a wildcard. - num_epochs (int or None) – If int, matching results will be returned. Otherwise a wildcard.
- dataset (str or None) – If specified, can be one of the dataset available in
nni.nas.benchmark.nasbench201.Nb201TrialConfig
. Otherwise a wildcard. - reduction (str or None) – If ‘none’ or None, all trial stats will be returned directly. If ‘mean’, fields in trial stats will be averaged given the same trial config.
Returns: A generator of
nni.nas.benchmark.nasbench201.Nb201TrialStats
objects, where each of them has been converted into a dict.Return type: generator of dict
- arch (dict or None) – If a dict, it is in the format that is described in
-
nasbench201.
NONE
= 'none'¶
-
nasbench201.
SKIP_CONNECT
= 'skip_connect'¶
-
nasbench201.
CONV_1X1
= 'conv_1x1'¶
-
nasbench201.
CONV_3X3
= 'conv_3x3'¶
-
nasbench201.
AVG_POOL_3X3
= 'avg_pool_3x3'¶
-
class
nni.nas.benchmarks.nasbench201.
Nb201TrialConfig
(*args, **kwargs)[source]¶ Trial config for NAS-Bench-201.
-
arch
¶ A dict with keys
0_1
,0_2
,0_3
,1_2
,1_3
,2_3
, each of which is an operator chosen fromnni.nas.benchmark.nasbench201.NONE
,nni.nas.benchmark.nasbench201.SKIP_CONNECT
,nni.nas.benchmark.nasbench201.CONV_1X1
,nni.nas.benchmark.nasbench201.CONV_3X3
andnni.nas.benchmark.nasbench201.AVG_POOL_3X3
.Type: dict
-
num_epochs
¶ Number of epochs planned for this trial. Should be one of 12 and 200.
Type: int
-
num_channels
¶ Number of channels for initial convolution. 16 by default.
Type: int
-
num_cells
¶ Number of cells per stage. 5 by default.
Type: int
-
dataset
¶ Dataset used for training and evaluation. NAS-Bench-201 provides the following 4 options:
cifar10-valid
(training data is splited into 25k for training and 25k for validation, validation data is used for test),cifar10
(training data is used in training, validation data is splited into 25k for validation and 25k for testing),cifar100
(same protocol ascifar10
), andimagenet16-120
(a subset of 120 classes in ImageNet, downscaled to 16x16, using training data for training, 6k images from validation set for validation and the other 6k for testing).Type: str
-
-
class
nni.nas.benchmarks.nasbench201.
Nb201TrialStats
(*args, **kwargs)[source]¶ Computation statistics for NAS-Bench-201. Each corresponds to one trial.
-
config
¶ Setup for this trial data.
Type: Nb201TrialConfig
-
seed
¶ Random seed selected, for reproduction.
Type: int
-
train_acc
¶ Final accuracy on training data, ranging from 0 to 100.
Type: float
-
valid_acc
¶ Final accuracy on validation data, ranging from 0 to 100.
Type: float
-
test_acc
¶ Final accuracy on test data, ranging from 0 to 100.
Type: float
-
ori_test_acc
¶ Test accuracy on original validation set (10k for CIFAR and 12k for Imagenet16-120), ranging from 0 to 100.
Type: float
-
train_loss
¶ Final cross entropy loss on training data. Note that loss could be NaN, in which case this attributed will be None.
Type: float or None
-
valid_loss
¶ Final cross entropy loss on validation data.
Type: float or None
-
test_loss
¶ Final cross entropy loss on test data.
Type: float or None
-
ori_test_loss
¶ Final cross entropy loss on original validation set.
Type: float or None
-
parameters
¶ Number of trainable parameters in million.
Type: float
-
latency
¶ Latency in seconds.
Type: float
-
flops
¶ FLOPs in million.
Type: float
-
training_time
¶ Duration of training in seconds.
Type: float
-
valid_evaluation_time
¶ Time elapsed to evaluate on validation set.
Type: float
-
test_evaluation_time
¶ Time elapsed to evaluate on test set.
Type: float
-
ori_test_evaluation_time
¶ Time elapsed to evaluate on original test set.
Type: float
-
-
class
nni.nas.benchmarks.nasbench201.
Nb201IntermediateStats
(*args, **kwargs)[source]¶ Intermediate statistics for NAS-Bench-201.
-
trial
¶ Corresponding trial.
Type: Nb201TrialStats
-
current_epoch
¶ Elapsed epochs.
Type: int
-
train_acc
¶ Current accuracy on training data, ranging from 0 to 100.
Type: float
-
valid_acc
¶ Current accuracy on validation data, ranging from 0 to 100.
Type: float
-
test_acc
¶ Current accuracy on test data, ranging from 0 to 100.
Type: float
-
ori_test_acc
¶ Test accuracy on original validation set (10k for CIFAR and 12k for Imagenet16-120), ranging from 0 to 100.
Type: float
-
train_loss
¶ Current cross entropy loss on training data.
Type: float or None
-
valid_loss
¶ Current cross entropy loss on validation data.
Type: float or None
-
test_loss
¶ Current cross entropy loss on test data.
Type: float or None
-
ori_test_loss
¶ Current cross entropy loss on original validation set.
Type: float or None
-
NDS¶
On Network Design Spaces for Visual Recognition released trial statistics of over 100,000 configurations (models + hyper-parameters) sampled from multiple model families, including vanilla (feedforward network loosely inspired by VGG), ResNet and ResNeXt (residual basic block and residual bottleneck block) and NAS cells (following popular design from NASNet, Ameoba, PNAS, ENAS and DARTS). Most configurations are trained only once with a fixed seed, except a few that are trained twice or three times.
Instead of storing results obtained with different configurations in separate files, we dump them into one single database to enable comparison in multiple dimensions. Specifically, we use model_family
to distinguish model types, model_spec
for all hyper-parameters needed to build this model, cell_spec
for detailed information on operators and connections if it is a NAS cell, generator
to denote the sampling policy through which this configuration is generated. Refer to API documentation for details.
Available Operators¶
Here is a list of available operators used in NDS.
-
constants.
NONE
= 'none'¶
-
constants.
SKIP_CONNECT
= 'skip_connect'¶
-
constants.
AVG_POOL_3X3
= 'avg_pool_3x3'¶
-
constants.
MAX_POOL_3X3
= 'max_pool_3x3'¶
-
constants.
MAX_POOL_5X5
= 'max_pool_5x5'¶
-
constants.
MAX_POOL_7X7
= 'max_pool_7x7'¶
-
constants.
CONV_1X1
= 'conv_1x1'¶
-
constants.
CONV_3X3
= 'conv_3x3'¶
-
constants.
CONV_3X1_1X3
= 'conv_3x1_1x3'¶
-
constants.
CONV_7X1_1X7
= 'conv_7x1_1x7'¶
-
constants.
DIL_CONV_3X3
= 'dil_conv_3x3'¶
-
constants.
DIL_CONV_5X5
= 'dil_conv_5x5'¶
-
constants.
SEP_CONV_3X3
= 'sep_conv_3x3'¶
-
constants.
SEP_CONV_5X5
= 'sep_conv_5x5'¶
-
constants.
SEP_CONV_7X7
= 'sep_conv_7x7'¶
-
constants.
DIL_SEP_CONV_3X3
= 'dil_sep_conv_3x3'¶
API Documentation¶
-
nni.nas.benchmarks.nds.
query_nds_trial_stats
(model_family, proposer, generator, model_spec, cell_spec, dataset, num_epochs=None, reduction=None)[source]¶ Query trial stats of NDS given conditions.
Parameters: - model_family (str or None) – If str, can be one of the model families available in
nni.nas.benchmark.nds.NdsTrialConfig
. Otherwise a wildcard. - proposer (str or None) – If str, can be one of the proposers available in
nni.nas.benchmark.nds.NdsTrialConfig
. Otherwise a wildcard. - generator (str or None) – If str, can be one of the generators available in
nni.nas.benchmark.nds.NdsTrialConfig
. Otherwise a wildcard. - model_spec (dict or None) – If specified, can be one of the model spec available in
nni.nas.benchmark.nds.NdsTrialConfig
. Otherwise a wildcard. - cell_spec (dict or None) – If specified, can be one of the cell spec available in
nni.nas.benchmark.nds.NdsTrialConfig
. Otherwise a wildcard. - dataset (str or None) – If str, can be one of the datasets available in
nni.nas.benchmark.nds.NdsTrialConfig
. Otherwise a wildcard. - num_epochs (float or None) – If int, matching results will be returned. Otherwise a wildcard.
- reduction (str or None) – If ‘none’ or None, all trial stats will be returned directly. If ‘mean’, fields in trial stats will be averaged given the same trial config.
Returns: A generator of
nni.nas.benchmark.nds.NdsTrialStats
objects, where each of them has been converted into a dict.Return type: generator of dict
- model_family (str or None) – If str, can be one of the model families available in
-
class
nni.nas.benchmarks.nds.
NdsTrialConfig
(*args, **kwargs)[source]¶ Trial config for NDS.
-
model_family
¶ Could be
nas_cell
,residual_bottleneck
,residual_basic
orvanilla
.Type: str
-
model_spec
¶ If
model_family
isnas_cell
, it containsnum_nodes_normal
,num_nodes_reduce
,depth
,width
,aux
anddrop_prob
. Ifmodel_family
isresidual_bottleneck
, it containsbot_muls
,ds
(depths),num_gs
(number of groups) andss
(strides). Ifmodel_family
isresidual_basic
orvanilla
, it containsds
,ss
andws
.Type: dict
-
cell_spec
¶ If
model_family
is notnas_cell
it will be an empty dict. Otherwise, it specifies<normal/reduce>_<i>_<op/input>_<x/y>
, where i ranges from 0 tonum_nodes_<normal/reduce> - 1
. If it is anop
, the value is chosen from the constants specified previously likenni.nas.benchmark.nds.CONV_1X1
. If it is i’sinput
, the value range from 0 toi + 1
, asnas_cell
uses previous two nodes as inputs, and node 0 is actually the second node. Refer to NASNet paper for details. Finally, another two key-value pairsnormal_concat
andreduce_concat
specify which nodes are eventually concatenated into output.Type: dict
-
dataset
¶ Dataset used. Could be
cifar10
orimagenet
.Type: str
-
generator
¶ Can be one of
random
which generates configurations at random, while keeping learning rate and weight decay fixed,fix_w_d
which further keepswidth
anddepth
fixed, only applicable fornas_cell
.tune_lr_wd
which further tunes learning rate and weight decay.Type: str
-
proposer
¶ Paper who has proposed the distribution for random sampling. Available proposers include
nasnet
,darts
,enas
,pnas
,amoeba
,vanilla
,resnext-a
,resnext-b
,resnet
,resnet-b
(ResNet with bottleneck). See NDS paper for details.Type: str
-
base_lr
¶ Initial learning rate.
Type: float
-
weight_decay
¶ L2 weight decay applied on weights.
Type: float
-
num_epochs
¶ Number of epochs scheduled, during which learning rate will decay to 0 following cosine annealing.
Type: int
-
-
class
nni.nas.benchmarks.nds.
NdsTrialStats
(*args, **kwargs)[source]¶ Computation statistics for NDS. Each corresponds to one trial.
-
config
¶ Corresponding config for trial.
Type: NdsTrialConfig
-
seed
¶ Random seed selected, for reproduction.
Type: int
-
final_train_acc
¶ Final accuracy on training data, ranging from 0 to 100.
Type: float
-
final_train_loss
¶ Final cross entropy loss on training data. Could be NaN (None).
Type: float or None
-
final_test_acc
¶ Final accuracy on test data, ranging from 0 to 100.
Type: float
-
best_train_acc
¶ Best accuracy on training data, ranging from 0 to 100.
Type: float
-
best_train_loss
¶ Best cross entropy loss on training data. Could be NaN (None).
Type: float or None
-
best_test_acc
¶ Best accuracy on test data, ranging from 0 to 100.
Type: float
-
parameters
¶ Number of trainable parameters in million.
Type: float
-
flops
¶ FLOPs in million.
Type: float
-
iter_time
¶ Seconds elapsed for each iteration.
Type: float
-
-
class
nni.nas.benchmarks.nds.
NdsIntermediateStats
(*args, **kwargs)[source]¶ Intermediate statistics for NDS.
-
trial
¶ Corresponding trial.
Type: NdsTrialStats
-
current_epoch
¶ Elapsed epochs.
Type: int
-
train_loss
¶ Current cross entropy loss on training data. Can be NaN (None).
Type: float or None
-
train_acc
¶ Current accuracy on training data, ranging from 0 to 100.
Type: float
-
test_acc
¶ Current accuracy on test data, ranging from 0 to 100.
Type: float
-