NAS Benchmarks

Introduction

To improve the reproducibility of NAS algorithms as well as reducing computing resource requirements, researchers proposed a series of NAS benchmarks such as NAS-Bench-101, NAS-Bench-201, NDS, etc. NNI provides a query interface for users to acquire these benchmarks. Within just a few lines of code, researcher are able to evaluate their NAS algorithms easily and fairly by utilizing these benchmarks.

Prerequisites

  • Please prepare a folder to household all the benchmark databases. By default, it can be found at ${HOME}/.cache/nni/nasbenchmark. Or you can place it anywhere you like, and specify it in NASBENCHMARK_DIR via export NASBENCHMARK_DIR=/path/to/your/nasbenchmark before importing NNI.

  • Please install peewee via pip3 install peewee, which NNI uses to connect to database.

Data Preparation

Option 2

Note

If you have files that are processed before v2.5, it is recommended that you delete them and try option 1.

  1. Clone NNI to your machine and enter examples/nas/benchmarks directory.

    git clone -b ${NNI_VERSION} https://github.com/microsoft/nni
    cd nni/examples/nas/benchmarks
    

    Replace ${NNI_VERSION} with a released version name or branch name, e.g., v2.4.

  2. Install dependencies via pip3 install -r xxx.requirements.txt. xxx can be nasbench101, nasbench201 or nds.

  3. Generate the database via ./xxx.sh. The directory that stores the benchmark file can be configured with NASBENCHMARK_DIR environment variable, which defaults to ~/.nni/nasbenchmark. Note that the NAS-Bench-201 dataset will be downloaded from a google drive.

Please make sure there is at least 10GB free disk space and note that the conversion process can take up to hours to complete.

Example Usages

Please refer to examples usages of Benchmarks API.

NAS-Bench-101

NAS-Bench-101 contains 423,624 unique neural networks, combined with 4 variations in number of epochs (4, 12, 36, 108), each of which is trained 3 times. It is a cell-wise search space, which constructs and stacks a cell by enumerating DAGs with at most 7 operators, and no more than 9 connections. All operators can be chosen from CONV3X3_BN_RELU, CONV1X1_BN_RELU and MAXPOOL3X3, except the first operator (always INPUT) and last operator (always OUTPUT).

Notably, NAS-Bench-101 eliminates invalid cells (e.g., there is no path from input to output, or there is redundant computation). Furthermore, isomorphic cells are de-duplicated, i.e., all the remaining cells are computationally unique.

API Documentation

nni.nas.benchmarks.nasbench101.query_nb101_trial_stats(arch, num_epochs, isomorphism=True, reduction=None, include_intermediates=False)[source]

Query trial stats of NAS-Bench-101 given conditions.

Parameters
  • arch (dict or None) – If a dict, it is in the format that is described in nni.nas.benchmark.nasbench101.Nb101TrialConfig. Only trial stats matched will be returned. If none, all architectures in the database will be matched.

  • num_epochs (int or None) – If int, matching results will be returned. Otherwise a wildcard.

  • isomorphism (boolean) – Whether to match essentially-same architecture, i.e., architecture with the same graph-invariant hash value.

  • reduction (str or None) – If ‘none’ or None, all trial stats will be returned directly. If ‘mean’, fields in trial stats will be averaged given the same trial config.

  • include_intermediates (boolean) – If true, intermediate results will be returned.

Returns

A generator of nni.nas.benchmark.nasbench101.Nb101TrialStats objects, where each of them has been converted into a dict.

Return type

generator of dict

nasbench101.INPUT = 'input'
nasbench101.OUTPUT = 'output'
nasbench101.CONV3X3_BN_RELU = 'conv3x3-bn-relu'
nasbench101.CONV1X1_BN_RELU = 'conv1x1-bn-relu'
nasbench101.MAXPOOL3X3 = 'maxpool3x3'
class nni.nas.benchmarks.nasbench101.Nb101TrialConfig(*args, **kwargs)[source]

Trial config for NAS-Bench-101.

arch

A dict with keys op1, op2, … and input1, input2, … Vertices are enumerate from 0. Since node 0 is input node, it is skipped in this dict. Each op is one of nni.nas.benchmark.nasbench101.CONV3X3_BN_RELU, nni.nas.benchmark.nasbench101.CONV1X1_BN_RELU, and nni.nas.benchmark.nasbench101.MAXPOOL3X3. Each input is a list of previous nodes. For example input5 can be [0, 1, 3].

Type

dict

num_vertices

Number of vertices (nodes) in one cell. Should be less than or equal to 7 in default setup.

Type

int

hash

Graph-invariant MD5 string for this architecture.

Type

str

num_epochs

Number of epochs planned for this trial. Should be one of 4, 12, 36, 108 in default setup.

Type

int

class nni.nas.benchmarks.nasbench101.Nb101TrialStats(*args, **kwargs)[source]

Computation statistics for NAS-Bench-101. Each corresponds to one trial. Each config has multiple trials with different random seeds, but unfortunately seed for each trial is unavailable. NAS-Bench-101 trains and evaluates on CIFAR-10 by default. The original training set is divided into 40k training images and 10k validation images, and the original validation set is used for test only.

config

Setup for this trial data.

Type

Nb101TrialConfig

train_acc

Final accuracy on training data, ranging from 0 to 100.

Type

float

valid_acc

Final accuracy on validation data, ranging from 0 to 100.

Type

float

test_acc

Final accuracy on test data, ranging from 0 to 100.

Type

float

parameters

Number of trainable parameters in million.

Type

float

training_time

Duration of training in seconds.

Type

float

class nni.nas.benchmarks.nasbench101.Nb101IntermediateStats(*args, **kwargs)[source]

Intermediate statistics for NAS-Bench-101.

trial

The exact trial where the intermediate result is produced.

Type

Nb101TrialStats

current_epoch

Elapsed epochs when evaluation is done.

Type

int

train_acc

Intermediate accuracy on training data, ranging from 0 to 100.

Type

float

valid_acc

Intermediate accuracy on validation data, ranging from 0 to 100.

Type

float

test_acc

Intermediate accuracy on test data, ranging from 0 to 100.

Type

float

training_time

Time elapsed in seconds.

Type

float

nni.nas.benchmarks.nasbench101.graph_util.nasbench_format_to_architecture_repr(adjacency_matrix, labeling)[source]

Computes a graph-invariance MD5 hash of the matrix and label pair. Imported from NAS-Bench-101 repo.

Parameters
  • adjacency_matrix (np.ndarray) – A 2D array of shape NxN, where N is the number of vertices. matrix[u][v] is 1 if there is a direct edge from u to v, otherwise it will be 0.

  • labeling (list of str) – A list of str that starts with input and ends with output. The intermediate nodes are chosen from candidate operators.

Returns

Converted number of vertices and architecture.

Return type

tuple and int and dict

nni.nas.benchmarks.nasbench101.graph_util.infer_num_vertices(architecture)[source]

Infer number of vertices from an architecture dict.

Parameters

architecture (dict) – Architecture in NNI format.

Returns

Number of vertices.

Return type

int

nni.nas.benchmarks.nasbench101.graph_util.hash_module(architecture, vertices)[source]

Computes a graph-invariance MD5 hash of the matrix and label pair. This snippet is modified from code in NAS-Bench-101 repo.

Parameters
  • matrix (np.ndarray) – Square upper-triangular adjacency matrix.

  • labeling (list of int) – Labels of length equal to both dimensions of matrix.

Returns

MD5 hash of the matrix and labeling.

Return type

str

NAS-Bench-201

NAS-Bench-201 is a cell-wise search space that views nodes as tensors and edges as operators. The search space contains all possible densely-connected DAGs with 4 nodes, resulting in 15,625 candidates in total. Each operator (i.e., edge) is selected from a pre-defined operator set (NONE, SKIP_CONNECT, CONV_1X1, CONV_3X3 and AVG_POOL_3X3). Training appraoches vary in the dataset used (CIFAR-10, CIFAR-100, ImageNet) and number of epochs scheduled (12 and 200). Each combination of architecture and training approach is repeated 1 - 3 times with different random seeds.

API Documentation

nni.nas.benchmarks.nasbench201.query_nb201_trial_stats(arch, num_epochs, dataset, reduction=None, include_intermediates=False)[source]

Query trial stats of NAS-Bench-201 given conditions.

Parameters
  • arch (dict or None) – If a dict, it is in the format that is described in nni.nas.benchmark.nasbench201.Nb201TrialConfig. Only trial stats matched will be returned. If none, all architectures in the database will be matched.

  • num_epochs (int or None) – If int, matching results will be returned. Otherwise a wildcard.

  • dataset (str or None) – If specified, can be one of the dataset available in nni.nas.benchmark.nasbench201.Nb201TrialConfig. Otherwise a wildcard.

  • reduction (str or None) – If ‘none’ or None, all trial stats will be returned directly. If ‘mean’, fields in trial stats will be averaged given the same trial config.

  • include_intermediates (boolean) – If true, intermediate results will be returned.

Returns

A generator of nni.nas.benchmark.nasbench201.Nb201TrialStats objects, where each of them has been converted into a dict.

Return type

generator of dict

nasbench201.NONE = 'none'
nasbench201.SKIP_CONNECT = 'skip_connect'
nasbench201.CONV_1X1 = 'conv_1x1'
nasbench201.CONV_3X3 = 'conv_3x3'
nasbench201.AVG_POOL_3X3 = 'avg_pool_3x3'
class nni.nas.benchmarks.nasbench201.Nb201TrialConfig(*args, **kwargs)[source]

Trial config for NAS-Bench-201.

arch

A dict with keys 0_1, 0_2, 0_3, 1_2, 1_3, 2_3, each of which is an operator chosen from nni.nas.benchmark.nasbench201.NONE, nni.nas.benchmark.nasbench201.SKIP_CONNECT, nni.nas.benchmark.nasbench201.CONV_1X1, nni.nas.benchmark.nasbench201.CONV_3X3 and nni.nas.benchmark.nasbench201.AVG_POOL_3X3.

Type

dict

num_epochs

Number of epochs planned for this trial. Should be one of 12 and 200.

Type

int

num_channels

Number of channels for initial convolution. 16 by default.

Type

int

num_cells

Number of cells per stage. 5 by default.

Type

int

dataset

Dataset used for training and evaluation. NAS-Bench-201 provides the following 4 options: cifar10-valid (training data is splited into 25k for training and 25k for validation, validation data is used for test), cifar10 (training data is used in training, validation data is splited into 5k for validation and 5k for testing), cifar100 (same protocol as cifar10), and imagenet16-120 (a subset of 120 classes in ImageNet, downscaled to 16x16, using training data for training, 6k images from validation set for validation and the other 6k for testing).

Type

str

class nni.nas.benchmarks.nasbench201.Nb201TrialStats(*args, **kwargs)[source]

Computation statistics for NAS-Bench-201. Each corresponds to one trial.

config

Setup for this trial data.

Type

Nb201TrialConfig

seed

Random seed selected, for reproduction.

Type

int

train_acc

Final accuracy on training data, ranging from 0 to 100.

Type

float

valid_acc

Final accuracy on validation data, ranging from 0 to 100.

Type

float

test_acc

Final accuracy on test data, ranging from 0 to 100.

Type

float

ori_test_acc

Test accuracy on original validation set (10k for CIFAR and 12k for Imagenet16-120), ranging from 0 to 100.

Type

float

train_loss

Final cross entropy loss on training data. Note that loss could be NaN, in which case this attributed will be None.

Type

float or None

valid_loss

Final cross entropy loss on validation data.

Type

float or None

test_loss

Final cross entropy loss on test data.

Type

float or None

ori_test_loss

Final cross entropy loss on original validation set.

Type

float or None

parameters

Number of trainable parameters in million.

Type

float

latency

Latency in seconds.

Type

float

flops

FLOPs in million.

Type

float

training_time

Duration of training in seconds.

Type

float

valid_evaluation_time

Time elapsed to evaluate on validation set.

Type

float

test_evaluation_time

Time elapsed to evaluate on test set.

Type

float

ori_test_evaluation_time

Time elapsed to evaluate on original test set.

Type

float

class nni.nas.benchmarks.nasbench201.Nb201IntermediateStats(*args, **kwargs)[source]

Intermediate statistics for NAS-Bench-201.

trial

Corresponding trial.

Type

Nb201TrialStats

current_epoch

Elapsed epochs.

Type

int

train_acc

Current accuracy on training data, ranging from 0 to 100.

Type

float

valid_acc

Current accuracy on validation data, ranging from 0 to 100.

Type

float

test_acc

Current accuracy on test data, ranging from 0 to 100.

Type

float

ori_test_acc

Test accuracy on original validation set (10k for CIFAR and 12k for Imagenet16-120), ranging from 0 to 100.

Type

float

train_loss

Current cross entropy loss on training data.

Type

float or None

valid_loss

Current cross entropy loss on validation data.

Type

float or None

test_loss

Current cross entropy loss on test data.

Type

float or None

ori_test_loss

Current cross entropy loss on original validation set.

Type

float or None

NDS

On Network Design Spaces for Visual Recognition released trial statistics of over 100,000 configurations (models + hyper-parameters) sampled from multiple model families, including vanilla (feedforward network loosely inspired by VGG), ResNet and ResNeXt (residual basic block and residual bottleneck block) and NAS cells (following popular design from NASNet, Ameoba, PNAS, ENAS and DARTS). Most configurations are trained only once with a fixed seed, except a few that are trained twice or three times.

Instead of storing results obtained with different configurations in separate files, we dump them into one single database to enable comparison in multiple dimensions. Specifically, we use model_family to distinguish model types, model_spec for all hyper-parameters needed to build this model, cell_spec for detailed information on operators and connections if it is a NAS cell, generator to denote the sampling policy through which this configuration is generated. Refer to API documentation for details.

Available Operators

Here is a list of available operators used in NDS.

constants.NONE = 'none'
constants.SKIP_CONNECT = 'skip_connect'
constants.AVG_POOL_3X3 = 'avg_pool_3x3'
constants.MAX_POOL_3X3 = 'max_pool_3x3'
constants.MAX_POOL_5X5 = 'max_pool_5x5'
constants.MAX_POOL_7X7 = 'max_pool_7x7'
constants.CONV_1X1 = 'conv_1x1'
constants.CONV_3X3 = 'conv_3x3'
constants.CONV_3X1_1X3 = 'conv_3x1_1x3'
constants.CONV_7X1_1X7 = 'conv_7x1_1x7'
constants.DIL_CONV_3X3 = 'dil_conv_3x3'
constants.DIL_CONV_5X5 = 'dil_conv_5x5'
constants.SEP_CONV_3X3 = 'sep_conv_3x3'
constants.SEP_CONV_5X5 = 'sep_conv_5x5'
constants.SEP_CONV_7X7 = 'sep_conv_7x7'
constants.DIL_SEP_CONV_3X3 = 'dil_sep_conv_3x3'

API Documentation

nni.nas.benchmarks.nds.query_nds_trial_stats(model_family, proposer, generator, model_spec, cell_spec, dataset, num_epochs=None, reduction=None, include_intermediates=False)[source]

Query trial stats of NDS given conditions.

Parameters
  • model_family (str or None) – If str, can be one of the model families available in nni.nas.benchmark.nds.NdsTrialConfig. Otherwise a wildcard.

  • proposer (str or None) – If str, can be one of the proposers available in nni.nas.benchmark.nds.NdsTrialConfig. Otherwise a wildcard.

  • generator (str or None) – If str, can be one of the generators available in nni.nas.benchmark.nds.NdsTrialConfig. Otherwise a wildcard.

  • model_spec (dict or None) – If specified, can be one of the model spec available in nni.nas.benchmark.nds.NdsTrialConfig. Otherwise a wildcard.

  • cell_spec (dict or None) – If specified, can be one of the cell spec available in nni.nas.benchmark.nds.NdsTrialConfig. Otherwise a wildcard.

  • dataset (str or None) – If str, can be one of the datasets available in nni.nas.benchmark.nds.NdsTrialConfig. Otherwise a wildcard.

  • num_epochs (float or None) – If int, matching results will be returned. Otherwise a wildcard.

  • reduction (str or None) – If ‘none’ or None, all trial stats will be returned directly. If ‘mean’, fields in trial stats will be averaged given the same trial config.

  • include_intermediates (boolean) – If true, intermediate results will be returned.

Returns

A generator of nni.nas.benchmark.nds.NdsTrialStats objects, where each of them has been converted into a dict.

Return type

generator of dict

class nni.nas.benchmarks.nds.NdsTrialConfig(*args, **kwargs)[source]

Trial config for NDS.

model_family

Could be nas_cell, residual_bottleneck, residual_basic or vanilla.

Type

str

model_spec

If model_family is nas_cell, it contains num_nodes_normal, num_nodes_reduce, depth, width, aux and drop_prob. If model_family is residual_bottleneck, it contains bot_muls, ds (depths), num_gs (number of groups) and ss (strides). If model_family is residual_basic or vanilla, it contains ds, ss and ws.

Type

dict

cell_spec

If model_family is not nas_cell it will be an empty dict. Otherwise, it specifies <normal/reduce>_<i>_<op/input>_<x/y>, where i ranges from 0 to num_nodes_<normal/reduce> - 1. If it is an op, the value is chosen from the constants specified previously like nni.nas.benchmark.nds.CONV_1X1. If it is i’s input, the value range from 0 to i + 1, as nas_cell uses previous two nodes as inputs, and node 0 is actually the second node. Refer to NASNet paper for details. Finally, another two key-value pairs normal_concat and reduce_concat specify which nodes are eventually concatenated into output.

Type

dict

dataset

Dataset used. Could be cifar10 or imagenet.

Type

str

generator

Can be one of random which generates configurations at random, while keeping learning rate and weight decay fixed, fix_w_d which further keeps width and depth fixed, only applicable for nas_cell. tune_lr_wd which further tunes learning rate and weight decay.

Type

str

proposer

Paper who has proposed the distribution for random sampling. Available proposers include nasnet, darts, enas, pnas, amoeba, vanilla, resnext-a, resnext-b, resnet, resnet-b (ResNet with bottleneck). See NDS paper for details.

Type

str

base_lr

Initial learning rate.

Type

float

weight_decay

L2 weight decay applied on weights.

Type

float

num_epochs

Number of epochs scheduled, during which learning rate will decay to 0 following cosine annealing.

Type

int

class nni.nas.benchmarks.nds.NdsTrialStats(*args, **kwargs)[source]

Computation statistics for NDS. Each corresponds to one trial.

config

Corresponding config for trial.

Type

NdsTrialConfig

seed

Random seed selected, for reproduction.

Type

int

final_train_acc

Final accuracy on training data, ranging from 0 to 100.

Type

float

final_train_loss

Final cross entropy loss on training data. Could be NaN (None).

Type

float or None

final_test_acc

Final accuracy on test data, ranging from 0 to 100.

Type

float

best_train_acc

Best accuracy on training data, ranging from 0 to 100.

Type

float

best_train_loss

Best cross entropy loss on training data. Could be NaN (None).

Type

float or None

best_test_acc

Best accuracy on test data, ranging from 0 to 100.

Type

float

parameters

Number of trainable parameters in million.

Type

float

flops

FLOPs in million.

Type

float

iter_time

Seconds elapsed for each iteration.

Type

float

class nni.nas.benchmarks.nds.NdsIntermediateStats(*args, **kwargs)[source]

Intermediate statistics for NDS.

trial

Corresponding trial.

Type

NdsTrialStats

current_epoch

Elapsed epochs.

Type

int

train_loss

Current cross entropy loss on training data. Can be NaN (None).

Type

float or None

train_acc

Current accuracy on training data, ranging from 0 to 100.

Type

float

test_acc

Current accuracy on test data, ranging from 0 to 100.

Type

float