Single Path One-Shot (SPOS)

Introduction

Proposed in Single Path One-Shot Neural Architecture Search with Uniform Sampling is a one-shot NAS method that addresses the difficulties in training One-Shot NAS models by constructing a simplified supernet trained with an uniform path sampling method, so that all underlying architectures (and their weights) get trained fully and equally. An evolutionary algorithm is then applied to efficiently search for the best-performing architectures without any fine tuning.

Implementation on NNI is based on official repo. We implement a trainer that trains the supernet and a evolution tuner that leverages the power of NNI framework that speeds up the evolutionary search phase. We have also shown

Examples

Here is a use case, which is the search space in paper, and the way to use flops limit to perform uniform sampling.

Example code

Requirements

NVIDIA DALI >= 0.16 is needed as we use DALI to accelerate the data loading of ImageNet. Installation guide

Download the flops lookup table from here (maintained by Megvii). Put op_flops_dict.pkl and checkpoint-150000.pth.tar (if you don’t want to retrain the supernet) under data directory.

Prepare ImageNet in the standard format (follow the script here). Linking it to data/imagenet will be more convenient.

After preparation, it’s expected to have the following code structure:

spos
├── architecture_final.json
├── blocks.py
├── config_search.yml
├── data
│   ├── imagenet
│   │   ├── train
│   │   └── val
│   └── op_flops_dict.pkl
├── dataloader.py
├── network.py
├── readme.md
├── scratch.py
├── supernet.py
├── tester.py
├── tuner.py
└── utils.py

Step 1. Train Supernet

python supernet.py

Will export the checkpoint to checkpoints directory, for the next step.

NOTE: The data loading used in the official repo is slightly different from usual, as they use BGR tensor and keep the values between 0 and 255 intentionally to align with their own DL framework. The option --spos-preprocessing will simulate the behavior used originally and enable you to use the checkpoints pretrained.

Step 3. Train from Scratch

python scratch.py

By default, it will use architecture_final.json. This architecture is provided by the official repo (converted into NNI format). You can use any architecture (e.g., the architecture found in step 2) with --fixed-arc option.

Reference

PyTorch

class nni.nas.pytorch.spos.SPOSEvolution(max_epochs=20, num_select=10, num_population=50, m_prob=0.1, num_crossover=25, num_mutation=25)[source]

SPOS evolution tuner.

Parameters:
  • max_epochs (int) – Maximum number of epochs to run.
  • num_select (int) – Number of survival candidates of each epoch.
  • num_population (int) – Number of candidates at the start of each epoch. If candidates generated by crossover and mutation are not enough, the rest will be filled with random candidates.
  • m_prob (float) – The probability of mutation.
  • num_crossover (int) – Number of candidates generated by crossover in each epoch.
  • num_mutation (int) – Number of candidates generated by mutation in each epoch.
export_results(result)[source]

Export a number of candidates to checkpoints dir.

Parameters:result (dict) – Chosen architectures to be exported.
generate_multiple_parameters(parameter_id_list, **kwargs)[source]

Callback function necessary to implement a tuner. This will put more parameter ids into the parameter id queue.

receive_trial_result(parameter_id, parameters, value, **kwargs)[source]

Callback function. Receive a trial result.

trial_end(parameter_id, success, **kwargs)[source]

Callback function when a trial is ended and resource is released.

update_search_space(search_space)[source]

Handle the initialization/update event of search space.

class nni.nas.pytorch.spos.SPOSSupernetTrainer(model, loss, metrics, optimizer, num_epochs, train_loader, valid_loader, mutator=None, batch_size=64, workers=4, device=None, log_frequency=None, callbacks=None)[source]

This trainer trains a supernet that can be used for evolution search.

Parameters:
  • model (nn.Module) – Model with mutables.
  • mutator (Mutator) – A mutator object that has been initialized with the model.
  • loss (callable) – Called with logits and targets. Returns a loss tensor.
  • metrics (callable) – Returns a dict that maps metrics keys to metrics data.
  • optimizer (Optimizer) – Optimizer that optimizes the model.
  • num_epochs (int) – Number of epochs of training.
  • train_loader (iterable) – Data loader of training. Raise StopIteration when one epoch is exhausted.
  • dataset_valid (iterable) – Data loader of validation. Raise StopIteration when one epoch is exhausted.
  • batch_size (int) – Batch size.
  • workers (int) – Number of threads for data preprocessing. Not used for this trainer. Maybe removed in future.
  • device (torch.device) – Device object. Either torch.device("cuda") or torch.device("cpu"). When None, trainer will automatic detects GPU and selects GPU first.
  • log_frequency (int) – Number of mini-batches to log metrics.
  • callbacks (list of Callback) – Callbacks to plug into the trainer. See Callbacks.
train_one_epoch(epoch)[source]

Train one epoch.

Parameters:epoch (int) – Epoch number starting from 0.
validate_one_epoch(epoch)[source]

Validate one epoch.

Parameters:epoch (int) – Epoch number starting from 0.
class nni.nas.pytorch.spos.SPOSSupernetTrainingMutator(model, flops_func=None, flops_lb=None, flops_ub=None, flops_bin_num=7, flops_sample_timeout=500)[source]

A random mutator with flops limit.

Parameters:
  • model (nn.Module) – PyTorch model.
  • flops_func (callable) – Callable that takes a candidate from sample_search and returns its candidate. When flops_func is None, functions related to flops will be deactivated.
  • flops_lb (number) – Lower bound of flops.
  • flops_ub (number) – Upper bound of flops.
  • flops_bin_num (number) – Number of bins divided for the interval of flops to ensure the uniformity. Bigger number will be more uniform, but the sampling will be slower.
  • flops_sample_timeout (int) – Maximum number of attempts to sample before giving up and use a random candidate.
sample_final()[source]

Implement only to suffice the interface of Mutator.

Sample a candidate for training. When flops_func is not None, candidates will be sampled uniformly relative to flops.

Returns:
Return type:dict

Known Limitations

  • Block search only. Channel search is not supported yet.
  • Only GPU version is provided here.

Current Reproduction Results

Reproduction is still undergoing. Due to the gap between official release and original paper, we compare our current results with official repo (our run) and paper.

  • Evolution phase is almost aligned with official repo. Our evolution algorithm shows a converging trend and reaches ~65% accuracy at the end of search. Nevertheless, this result is not on par with paper. For details, please refer to this issue.
  • Retrain phase is not aligned. Our retraining code, which uses the architecture released by the authors, reaches 72.14% accuracy, still having a gap towards 73.61% by official release and 74.3% reported in original paper.