Single Path One-Shot (SPOS)¶
Proposed in Single Path One-Shot Neural Architecture Search with Uniform Sampling is a one-shot NAS method that addresses the difficulties in training One-Shot NAS models by constructing a simplified supernet trained with an uniform path sampling method, so that all underlying architectures (and their weights) get trained fully and equally. An evolutionary algorithm is then applied to efficiently search for the best-performing architectures without any fine tuning.
Implementation on NNI is based on official repo. We implement a trainer that trains the supernet and a evolution tuner that leverages the power of NNI framework that speeds up the evolutionary search phase. We have also shown
Here is a use case, which is the search space in paper, and the way to use flops limit to perform uniform sampling.
NVIDIA DALI >= 0.16 is needed as we use DALI to accelerate the data loading of ImageNet. Installation guide
Prepare ImageNet in the standard format (follow the script here). Linking it to
data/imagenet will be more convenient.
After preparation, it’s expected to have the following code structure:
spos ├── architecture_final.json ├── blocks.py ├── config_search.yml ├── data │ ├── imagenet │ │ ├── train │ │ └── val │ └── op_flops_dict.pkl ├── dataloader.py ├── network.py ├── readme.md ├── scratch.py ├── supernet.py ├── tester.py ├── tuner.py └── utils.py
Step 1. Train Supernet¶
Will export the checkpoint to
checkpoints directory, for the next step.
NOTE: The data loading used in the official repo is slightly different from usual, as they use BGR tensor and keep the values between 0 and 255 intentionally to align with their own DL framework. The option
--spos-preprocessing will simulate the behavior used originally and enable you to use the checkpoints pretrained.
Step 2. Evolution Search¶
Single Path One-Shot leverages evolution algorithm to search for the best architecture. The tester, which is responsible for testing the sampled architecture, recalculates all the batch norm for a subset of training images, and evaluates the architecture on the full validation set.
In order to make the tuner aware of the flops limit and have the ability to calculate the flops, we created a new tuner called
tuner.py, inheriting the tuner in SDK.
To have a search space ready for NNI framework, first run
nnictl ss_gen -t "python tester.py"
This will generate a file called
nni_auto_gen_search_space.json, which is a serialized representation of your search space.
By default, it will use
checkpoint-150000.pth.tar downloaded previously. In case you want to use the checkpoint trained by yourself from the last step, specify
--checkpoint in the command in
Then search with evolution tuner.
nnictl create --config config_search.yml
The final architecture exported from every epoch of evolution can be found in
checkpoints under the working directory of your tuner, which, by default, is
Step 3. Train from Scratch¶
By default, it will use
architecture_final.json. This architecture is provided by the official repo (converted into NNI format). You can use any architecture (e.g., the architecture found in step 2) with
- class nni.retiarii.oneshot.pytorch.SinglePathTrainer(model, loss, metrics, optimizer, num_epochs, dataset_train, dataset_valid, mutator=None, batch_size=64, workers=4, device=None, log_frequency=None)
Single-path trainer. Samples a path every time and backpropagates on that path.
model (nn.Module) – Model with mutables.
loss (callable) – Called with logits and targets. Returns a loss tensor.
metrics (callable) – Returns a dict that maps metrics keys to metrics data.
optimizer (Optimizer) – Optimizer that optimizes the model.
num_epochs (int) – Number of epochs of training.
dataset_train (Dataset) – Dataset of training.
dataset_valid (Dataset) – Dataset of validation.
batch_size (int) – Batch size.
workers (int) – Number of threads for data preprocessing. Not used for this trainer. Maybe removed in future.
device (torch.device) – Device object. Either
None, trainer will automatic detects GPU and selects GPU first.
log_frequency (int) – Number of mini-batches to log metrics.
Block search only. Channel search is not supported yet.
Only GPU version is provided here.
Current Reproduction Results¶
Reproduction is still undergoing. Due to the gap between official release and original paper, we compare our current results with official repo (our run) and paper.
Evolution phase is almost aligned with official repo. Our evolution algorithm shows a converging trend and reaches ~65% accuracy at the end of search. Nevertheless, this result is not on par with paper. For details, please refer to this issue.
Retrain phase is not aligned. Our retraining code, which uses the architecture released by the authors, reaches 72.14% accuracy, still having a gap towards 73.61% by official release and 74.3% reported in original paper.