Experiment Config Reference¶

A config file is needed when creating an experiment. This document describes the rules to write a config file and provides some examples.

Note

This document lists field names with camelCase. If users use these fields in the pythonic way with NNI Python APIs (e.g., nni.experiment), the field names should be converted to snake_case.
In this document, the type of fields are formatted as Python type hint. Therefore JSON objects are called dict and arrays are called list.

Some fields take a path to a file or directory. Unless otherwise noted, both absolute path and relative path are supported, and ~ will be expanded to the home directory.
- When written in the YAML file, relative paths are relative to the directory containing that file.
- When assigned in Python code, relative paths are relative to the current working directory.
- All relative paths are converted to absolute when loading YAML file into Python class, and when saving Python class to YAML file.
Setting a field to None or null is equivalent to not setting the field.

Contents

Examples
Reference

Examples ¶

Local Mode ¶

experimentName: MNIST
searchSpaceFile: search_space.json
trialCommand: python mnist.py
trialCodeDirectory: .
trialGpuNumber: 1
trialConcurrency: 2
maxExperimentDuration: 24h
maxTrialNumber: 100
tuner:
  name: TPE
  classArgs:
    optimize_mode: maximize
trainingService:
  platform: local
  useActiveGpu: True

Local Mode (Inline Search Space)¶

searchSpace:
  batch_size:
    _type: choice
    _value: [16, 32, 64]
  learning_rate:
    _type: loguniform
    _value: [0.0001, 0.1]
trialCommand: python mnist.py
trialGpuNumber: 1
trialConcurrency: 2
tuner:
  name: TPE
  classArgs:
    optimize_mode: maximize
trainingService:
  platform: local
  useActiveGpu: True

Remote Mode ¶

experimentName: MNIST
searchSpaceFile: search_space.json
trialCommand: python mnist.py
trialCodeDirectory: .
trialGpuNumber: 1
trialConcurrency: 2
maxExperimentDuration: 24h
maxTrialNumber: 100
tuner:
  name: TPE
  classArgs:
    optimize_mode: maximize
trainingService:
  platform: remote
  machineList:
    - host: 11.22.33.44
      user: alice
      password: xxxxx
    - host: my.domain.com
      user: bob
      sshKeyFile: ~/.ssh/id_rsa

Reference ¶

ExperimentConfig ¶

Field Name	Type	Description
experimentName	`str`, optional	Mnemonic name of the experiment, which will be shown in WebUI and nnictl.
searchSpaceFile	`str`, optional	Path to the JSON file containing the search space. Search space format is determined by tuner. The common format for built-in tuners is documented here. Mutually exclusive to `searchSpace`.
searchSpace	`JSON`, optional	Search space object. The format is determined by tuner. Common format for built-in tuners is documented here. Note that `None` means “no such field” so empty search space should be written as `{}`. Mutually exclusive to `searchSpaceFile`.
trialCommand	`str`	Command to launch trial. The command will be executed in bash on Linux and macOS, and in PowerShell on Windows. Note that using `python3` on Linux and macOS, and using `python` on Windows.
trialCodeDirectory	`str`, optional	Default: `"."`. Path to the directory containing trial source files. All files in this directory will be sent to the training machine, unless in the `.nniignore` file. (See nniignore for details.)
trialConcurrency	`int`	Specify how many trials should be run concurrently. The real concurrency also depends on hardware resources and may be less than this value.
trialGpuNumber	`int` or `None`, optional	Default: None. This field might have slightly different meanings for various training services, especially when set to `0` or `None`. See training service’s document for details. In local mode, setting the field to `0` will prevent trials from accessing GPU (by empty `CUDA_VISIBLE_DEVICES`). And when set to `None`, trials will be created and scheduled as if they did not use GPU, but they can still use all GPU resources if they want.
maxExperimentDuration	`str`, optional	Limit the duration of this experiment if specified. The duration is unlimited if not set. Format: `number + s\|m\|h\|d`. Examples: `"10m"`, `"0.5h"`. When time runs out, the experiment will stop creating trials but continue to serve WebUI.
maxTrialNumber	`int`, optional	Limit the number of trials to create if specified. The trial number is unlimited if not set. When the budget runs out, the experiment will stop creating trials but continue to serve WebUI.
maxTrialDuration	`str`, optional	Limit the duration of trial job if specified. The duration is unlimited if not set. Format: `number + s\|m\|h\|d`. Examples: `"10m"`, `"0.5h"`. When time runs out, the current trial job will stop.
nniManagerIp	`str`, optional	Default: default connection chosen by system. IP of the current machine, used by training machines to access NNI manager. Not used in local mode. Except for the local mode, it is highly recommended to set this field manually.
useAnnotation	`bool`, optional	Default: `False`. Enable annotation. When using annotation, `searchSpace` and `searchSpaceFile` should not be specified manually.
debug	`bool`, optional	Default: `False`. Enable debug mode. When enabled, logging will be more verbose and some internal validation will be loosened.
logLevel	`str`, optional	Default: `info` or `debug`, depending on `debug` option. Set log level of the whole system. values: `"trace"`, `"debug"`, `"info"`, `"warning"`, `"error"`, `"fatal"` When debug mode is enabled, Loglevel is set to “debug”, otherwise, Loglevel is set to “info”. Most modules of NNI will be affected by this value, including NNI manager, tuner, training service, etc. The exception is trial, whose logging level is directly managed by trial code. For Python modules, “trace” acts as logging level 0 and “fatal” acts as `logging.CRITICAL`.
experimentWorkingDirectory	`str`, optional	Default: `~/nni-experiments`. Specify the directory to place log, checkpoint, metadata, and other run-time stuff. NNI will create a subdirectory named by experiment ID, so it is safe to use the same directory for multiple experiments.
tunerGpuIndices	`list[int]` or `str` or `int`, optional	Limit the GPUs visible to tuner, assessor, and advisor. This will be the `CUDA_VISIBLE_DEVICES` environment variable of tuner process. Because tuner, assessor, and advisor run in the same process, this option will affect them all.
tuner	`AlgorithmConfig`, optional	Specify the tuner. The built-in tuners can be found here and you can follow this tutorial to customize a new tuner.
assessor	`AlgorithmConfig`, optional	Specify the assessor. The built-in assessors can be found here and you can follow this tutorial to customize a new assessor.
advisor	`AlgorithmConfig`, optional	Specify the advisor. NNI provides two built-in advisors: BOHB and Hyperband, and you can follow this tutorial to customize a new advisor.
trainingService	`TrainingServiceConfig`	Specify the training service.
sharedStorage	`SharedStorageConfig`, optional	Configure the shared storage, detailed usage can be found here.

AlgorithmConfig ¶

AlgorithmConfig describes a tuner / assessor / advisor algorithm.

For customized algorithms, there are two ways to describe them:

Register the algorithm to use it like built-in. (preferred)

Specify code directory and class name directly.

Field Name	Type	Description
name	`str` or `None`, optional	Default: None. Name of the built-in or registered algorithm. `str` for the built-in and registered algorithm, `None` for other customized algorithms.
className	`str` or `None`, optional	Default: None. Qualified class name of not registered customized algorithm. `None` for the built-in and registered algorithm, `str` for other customized algorithms. example: `"my_tuner.MyTuner"`
codeDirectory	`str` or `None`, optional	Default: None. Path to the directory containing the customized algorithm class. `None` for the built-in and registered algorithm, `str` for other customized algorithms.
classArgs	`dict[str, Any]`, optional	Keyword arguments passed to algorithm class’ constructor. See algorithm’s document for supported value.

TrainingServiceConfig ¶

One of the following:

LocalConfig
RemoteConfig
OpenpaiConfig
AmlConfig
DlcConfig
HybridConfig

For Kubeflow, FrameworkController, and AdaptDL training platforms, it is suggested to use v1 config schema for now.