Experiment Config Reference¶
A config file is needed when creating an experiment. This document describes the rules to write a config file and provides some examples.
Note
This document lists field names with
camelCase
. If users use these fields in the pythonic way with NNI Python APIs (e.g.,nni.experiment
), the field names should be converted tosnake_case
.In this document, the type of fields are formatted as Python type hint. Therefore JSON objects are called dict and arrays are called list.
Some fields take a path to a file or directory. Unless otherwise noted, both absolute path and relative path are supported, and
~
will be expanded to the home directory.When written in the YAML file, relative paths are relative to the directory containing that file.
When assigned in Python code, relative paths are relative to the current working directory.
All relative paths are converted to absolute when loading YAML file into Python class, and when saving Python class to YAML file.
Setting a field to
None
ornull
is equivalent to not setting the field.
Contents
Examples¶
Local Mode¶
experimentName: MNIST
searchSpaceFile: search_space.json
trialCommand: python mnist.py
trialCodeDirectory: .
trialGpuNumber: 1
trialConcurrency: 2
maxExperimentDuration: 24h
maxTrialNumber: 100
tuner:
name: TPE
classArgs:
optimize_mode: maximize
trainingService:
platform: local
useActiveGpu: True
Local Mode (Inline Search Space)¶
searchSpace:
batch_size:
_type: choice
_value: [16, 32, 64]
learning_rate:
_type: loguniform
_value: [0.0001, 0.1]
trialCommand: python mnist.py
trialGpuNumber: 1
trialConcurrency: 2
tuner:
name: TPE
classArgs:
optimize_mode: maximize
trainingService:
platform: local
useActiveGpu: True
Remote Mode¶
experimentName: MNIST
searchSpaceFile: search_space.json
trialCommand: python mnist.py
trialCodeDirectory: .
trialGpuNumber: 1
trialConcurrency: 2
maxExperimentDuration: 24h
maxTrialNumber: 100
tuner:
name: TPE
classArgs:
optimize_mode: maximize
trainingService:
platform: remote
machineList:
- host: 11.22.33.44
user: alice
password: xxxxx
- host: my.domain.com
user: bob
sshKeyFile: ~/.ssh/id_rsa
Reference¶
ExperimentConfig¶
Field Name |
Type |
Description |
---|---|---|
experimentName |
|
Mnemonic name of the experiment, which will be shown in WebUI and nnictl. |
searchSpaceFile |
|
Path to the JSON file containing the search space.
Search space format is determined by tuner. The common format for built-in tuners is documented here.
Mutually exclusive to |
searchSpace |
|
Search space object.
The format is determined by tuner. Common format for built-in tuners is documented here.
Note that |
trialCommand |
|
Command to launch trial.
The command will be executed in bash on Linux and macOS, and in PowerShell on Windows.
Note that using |
trialCodeDirectory |
|
Default: |
trialConcurrency |
|
Specify how many trials should be run concurrently. The real concurrency also depends on hardware resources and may be less than this value. |
trialGpuNumber |
|
Default: None. This field might have slightly different meanings for various training services,
especially when set to In local mode, setting the field to |
maxExperimentDuration |
|
Limit the duration of this experiment if specified. The duration is unlimited if not set.
Format: |
maxTrialNumber |
|
Limit the number of trials to create if specified. The trial number is unlimited if not set. When the budget runs out, the experiment will stop creating trials but continue to serve WebUI. |
maxTrialDuration |
|
Limit the duration of trial job if specified. The duration is unlimited if not set.
Format: |
nniManagerIp |
|
Default: default connection chosen by system. IP of the current machine, used by training machines to access NNI manager. Not used in local mode. Except for the local mode, it is highly recommended to set this field manually. |
useAnnotation |
|
Default: |
debug |
|
Default: |
logLevel |
|
Default: |
experimentWorkingDirectory |
|
Default: |
tunerGpuIndices |
|
Limit the GPUs visible to tuner, assessor, and advisor.
This will be the |
tuner |
|
Specify the tuner. The built-in tuners can be found here and you can follow this tutorial to customize a new tuner. |
assessor |
|
Specify the assessor. The built-in assessors can be found here and you can follow this tutorial to customize a new assessor. |
advisor |
|
Specify the advisor. NNI provides two built-in advisors: BOHB and Hyperband, and you can follow this tutorial to customize a new advisor. |
trainingService |
|
Specify the training service. |
sharedStorage |
|
Configure the shared storage, detailed usage can be found here. |
AlgorithmConfig¶
AlgorithmConfig
describes a tuner / assessor / advisor algorithm.
For customized algorithms, there are two ways to describe them:
Register the algorithm to use it like built-in. (preferred)
Specify code directory and class name directly.
Field Name |
Type |
Description |
---|---|---|
name |
|
Default: None. Name of the built-in or registered algorithm.
|
className |
|
Default: None. Qualified class name of not registered customized algorithm.
|
codeDirectory |
|
Default: None. Path to the directory containing the customized algorithm class.
|
classArgs |
|
Keyword arguments passed to algorithm class’ constructor. See algorithm’s document for supported value. |
TrainingServiceConfig¶
One of the following:
For Kubeflow, FrameworkController, and AdaptDL training platforms, it is suggested to use v1 config schema for now.
LocalConfig¶
Detailed usage can be found here.
Field Name |
Type |
Description |
---|---|---|
platform |
|
|
useActiveGpu |
|
Default:
If you are using a graphical OS like Windows 10 or Ubuntu desktop, set this field to |
maxTrialNumberPerGpu |
|
Default: |
gpuIndices |
|
Limit the GPUs visible to trial processes.
If |
RemoteConfig¶
Detailed usage can be found here.
Field Name |
Type |
Description |
---|---|---|
platform |
|
|
machineList |
|
List of training machines. |
reuseMode |
|
Default: |
RemoteMachineConfig¶
Field Name |
Type |
Description |
---|---|---|
host |
|
IP or hostname (domain name) of the machine. |
port |
|
Default: |
user |
|
Login user name. |
password |
|
If not specified, |
sshKeyFile |
|
Path to |
sshPassphrase |
|
Passphrase of SSH identity file. |
useActiveGpu |
|
Default:
If your remote machine is a graphical OS like Ubuntu desktop, set this field to |
maxTrialNumberPerGpu |
|
Default: |
gpuIndices |
|
Limit the GPUs visible to trial processes.
If |
pythonPath |
|
Specify a Python environment. This path will be inserted at the front of PATH. Here are some examples:
If you are working on Anaconda, there is some difference. On Windows, you also have to add
This is useful if preparing steps vary for different machines. |
OpenpaiConfig¶
Detailed usage can be found here.
Field Name |
Type |
Description |
---|---|---|
platform |
|
|
host |
|
Hostname of OpenPAI service.
This may include |
username |
|
OpenPAI user name. |
token |
|
OpenPAI user token. This can be found in your OpenPAI user settings page. |
trialCpuNumber |
|
Specify the CPU number of each trial to be used in OpenPAI container. |
trialMemorySize |
|
Specify the memory size of each trial to be used in OpenPAI container.
format: |
storageConfigName |
|
Specify the storage name used in OpenPAI. |
dockerImage |
|
Default: |
localStorageMountPoint |
|
Mount point of storage service (typically NFS) on the local machine. |
containerStorageMountPoint |
|
Mount point of storage service (typically NFS) in docker container. This must be an absolute path. |
reuseMode |
|
Default: |
openpaiConfig |
|
Embedded OpenPAI config file. |
openpaiConfigFile |
|
AmlConfig¶
Detailed usage can be found here.
Field Name |
Type |
Description |
---|---|---|
platform |
|
|
dockerImage |
|
Default: |
subscriptionId |
|
Azure subscription ID. |
resourceGroup |
|
Azure resource group name. |
workspaceName |
|
Azure workspace name. |
computeTarget |
|
AML compute cluster name. |
DlcConfig¶
Detailed usage can be found here.
Field Name |
Type |
Description |
---|---|---|
platform |
|
|
type |
|
Default: |
image |
|
Name and tag of docker image to run the trials. |
jobType |
|
Default: |
podCount |
|
Pod count to run a single training job. |
ecsSpec |
|
Training server config spec string. |
region |
|
The region where PAI-DLC public-cluster locates. |
nasDataSourceId |
|
The NAS datasource id configurated in PAI-DLC side. |
accessKeyId |
|
The accessKeyId of your cloud account. |
accessKeySecret |
|
The accessKeySecret of your cloud account. |
localStorageMountPoint |
|
The mount point of the NAS on PAI-DSW server, default is /home/admin/workspace/. |
containerStorageMountPoint |
|
The mount point of the NAS on PAI-DLC side, default is /root/data/. |
HybridConfig¶
Currently only support LocalConfig, RemoteConfig, OpenpaiConfig and AmlConfig . Detailed usage can be found here.