How to Launch an Experiment from Python

..  toctree::
    :hidden:

    Start Usage <python_api_start>
    Connect Usage <python_api_connect>

Overview

Since v2.0, NNI provides a new way to launch the experiments. Before that, you need to configure the experiment in the YAML configuration file and then use the nnictl command to launch the experiment. Now, you can also configure and run experiments directly in the Python file. If you are familiar with Python programming, this will undoubtedly bring you more convenience.

Run a New Experiment

After successfully installing nni and prepare the trial code, you can start the experiment with a Python script in the following 2 steps.

Step 1 - Initialize an experiment instance and configure it

from nni.experiment import Experiment
experiment = Experiment('local')

Now, you have a Experiment instance, and this experiment will launch trials on your local machine due to training_service='local'.

See all training services supported in NNI.

experiment.config.experiment_name = 'MNIST example'
experiment.config.trial_concurrency = 2
experiment.config.max_trial_number = 10
experiment.config.search_space = search_space
experiment.config.trial_command = 'python3 mnist.py'
experiment.config.trial_code_directory = Path(__file__).parent
experiment.config.tuner.name = 'TPE'
experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
experiment.config.training_service.use_active_gpu = True

Use the form like experiment.config.foo = 'bar' to configure your experiment.

See all real builtin tuners supported in NNI.

See configuration reference for more detailed usage of these fields.

Step 2 - Just run

experiment.run(port=8080)

Now, you have successfully launched an NNI experiment. And you can type localhost:8080 in your browser to observe your experiment in real time.

In this way, experiment will run in the foreground and will automatically exit when the experiment finished.

备注

If you want to run an experiment in an interactive way, use start() in Step 2. If you launch the experiment in Python script, please use run(), as start() is designed for the interactive scenarios.

Example

Below is an example for this new launching approach. You can find this code in mnist-tfv2/launch.py.

from pathlib import Path

from nni.experiment import Experiment

search_space = {
    "dropout_rate": { "_type": "uniform", "_value": [0.5, 0.9] },
    "conv_size": { "_type": "choice", "_value": [2, 3, 5, 7] },
    "hidden_size": { "_type": "choice", "_value": [124, 512, 1024] },
    "batch_size": { "_type": "choice", "_value": [16, 32] },
    "learning_rate": { "_type": "choice", "_value": [0.0001, 0.001, 0.01, 0.1] }
}

experiment = Experiment('local')
experiment.config.experiment_name = 'MNIST example'
experiment.config.trial_concurrency = 2
experiment.config.max_trial_number = 10
experiment.config.search_space = search_space
experiment.config.trial_command = 'python3 mnist.py'
experiment.config.trial_code_directory = Path(__file__).parent
experiment.config.tuner.name = 'TPE'
experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
experiment.config.training_service.use_active_gpu = True

experiment.run(8080)

Start and Manage a New Experiment

NNI migrates the API in NNI Client to this new launching approach. Launch the experiment by start() instead of run(), then you can use these APIs in interactive mode.

Please refer to example usage and code file python_api_start.ipynb.

备注

run() polls the experiment status and will automatically call stop() when the experiment finished. start() just launched a new experiment, so you need to manually stop the experiment by calling stop().

Connect and Manage an Exist Experiment

If you launch an experiment by nnictl and also want to use these APIs, you can use Experiment.connect() to connect to an existing experiment.

Please refer to example usage and code file python_api_connect.ipynb.

备注

You can use stop() to stop the experiment when connecting to an existing experiment.

Resume/View and Manage a Stopped Experiment

You can use Experiment.resume() and Experiment.view() to resume and view a stopped experiment, these functions behave like nnictl resume and nnictl view.

If you want to manage the experiment, set wait_completion as False and the functions will return an Experiment instance. For more parameters, please refer to API reference.

API Reference

Detailed usage could be found here.

Experiment

class nni.experiment.Experiment(config: nni.experiment.config.experiment_config.ExperimentConfig)[源代码]
class nni.experiment.Experiment(training_service: Union[str, List[str]])

Create and stop an NNI experiment.

config

Experiment configuration.

port

Web UI port of the experiment, or None if it is not running.

classmethod connect(port: int)[源代码]

Connect to an existing experiment.

参数

port -- The port of web UI.

export_data()[源代码]

Return exported information for all trial jobs.

返回

List of TrialResult.

返回类型

list

get_all_experiments_metadata()[源代码]

Return all experiments metadata as a list.

返回

The experiments metadata.

返回类型

list

get_experiment_metadata(exp_id: str)[源代码]

Return experiment metadata with specified exp_id as a dict.

返回

The specified experiment metadata.

返回类型

dict

get_experiment_profile()[源代码]

Return experiment profile as a dict.

返回

The profile of the experiment.

返回类型

dict

get_job_metrics(trial_job_id=None)[源代码]

Return trial job metrics.

参数

trial_job_id (str) -- trial job id. if this parameter is None, all trail jobs' metrics will be returned.

返回

Each key is a trialJobId, the corresponding value is a list of TrialMetricData.

返回类型

dict

get_job_statistics()[源代码]

Return trial job statistics information as a dict.

返回

Job statistics information.

返回类型

dict

get_status() str[源代码]

Return experiment status as a str.

返回

Experiment status.

返回类型

str

get_trial_job(trial_job_id: str)[源代码]

Return a trial job.

参数

trial_job_id (str) -- Trial job id.

返回

A TrialJob instance corresponding to trial_job_id.

返回类型

TrialJob

list_trial_jobs()[源代码]

Return information for all trial jobs as a list.

返回

List of TrialJob.

返回类型

list

static resume(experiment_id: str, port: int = 8080, wait_completion: bool = True, debug: bool = False)[源代码]

Resume a stopped experiment.

参数
  • experiment_id -- The stopped experiment id.

  • port -- The port of web UI.

  • wait_completion -- If true, run in the foreground. If false, run in the background.

  • debug -- Whether to start in debug mode.

run(port: int = 8080, wait_completion: bool = True, debug: bool = False) bool[源代码]

Run the experiment.

If wait_completion is True, this function will block until experiment finish or error.

Return True when experiment done; or return False when experiment failed.

Else if wait_completion is False, this function will non-block and return None immediately.

start(port: int = 8080, debug: bool = False, run_mode: nni.experiment.experiment.RunMode = RunMode.Background) None[源代码]

Start the experiment in background.

This method will raise exception on failure. If it returns, the experiment should have been successfully started.

参数
  • port -- The port of web UI.

  • debug -- Whether to start in debug mode.

stop() None[源代码]

Stop background experiment.

update_max_experiment_duration(value: str)[源代码]

Update an experiment's max_experiment_duration

参数

value (str) -- Strings like '1m' for one minute or '2h' for two hours. SUFFIX may be 's' for seconds, 'm' for minutes, 'h' for hours or 'd' for days.

update_max_trial_number(value: int)[源代码]

Update an experiment's max_trial_number

参数

value (int) -- New max_trial_number value.

update_search_space(value: dict)[源代码]

Update the experiment's search_space. TODO: support searchspace file.

参数

value (dict) -- New search_space.

update_trial_concurrency(value: int)[源代码]

Update an experiment's trial_concurrency

参数

value (int) -- New trial_concurrency value.

static view(experiment_id: str, port: int = 8080, non_blocking: bool = False)[源代码]

View a stopped experiment.

参数
  • experiment_id -- The stopped experiment id.

  • port -- The port of web UI.

  • non_blocking -- If false, run in the foreground. If true, run in the background.

Experiment Config

ExperimentConfig.experiment_name: Optional[str] = None
ExperimentConfig.search_space_file: Optional[Union[pathlib.Path, str]] = None
ExperimentConfig.search_space: Any = None
ExperimentConfig.trial_command: Optional[str] = None
ExperimentConfig.trial_code_directory: Union[pathlib.Path, str] = '.'
ExperimentConfig.trial_concurrency: int
ExperimentConfig.trial_gpu_number: Optional[int] = None
ExperimentConfig.max_experiment_duration: Optional[Union[str, int]] = None
ExperimentConfig.max_trial_number: Optional[int] = None
ExperimentConfig.nni_manager_ip: Optional[str] = None
ExperimentConfig.use_annotation: bool = False
ExperimentConfig.debug: bool = False
ExperimentConfig.log_level: Optional[str] = None
ExperimentConfig.experiment_working_directory: Union[pathlib.Path, str] = '~/nni-experiments'
ExperimentConfig.tuner_gpu_indices: Optional[Union[List[int], int, str]] = None
ExperimentConfig.tuner: Optional[nni.experiment.config.algorithm._AlgorithmConfig] = None
ExperimentConfig.assessor: Optional[nni.experiment.config.algorithm._AlgorithmConfig] = None
ExperimentConfig.advisor: Optional[nni.experiment.config.algorithm._AlgorithmConfig] = None
ExperimentConfig.training_service: Union[nni.experiment.config.training_service.TrainingServiceConfig, List[nni.experiment.config.training_service.TrainingServiceConfig]]
ExperimentConfig.shared_storage: Optional[nni.experiment.config.shared_storage.SharedStorageConfig] = None

Algorithm Config

AlgorithmConfig.name: str = None
AlgorithmConfig.class_args: Optional[Dict[str, Any]] = None
CustomAlgorithmConfig.class_name: str = None
CustomAlgorithmConfig.code_directory: Optional[Union[pathlib.Path, str]] = '.'
CustomAlgorithmConfig.class_args: Optional[Dict[str, Any]] = None

Training Service Config

Local Config

LocalConfig.platform: str = 'local'
LocalConfig.use_active_gpu: Optional[bool] = None
LocalConfig.max_trial_number_per_gpu: int = 1
LocalConfig.gpu_indices: Optional[Union[List[int], int, str]] = None

Remote Config

RemoteConfig.platform: str = 'remote'
RemoteConfig.reuse_mode: bool = True
RemoteConfig.machine_list: List[nni.experiment.config.training_services.remote.RemoteMachineConfig]
RemoteMachineConfig.host: str
RemoteMachineConfig.port: int = 22
RemoteMachineConfig.user: str
RemoteMachineConfig.password: Optional[str] = None
RemoteMachineConfig.ssh_key_file: Optional[Union[pathlib.Path, str]] = '~/.ssh/id_rsa'
RemoteMachineConfig.ssh_passphrase: Optional[str] = None
RemoteMachineConfig.use_active_gpu: bool = False
RemoteMachineConfig.max_trial_number_per_gpu: int = 1
RemoteMachineConfig.gpu_indices: Optional[Union[List[int], int, str]] = None
RemoteMachineConfig.python_path: Optional[str] = None

Openpai Config

OpenpaiConfig.platform: str = 'openpai'
OpenpaiConfig.host: str
OpenpaiConfig.username: str
OpenpaiConfig.token: str
OpenpaiConfig.trial_cpu_number: int
OpenpaiConfig.trial_memory_size: Union[str, int]
OpenpaiConfig.storage_config_name: str
OpenpaiConfig.docker_image: str = 'msranni/nni:latest'
OpenpaiConfig.local_storage_mount_point: Union[pathlib.Path, str]
OpenpaiConfig.container_storage_mount_point: str
OpenpaiConfig.reuse_mode: bool = True
OpenpaiConfig.openpai_config: Optional[Dict] = None
OpenpaiConfig.openpai_config_file: Optional[Union[pathlib.Path, str]] = None

AML Config

AmlConfig.platform: str = 'aml'
AmlConfig.subscription_id: str
AmlConfig.resource_group: str
AmlConfig.workspace_name: str
AmlConfig.compute_target: str
AmlConfig.docker_image: str = 'msranni/nni:latest'
AmlConfig.max_trial_number_per_gpu: int = 1

Shared Storage Config

Nfs Config

NfsConfig.storage_type: str = 'NFS'
NfsConfig.nfs_server: str = None
NfsConfig.exported_directory: str = None

Azure Blob Config

AzureBlobConfig.storage_type: str = 'AzureBlob'
AzureBlobConfig.storage_account_name: str = None
AzureBlobConfig.storage_account_key: Optional[str] = None
AzureBlobConfig.container_name: str = None