How to Launch an Experiment from Python¶

..  toctree::
    :hidden:

    Start Usage <python_api_start>
    Connect Usage <python_api_connect>

Overview¶

Since v2.0, NNI provides a new way to launch the experiments. Before that, you need to configure the experiment in the YAML configuration file and then use the nnictl command to launch the experiment. Now, you can also configure and run experiments directly in the Python file. If you are familiar with Python programming, this will undoubtedly bring you more convenience.

Run a New Experiment¶

After successfully installing nni and prepare the trial code, you can start the experiment with a Python script in the following 2 steps.

Step 1 - Initialize an experiment instance and configure it¶

from nni.experiment import Experiment
experiment = Experiment('local')

Now, you have a Experiment instance, and this experiment will launch trials on your local machine due to training_service='local'.

See all training services supported in NNI.

experiment.config.experiment_name = 'MNIST example'
experiment.config.trial_concurrency = 2
experiment.config.max_trial_number = 10
experiment.config.search_space = search_space
experiment.config.trial_command = 'python3 mnist.py'
experiment.config.trial_code_directory = Path(__file__).parent
experiment.config.tuner.name = 'TPE'
experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
experiment.config.training_service.use_active_gpu = True

Use the form like experiment.config.foo = 'bar' to configure your experiment.

See all real builtin tuners supported in NNI.

See configuration reference for more detailed usage of these fields.

Step 2 - Just run¶

experiment.run(port=8080)

Now, you have successfully launched an NNI experiment. And you can type localhost:8080 in your browser to observe your experiment in real time.

In this way, experiment will run in the foreground and will automatically exit when the experiment finished.

备注

If you want to run an experiment in an interactive way, use start() in Step 2. If you launch the experiment in Python script, please use run(), as start() is designed for the interactive scenarios.

Example¶

Below is an example for this new launching approach. You can find this code in mnist-tfv2/launch.py.

from pathlib import Path

from nni.experiment import Experiment

search_space = {
    "dropout_rate": { "_type": "uniform", "_value": [0.5, 0.9] },
    "conv_size": { "_type": "choice", "_value": [2, 3, 5, 7] },
    "hidden_size": { "_type": "choice", "_value": [124, 512, 1024] },
    "batch_size": { "_type": "choice", "_value": [16, 32] },
    "learning_rate": { "_type": "choice", "_value": [0.0001, 0.001, 0.01, 0.1] }
}

experiment = Experiment('local')
experiment.config.experiment_name = 'MNIST example'
experiment.config.trial_concurrency = 2
experiment.config.max_trial_number = 10
experiment.config.search_space = search_space
experiment.config.trial_command = 'python3 mnist.py'
experiment.config.trial_code_directory = Path(__file__).parent
experiment.config.tuner.name = 'TPE'
experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
experiment.config.training_service.use_active_gpu = True

experiment.run(8080)

Start and Manage a New Experiment¶

NNI migrates the API in NNI Client to this new launching approach. Launch the experiment by start() instead of run(), then you can use these APIs in interactive mode.

Please refer to example usage and code file python_api_start.ipynb.

备注

run() polls the experiment status and will automatically call stop() when the experiment finished. start() just launched a new experiment, so you need to manually stop the experiment by calling stop().

Connect and Manage an Exist Experiment¶

If you launch an experiment by nnictl and also want to use these APIs, you can use Experiment.connect() to connect to an existing experiment.

Please refer to example usage and code file python_api_connect.ipynb.

备注

You can use stop() to stop the experiment when connecting to an existing experiment.

Resume/View and Manage a Stopped Experiment¶

You can use Experiment.resume() and Experiment.view() to resume and view a stopped experiment, these functions behave like nnictl resume and nnictl view.

If you want to manage the experiment, set wait_completion as False and the functions will return an Experiment instance. For more parameters, please refer to API reference.

API Reference¶

Detailed usage could be found here.

Experiment
Experiment Config
Algorithm Config
Training Service Config * Local Config * Remote Config * Openpai Config * AML Config
Shared Storage Config

Experiment¶

class nni.experiment.Experiment(config: nni.experiment.config.experiment_config.ExperimentConfig)[源代码]¶

class nni.experiment.Experiment(training_service: Union[str, List[str]])

Create and stop an NNI experiment.

config¶: Experiment configuration.

port¶: Web UI port of the experiment, or None if it is not running.

classmethod connect(port: int)[源代码]¶

Connect to an existing experiment.

参数: port -- The port of web UI.

export_data()[源代码]¶

Return exported information for all trial jobs.

返回: List of TrialResult.
返回类型: list

get_all_experiments_metadata()[源代码]¶

Return all experiments metadata as a list.

返回: The experiments metadata.
返回类型: list

get_experiment_metadata(exp_id: str)[源代码]¶

Return experiment metadata with specified exp_id as a dict.

返回: The specified experiment metadata.
返回类型: dict

get_experiment_profile()[源代码]¶

Return experiment profile as a dict.

返回: The profile of the experiment.
返回类型: dict

get_job_metrics(trial_job_id=None)[源代码]¶

Return trial job metrics.

参数: trial_job_id (str) -- trial job id. if this parameter is None, all trail jobs' metrics will be returned.
返回: Each key is a trialJobId, the corresponding value is a list of TrialMetricData.
返回类型: dict

get_job_statistics()[源代码]¶

Return trial job statistics information as a dict.

返回: Job statistics information.
返回类型: dict

get_status() → str[源代码]¶

Return experiment status as a str.

返回: Experiment status.
返回类型: str

get_trial_job(trial_job_id: str)[源代码]¶

Return a trial job.

参数: trial_job_id (str) -- Trial job id.
返回: A TrialJob instance corresponding to trial_job_id.
返回类型: TrialJob

list_trial_jobs()[源代码]¶

Return information for all trial jobs as a list.

返回: List of TrialJob.
返回类型: list

static resume(experiment_id: str, port: int = 8080, wait_completion: bool = True, debug: bool = False)[源代码]¶

Resume a stopped experiment.

参数

experiment_id -- The stopped experiment id.
port -- The port of web UI.
wait_completion -- If true, run in the foreground. If false, run in the background.
debug -- Whether to start in debug mode.

run(port: int = 8080, wait_completion: bool = True, debug: bool = False) → bool[源代码]¶

Run the experiment.

If wait_completion is True, this function will block until experiment finish or error.

Return True when experiment done; or return False when experiment failed.

Else if wait_completion is False, this function will non-block and return None immediately.

start(port: int = 8080, debug: bool = False, run_mode: nni.experiment.experiment.RunMode = RunMode.Background) → None[源代码]¶

Start the experiment in background.

This method will raise exception on failure. If it returns, the experiment should have been successfully started.

参数

port -- The port of web UI.
debug -- Whether to start in debug mode.

stop() → None[源代码]¶: Stop background experiment.

update_max_experiment_duration(value: str)[源代码]¶

Update an experiment's max_experiment_duration

参数: value (str) -- Strings like '1m' for one minute or '2h' for two hours. SUFFIX may be 's' for seconds, 'm' for minutes, 'h' for hours or 'd' for days.

update_max_trial_number(value: int)[源代码]¶

Update an experiment's max_trial_number

参数: value (int) -- New max_trial_number value.

update_search_space(value: dict)[源代码]¶

Update the experiment's search_space. TODO: support searchspace file.

参数: value (dict) -- New search_space.

update_trial_concurrency(value: int)[源代码]¶

Update an experiment's trial_concurrency

参数: value (int) -- New trial_concurrency value.

static view(experiment_id: str, port: int = 8080, non_blocking: bool = False)[源代码]¶

View a stopped experiment.

参数

experiment_id -- The stopped experiment id.
port -- The port of web UI.
non_blocking -- If false, run in the foreground. If true, run in the background.

Experiment Config¶

ExperimentConfig.experiment_name: Optional[str] = None¶

ExperimentConfig.search_space_file: Optional[Union[pathlib.Path, str]] = None¶

ExperimentConfig.search_space: Any = None¶

ExperimentConfig.trial_command: Optional[str] = None¶

ExperimentConfig.trial_code_directory: Union[pathlib.Path, str] = '.'¶

ExperimentConfig.trial_concurrency: int¶

ExperimentConfig.trial_gpu_number: Optional[int] = None¶

ExperimentConfig.max_experiment_duration: Optional[Union[str, int]] = None¶

ExperimentConfig.max_trial_number: Optional[int] = None¶

ExperimentConfig.nni_manager_ip: Optional[str] = None¶

ExperimentConfig.use_annotation: bool = False¶

ExperimentConfig.debug: bool = False¶

ExperimentConfig.log_level: Optional[str] = None¶

ExperimentConfig.experiment_working_directory: Union[pathlib.Path, str] = '~/nni-experiments'¶

ExperimentConfig.tuner_gpu_indices: Optional[Union[List[int], int, str]] = None¶

ExperimentConfig.tuner: Optional[nni.experiment.config.algorithm._AlgorithmConfig] = None¶

ExperimentConfig.assessor: Optional[nni.experiment.config.algorithm._AlgorithmConfig] = None¶

ExperimentConfig.advisor: Optional[nni.experiment.config.algorithm._AlgorithmConfig] = None¶

ExperimentConfig.training_service: Union[nni.experiment.config.training_service.TrainingServiceConfig, List[nni.experiment.config.training_service.TrainingServiceConfig]]¶

ExperimentConfig.shared_storage: Optional[nni.experiment.config.shared_storage.SharedStorageConfig] = None¶

Algorithm Config¶

AlgorithmConfig.name: str = None¶

AlgorithmConfig.class_args: Optional[Dict[str, Any]] = None¶

CustomAlgorithmConfig.class_name: str = None¶

CustomAlgorithmConfig.code_directory: Optional[Union[pathlib.Path, str]] = '.'¶

CustomAlgorithmConfig.class_args: Optional[Dict[str, Any]] = None¶

Training Service Config¶

Local Config¶

LocalConfig.platform: str = 'local'¶

LocalConfig.use_active_gpu: Optional[bool] = None¶

LocalConfig.max_trial_number_per_gpu: int = 1¶

LocalConfig.gpu_indices: Optional[Union[List[int], int, str]] = None¶

Remote Config¶

RemoteConfig.platform: str = 'remote'¶

RemoteConfig.reuse_mode: bool = True¶

RemoteConfig.machine_list: List[nni.experiment.config.training_services.remote.RemoteMachineConfig]¶

RemoteMachineConfig.host: str¶

RemoteMachineConfig.port: int = 22¶

RemoteMachineConfig.user: str¶

RemoteMachineConfig.password: Optional[str] = None¶

RemoteMachineConfig.ssh_key_file: Optional[Union[pathlib.Path, str]] = '~/.ssh/id_rsa'¶

RemoteMachineConfig.ssh_passphrase: Optional[str] = None¶

RemoteMachineConfig.use_active_gpu: bool = False¶

RemoteMachineConfig.max_trial_number_per_gpu: int = 1¶

RemoteMachineConfig.gpu_indices: Optional[Union[List[int], int, str]] = None¶

RemoteMachineConfig.python_path: Optional[str] = None¶

Openpai Config¶

OpenpaiConfig.platform: str = 'openpai'¶

OpenpaiConfig.host: str¶

OpenpaiConfig.username: str¶

OpenpaiConfig.token: str¶

OpenpaiConfig.trial_cpu_number: int¶

OpenpaiConfig.trial_memory_size: Union[str, int]¶

OpenpaiConfig.storage_config_name: str¶

OpenpaiConfig.docker_image: str = 'msranni/nni:latest'¶

OpenpaiConfig.local_storage_mount_point: Union[pathlib.Path, str]¶

OpenpaiConfig.container_storage_mount_point: str¶

OpenpaiConfig.reuse_mode: bool = True¶

OpenpaiConfig.openpai_config: Optional[Dict] = None¶

OpenpaiConfig.openpai_config_file: Optional[Union[pathlib.Path, str]] = None¶

AML Config¶

AmlConfig.platform: str = 'aml'¶

AmlConfig.subscription_id: str¶

AmlConfig.resource_group: str¶

AmlConfig.workspace_name: str¶

AmlConfig.compute_target: str¶

AmlConfig.docker_image: str = 'msranni/nni:latest'¶

AmlConfig.max_trial_number_per_gpu: int = 1¶

Shared Storage Config¶

Nfs Config¶

NfsConfig.storage_type: str = 'NFS'¶

NfsConfig.nfs_server: str = None¶

NfsConfig.exported_directory: str = None¶

Azure Blob Config¶

AzureBlobConfig.storage_type: str = 'AzureBlob'¶

AzureBlobConfig.storage_account_name: str = None¶

AzureBlobConfig.storage_account_key: Optional[str] = None¶

AzureBlobConfig.container_name: str = None¶