How to Launch an Experiment from Python¶
.. toctree::
:hidden:
Start Usage <python_api_start>
Connect Usage <python_api_connect>
Overview¶
Since v2.0
, NNI provides a new way to launch the experiments. Before that, you need to configure the experiment in the YAML configuration file and then use the nnictl
command to launch the experiment. Now, you can also configure and run experiments directly in the Python file. If you are familiar with Python programming, this will undoubtedly bring you more convenience.
Run a New Experiment¶
After successfully installing nni
and prepare the trial code, you can start the experiment with a Python script in the following 2 steps.
Step 1 - Initialize an experiment instance and configure it¶
from nni.experiment import Experiment
experiment = Experiment('local')
Now, you have a Experiment
instance, and this experiment will launch trials on your local machine due to training_service='local'
.
See all training services supported in NNI.
experiment.config.experiment_name = 'MNIST example'
experiment.config.trial_concurrency = 2
experiment.config.max_trial_number = 10
experiment.config.search_space = search_space
experiment.config.trial_command = 'python3 mnist.py'
experiment.config.trial_code_directory = Path(__file__).parent
experiment.config.tuner.name = 'TPE'
experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
experiment.config.training_service.use_active_gpu = True
Use the form like experiment.config.foo = 'bar'
to configure your experiment.
See all real builtin tuners supported in NNI.
See configuration reference for more detailed usage of these fields.
Step 2 - Just run¶
experiment.run(port=8080)
Now, you have successfully launched an NNI experiment. And you can type localhost:8080
in your browser to observe your experiment in real time.
In this way, experiment will run in the foreground and will automatically exit when the experiment finished.
备注
If you want to run an experiment in an interactive way, use start()
in Step 2. If you launch the experiment in Python script, please use run()
, as start()
is designed for the interactive scenarios.
Example¶
Below is an example for this new launching approach. You can find this code in mnist-tfv2/launch.py.
from pathlib import Path
from nni.experiment import Experiment
search_space = {
"dropout_rate": { "_type": "uniform", "_value": [0.5, 0.9] },
"conv_size": { "_type": "choice", "_value": [2, 3, 5, 7] },
"hidden_size": { "_type": "choice", "_value": [124, 512, 1024] },
"batch_size": { "_type": "choice", "_value": [16, 32] },
"learning_rate": { "_type": "choice", "_value": [0.0001, 0.001, 0.01, 0.1] }
}
experiment = Experiment('local')
experiment.config.experiment_name = 'MNIST example'
experiment.config.trial_concurrency = 2
experiment.config.max_trial_number = 10
experiment.config.search_space = search_space
experiment.config.trial_command = 'python3 mnist.py'
experiment.config.trial_code_directory = Path(__file__).parent
experiment.config.tuner.name = 'TPE'
experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
experiment.config.training_service.use_active_gpu = True
experiment.run(8080)
Start and Manage a New Experiment¶
NNI migrates the API in NNI Client
to this new launching approach. Launch the experiment by start()
instead of run()
, then you can use these APIs in interactive mode.
Please refer to example usage and code file python_api_start.ipynb.
备注
run()
polls the experiment status and will automatically call stop()
when the experiment finished. start()
just launched a new experiment, so you need to manually stop the experiment by calling stop()
.
Connect and Manage an Exist Experiment¶
If you launch an experiment by nnictl
and also want to use these APIs, you can use Experiment.connect()
to connect to an existing experiment.
Please refer to example usage and code file python_api_connect.ipynb.
备注
You can use stop()
to stop the experiment when connecting to an existing experiment.
Resume/View and Manage a Stopped Experiment¶
You can use Experiment.resume()
and Experiment.view()
to resume and view a stopped experiment, these functions behave like nnictl resume
and nnictl view
.
If you want to manage the experiment, set wait_completion
as False
and the functions will return an Experiment
instance. For more parameters, please refer to API reference.
API Reference¶
Detailed usage could be found here.
Experiment¶
- class nni.experiment.Experiment(config: nni.experiment.config.experiment_config.ExperimentConfig)[源代码]¶
- class nni.experiment.Experiment(training_service: Union[str, List[str]])
Create and stop an NNI experiment.
- config¶
Experiment configuration.
- port¶
Web UI port of the experiment, or None if it is not running.
- classmethod connect(port: int)[源代码]¶
Connect to an existing experiment.
- 参数
port -- The port of web UI.
- export_data()[源代码]¶
Return exported information for all trial jobs.
- 返回
List of TrialResult.
- 返回类型
list
- get_all_experiments_metadata()[源代码]¶
Return all experiments metadata as a list.
- 返回
The experiments metadata.
- 返回类型
list
- get_experiment_metadata(exp_id: str)[源代码]¶
Return experiment metadata with specified exp_id as a dict.
- 返回
The specified experiment metadata.
- 返回类型
dict
- get_experiment_profile()[源代码]¶
Return experiment profile as a dict.
- 返回
The profile of the experiment.
- 返回类型
dict
- get_job_metrics(trial_job_id=None)[源代码]¶
Return trial job metrics.
- 参数
trial_job_id (str) -- trial job id. if this parameter is None, all trail jobs' metrics will be returned.
- 返回
Each key is a trialJobId, the corresponding value is a list of TrialMetricData.
- 返回类型
dict
- get_job_statistics()[源代码]¶
Return trial job statistics information as a dict.
- 返回
Job statistics information.
- 返回类型
dict
- get_trial_job(trial_job_id: str)[源代码]¶
Return a trial job.
- 参数
trial_job_id (str) -- Trial job id.
- 返回
A TrialJob instance corresponding to trial_job_id.
- 返回类型
TrialJob
- list_trial_jobs()[源代码]¶
Return information for all trial jobs as a list.
- 返回
List of TrialJob.
- 返回类型
list
- static resume(experiment_id: str, port: int = 8080, wait_completion: bool = True, debug: bool = False)[源代码]¶
Resume a stopped experiment.
- 参数
experiment_id -- The stopped experiment id.
port -- The port of web UI.
wait_completion -- If true, run in the foreground. If false, run in the background.
debug -- Whether to start in debug mode.
- run(port: int = 8080, wait_completion: bool = True, debug: bool = False) bool [源代码]¶
Run the experiment.
If wait_completion is True, this function will block until experiment finish or error.
Return True when experiment done; or return False when experiment failed.
Else if wait_completion is False, this function will non-block and return None immediately.
- start(port: int = 8080, debug: bool = False, run_mode: nni.experiment.experiment.RunMode = RunMode.Background) None [源代码]¶
Start the experiment in background.
This method will raise exception on failure. If it returns, the experiment should have been successfully started.
- 参数
port -- The port of web UI.
debug -- Whether to start in debug mode.
- update_max_experiment_duration(value: str)[源代码]¶
Update an experiment's max_experiment_duration
- 参数
value (str) -- Strings like '1m' for one minute or '2h' for two hours. SUFFIX may be 's' for seconds, 'm' for minutes, 'h' for hours or 'd' for days.
- update_max_trial_number(value: int)[源代码]¶
Update an experiment's max_trial_number
- 参数
value (int) -- New max_trial_number value.
- update_search_space(value: dict)[源代码]¶
Update the experiment's search_space. TODO: support searchspace file.
- 参数
value (dict) -- New search_space.
Experiment Config¶
- ExperimentConfig.experiment_name: Optional[str] = None¶
- ExperimentConfig.search_space_file: Optional[Union[pathlib.Path, str]] = None¶
- ExperimentConfig.search_space: Any = None¶
- ExperimentConfig.trial_command: Optional[str] = None¶
- ExperimentConfig.trial_code_directory: Union[pathlib.Path, str] = '.'¶
- ExperimentConfig.trial_concurrency: int¶
- ExperimentConfig.trial_gpu_number: Optional[int] = None¶
- ExperimentConfig.max_experiment_duration: Optional[Union[str, int]] = None¶
- ExperimentConfig.max_trial_number: Optional[int] = None¶
- ExperimentConfig.nni_manager_ip: Optional[str] = None¶
- ExperimentConfig.use_annotation: bool = False¶
- ExperimentConfig.debug: bool = False¶
- ExperimentConfig.log_level: Optional[str] = None¶
- ExperimentConfig.experiment_working_directory: Union[pathlib.Path, str] = '~/nni-experiments'¶
- ExperimentConfig.tuner_gpu_indices: Optional[Union[List[int], int, str]] = None¶
- ExperimentConfig.tuner: Optional[nni.experiment.config.algorithm._AlgorithmConfig] = None¶
- ExperimentConfig.assessor: Optional[nni.experiment.config.algorithm._AlgorithmConfig] = None¶
- ExperimentConfig.advisor: Optional[nni.experiment.config.algorithm._AlgorithmConfig] = None¶
- ExperimentConfig.training_service: Union[nni.experiment.config.training_service.TrainingServiceConfig, List[nni.experiment.config.training_service.TrainingServiceConfig]]¶
Algorithm Config¶
- AlgorithmConfig.name: str = None¶
- AlgorithmConfig.class_args: Optional[Dict[str, Any]] = None¶
- CustomAlgorithmConfig.class_name: str = None¶
- CustomAlgorithmConfig.code_directory: Optional[Union[pathlib.Path, str]] = '.'¶
- CustomAlgorithmConfig.class_args: Optional[Dict[str, Any]] = None¶
Training Service Config¶
Local Config¶
- LocalConfig.platform: str = 'local'¶
- LocalConfig.use_active_gpu: Optional[bool] = None¶
- LocalConfig.max_trial_number_per_gpu: int = 1¶
- LocalConfig.gpu_indices: Optional[Union[List[int], int, str]] = None¶
Remote Config¶
- RemoteConfig.platform: str = 'remote'¶
- RemoteConfig.reuse_mode: bool = True¶
- RemoteConfig.machine_list: List[nni.experiment.config.training_services.remote.RemoteMachineConfig]¶
- RemoteMachineConfig.host: str¶
- RemoteMachineConfig.port: int = 22¶
- RemoteMachineConfig.user: str¶
- RemoteMachineConfig.password: Optional[str] = None¶
- RemoteMachineConfig.ssh_key_file: Optional[Union[pathlib.Path, str]] = '~/.ssh/id_rsa'¶
- RemoteMachineConfig.ssh_passphrase: Optional[str] = None¶
- RemoteMachineConfig.use_active_gpu: bool = False¶
- RemoteMachineConfig.max_trial_number_per_gpu: int = 1¶
- RemoteMachineConfig.gpu_indices: Optional[Union[List[int], int, str]] = None¶
- RemoteMachineConfig.python_path: Optional[str] = None¶
Openpai Config¶
- OpenpaiConfig.platform: str = 'openpai'¶
- OpenpaiConfig.host: str¶
- OpenpaiConfig.username: str¶
- OpenpaiConfig.token: str¶
- OpenpaiConfig.trial_cpu_number: int¶
- OpenpaiConfig.trial_memory_size: Union[str, int]¶
- OpenpaiConfig.storage_config_name: str¶
- OpenpaiConfig.docker_image: str = 'msranni/nni:latest'¶
- OpenpaiConfig.local_storage_mount_point: Union[pathlib.Path, str]¶
- OpenpaiConfig.container_storage_mount_point: str¶
- OpenpaiConfig.reuse_mode: bool = True¶
- OpenpaiConfig.openpai_config: Optional[Dict] = None¶
- OpenpaiConfig.openpai_config_file: Optional[Union[pathlib.Path, str]] = None¶
AML Config¶
- AmlConfig.platform: str = 'aml'¶
- AmlConfig.subscription_id: str¶
- AmlConfig.resource_group: str¶
- AmlConfig.workspace_name: str¶
- AmlConfig.compute_target: str¶
- AmlConfig.docker_image: str = 'msranni/nni:latest'¶
- AmlConfig.max_trial_number_per_gpu: int = 1¶
Shared Storage Config¶
Nfs Config¶
- NfsConfig.storage_type: str = 'NFS'¶
- NfsConfig.nfs_server: str = None¶
- NfsConfig.exported_directory: str = None¶
Azure Blob Config¶
- AzureBlobConfig.storage_type: str = 'AzureBlob'¶
- AzureBlobConfig.storage_account_name: str = None¶
- AzureBlobConfig.storage_account_key: Optional[str] = None¶
- AzureBlobConfig.container_name: str = None¶