Experiment API Reference¶
- class nni.experiment.Experiment(config_or_platform, id=None)[source]¶
Manage NNI experiment.
You can either specify an
ExperimentConfig
object, or a training service name. If a platform name is used, a blank config template for that training service will be generated.When configuration is completed, use
Experiment.run()
to launch the experiment.- Parameters:
config_or_platform (ExperimentConfig | str | list[str] | None) – See
ExperimentConfig
.id (str | None) – Experiment ID. If not specified, a random ID will be generated. If specified, the ID should be unique to avoid conflict with existing experiments. The only case when you need to specify an existing ID is when you want to resume an experiment.
Example
experiment = Experiment('remote') experiment.config.trial_command = 'python3 trial.py' experiment.config.machines.append(RemoteMachineConfig(ip=..., user_name=...)) ... experiment.run(8080)
- config¶
Experiment configuration.
- id¶
Experiment ID.
- port¶
Web portal port. Or
None
if the experiment is not running.
- classmethod connect(port)[source]¶
Connect to an existing experiment.
- Parameters:
port (int) – The port of web UI.
- export_data()[source]¶
Return exported information for all trial jobs.
- Returns:
List of TrialResult.
- Return type:
list
- get_all_experiments_metadata()[source]¶
Return all experiments metadata as a list.
- Returns:
The experiments metadata.
- Return type:
list
- get_experiment_metadata(exp_id)[source]¶
Return experiment metadata with specified exp_id as a dict.
- Returns:
The specified experiment metadata.
- Return type:
dict
- get_experiment_profile()[source]¶
Return experiment profile as a dict.
- Returns:
The profile of the experiment.
- Return type:
dict
- get_job_metrics(trial_job_id=None)[source]¶
Return trial job metrics.
- Parameters:
trial_job_id (str) – trial job id. if this parameter is None, all trail jobs’ metrics will be returned.
- Returns:
Each key is a trialJobId, the corresponding value is a list of TrialMetricData.
- Return type:
dict
- get_job_statistics()[source]¶
Return trial job statistics information as a dict.
- Returns:
Job statistics information.
- Return type:
dict
- get_status()[source]¶
Return experiment status as a str.
- Returns:
Experiment status.
- Return type:
str
- get_trial_job(trial_job_id)[source]¶
Return a trial job.
- Parameters:
trial_job_id (str) – Trial job id.
- Returns:
A TrialJob instance corresponding to trial_job_id.
- Return type:
TrialJob
- has_checkpoint()[source]¶
Check whether a checkpoint of current experiment ID exists.
- Return type:
True
if checkpoint is found;False
otherwise.
- kill_trial_job(trial_job_id)[source]¶
Kill a trial job.
- Parameters:
trial_job_id (str) – Trial job id.
- list_trial_jobs()[source]¶
Return information for all trial jobs as a list.
- Returns:
List of TrialJob.
- Return type:
list
- load_checkpoint()[source]¶
Load checkpoint from local file system. Restores the status of the experiment instance.
- resume(port=8080, wait_completion=True, debug=False)[source]¶
Resume a stopped experiment.
- Parameters:
port (int) – The port of web UI.
wait_completion (bool) – If true, run in the foreground. If false, run in the background.
debug (bool) – Whether to start in debug mode.
- Return type:
See
run()
.
- run(port=8080, wait_completion=True, debug=False)[source]¶
Run the experiment.
Using Ctrl-C will
stop()
the experiment. Otherwise the experiment won’t bestop`ped even if the method returns. It has to be manually :meth:`stop`ped, or atexit will :meth:`stop()
it at process exit.- Parameters:
port (int) – The port on which NNI manager will run. It will also be the port of web portal.
wait_completion (bool) – If
wait_completion
isTrue
, this function will block until experiment finish or error.debug (bool) – Set log level to debug.
- Returns:
If
wait_completion
isFalse
, this function will non-block and return None immediately.Otherwise, return
True
when experiment done; or returnFalse
when experiment failed.
- Return type:
bool | None
- run_or_resume(port=8080, wait_completion=True, debug=False)[source]¶
Call
run()
orresume()
based on the return value ofhas_checkpoint()
.Parameters are return values are same as
run()
.
- start(port=8080, debug=False, run_mode=RunMode.Background)[source]¶
Start the experiment in background.
This method will raise exception on failure. If it returns, the experiment should have been successfully started.
- Parameters:
port (int) – The port of web UI.
debug (bool) – Whether to start in debug mode.
run_mode (RunMode) – Running the experiment in foreground or background
- update_max_experiment_duration(value)[source]¶
Update an experiment’s max_experiment_duration
- Parameters:
value (str) – Strings like ‘1m’ for one minute or ‘2h’ for two hours. SUFFIX may be ‘s’ for seconds, ‘m’ for minutes, ‘h’ for hours or ‘d’ for days.
- update_max_trial_number(value)[source]¶
Update an experiment’s max_trial_number
- Parameters:
value (int) – New max_trial_number value.
- update_search_space(value)[source]¶
Update the experiment’s search_space. TODO: support searchspace file.
- Parameters:
value (dict) – New search_space.