Experiment API Reference¶

class nni.experiment.Experiment(config_or_platform, id=None)[source]¶

Manage NNI experiment.

You can either specify an ExperimentConfig object, or a training service name. If a platform name is used, a blank config template for that training service will be generated.

When configuration is completed, use Experiment.run() to launch the experiment.

Parameters:

config_or_platform (ExperimentConfig | str | list[str] | None) – See ExperimentConfig.
id (str | None) – Experiment ID. If not specified, a random ID will be generated. If specified, the ID should be unique to avoid conflict with existing experiments. The only case when you need to specify an existing ID is when you want to resume an experiment.

Example

experiment = Experiment('remote')
experiment.config.trial_command = 'python3 trial.py'
experiment.config.machines.append(RemoteMachineConfig(ip=..., user_name=...))
...
experiment.run(8080)

config¶: Experiment configuration.

id¶: Experiment ID.

port¶: Web portal port. Or None if the experiment is not running.

classmethod connect(port)[source]¶

Connect to an existing experiment.

Parameters:: port (int) – The port of web UI.

export_data()[source]¶

Return exported information for all trial jobs.

Returns:: List of TrialResult.
Return type:: list

get_all_experiments_metadata()[source]¶

Return all experiments metadata as a list.

Returns:: The experiments metadata.
Return type:: list

get_experiment_metadata(exp_id)[source]¶

Return experiment metadata with specified exp_id as a dict.

Returns:: The specified experiment metadata.
Return type:: dict

get_experiment_profile()[source]¶

Return experiment profile as a dict.

Returns:: The profile of the experiment.
Return type:: dict

get_job_metrics(trial_job_id=None)[source]¶

Return trial job metrics.

Parameters:: trial_job_id (str) – trial job id. if this parameter is None, all trail jobs’ metrics will be returned.
Returns:: Each key is a trialJobId, the corresponding value is a list of TrialMetricData.
Return type:: dict

get_job_statistics()[source]¶

Return trial job statistics information as a dict.

Returns:: Job statistics information.
Return type:: dict

get_status()[source]¶

Return experiment status as a str.

Returns:: Experiment status.
Return type:: str

get_trial_job(trial_job_id)[source]¶

Return a trial job.

Parameters:: trial_job_id (str) – Trial job id.
Returns:: A TrialJob instance corresponding to trial_job_id.
Return type:: TrialJob

has_checkpoint()[source]¶

Check whether a checkpoint of current experiment ID exists.

Return type:: True if checkpoint is found; False otherwise.

kill_trial_job(trial_job_id)[source]¶

Kill a trial job.

Parameters:: trial_job_id (str) – Trial job id.

list_trial_jobs()[source]¶

Return information for all trial jobs as a list.

Returns:: List of TrialJob.
Return type:: list

load_checkpoint()[source]¶

Load checkpoint from local file system. Restores the status of the experiment instance.

resume(port=8080, wait_completion=True, debug=False)[source]¶

Resume a stopped experiment.

Parameters:

port (int) – The port of web UI.
wait_completion (bool) – If true, run in the foreground. If false, run in the background.
debug (bool) – Whether to start in debug mode.

Return type:

See run().

run(port=8080, wait_completion=True, debug=False)[source]¶

Run the experiment.

Using Ctrl-C will stop() the experiment. Otherwise the experiment won’t be stop`ped even if the method returns. It has to be manually :meth:`stop`ped, or atexit will :meth:`stop() it at process exit.

Parameters:

port (int) – The port on which NNI manager will run. It will also be the port of web portal.
wait_completion (bool) – If wait_completion is True, this function will block until experiment finish or error.
debug (bool) – Set log level to debug.

Returns:

If wait_completion is False, this function will non-block and return None immediately.
Otherwise, return True when experiment done; or return False when experiment failed.

Return type:

bool | None

run_or_resume(port=8080, wait_completion=True, debug=False)[source]¶

Call run() or resume() based on the return value of has_checkpoint().

Parameters are return values are same as run().

save_checkpoint()[source]¶

Save the experiment status to local file system.

start(port=8080, debug=False, run_mode=RunMode.Background)[source]¶

Start the experiment in background.

This method will raise exception on failure. If it returns, the experiment should have been successfully started.

Parameters:

port (int) – The port of web UI.
debug (bool) – Whether to start in debug mode.
run_mode (RunMode) – Running the experiment in foreground or background

stop()[source]¶

Stop the experiment.

update_max_experiment_duration(value)[source]¶

Update an experiment’s max_experiment_duration

Parameters:: value (str) – Strings like ‘1m’ for one minute or ‘2h’ for two hours. SUFFIX may be ‘s’ for seconds, ‘m’ for minutes, ‘h’ for hours or ‘d’ for days.

update_max_trial_number(value)[source]¶

Update an experiment’s max_trial_number

Parameters:: value (int) – New max_trial_number value.

update_search_space(value)[source]¶

Update the experiment’s search_space. TODO: support searchspace file.

Parameters:: value (dict) – New search_space.

update_trial_concurrency(value)[source]¶

Update an experiment’s trial_concurrency

Parameters:: value (int) – New trial_concurrency value.

view(port=8080, non_blocking=False)[source]¶

View a stopped experiment.

Parameters:

port (int) – The port of web UI.
non_blocking (bool) – If false, run in the foreground. If true, run in the background.

Return type:

Return self instance.