Experiment API Reference

class nni.experiment.Experiment(config_or_platform, id=None)[源代码]

Manage NNI experiment.

You can either specify an ExperimentConfig object, or a training service name. If a platform name is used, a blank config template for that training service will be generated.

When configuration is completed, use Experiment.run() to launch the experiment.

参数:
  • config_or_platform (ExperimentConfig | str | list[str] | None) -- See ExperimentConfig.

  • id (str | None) -- Experiment ID. If not specified, a random ID will be generated. If specified, the ID should be unique to avoid conflict with existing experiments. The only case when you need to specify an existing ID is when you want to resume an experiment.

示例

experiment = Experiment('remote')
experiment.config.trial_command = 'python3 trial.py'
experiment.config.machines.append(RemoteMachineConfig(ip=..., user_name=...))
...
experiment.run(8080)
config

Experiment configuration.

id

Experiment ID.

port

Web portal port. Or None if the experiment is not running.

classmethod connect(port)[源代码]

Connect to an existing experiment.

参数:

port (int) -- The port of web UI.

export_data()[源代码]

Return exported information for all trial jobs.

返回:

List of TrialResult.

返回类型:

list

get_all_experiments_metadata()[源代码]

Return all experiments metadata as a list.

返回:

The experiments metadata.

返回类型:

list

get_experiment_metadata(exp_id)[源代码]

Return experiment metadata with specified exp_id as a dict.

返回:

The specified experiment metadata.

返回类型:

dict

get_experiment_profile()[源代码]

Return experiment profile as a dict.

返回:

The profile of the experiment.

返回类型:

dict

get_job_metrics(trial_job_id=None)[源代码]

Return trial job metrics.

参数:

trial_job_id (str) -- trial job id. if this parameter is None, all trail jobs' metrics will be returned.

返回:

Each key is a trialJobId, the corresponding value is a list of TrialMetricData.

返回类型:

dict

get_job_statistics()[源代码]

Return trial job statistics information as a dict.

返回:

Job statistics information.

返回类型:

dict

get_status()[源代码]

Return experiment status as a str.

返回:

Experiment status.

返回类型:

str

get_trial_job(trial_job_id)[源代码]

Return a trial job.

参数:

trial_job_id (str) -- Trial job id.

返回:

A TrialJob instance corresponding to trial_job_id.

返回类型:

TrialJob

has_checkpoint()[源代码]

Check whether a checkpoint of current experiment ID exists.

返回类型:

True if checkpoint is found; False otherwise.

kill_trial_job(trial_job_id)[源代码]

Kill a trial job.

参数:

trial_job_id (str) -- Trial job id.

list_trial_jobs()[源代码]

Return information for all trial jobs as a list.

返回:

List of TrialJob.

返回类型:

list

load_checkpoint()[源代码]

Load checkpoint from local file system. Restores the status of the experiment instance.

resume(port=8080, wait_completion=True, debug=False)[源代码]

Resume a stopped experiment.

参数:
  • port (int) -- The port of web UI.

  • wait_completion (bool) -- If true, run in the foreground. If false, run in the background.

  • debug (bool) -- Whether to start in debug mode.

返回类型:

See run().

run(port=8080, wait_completion=True, debug=False)[源代码]

Run the experiment.

Using Ctrl-C will stop() the experiment. Otherwise the experiment won't be stop`ped even if the method returns. It has to be manually :meth:`stop`ped, or atexit will :meth:`stop() it at process exit.

参数:
  • port (int) -- The port on which NNI manager will run. It will also be the port of web portal.

  • wait_completion (bool) -- If wait_completion is True, this function will block until experiment finish or error.

  • debug (bool) -- Set log level to debug.

返回:

  • If wait_completion is False, this function will non-block and return None immediately.

  • Otherwise, return True when experiment done; or return False when experiment failed.

返回类型:

bool | None

run_or_resume(port=8080, wait_completion=True, debug=False)[源代码]

Call run() or resume() based on the return value of has_checkpoint().

Parameters are return values are same as run().

save_checkpoint()[源代码]

Save the experiment status to local file system.

start(port=8080, debug=False, run_mode=RunMode.Background)[源代码]

Start the experiment in background.

This method will raise exception on failure. If it returns, the experiment should have been successfully started.

参数:
  • port (int) -- The port of web UI.

  • debug (bool) -- Whether to start in debug mode.

  • run_mode (RunMode) -- Running the experiment in foreground or background

stop()[源代码]

Stop the experiment.

update_max_experiment_duration(value)[源代码]

Update an experiment's max_experiment_duration

参数:

value (str) -- Strings like '1m' for one minute or '2h' for two hours. SUFFIX may be 's' for seconds, 'm' for minutes, 'h' for hours or 'd' for days.

update_max_trial_number(value)[源代码]

Update an experiment's max_trial_number

参数:

value (int) -- New max_trial_number value.

update_search_space(value)[源代码]

Update the experiment's search_space. TODO: support searchspace file.

参数:

value (dict) -- New search_space.

update_trial_concurrency(value)[源代码]

Update an experiment's trial_concurrency

参数:

value (int) -- New trial_concurrency value.

view(port=8080, non_blocking=False)[源代码]

View a stopped experiment.

参数:
  • port (int) -- The port of web UI.

  • non_blocking (bool) -- If false, run in the foreground. If true, run in the background.

返回类型:

Return self instance.