Overview¶

NNI has supported many training services listed below. Users can go through each page to learning how to configure the corresponding training service. NNI has high extensibility by design, users can customize new training service for their special resource, platform or needs.

Training Service	Description
Local	The whole experiment runs on your dev machine (i.e., a single local machine)
Remote	The trials are dispatched to your configured SSH servers
OpenPAI	Running trials on OpenPAI, a DNN model training platform based on Kubernetes
Kubeflow	Running trials with Kubeflow, a DNN model training framework based on Kubernetes
AdaptDL	Running trials on AdaptDL, an elastic DNN model training platform
FrameworkController	Running trials with FrameworkController, a DNN model training framework on Kubernetes
AML	Running trials on Azure Machine Learning (AML) cloud service
PAI-DLC	Running trials on PAI-DLC, which is deep learning containers based on Alibaba ACK
Hybrid	Support jointly using multiple above training services

Training Service Under Reuse Mode¶

Since NNI v2.0, there are two sets of training service implementations in NNI. The new one is called reuse mode. When reuse mode is enabled, a cluster, such as a remote machine or a computer instance on AML, will launch a long-running environment, so that NNI will submit trials to these environments iteratively, which saves the time to create new jobs. For instance, using OpenPAI training platform under reuse mode can avoid the overhead of pulling docker images, creating containers, and downloading data repeatedly.

Note

In the reuse mode, users need to make sure each trial can run independently in the same job (e.g., avoid loading checkpoints from previous trials).