Hybrid Training Service¶
Hybrid training service is for aggregating different types of computation resources into a virtually unified resource pool, in which trial jobs are dispatched. Hybrid training service is for collecting user’s all available computation resources to jointly work on an AutoML task, it is flexibile enough to switch among different types of computation resources. For example, NNI could submit trial jobs to multiple remote machines and AML simultaneously.
NNI has supported Local Training Service, Remote Training Service, OpenPAI Training Service, AML Training Service, Kubeflow Training Service, FrameworkController Training Service, for hybrid training service. Before starting an experiment using using hybrid training service, users should first setup their chosen (sub) training services (e.g., remote training service) according to each training service’s own document page.
Reuse mode is disabled by default for local training service. But if you are using local training service in hybrid, reuse mode is enabled by default.
Unlike other training services (e.g.,
platform: remote in remote training service), there is no dedicated keyword for hybrid training service, users can simply list the configurations of their chosen training services under the
trainingService field. Below is an example of a hybrid training service containing remote training service and local training service in experiment configuration yaml.
# the experiment config yaml file ... trainingService: - platform: remote machineList: - host: 127.0.0.1 # your machine's IP address user: bob password: bob - platform: local ...
A complete example configuration file can be found in Github link: examples/trials/mnist-pytorch/config_hybrid.yml.