Local Training Service¶
With local training service, the whole experiment (e.g., tuning algorithms, trials) runs on a single machine, i.e., user’s dev machine. The generated trials run on this machine following
trialConcurrency set in the configuration yaml file. If GPUs are used by trial, local training service will allocate required number of GPUs for each trial, like a resource scheduler.
Currently, reuse mode remains disabled by default in local training service.
You are recommended to go through quick start first, as this document page only explains the configuration of local training service, one part of the experiment configuration yaml file.
# the experiment config yaml file
useActiveGpu: false # optional
There are other supported fields for local training service, such as
gpuIndices, for concurrently running multiple trials on one GPU, and running trials on a subset of GPUs on your machine. Please refer to LocalConfig in reference for detailed usage.
Users should set useActiveGpu to true, if the local machine has GPUs and your trial uses GPU, but generated trials keep waiting. This is usually the case when you are using graphical OS like Windows 10 and Ubuntu desktop.
Then we explain how local training service works with different configurations of
trialConcurrency. Suppose user’s local machine has 4 GPUs, with configuration
trialGpuNumber: 1 and
trialConcurrency: 4, there will be 4 trials run on this machine concurrently, each of which uses 1 GPU. If the configuration is
trialGpuNumber: 2 and
trialConcurrency: 2, there will be 2 trials run on this machine concurrently, each of which uses 2 GPUs. Which GPU is allocated to which trial is decided by local training service, users do not need to worry about it. An exmaple configuration below.
A complete example configuration file can be found Github link: examples/trials/mnist-pytorch/config.yml.