AML Training Service¶
To run your trials on AzureML, you can use AML training service. AML training service can programmatically submit runs to AzureML platform and collect their metrics.
Prerequisite¶
Create an Azure account/subscription using this link. If you already have an Azure account/subscription, skip this step.
Install the Azure CLI on your machine, follow the install guide here.
Authenticate to your Azure subscription from the CLI. To authenticate interactively, open a command line or terminal and use the following command:
az login
Log into your Azure account with a web browser and create a Machine Learning resource. You will need to choose a resource group and specific a workspace name. Then download
config.json
which will be used later.Create an AML cluster as the compute target.
Open a command line and install AML package environment.
python3 -m pip install azureml python3 -m pip install azureml-sdk
Usage¶
We show an example configuration here with YAML (Python configuration should be similar).
trialConcurrency: 1
maxTrialNumber: 10
...
trainingService:
platform: aml
dockerImage: msranni/nni
subscriptionId: ${your subscription ID}
resourceGroup: ${your resource group}
workspaceName: ${your workspace name}
computeTarget: ${your compute target}
Configuration References¶
Compared with Local Training Service and Remote Training Service, OpenPAI training service supports the following additional configurations.
Field name |
Description |
---|---|
dockerImage |
Required field. The docker image name used in job. If you don’t want to build your own, NNI has provided a docker image msranni/nni, which is up-to-date with every NNI release. |
subscriptionId |
Required field. The subscription id of your account, can be found in |
resourceGroup |
Required field. The resource group of your account, can be found in |
workspaceName |
Required field. The workspace name of your account, can be found in |
computeTarget |
Required field. The compute cluster name you want to use in your AML workspace. See reference and Step 5 above. |
maxTrialNumberPerGpu |
Optional field. Default 1. Used to specify the max concurrency trial number on a GPU device. |
useActiveGpu |
Optional field. Default false. Used to specify whether to use a GPU if there is another process. By default, NNI will use the GPU only if there is no other active process in the GPU. See Local Training Service for details. |
Monitor your trial on the cloud by using AML studio¶
To see your trial job’s detailed status on the cloud, you need to visit your studio which you create at Step 5 above. Once the job completes, go to the Outputs + logs tab. There you can see a 70_driver_log.txt
file, This file contains the standard output from a run and can be useful when you’re debugging remote runs in the cloud. Learn more about aml from here.