Overview

Track as 3 kind of objects, Project, Trial Group and Trial.

  • Project is a top level object that holds all of its trials and groups
  • TrialGroup is a set of trials. They are used to order trials together. trials can belong to multiple groups
  • Trial is the object holding all the information about a given training session. the trial object is the backbone of track and it is the object you will have to deal with the most often

Overview

from track import TrackClient

client = TrackClient('file://client_test.json')

project = client.set_project(name='paper_78997')
group = client.set_group(name='idea_4573')

trial = client.new_trial(name='final_trial_2', description='almost graduating')
trial.log_arguments(batch_size=256, lr=0.01,momentum=0.99)
trial.log_metadata(gpu='V100')

# start the trial explicitly
with trial:

    for e in range(epochs):

        for batch in dataset:

            # trial helper that compute elapsed time inside a block
            with trial.chrono('batch_time'):
                ...
                loss += ...

        trial.log_metrics(step=e, epoch_loss=loss)

    trial.log_metrics(accuracy=0.98)

client.report()

You can find the sample of a report below

{
  "revision": 1,
  "name": "final_trial_2",
  "description": "almost graduating",
  "version": "a8c3",
  "tags": {
    "workers": 8,
    "hpo": "byopt"
  },
  "parameters": {
    "batch_size": 32,
    "cuda": true,
    "workers": 0,
    "seed": 0,
    "epochs": 2,
    "arch": "convnet",
    "lr": 0.1,
    "momentum": 0.9,
    "opt_level": "O0",
    "break_after": null,
    "data": "mnist",
    "backend": null
  },
  "metadata": {},
  "metrics": {
    "epoch_loss": {
      "0": 2.306920262972514,
      "1": 2.307889754740397
    }
  },
  "chronos": {
    "runtime": 3142.5199086666107,
    "batch_time": {
      "avg": 0.6737696465350126,
      "min": 0.019209623336791992,
      "max": 445.9658739566803,
      "sd": 12.500646799505962,
      "count": 3751,
      "unit": "s"
    }
  },
  "errors": [],
  "status": {
    "value": 302,
    "name": "Completed"
  }
}

Log Metrics

User can log metrics with a step or without. step is used as key in a dictionary and should be unique

trial.log_metrics(step=e, epoch_loss=loss, metric2=value)

trial.log_metrics(cost=val)

Time things

You can easily time things with chrono. Do not forget if you are measuring GPU compute time should should synchronize to make sure the computation are done before computing the elapsed time.

with trial.chrono('long_compute'):
    sleep(100)

Save arbitrary data

You can use metadata to save information on a specific trial that might not be reflected by its parameters

trial.log_metadata(had_short_hair_when_running_this_trial=False)

Experiment Report

Get a quick overview of all the data that was saved up during training

trial.report()