torchtraining.accelerators¶

Accelerators enabling distributed (multi-GPU/multi-node) training.

Accelerators should be instantiated only once and used on top-most module (in the following order):

epoch (if exists)

iteration (if exists)

step

Those are the only objects which can be “piped” into producers, for example:

tt.accelerators.Horovod(...) ** tt.iterations.Iteration(...)

And should be used in this way (although it’s not always necessary). See horovod module for an example.

class torchtraining.accelerators.Horovod(model, rank: int = 0, per_worker_threads: int = None, comm=None)[source]¶

Accelerate training using Uber’s Horovod framework.

See torchtraining.accelerators.horovod package for more information.

Note

IMPORTANT: This object needs horovod Python package to be visible. You can install it with pip install -U torchtraining[horovod]. Also you should export CUDA_HOME variable like this: CUDA_HOME=/opt/cuda pip install -U torchtraining[horovod] (your path may vary)

Parameters

module (torch.nn.Module) – Module to be broadcasted to all processes.
rank (int, optional) – Root process rank. Default: 0
per_worker_threads (int, optional) – Number of threads which can be utilized by each process. Default: pytorch’s default
comm (List, optional) – List specifying ranks for the communicator, relative to the MPI_COMM_WORLD communicator OR the MPI communicator to use. Given communicator will be duplicated. If None, Horovod will use MPI_COMM_WORLD Communicator. Default: None

Submodules

torchtraining.accelerators.horovod module