torchtraining.accelerators¶
Accelerators enabling distributed (multi-GPU/multi-node) training.
Accelerators should be instantiated only once and used on top-most module (in the following order):
epoch (if exists)
iteration (if exists)
step
Those are the only objects which can be “piped” into producers, for example:
tt.accelerators.Horovod(...) ** tt.iterations.Iteration(...)
And should be used in this way (although it’s not always necessary).
See horovod module for an example.
-
class
torchtraining.accelerators.Horovod(model, rank: int = 0, per_worker_threads: int = None, comm=None)[source]¶ Accelerate training using Uber’s Horovod framework.
See
torchtraining.accelerators.horovodpackage for more information.Note
IMPORTANT: This object needs
horovodPython package to be visible. You can install it withpip install -U torchtraining[horovod]. Also you should exportCUDA_HOMEvariable like this:CUDA_HOME=/opt/cuda pip install -U torchtraining[horovod](your path may vary)- Parameters
module (torch.nn.Module) – Module to be broadcasted to all processes.
rank (int, optional) – Root process rank. Default:
0per_worker_threads (int, optional) – Number of threads which can be utilized by each process. Default:
pytorch’s defaultcomm (List, optional) – List specifying ranks for the communicator, relative to the
MPI_COMM_WORLDcommunicator OR the MPI communicator to use. Given communicator will be duplicated. IfNone, Horovod will use MPI_COMM_WORLD Communicator. Default:None
Submodules