torchtraining.accelerators¶
Accelerators enabling distributed (multi-GPU/multi-node) training.
Accelerators should be instantiated only once and used on top-most module (in the following order):
epoch (if exists)
iteration (if exists)
step
Those are the only objects which can be “piped” into producers, for example:
tt.accelerators.Horovod(...) ** tt.iterations.Iteration(...)
And should be used in this way (although it’s not always necessary).
See horovod
module for an example.
-
class
torchtraining.accelerators.
Horovod
(model, rank: int = 0, per_worker_threads: int = None, comm=None)[source]¶ Accelerate training using Uber’s Horovod framework.
See
torchtraining.accelerators.horovod
package for more information.Note
IMPORTANT: This object needs
horovod
Python package to be visible. You can install it withpip install -U torchtraining[horovod]
. Also you should exportCUDA_HOME
variable like this:CUDA_HOME=/opt/cuda pip install -U torchtraining[horovod]
(your path may vary)- Parameters
module (torch.nn.Module) – Module to be broadcasted to all processes.
rank (int, optional) – Root process rank. Default:
0
per_worker_threads (int, optional) – Number of threads which can be utilized by each process. Default:
pytorch
’s defaultcomm (List, optional) – List specifying ranks for the communicator, relative to the
MPI_COMM_WORLD
communicator OR the MPI communicator to use. Given communicator will be duplicated. IfNone
, Horovod will use MPI_COMM_WORLD Communicator. Default:None
Submodules