torchtraining.pytorch¶
This module provides standard PyTorch operations (like backward)
in functional manner.
Note
IMPORTANT: This module is used almost all the time so be sure to understand how it works.
It allows users to perform training on single step for both training and evaluation
using PyTorch’s optimizer, backward or zeroing gradient, for example:
class Step(tt.steps.Step):
def forward(self, module, sample):
# Your forward step here
...
return loss, predictions
training = (
Step(criterion, gradient=True, device=device)
** tt.Select(loss=0)
** tt.pytorch.ZeroGrad(network)
** tt.pytorch.Backward()
** tt.pytorch.Optimize(optimizer)
** tt.pytorch.Detach()
)
evaluation = (
Step(criterion, gradient=False, device=device)
** tt.Select(predictions=1)
** tt.callbacks.Log(writer, "Predicted")
)
Some other operations are also simplified (e.g. gradient accumulation),
see torchtraining.callbacks.Optimize
-
class
torchtraining.pytorch.Backward(scaler=None, accumulate: int = 1, gradient: torch.Tensor = None)[source]¶ Run backpropagation on output tensor.
- Parameters
scaler (torch.cuda.amp.GradScaler, optional) – Gradient scaler used for automatic mixed precision mode.
accumulate (int, optional) – Divide loss by
accumulateif gradient accumulation is used. This approach averages gradient from multiple batches. Default:1(no accumulation)gradient (torch.Tensor, optional) – Tensor used as initial value to backpropagation. If unspecified, uses
torch.tensor([1.0])as default value (just liketensor.backward()call).
- Returns
Tensor after backward (possibly scaled by
accumulate)- Return type
-
forward(data)[source]¶ - Parameters
data (torch.Tensor) – Tensor on which
backwardwill be run (possibly accumulated). Usuallylossvalue
-
class
torchtraining.pytorch.Detach[source]¶ Returns a new Tensor, detached from the current graph.
Note
IMPORTANT: This operation should be used before accumulating values after
iterationin order not to grow backpropagation graph.- Returns
Detached tensor
- Return type
-
class
torchtraining.pytorch.Optimize(optimizer, accumulate: int = 1, closure=None, scaler=None, *args, **kwargs)[source]¶ Perform optimization step on
parametersstored byoptimizer.Currently specifying
closureandscaleris mutually exclusive.- Parameters
optimizer (torch.optim.Optimizer) – Instance of optimizer-like object with interface aligned with
torch.optim.Optimizer.accumulate (int, optional) – Divide loss by
accumulateif gradient accumulation is used. This approach averages gradient from multiple batches. Default:1(no accumulation)closure (Callable, optional) – A closure that reevaluates the model and returns the loss. Optional for most optimizers. Default:
Nonescaler (torch.cuda.amp.GradScaler, optional) – Gradient scaler used for automatic mixed precision mode. Default:
None*args – Arguments passed to either
scaler.step(if specified) oroptimizer.step**kwargs – Keyword arguments passed to either
scaler.step(if specified) oroptimizer.step
- Returns
Anything passed to
forward.- Return type
Any
-
class
torchtraining.pytorch.Schedule(scheduler, use_data: bool = False)[source]¶ Run single step of given scheduler.
Usually placed after each
steporiteration(depending on provided scheduler instance).- Returns
Value passed to function initially
- Return type
- Parameters
scheduler (torch.optim.lr_scheduler._LRScheduler) – Instance of scheduler-like object with interface aligned with
torch.optim.lr_scheduler._LRSchedulerbase classuse_data (bool) – Whether input data should be used when stepping scheduler.
-
class
torchtraining.pytorch.UpdateGradScaler(scaler)[source]¶ Update gradient scaler used with automatic mixed precision.
- Parameters
scaler (torch.cuda.amp.GradScaler) – Gradient scaler used for automatic mixed precision mode.
- Returns
Anything passed to
forward.- Return type
Any
-
class
torchtraining.pytorch.ZeroGrad(obj, accumulate: int = 1)[source]¶ Zero model or optimizer gradients.
Function
zero_grad()will be run on the provided object. Usually called after everystep(or after multiple steps, seeaccumulateargument).