torchtraining.pytorch¶

This module provides standard PyTorch operations (like backward) in functional manner.

Note

IMPORTANT: This module is used almost all the time so be sure to understand how it works.

It allows users to perform training on single step for both training and evaluation using PyTorch’s optimizer, backward or zeroing gradient, for example:

class Step(tt.steps.Step):
    def forward(self, module, sample):
        # Your forward step here
        ...
        return loss, predictions

training = (
    Step(criterion, gradient=True, device=device)
    ** tt.Select(loss=0)
    ** tt.pytorch.ZeroGrad(network)
    ** tt.pytorch.Backward()
    ** tt.pytorch.Optimize(optimizer)
    ** tt.pytorch.Detach()
)

evaluation = (
    Step(criterion, gradient=False, device=device)
    ** tt.Select(predictions=1)
    ** tt.callbacks.Log(writer, "Predicted")
)

Some other operations are also simplified (e.g. gradient accumulation), see torchtraining.callbacks.Optimize

class torchtraining.pytorch.Backward(scaler=None, accumulate: int = 1, gradient: torch.Tensor = None)[source]¶

Run backpropagation on output tensor.

Parameters

scaler (torch.cuda.amp.GradScaler, optional) – Gradient scaler used for automatic mixed precision mode.
accumulate (int, optional) – Divide loss by accumulate if gradient accumulation is used. This approach averages gradient from multiple batches. Default: 1 (no accumulation)
gradient (torch.Tensor, optional) – Tensor used as initial value to backpropagation. If unspecified, uses torch.tensor([1.0]) as default value (just like tensor.backward() call).

Returns

Tensor after backward (possibly scaled by accumulate)

Return type

torch.Tensor

forward(data)[source]¶

Parameters: data (torch.Tensor) – Tensor on which backward will be run (possibly accumulated). Usually loss value

class torchtraining.pytorch.Detach[source]¶

Returns a new Tensor, detached from the current graph.

Note

IMPORTANT: This operation should be used before accumulating values after iteration in order not to grow backpropagation graph.

Returns: Detached tensor
Return type: torch.Tensor

forward(data)[source]¶

Parameters: torch.Tensor – Tensor to be detached (new Tensor is returned).

class torchtraining.pytorch.Optimize(optimizer, accumulate: int = 1, closure=None, scaler=None, *args, **kwargs)[source]¶

Perform optimization step on parameters stored by optimizer.

Currently specifying closure and scaler is mutually exclusive.

Parameters

optimizer (torch.optim.Optimizer) – Instance of optimizer-like object with interface aligned with torch.optim.Optimizer.
accumulate (int, optional) – Divide loss by accumulate if gradient accumulation is used. This approach averages gradient from multiple batches. Default: 1 (no accumulation)
closure (Callable, optional) – A closure that reevaluates the model and returns the loss. Optional for most optimizers. Default: None
scaler (torch.cuda.amp.GradScaler, optional) – Gradient scaler used for automatic mixed precision mode. Default: None
*args – Arguments passed to either scaler.step (if specified) or optimizer.step
**kwargs – Keyword arguments passed to either scaler.step (if specified) or optimizer.step

Returns

Anything passed to forward.

Return type

Any

forward(data)[source]¶

Parameters: data (Any) – Anything as it does not influence this operation.

class torchtraining.pytorch.Schedule(scheduler, use_data: bool = False)[source]¶

Run single step of given scheduler.

Usually placed after each step or iteration (depending on provided scheduler instance).

Returns

Value passed to function initially

Return type

torch.Tensor

Parameters

scheduler (torch.optim.lr_scheduler._LRScheduler) – Instance of scheduler-like object with interface aligned with torch.optim.lr_scheduler._LRScheduler base class
use_data (bool) – Whether input data should be used when stepping scheduler.

forward(data)[source]¶

Parameters: torch.Tensor – Tensor which is optionally used to step scheduler.

class torchtraining.pytorch.UpdateGradScaler(scaler)[source]¶

Update gradient scaler used with automatic mixed precision.

Parameters: scaler (torch.cuda.amp.GradScaler) – Gradient scaler used for automatic mixed precision mode.
Returns: Anything passed to forward.
Return type: Any

forward(data)[source]¶

Parameters: data (Any) – Anything as it does not influence this operation.

class torchtraining.pytorch.ZeroGrad(obj, accumulate: int = 1)[source]¶

Zero model or optimizer gradients.

Function zero_grad() will be run on the provided object. Usually called after every step (or after multiple steps, see accumulate argument).

Parameters

obj (torch.optim.Optimizer | torch.nn.Module) – Instance of object to zero gradient on.
accumulate (int) – Accumulate gradient for specified number of iterations before zero-ing out gradient.

Returns

Anything passed to forward.

Return type

Any

forward(data)[source]¶

Parameters: data (Any) – Anything as it does not influence this operation.