• >
  • torchtraining.pytorch
Shortcuts

torchtraining.pytorch

This module provides standard PyTorch operations (like backward) in functional manner.

Note

IMPORTANT: This module is used almost all the time so be sure to understand how it works.

It allows users to perform training on single step for both training and evaluation using PyTorch’s optimizer, backward or zeroing gradient, for example:

class Step(tt.steps.Step):
    def forward(self, module, sample):
        # Your forward step here
        ...
        return loss, predictions

training = (
    Step(criterion, gradient=True, device=device)
    ** tt.Select(loss=0)
    ** tt.pytorch.ZeroGrad(network)
    ** tt.pytorch.Backward()
    ** tt.pytorch.Optimize(optimizer)
    ** tt.pytorch.Detach()
)

evaluation = (
    Step(criterion, gradient=False, device=device)
    ** tt.Select(predictions=1)
    ** tt.callbacks.Log(writer, "Predicted")
)

Some other operations are also simplified (e.g. gradient accumulation), see torchtraining.callbacks.Optimize

class torchtraining.pytorch.Backward(scaler=None, accumulate: int = 1, gradient: torch.Tensor = None)[source]

Run backpropagation on output tensor.

Parameters
  • scaler (torch.cuda.amp.GradScaler, optional) – Gradient scaler used for automatic mixed precision mode.

  • accumulate (int, optional) – Divide loss by accumulate if gradient accumulation is used. This approach averages gradient from multiple batches. Default: 1 (no accumulation)

  • gradient (torch.Tensor, optional) – Tensor used as initial value to backpropagation. If unspecified, uses torch.tensor([1.0]) as default value (just like tensor.backward() call).

Returns

Tensor after backward (possibly scaled by accumulate)

Return type

torch.Tensor

forward(data)[source]
Parameters

data (torch.Tensor) – Tensor on which backward will be run (possibly accumulated). Usually loss value

class torchtraining.pytorch.Detach[source]

Returns a new Tensor, detached from the current graph.

Note

IMPORTANT: This operation should be used before accumulating values after iteration in order not to grow backpropagation graph.

Returns

Detached tensor

Return type

torch.Tensor

forward(data)[source]
Parameters

torch.Tensor – Tensor to be detached (new Tensor is returned).

class torchtraining.pytorch.Optimize(optimizer, accumulate: int = 1, closure=None, scaler=None, *args, **kwargs)[source]

Perform optimization step on parameters stored by optimizer.

Currently specifying closure and scaler is mutually exclusive.

Parameters
  • optimizer (torch.optim.Optimizer) – Instance of optimizer-like object with interface aligned with torch.optim.Optimizer.

  • accumulate (int, optional) – Divide loss by accumulate if gradient accumulation is used. This approach averages gradient from multiple batches. Default: 1 (no accumulation)

  • closure (Callable, optional) – A closure that reevaluates the model and returns the loss. Optional for most optimizers. Default: None

  • scaler (torch.cuda.amp.GradScaler, optional) – Gradient scaler used for automatic mixed precision mode. Default: None

  • *args – Arguments passed to either scaler.step (if specified) or optimizer.step

  • **kwargs – Keyword arguments passed to either scaler.step (if specified) or optimizer.step

Returns

Anything passed to forward.

Return type

Any

forward(data)[source]
Parameters

data (Any) – Anything as it does not influence this operation.

class torchtraining.pytorch.Schedule(scheduler, use_data: bool = False)[source]

Run single step of given scheduler.

Usually placed after each step or iteration (depending on provided scheduler instance).

Returns

Value passed to function initially

Return type

torch.Tensor

Parameters
  • scheduler (torch.optim.lr_scheduler._LRScheduler) – Instance of scheduler-like object with interface aligned with torch.optim.lr_scheduler._LRScheduler base class

  • use_data (bool) – Whether input data should be used when stepping scheduler.

forward(data)[source]
Parameters

torch.Tensor – Tensor which is optionally used to step scheduler.

class torchtraining.pytorch.UpdateGradScaler(scaler)[source]

Update gradient scaler used with automatic mixed precision.

Parameters

scaler (torch.cuda.amp.GradScaler) – Gradient scaler used for automatic mixed precision mode.

Returns

Anything passed to forward.

Return type

Any

forward(data)[source]
Parameters

data (Any) – Anything as it does not influence this operation.

class torchtraining.pytorch.ZeroGrad(obj, accumulate: int = 1)[source]

Zero model or optimizer gradients.

Function zero_grad() will be run on the provided object. Usually called after every step (or after multiple steps, see accumulate argument).

Parameters
  • obj (torch.optim.Optimizer | torch.nn.Module) – Instance of object to zero gradient on.

  • accumulate (int) – Accumulate gradient for specified number of iterations before zero-ing out gradient.

Returns

Anything passed to forward.

Return type

Any

forward(data)[source]
Parameters

data (Any) – Anything as it does not influence this operation.