torchdata.maps¶

This module provides functions one can use with torchdata.Dataset.map method.

Following dataset object will be used throughout documentation for brevity (if not defined explicitly):

# Image loading dataset
import torchdata as td

class Example(td.Dataset):
    def __init__(self, max: int):
        self.values = list(range(max))

    def __getitem__(self, index):
        return self.values[index]

    def __len__(self):
        return len(self.values)

dataset = Example(100)

maps below are general and can be used in various scenarios.

class torchdata.maps.After(samples: int, function: Callable)[source]¶

Apply function after specified number of samples passed.

Useful for introducing data augmentation after an initial warm-up period. If you want a direct control over when function will be applied to sample, please use torchdata.transforms.OnSignal.

Example:

# After 10 samples apply lambda mapping
dataset = dataset.map(After(10, lambda x: -x))

Parameters

samples (int) – After how many samples function will start being applied.
function (Callable) – Function to apply to sample.

Returns

Either unchanged sample or function(sample)

Return type

Union[sample, function(sample)]

class torchdata.maps.Drop(*indices)[source]¶

Return sample without selected elements.

Sample has to be indexable object (has __getitem__ method implemented).

Important:

Negative indexing is supported if supported by sample object.
This function is slower than Select and the latter should be preffered.
If you want to select sample from nested tuple, please use Flatten first
Returns single element if only one element is left
Returns None if all elements are dropped

Example:

# Sample-wise concatenate dataset three times
new_dataset = dataset | dataset | dataset
# Zeroth and last samples dropped
selected = new_dataset.map(td.maps.Drop(0, 2))

Parameters: *indices (int) – Indices of objects to remove from the sample. If left empty, tuple containing all elements will be returned.
Returns: Tuple without selected elements
Return type: Tuple[samples]

class torchdata.maps.Except(function: Callable, *indices)[source]¶

Apply function to all elements of sample except the ones specified.

Sample has to be iterable object.

Important:

If you want to apply function to all nested elements (e.g. in nested tuple), please use torchdata.maps.Flatten object first.

Example:

# Sample-wise concatenate dataset three times
dataset |= dataset
# Every element increased by one except the first one
selected = new_dataset.map(td.maps.Except(lambda x: x+1, 0))

function¶

Function to apply to chosen elements of sample.

Type: Callable

\*indices

Indices of objects to which function will not be applied. If left empty, function will be applied to every element of sample.

Type: int

Returns: Tuple with subsamples where some have the function applied.
Return type: Tuple[function(subsample)]

class torchdata.maps.Flatten(types: Tuple = (<class 'list'>, <class 'tuple'>))[source]¶

Flatten arbitrarily nested sample.

Example:

# Nest elements
dataset = dataset.map(lambda x: (x, (x, (x, x), x),))
# Flatten no matter how deep
dataset = dataset.map(torchdata.maps.Flatten())

Parameters: types (Tuple[type], optional) – Types to be considered non-flat. Those will be recursively flattened. Default: (list, tuple)
Returns: Tuple with elements flattened
Return type: Tuple[samples]

class torchdata.maps.OnSignal(signal: Callable[[…], bool], function: Callable)[source]¶

Apply function based on boolean output of signalling function.

Useful for introducing data augmentation after an initial warm-up period. You can use it to turn on/off specific augmentation with respect to outer world, for example turning on image rotations after 5 epochs and turning off 5 epochs before the end in order to fine-tune your network.

Example:

import torch
from PIL import Image

import torchdata as td
import torchvision


# Image loading dataset
class ImageDataset(td.datasets.Files):
    def __getitem__(self, index):
        return Image.open(self.files[index])


class Handle:
    def __init__(self):
        self.value: bool = False

    def __call__(self):
        return self.value

# you can change handle.value to switch whether mapping should be applied
handle = Handle()
dataset = (
    ImageDataset.from_folder("./data")
    .map(torchvision.transforms.ToTensor())
    .cache()
    # If handle returns True, mapping will be applied
    .map(
        td.maps.OnSignal(
            handle, lambda image: image + torch.rand_like(image)
        )
    )
)

Parameters

signal (Callable) – No argument callable returning boolean, indicating whether to apply function.
function (Callable) – Function to apply to sample.

Returns

Either unchanged sample of function(sample)

Return type

Union[sample, function(sample)]

class torchdata.maps.Repeat(n: int, function: Callable)[source]¶

Apply function repeatedly to the sample.

Example:

import torchdata as td

# Creating td.Dataset instance
...
# Increase each value by 10 * 1
dataset = dataset.map(td.maps.Repeat(10, lambda x: x+1))

Parameters

n (int) – How many times the function will be applied.
function (Callable) – Function to apply.

Returns

Function(sample) applied n times.

Return type

function(sample)

class torchdata.maps.Select(*indices)[source]¶

Select elements from sample.

Sample has to be indexable object (has __getitem__ method implemented).

Important:

Negative indexing is supported if supported by sample object.
This function is faster than Drop and should be used if possible.
If you want to select sample from nested tuple, please use Flatten first
Returns single element if only one element is left

Example:

# Sample-wise concatenate dataset three times
new_dataset = dataset | dataset
# Only second (first index) element will be taken
selected = new_dataset.map(td.maps.Select(1))

Parameters: *indices (int) – Indices of objects to select from the sample. If left empty, empty tuple will be returned.
Returns: Tuple with selected elements
Return type: Tuple[samples]

class torchdata.maps.To(function: Callable, *indices)[source]¶

Apply function to specified elements of sample.

Sample has to be iterable object.

Important:

If you want to apply function to all nested elements (e.g. in nested tuple), please use torchdata.maps.Flatten object first.

Example:

# Sample-wise concatenate dataset three times
new_dataset = dataset | dataset | dataset
# Zero and first subsamples will be increased by one, last one left untouched
selected = new_dataset.map(td.maps.To(lambda x: x+1, 0, 1))

function¶

Function to apply to specified elements of sample.

Type: Callable

\*indices

Indices to which function will be applied. If left empty, function will not be applied to anything.

Type: int

Returns: Tuple consisting of subsamples with some having the function applied.
Return type: Tuple[function(subsample)]

class torchdata.maps.ToAll(function: Callable)[source]¶

Apply function to each element of sample.

Sample has to be iterable object.

Important:

If you want to apply function to all nested elements (e.g. in nested tuple), please use torchdata.maps.Flatten object first.

Example:

# Sample-wise concatenate dataset three times
new_dataset = dataset | dataset | dataset
# Each concatenated sample will be increased by 1
selected = new_dataset.map(td.maps.ToAll(lambda x: x+1))

function¶

Function to apply to each element of sample.

Type: Callable

Returns: Tuple consisting of subsamples with function applied.
Return type: Tuple[function(subsample)]