torchdata.modifiers¶

This module allows you to modify behaviour of torchdata.cachers.

To cache in memory only 20 first samples you could do (assuming you have already created torchdata.Dataset instance named dataset):

dataset.cache(td.modifiers.UpToIndex(20, td.cachers.Memory()))

Modifers could be mixed intuitively as well using logical operators | (or) and & (and).

Example (cache to disk 20 first or samples with index 1000 and upwards):

dataset.cache(
    td.modifiers.UpToIndex(20, td.cachers.Memory())
    | td.modifiers.FromIndex(1000, td.cachers.Memory())
)

You can mix provided modifiers or extend them by inheriting from Modifier and implementing condition method (interface described below).

For most of cases Lambda modifier should be sufficient, for example:

# Only element up to `25th` and those which are divisible by `2`
dataset = dataset.cache(
    td.modifiers.UpToIndex(25, cacher)
    & td.modifiers.Lambda(lambda index: index % 2 == 0, cacher)
)

class torchdata.modifiers.All(*modifiers)[source]¶

Return True if all modifiers return True on given sample.

Parameters: *modifiers (List[torchdata.modifiers.Modifier]) – List of modifiers

class torchdata.modifiers.Any(*modifiers)[source]¶

Return True if any modifier returns True on given sample.

Parameters: *modifiers (List[torchdata.modifiers.Modifier]) – List of modifiers

class torchdata.modifiers.FromIndex(index: int, cacher)[source]¶

Cache samples from specified index leaving the rest untouched.

Parameters

index (int) – Index of sample
cacher (torchdata.cacher.Cacher) – Instance of cacher

class torchdata.modifiers.FromPercentage(p: float, length: int, cacher)[source]¶

Cache from specified percentage of samples leaving the rest untouched.

Parameters

p (float) – Percentage specified as flow between [0, 1].
length (int) – How many samples are in dataset. You can pass len(dataset).
cacher (torchdata.cacher.Cacher) – Instance of cacher

class torchdata.modifiers.Indices(cacher, *indices)[source]¶

Cache samples if index is one of specified.

Parameters

cacher (List[torchdata.modifiers.Modifier]) – List of modifiers
index (int) – Index of sample

__init__(cacher, *indices)[source]¶: Initialize self. See help(type(self)) for accurate signature.

class torchdata.modifiers.Lambda(function: Callable, cacher)[source]¶

Cache samples if specified function returns True.

Parameters

function (Callable) – Single-element callable, if True returned, cache this sample. Number of sample is passed as an argument.
cacher (torchdata.cacher.Cacher) – Instance of cacher

__init__(function: Callable, cacher)[source]¶: Initialize self. See help(type(self)) for accurate signature.

class torchdata.modifiers.Modifier[source]¶

Interface for all modifiers.

Most methods are pre-configured, so user should not override them. In-fact only condition has to be overriden and __init__ implemented. Constructor should assign cacher to self in order for everything to work, see example below.

Example implementation of modifier caching only elements 0 to 100 of any td.cacher.Cacher:

import torchdata as td

class ExampleModifier(td.modifiers.Modifier):

    # You have to assign cacher to self.cacher so modifier works.
    def __init__(self, cacher):
        self.cacher = cacher

    def condition(self, index):
        return index < 100 # Cache if index smaller than 100

__and__(other)[source]¶

If self and other returns True, then use cacher.

Important: self and other should have the same cacher wrapped. Cacher of first modifier is used no matter what.

Parameters: other (Modifier) – Another modifier
Returns: Modifier concatenating both modifiers.
Return type: All

__contains__(index: int) → bool[source]¶

Acts as invisible proxy for cacher’s __contains__ method.

User should not override this method. For more information check torchdata.cacher.Cacher interface.

Parameters: index (int) – Index of sample

__getitem__(index: int)[source]¶

Acts as invisible proxy for cacher’s __getitem__ method.

User should not override this method. For more information check torchdata.cacher.Cacher interface.

Parameters: index (int) – Index of sample

__or__(other)[source]¶

If self or other returns True, then use cacher.

User should not override this method.

Important: self and other should have the same cacher wrapped. Otherwise exception is thrown. Cacher of first modifier is used in such case.

Parameters: other (Modifier) – Another modifier
Returns: Modifier concatenating both modifiers.
Return type: Any

__setitem__(index: int, data: Any) → None[source]¶

Acts as invisible proxy for cacher’s __setitem__ method.

User should not override this method. For more information check torchdata.cacher.Cacher interface.

Parameters

index (int) – Index of sample
data (typing.Any) – Data generated by dataset.

class torchdata.modifiers.UpToIndex(index: int, cacher)[source]¶

Cache up to samples of specified index leaving the rest untouched.

Parameters

index (int) – Index of sample
cacher (torchdata.cacher.Cacher) – Instance of cacher

class torchdata.modifiers.UpToPercentage(p: float, length: int, cacher)[source]¶

Cache up to percentage of samples leaving the rest untouched.

Parameters

p (float) – Percentage specified as flow between [0, 1].
length (int) – How many samples are in dataset. You can pass len(dataset).
cacher (torchdata.cacher.Cacher) – Instance of cacher