• >
  • torchdata.modifiers
Shortcuts

torchdata.modifiers

This module allows you to modify behaviour of torchdata.cachers.

To cache in memory only 20 first samples you could do (assuming you have already created torchdata.Dataset instance named dataset):

dataset.cache(td.modifiers.UpToIndex(20, td.cachers.Memory()))

Modifers could be mixed intuitively as well using logical operators | (or) and & (and).

Example (cache to disk 20 first or samples with index 1000 and upwards):

dataset.cache(
    td.modifiers.UpToIndex(20, td.cachers.Memory())
    | td.modifiers.FromIndex(1000, td.cachers.Memory())
)

You can mix provided modifiers or extend them by inheriting from Modifier and implementing condition method (interface described below).

For most of cases Lambda modifier should be sufficient, for example:

# Only element up to `25th` and those which are divisible by `2`
dataset = dataset.cache(
    td.modifiers.UpToIndex(25, cacher)
    & td.modifiers.Lambda(lambda index: index % 2 == 0, cacher)
)
class torchdata.modifiers.All(*modifiers)[source]

Return True if all modifiers return True on given sample.

Parameters

*modifiers (List[torchdata.modifiers.Modifier]) – List of modifiers

class torchdata.modifiers.Any(*modifiers)[source]

Return True if any modifier returns True on given sample.

Parameters

*modifiers (List[torchdata.modifiers.Modifier]) – List of modifiers

class torchdata.modifiers.FromIndex(index: int, cacher)[source]

Cache samples from specified index leaving the rest untouched.

Parameters
  • index (int) – Index of sample

  • cacher (torchdata.cacher.Cacher) – Instance of cacher

class torchdata.modifiers.FromPercentage(p: float, length: int, cacher)[source]

Cache from specified percentage of samples leaving the rest untouched.

Parameters
  • p (float) – Percentage specified as flow between [0, 1].

  • length (int) – How many samples are in dataset. You can pass len(dataset).

  • cacher (torchdata.cacher.Cacher) – Instance of cacher

class torchdata.modifiers.Indices(cacher, *indices)[source]

Cache samples if index is one of specified.

Parameters
__init__(cacher, *indices)[source]

Initialize self. See help(type(self)) for accurate signature.

class torchdata.modifiers.Lambda(function: Callable, cacher)[source]

Cache samples if specified function returns True.

Parameters
  • function (Callable) – Single-element callable, if True returned, cache this sample. Number of sample is passed as an argument.

  • cacher (torchdata.cacher.Cacher) – Instance of cacher

__init__(function: Callable, cacher)[source]

Initialize self. See help(type(self)) for accurate signature.

class torchdata.modifiers.Modifier[source]

Interface for all modifiers.

Most methods are pre-configured, so user should not override them. In-fact only condition has to be overriden and __init__ implemented. Constructor should assign cacher to self in order for everything to work, see example below.

Example implementation of modifier caching only elements 0 to 100 of any td.cacher.Cacher:

import torchdata as td

class ExampleModifier(td.modifiers.Modifier):

    # You have to assign cacher to self.cacher so modifier works.
    def __init__(self, cacher):
        self.cacher = cacher

    def condition(self, index):
        return index < 100 # Cache if index smaller than 100
__and__(other)[source]

If self and other returns True, then use cacher.

Important: self and other should have the same cacher wrapped. Cacher of first modifier is used no matter what.

Parameters

other (Modifier) – Another modifier

Returns

Modifier concatenating both modifiers.

Return type

All

__contains__(index: int) → bool[source]

Acts as invisible proxy for cacher’s __contains__ method.

User should not override this method. For more information check torchdata.cacher.Cacher interface.

Parameters

index (int) – Index of sample

__getitem__(index: int)[source]

Acts as invisible proxy for cacher’s __getitem__ method.

User should not override this method. For more information check torchdata.cacher.Cacher interface.

Parameters

index (int) – Index of sample

__or__(other)[source]

If self or other returns True, then use cacher.

User should not override this method.

Important: self and other should have the same cacher wrapped. Otherwise exception is thrown. Cacher of first modifier is used in such case.

Parameters

other (Modifier) – Another modifier

Returns

Modifier concatenating both modifiers.

Return type

Any

__setitem__(index: int, data: Any) → None[source]

Acts as invisible proxy for cacher’s __setitem__ method.

User should not override this method. For more information check torchdata.cacher.Cacher interface.

Parameters
  • index (int) – Index of sample

  • data (typing.Any) – Data generated by dataset.

class torchdata.modifiers.UpToIndex(index: int, cacher)[source]

Cache up to samples of specified index leaving the rest untouched.

Parameters
  • index (int) – Index of sample

  • cacher (torchdata.cacher.Cacher) – Instance of cacher

class torchdata.modifiers.UpToPercentage(p: float, length: int, cacher)[source]

Cache up to percentage of samples leaving the rest untouched.

Parameters
  • p (float) – Percentage specified as flow between [0, 1].

  • length (int) – How many samples are in dataset. You can pass len(dataset).

  • cacher (torchdata.cacher.Cacher) – Instance of cacher