• >
  • torchdata.datasets
Shortcuts

torchdata.datasets

Concrete implementations of torchdata.Dataset and torchdata.Iterable.

Classes below extend and/or make it easier for user to implement common functionalities. To use standard PyTorch datasets defined by, for example, torchvision, you can use WrapDataset or WrapIterable like this:

import torchdata
import torchvision

dataset = torchdata.datasets.WrapDataset(
    torchvision.datasets.MNIST("./data", download=True)
)

After that you can use map, apply and other functionalities like you normally would with either torchdata.Dataset or torchdata.Iterable.

class torchdata.datasets.ChainDataset(datasets)[source]

Concrete torchdata.Dataset responsible for chaining multiple datasets.

This class is returned when + (logical or operator) is used on instance of torchdata.Dataset (original torch.utils.data.Dataset can be used as well). Acts just like PyTorch’s + or rather torch.utils.data.ConcatDataset

Important: This class is meant to be more of a proxy for + operator, you can use it directly though.

Example:

# Iterate over 3 datasets consecutively
dataset = torchdata.ChainDataset([dataset1, dataset2, dataset3])

Any Dataset methods can be used normally.

datasets

List of datasets to be chained.

Type

List[Union[torchdata.Dataset, torch.utils.data.Dataset]]

__init__(datasets)[source]

Initialize self. See help(type(self)) for accurate signature.

class torchdata.datasets.ChainIterable(datasets)[source]

Concrete torchdata.Iterable responsible for chaining multiple datasets.

This class is returned when + (logical or operator) is used on instance of torchdata.Iterable (original torch.utils.data.Iterable can be used as well). Acts just like PyTorch’s + and ChainDataset.

Important: This class is meant to be more of a proxy for + operator, you can use it directly though.

Example:

# Iterate over 3 iterable datasets consecutively
dataset = torchdata.ChainDataset([dataset1, dataset2, dataset3])

Any Iterable methods can be used normally.

datasets

List of datasets to be chained.

Type

List[Union[torchdata.Iterable, torch.utils.data.IterableDataset]]

__init__(datasets)[source]

Initialize self. See help(type(self)) for accurate signature.

class torchdata.datasets.ConcatDataset(datasets: List)[source]

Concrete torchdata.Dataset responsible for sample-wise concatenation.

This class is returned when | (logical or operator) is used on instance of torchdata.Dataset (original torch.utils.data.Dataset can be used as well).

Important: This class is meant to be more of a proxy for | operator, you can use it directly though.

Example:

dataset = (
    torchdata.ConcatDataset([dataset1, dataset2, dataset3])
    .map(lambda sample: sample[0] + sample[1] + sample[2]))
)

Any Dataset methods can be used normally.

datasets

List of datasets to be concatenated sample-wise.

Type

List[Union[torchdata.Dataset, torch.utils.data.Dataset]]

__init__(datasets: List)[source]

Initialize self. See help(type(self)) for accurate signature.

class torchdata.datasets.ConcatIterable(datasets: List)[source]

Concrete Iterable responsible for sample-wise concatenation.

This class is returned when | (logical or operator) is used on instance of Iterable (original torch.utils.data.IterableDataset can be used as well).

Important: This class is meant to be more of a proxy for | operator, you can use it directly though.

Example:

dataset = (
    torchdata.ConcatIterable([dataset1, dataset2, dataset3])
    .map(lambda x, y, z: (x + y, z))
)

Any IterableDataset methods can be used normally.

datasets

List of datasets to be concatenated sample-wise.

Type

List[Union[torchdata.Iterable, torch.utils.data.IterableDataset]]

__init__(datasets: List)[source]

Initialize self. See help(type(self)) for accurate signature.

class torchdata.datasets.Files(files: List[pathlib.Path], *args, **kwargs)[source]

Create Dataset from list of files.

Each file is a separate sample. User can use this class directly as all necessary methods are implemented.

__getitem__ uses Python’s open and returns file. It’s implementation looks like:

# You can modify open behaviour by passing args nad kwargs to __init__
with open(self.files[index], *self.args, **self.kwargs) as file:
    return file

you can use map method in order to modify returned file or you can overload __getitem__ (image opening example below):

import torchdata
import torchvision

from PIL import Image


# Image loading dataset
class ImageDataset(torchdata.datasets.FilesDataset):
    def __getitem__(self, index):
        return Image.open(self.files[index])

# Useful class methods are inherited as well
dataset = ImageDataset.from_folder("./data", regex="*.png").map(
    torchvision.transforms.ToTensor()
)

from_folder class method is available for common case of creating dataset from files in folder.

Parameters
  • files (List[pathlib.Path]) – List of files to be used.

  • regex (str, optional) – Regex to be used for filtering. Default: * (all files)

  • *args – Arguments saved for __getitem__

  • **kwargs – Keyword arguments saved for __getitem__

__init__(files: List[pathlib.Path], *args, **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.

filter(predicate: Callable)[source]

Remove files for which predicate returns `False`**.**

Note: This is different from torchdata.Iterable’s filter method, as the filtering is done when called, not during iteration.

Parameters

predicate (Callable) – Function-like object taking file as argument and returning boolean indicating whether to keep a file.

Returns

Modified self

Return type

FilesDataset

classmethod from_folder(path: pathlib.Path, regex: str = '*', *args, **kwargs)[source]

Create dataset from pathlib.Path -like object.

Path should be a directory and will be extended via glob method taking regex (if specified). Varargs and kwargs will be saved for use for __getitem__ method.

Parameters
  • path (pathlib.Path) – Path object (directory) containing samples.

  • regex (str, optional) – Regex to be used for filtering. Default: * (all files)

  • *args – Arguments saved for __getitem__

  • **kwargs – Keyword arguments saved for __getitem__

Returns

Instance of your file based dataset.

Return type

FilesDataset

sort(key=None, reverse=False)[source]

Sort files using Python’s built-in sorted method.

Arguments are passed directly to sorted.

Parameters
  • key (Callable, optional) – Specifies a function of one argument that is used to extract a comparison key from each element. Default: None (compare the elements directly).

  • reverse (bool, optional) – Whether sorting should be descending. Default: False

Returns

Modified self

Return type

FilesDataset

class torchdata.datasets.Generator(expression)[source]

Iterable wrapping any generator expression.

expression: Generator expression

Generator from which one can yield via yield from syntax.

__init__(expression)[source]

Initialize self. See help(type(self)) for accurate signature.

class torchdata.datasets.TensorDataset(*tensors)[source]

Dataset wrapping torch.tensors .

cache, map etc. enabled version of torch.utils.data.TensorDataset.

*tensorstorch.Tensor

List of tensors to be wrapped.

__init__(*tensors)[source]

Initialize self. See help(type(self)) for accurate signature.

class torchdata.datasets.WrapDataset(dataset)[source]

Dataset wrapping standard torch.data.utils.Dataset and making it torchdata.Dataset compatible.

All attributes of wrapped dataset can be used normally, for example:

dataset = td.datasets.WrapDataset(
    torchvision.datasets.MNIST("./data")
)
dataset.train # True, has all MNIST attributes
dataset: torch.data.utils.Dataset

Dataset to be wrapped

__init__(dataset)[source]

Initialize self. See help(type(self)) for accurate signature.

class torchdata.datasets.WrapIterable(dataset)[source]

Iterable wrapping standard torch.data.utils.IterableDataset and making it torchdata.Iterable compatible.

All attributes of wrapped dataset can be used normally as is the case for torchdata.datasets.WrapDataset.

dataset: torch.data.utils.Dataset

Dataset to be wrapped

__init__(dataset)[source]

Initialize self. See help(type(self)) for accurate signature.