torchdata.datasets¶
Concrete implementations of torchdata.Dataset and torchdata.Iterable.
Classes below extend and/or make it easier for user to implement common functionalities.
To use standard PyTorch datasets defined by, for example, torchvision, you can
use WrapDataset or WrapIterable like this:
import torchdata
import torchvision
dataset = torchdata.datasets.WrapDataset(
torchvision.datasets.MNIST("./data", download=True)
)
After that you can use map, apply and other functionalities like you normally would with
either torchdata.Dataset or torchdata.Iterable.
-
class
torchdata.datasets.ChainDataset(datasets)[source]¶ Concrete
torchdata.Datasetresponsible for chaining multiple datasets.This class is returned when
+(logical or operator) is used on instance oftorchdata.Dataset(originaltorch.utils.data.Datasetcan be used as well). Acts just like PyTorch’s+or rather torch.utils.data.ConcatDatasetImportant: This class is meant to be more of a proxy for
+operator, you can use it directly though.Example:
# Iterate over 3 datasets consecutively dataset = torchdata.ChainDataset([dataset1, dataset2, dataset3])
Any
Datasetmethods can be used normally.-
datasets¶ List of datasets to be chained.
- Type
List[Union[torchdata.Dataset, torch.utils.data.Dataset]]
-
-
class
torchdata.datasets.ChainIterable(datasets)[source]¶ Concrete
torchdata.Iterableresponsible for chaining multiple datasets.This class is returned when
+(logical or operator) is used on instance oftorchdata.Iterable(originaltorch.utils.data.Iterablecan be used as well). Acts just like PyTorch’s+and ChainDataset.Important: This class is meant to be more of a proxy for
+operator, you can use it directly though.Example:
# Iterate over 3 iterable datasets consecutively dataset = torchdata.ChainDataset([dataset1, dataset2, dataset3])
Any
Iterablemethods can be used normally.-
datasets¶ List of datasets to be chained.
- Type
List[Union[torchdata.Iterable, torch.utils.data.IterableDataset]]
-
-
class
torchdata.datasets.ConcatDataset(datasets: List)[source]¶ Concrete
torchdata.Datasetresponsible for sample-wise concatenation.This class is returned when
|(logical or operator) is used on instance oftorchdata.Dataset(original torch.utils.data.Dataset can be used as well).Important: This class is meant to be more of a proxy for
|operator, you can use it directly though.Example:
dataset = ( torchdata.ConcatDataset([dataset1, dataset2, dataset3]) .map(lambda sample: sample[0] + sample[1] + sample[2])) )
Any
Datasetmethods can be used normally.-
datasets¶ List of datasets to be concatenated sample-wise.
- Type
List[Union[torchdata.Dataset, torch.utils.data.Dataset]]
-
-
class
torchdata.datasets.ConcatIterable(datasets: List)[source]¶ Concrete
Iterableresponsible for sample-wise concatenation.This class is returned when
|(logical or operator) is used on instance ofIterable(original torch.utils.data.IterableDataset can be used as well).Important: This class is meant to be more of a proxy for
|operator, you can use it directly though.Example:
dataset = ( torchdata.ConcatIterable([dataset1, dataset2, dataset3]) .map(lambda x, y, z: (x + y, z)) )
Any
IterableDatasetmethods can be used normally.-
datasets¶ List of datasets to be concatenated sample-wise.
- Type
List[Union[torchdata.Iterable, torch.utils.data.IterableDataset]]
-
-
class
torchdata.datasets.Files(files: List[pathlib.Path], *args, **kwargs)[source]¶ Create
Datasetfrom list of files.Each file is a separate sample. User can use this class directly as all necessary methods are implemented.
__getitem__uses Python’s open and returns file. It’s implementation looks like:# You can modify open behaviour by passing args nad kwargs to __init__ with open(self.files[index], *self.args, **self.kwargs) as file: return file
you can use
mapmethod in order to modify returnedfileor you can overload__getitem__(image opening example below):import torchdata import torchvision from PIL import Image # Image loading dataset class ImageDataset(torchdata.datasets.FilesDataset): def __getitem__(self, index): return Image.open(self.files[index]) # Useful class methods are inherited as well dataset = ImageDataset.from_folder("./data", regex="*.png").map( torchvision.transforms.ToTensor() )
from_folderclass method is available for common case of creating dataset from files in folder.- Parameters
files (List[pathlib.Path]) – List of files to be used.
regex (str, optional) – Regex to be used for filtering. Default:
*(all files)*args – Arguments saved for
__getitem__**kwargs – Keyword arguments saved for
__getitem__
-
__init__(files: List[pathlib.Path], *args, **kwargs)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
filter(predicate: Callable)[source]¶ Remove
filesfor which predicate returns `False`**.**Note: This is different from
torchdata.Iterable’sfiltermethod, as the filtering is done when called, not during iteration.- Parameters
predicate (Callable) – Function-like object taking file as argument and returning boolean indicating whether to keep a file.
- Returns
Modified self
- Return type
FilesDataset
-
classmethod
from_folder(path: pathlib.Path, regex: str = '*', *args, **kwargs)[source]¶ Create dataset from
pathlib.Path-like object.Path should be a directory and will be extended via
globmethod takingregex(if specified). Varargs and kwargs will be saved for use for__getitem__method.- Parameters
path (pathlib.Path) – Path object (directory) containing samples.
regex (str, optional) – Regex to be used for filtering. Default:
*(all files)*args – Arguments saved for
__getitem__**kwargs – Keyword arguments saved for
__getitem__
- Returns
Instance of your file based dataset.
- Return type
FilesDataset
-
sort(key=None, reverse=False)[source]¶ Sort files using Python’s built-in
sortedmethod.Arguments are passed directly to
sorted.- Parameters
key (Callable, optional) – Specifies a function of one argument that is used to extract a comparison key from each element. Default:
None(compare the elements directly).reverse (bool, optional) – Whether
sortingshould be descending. Default:False
- Returns
Modified self
- Return type
FilesDataset
-
class
torchdata.datasets.Generator(expression)[source]¶ Iterable wrapping any generator expression.
- expression: Generator expression
Generator from which one can
yieldviayield fromsyntax.
-
class
torchdata.datasets.TensorDataset(*tensors)[source]¶ Dataset wrapping
torch.tensors.cache,mapetc. enabled version of torch.utils.data.TensorDataset.- *tensorstorch.Tensor
List of
tensorsto be wrapped.
-
class
torchdata.datasets.WrapDataset(dataset)[source]¶ Dataset wrapping standard
torch.data.utils.Datasetand making ittorchdata.Datasetcompatible.All attributes of wrapped dataset can be used normally, for example:
dataset = td.datasets.WrapDataset( torchvision.datasets.MNIST("./data") ) dataset.train # True, has all MNIST attributes
- dataset:
torch.data.utils.Dataset Dataset to be wrapped
- dataset:
-
class
torchdata.datasets.WrapIterable(dataset)[source]¶ Iterable wrapping standard
torch.data.utils.IterableDatasetand making ittorchdata.Iterablecompatible.All attributes of wrapped dataset can be used normally as is the case for
torchdata.datasets.WrapDataset.- dataset:
torch.data.utils.Dataset Dataset to be wrapped
- dataset: