torchdata.datasets¶
Concrete implementations of torchdata.Dataset
and torchdata.Iterable
.
Classes below extend and/or make it easier for user to implement common functionalities.
To use standard PyTorch datasets defined by, for example, torchvision
, you can
use WrapDataset
or WrapIterable
like this:
import torchdata
import torchvision
dataset = torchdata.datasets.WrapDataset(
torchvision.datasets.MNIST("./data", download=True)
)
After that you can use map
, apply
and other functionalities like you normally would with
either torchdata.Dataset
or torchdata.Iterable
.
-
class
torchdata.datasets.
ChainDataset
(datasets)[source]¶ Concrete
torchdata.Dataset
responsible for chaining multiple datasets.This class is returned when
+
(logical or operator) is used on instance oftorchdata.Dataset
(originaltorch.utils.data.Dataset
can be used as well). Acts just like PyTorch’s+
or rather torch.utils.data.ConcatDatasetImportant: This class is meant to be more of a proxy for
+
operator, you can use it directly though.Example:
# Iterate over 3 datasets consecutively dataset = torchdata.ChainDataset([dataset1, dataset2, dataset3])
Any
Dataset
methods can be used normally.-
datasets
¶ List of datasets to be chained.
- Type
List[Union[torchdata.Dataset, torch.utils.data.Dataset]]
-
-
class
torchdata.datasets.
ChainIterable
(datasets)[source]¶ Concrete
torchdata.Iterable
responsible for chaining multiple datasets.This class is returned when
+
(logical or operator) is used on instance oftorchdata.Iterable
(originaltorch.utils.data.Iterable
can be used as well). Acts just like PyTorch’s+
and ChainDataset.Important: This class is meant to be more of a proxy for
+
operator, you can use it directly though.Example:
# Iterate over 3 iterable datasets consecutively dataset = torchdata.ChainDataset([dataset1, dataset2, dataset3])
Any
Iterable
methods can be used normally.-
datasets
¶ List of datasets to be chained.
- Type
List[Union[torchdata.Iterable, torch.utils.data.IterableDataset]]
-
-
class
torchdata.datasets.
ConcatDataset
(datasets: List)[source]¶ Concrete
torchdata.Dataset
responsible for sample-wise concatenation.This class is returned when
|
(logical or operator) is used on instance oftorchdata.Dataset
(original torch.utils.data.Dataset can be used as well).Important: This class is meant to be more of a proxy for
|
operator, you can use it directly though.Example:
dataset = ( torchdata.ConcatDataset([dataset1, dataset2, dataset3]) .map(lambda sample: sample[0] + sample[1] + sample[2])) )
Any
Dataset
methods can be used normally.-
datasets
¶ List of datasets to be concatenated sample-wise.
- Type
List[Union[torchdata.Dataset, torch.utils.data.Dataset]]
-
-
class
torchdata.datasets.
ConcatIterable
(datasets: List)[source]¶ Concrete
Iterable
responsible for sample-wise concatenation.This class is returned when
|
(logical or operator) is used on instance ofIterable
(original torch.utils.data.IterableDataset can be used as well).Important: This class is meant to be more of a proxy for
|
operator, you can use it directly though.Example:
dataset = ( torchdata.ConcatIterable([dataset1, dataset2, dataset3]) .map(lambda x, y, z: (x + y, z)) )
Any
IterableDataset
methods can be used normally.-
datasets
¶ List of datasets to be concatenated sample-wise.
- Type
List[Union[torchdata.Iterable, torch.utils.data.IterableDataset]]
-
-
class
torchdata.datasets.
Files
(files: List[pathlib.Path], *args, **kwargs)[source]¶ Create
Dataset
from list of files.Each file is a separate sample. User can use this class directly as all necessary methods are implemented.
__getitem__
uses Python’s open and returns file. It’s implementation looks like:# You can modify open behaviour by passing args nad kwargs to __init__ with open(self.files[index], *self.args, **self.kwargs) as file: return file
you can use
map
method in order to modify returnedfile
or you can overload__getitem__
(image opening example below):import torchdata import torchvision from PIL import Image # Image loading dataset class ImageDataset(torchdata.datasets.FilesDataset): def __getitem__(self, index): return Image.open(self.files[index]) # Useful class methods are inherited as well dataset = ImageDataset.from_folder("./data", regex="*.png").map( torchvision.transforms.ToTensor() )
from_folder
class method is available for common case of creating dataset from files in folder.- Parameters
files (List[pathlib.Path]) – List of files to be used.
regex (str, optional) – Regex to be used for filtering. Default:
*
(all files)*args – Arguments saved for
__getitem__
**kwargs – Keyword arguments saved for
__getitem__
-
__init__
(files: List[pathlib.Path], *args, **kwargs)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
filter
(predicate: Callable)[source]¶ Remove
files
for which predicate returns `False`**.**Note: This is different from
torchdata.Iterable
’sfilter
method, as the filtering is done when called, not during iteration.- Parameters
predicate (Callable) – Function-like object taking file as argument and returning boolean indicating whether to keep a file.
- Returns
Modified self
- Return type
FilesDataset
-
classmethod
from_folder
(path: pathlib.Path, regex: str = '*', *args, **kwargs)[source]¶ Create dataset from
pathlib.Path
-like object.Path should be a directory and will be extended via
glob
method takingregex
(if specified). Varargs and kwargs will be saved for use for__getitem__
method.- Parameters
path (pathlib.Path) – Path object (directory) containing samples.
regex (str, optional) – Regex to be used for filtering. Default:
*
(all files)*args – Arguments saved for
__getitem__
**kwargs – Keyword arguments saved for
__getitem__
- Returns
Instance of your file based dataset.
- Return type
FilesDataset
-
sort
(key=None, reverse=False)[source]¶ Sort files using Python’s built-in
sorted
method.Arguments are passed directly to
sorted
.- Parameters
key (Callable, optional) – Specifies a function of one argument that is used to extract a comparison key from each element. Default:
None
(compare the elements directly).reverse (bool, optional) – Whether
sorting
should be descending. Default:False
- Returns
Modified self
- Return type
FilesDataset
-
class
torchdata.datasets.
Generator
(expression)[source]¶ Iterable wrapping any generator expression.
- expression: Generator expression
Generator from which one can
yield
viayield from
syntax.
-
class
torchdata.datasets.
TensorDataset
(*tensors)[source]¶ Dataset wrapping
torch.tensors
.cache
,map
etc. enabled version of torch.utils.data.TensorDataset.- *tensorstorch.Tensor
List of
tensors
to be wrapped.
-
class
torchdata.datasets.
WrapDataset
(dataset)[source]¶ Dataset wrapping standard
torch.data.utils.Dataset
and making ittorchdata.Dataset
compatible.All attributes of wrapped dataset can be used normally, for example:
dataset = td.datasets.WrapDataset( torchvision.datasets.MNIST("./data") ) dataset.train # True, has all MNIST attributes
- dataset:
torch.data.utils.Dataset
Dataset to be wrapped
- dataset:
-
class
torchdata.datasets.
WrapIterable
(dataset)[source]¶ Iterable wrapping standard
torch.data.utils.IterableDataset
and making ittorchdata.Iterable
compatible.All attributes of wrapped dataset can be used normally as is the case for
torchdata.datasets.WrapDataset
.- dataset:
torch.data.utils.Dataset
Dataset to be wrapped
- dataset: