torchdata.cachers¶
This module contains interface needed for cachers (used in cache method of td.Dataset ) .
To cache on disk all samples using Python’s pickle in folder cache
(assuming you have already created td.Dataset instance named dataset):
import torchdata as td
...
dataset.cache(td.cachers.Pickle("./cache"))
Users are encouraged to write their custom cachers if the ones provided below
are too slow or not good enough for their purposes (see Cacher abstract interface below).
-
class
torchdata.cachers.Cacher[source]¶ Interface to fulfil to make object compatible with
torchdata.Dataset.cachemethod.If you want to implement your own
cachingfunctionality, inherit from this class and implement methods described below.-
abstract
__contains__(index: int) → bool[source]¶ Return true if sample under
indexis cached.If
Falsereturned, cacher’s__setitem__will be called, hence if you are not going to cache sample under thisindex, you should describe this operation at that method. This is simply a boolean indicator whether sample is cached.If
Truecacher’s__getitem__will be called and it’s users responsibility to return correct value in such case.- Parameters
index (int) – Index of sample
-
abstract
__getitem__(index) → Any[source]¶ Retrieve sample from cache.
This function MUST return valid data sample and it’s users responsibility if custom cacher is implemented.
Return from this function datasample which lies under it’s respective
index.- Parameters
index (int) – Index of sample
-
abstract
__setitem__(index: int, data: Any) → None[source]¶ Saves sample under index in cache or do nothing.
This function should save sample under
indexto be later retrieved by__getitem__. If you don’t want to save specificindex, you can implement this functionality incacheror create separatemodifiersolely for this purpose (second approach is highly recommended).- Parameters
index (int) – Index of sample
data (Any) – Data generated by dataset.
-
abstract
-
class
torchdata.cachers.Memory[source]¶ Save and load data in Python dictionary.
This
cacheris used by default insidetorchdata.Dataset.
-
class
torchdata.cachers.Pickle(path: pathlib.Path, extension: str = '.pkl')[source]¶ Save and load data from disk using
picklemodule.Data will be saved as
pklin specified path. If path does not exist, it will be created.This object can be used as a
context managerand it will deletepathat the end of block:with td.cachers.Pickle(pathlib.Path("./disk")) as pickler: dataset = dataset.map(lambda x: x+1).cache(pickler) ... # Do something with dataset ... # Folder removed
You can also issue
clean()method manually for the same effect (though it’s discouraged as you might crash__setitem__method).Important:
This
cachercan act between consecutive runs, just don’t useclean()method or don’t delete the folder manually. If so, please ensure correct sampling (same seed and sampling order) for reproducible behaviour between runs.-
path¶ Path to the folder where samples will be saved and loaded from.
- Type
pathlib.Path
-
extension¶ Extension to use for saved pickle files. Default:
pkl- Type
str
-
__contains__(index: int) → bool[source]¶ Check whether file exists on disk.
If file is available it is considered cached, hence you can cache data between multiple runs (if you ensure repeatable sampling).
-
__getitem__(index: int)[source]¶ Retrieve
dataspecified byindex.Name of the item will be equal to
{self.path}/{index}{extension}.
-