torchdata.cachers¶
This module contains interface needed for cachers
(used in cache
method of td.Dataset
) .
To cache on disk all samples using Python’s pickle in folder cache
(assuming you have already created td.Dataset
instance named dataset
):
import torchdata as td
...
dataset.cache(td.cachers.Pickle("./cache"))
Users are encouraged to write their custom cachers
if the ones provided below
are too slow or not good enough for their purposes (see Cacher
abstract interface below).
-
class
torchdata.cachers.
Cacher
[source]¶ Interface to fulfil to make object compatible with
torchdata.Dataset.cache
method.If you want to implement your own
caching
functionality, inherit from this class and implement methods described below.-
abstract
__contains__
(index: int) → bool[source]¶ Return true if sample under
index
is cached.If
False
returned, cacher’s__setitem__
will be called, hence if you are not going to cache sample under thisindex
, you should describe this operation at that method. This is simply a boolean indicator whether sample is cached.If
True
cacher’s__getitem__
will be called and it’s users responsibility to return correct value in such case.- Parameters
index (int) – Index of sample
-
abstract
__getitem__
(index) → Any[source]¶ Retrieve sample from cache.
This function MUST return valid data sample and it’s users responsibility if custom cacher is implemented.
Return from this function datasample which lies under it’s respective
index
.- Parameters
index (int) – Index of sample
-
abstract
__setitem__
(index: int, data: Any) → None[source]¶ Saves sample under index in cache or do nothing.
This function should save sample under
index
to be later retrieved by__getitem__
. If you don’t want to save specificindex
, you can implement this functionality incacher
or create separatemodifier
solely for this purpose (second approach is highly recommended).- Parameters
index (int) – Index of sample
data (Any) – Data generated by dataset.
-
abstract
-
class
torchdata.cachers.
Memory
[source]¶ Save and load data in Python dictionary.
This
cacher
is used by default insidetorchdata.Dataset
.
-
class
torchdata.cachers.
Pickle
(path: pathlib.Path, extension: str = '.pkl')[source]¶ Save and load data from disk using
pickle
module.Data will be saved as
pkl
in specified path. If path does not exist, it will be created.This object can be used as a
context manager
and it will deletepath
at the end of block:with td.cachers.Pickle(pathlib.Path("./disk")) as pickler: dataset = dataset.map(lambda x: x+1).cache(pickler) ... # Do something with dataset ... # Folder removed
You can also issue
clean()
method manually for the same effect (though it’s discouraged as you might crash__setitem__
method).Important:
This
cacher
can act between consecutive runs, just don’t useclean()
method or don’t delete the folder manually. If so, please ensure correct sampling (same seed and sampling order) for reproducible behaviour between runs.-
path
¶ Path to the folder where samples will be saved and loaded from.
- Type
pathlib.Path
-
extension
¶ Extension to use for saved pickle files. Default:
pkl
- Type
str
-
__contains__
(index: int) → bool[source]¶ Check whether file exists on disk.
If file is available it is considered cached, hence you can cache data between multiple runs (if you ensure repeatable sampling).
-
__getitem__
(index: int)[source]¶ Retrieve
data
specified byindex
.Name of the item will be equal to
{self.path}/{index}{extension}
.
-