torchdata is PyTorch oriented library focused on data processing and input pipelines in general.
torch.utils.data.Dataset and equips it with
functionalities known from tensorflow.data
All of that with minimal interference (single call to
super().__init__()) in original
cachedata in RAM or on disk (even partially, say first
torchdata.datasetsdesigned for file reading and other general tasks
If you are looking for ecosystem of supporting functions around PyTorch check torchfunc.
Following installation methods are available:
To install latest release:
pip install --user torchdata
pip install --user torchdata-nightly
torchdata images are available both CPU and GPU-enabled.
You can find them at Docker Cloud at
CPU image is based on ubuntu:18.04 and official release can be pulled with:
docker pull szymonmaszke/torchdata:18.04
docker pull szymonmaszke/torchdata:nightly_18.04
This image is significantly lighter due to lack of GPU support.
Following images are available:
docker pull szymonmaszke/torchdata:10.1-cudnn7-runtime-ubuntu18.04
You can use
nightly builds as well, just prefix the tag with
nightly_, for example
docker pull szymonmaszke/torchdata:nightly_10.1-cudnn7-runtime-ubuntu18.04