torchdata¶
torchdata is PyTorch oriented library focused on data processing and input pipelines in general.
It extends torch.utils.data.Dataset and equips it with
functionalities known from tensorflow.data
like map or cache.
All of that with minimal interference (single call to super().__init__()) in original
PyTorch’s datasets.
Functionalities Overview:
Use
map,apply,reduceorfiltercachedata in RAM or on disk (even partially, say first20%)Full PyTorch’s Dataset and IterableDataset, support (including torchvision)
General
torchdata.mapslikeFlattenorSelectConcrete
torchdata.datasetsdesigned for file reading and other general tasks
If you are looking for ecosystem of supporting functions around PyTorch check torchfunc.
Installation¶
Following installation methods are available:
pip:¶
To install latest release:
pip install --user torchdata
To install nightly version:
pip install --user torchdata-nightly
Docker:¶
Various torchdata images are available both CPU and GPU-enabled.
You can find them at Docker Cloud at szymonmaszke/torchdata
CPU¶
CPU image is based on ubuntu:18.04 and official release can be pulled with:
docker pull szymonmaszke/torchdata:18.04
For nightly release:
docker pull szymonmaszke/torchdata:nightly_18.04
This image is significantly lighter due to lack of GPU support.
GPU¶
All images are based on nvidia/cuda Docker image.
Each has corresponding CUDA version tag ( 10.1, 10 and 9.2) CUDNN7 support
and base image ( ubuntu:18.04 ).
Following images are available:
10.1-cudnn7-runtime-ubuntu18.0410.1-runtime-ubuntu18.0410.0-cudnn7-runtime-ubuntu18.0410.0-runtime-ubuntu18.049.2-cudnn7-runtime-ubuntu18.049.2-runtime-ubuntu18.04
Example pull:
docker pull szymonmaszke/torchdata:10.1-cudnn7-runtime-ubuntu18.04
You can use nightly builds as well, just prefix the tag with nightly_, for example
like this:
docker pull szymonmaszke/torchdata:nightly_10.1-cudnn7-runtime-ubuntu18.04
Contributing¶
If you find any issue or you think some functionality may be useful to others and fits here well, please open new Issue or create Pull Request.
To get an overview of things on can do to help this project, please see Roadmap.