torchdata¶
torchdata is PyTorch oriented library focused on data processing and input pipelines in general.
It extends torch.utils.data.Dataset
and equips it with
functionalities known from tensorflow.data
like map
or cache
.
All of that with minimal interference (single call to super().__init__()
) in original
PyTorch’s datasets.
Functionalities Overview:
Use
map
,apply
,reduce
orfilter
cache
data in RAM or on disk (even partially, say first20%
)Full PyTorch’s Dataset and IterableDataset, support (including torchvision)
General
torchdata.maps
likeFlatten
orSelect
Concrete
torchdata.datasets
designed for file reading and other general tasks
If you are looking for ecosystem of supporting functions around PyTorch check torchfunc.
Installation¶
Following installation methods are available:
pip:¶
To install latest release:
pip install --user torchdata
To install nightly
version:
pip install --user torchdata-nightly
Docker:¶
Various torchdata
images are available both CPU and GPU-enabled.
You can find them at Docker Cloud at szymonmaszke/torchdata
CPU¶
CPU image is based on ubuntu:18.04 and official release can be pulled with:
docker pull szymonmaszke/torchdata:18.04
For nightly
release:
docker pull szymonmaszke/torchdata:nightly_18.04
This image is significantly lighter due to lack of GPU support.
GPU¶
All images are based on nvidia/cuda Docker image.
Each has corresponding CUDA version tag ( 10.1
, 10
and 9.2
) CUDNN7 support
and base image ( ubuntu:18.04 ).
Following images are available:
10.1-cudnn7-runtime-ubuntu18.04
10.1-runtime-ubuntu18.04
10.0-cudnn7-runtime-ubuntu18.04
10.0-runtime-ubuntu18.04
9.2-cudnn7-runtime-ubuntu18.04
9.2-runtime-ubuntu18.04
Example pull:
docker pull szymonmaszke/torchdata:10.1-cudnn7-runtime-ubuntu18.04
You can use nightly
builds as well, just prefix the tag with nightly_
, for example
like this:
docker pull szymonmaszke/torchdata:nightly_10.1-cudnn7-runtime-ubuntu18.04
Contributing¶
If you find any issue or you think some functionality may be useful to others and fits here well, please open new Issue or create Pull Request.
To get an overview of things on can do to help this project, please see Roadmap.