torchdata is PyTorch oriented library focused on data processing and input pipelines in general.

It extends and equips it with functionalities known from like map or cache.

All of that with minimal interference (single call to super().__init__()) in original PyTorch’s datasets.

Functionalities Overview:

If you are looking for ecosystem of supporting functions around PyTorch check torchfunc.


Following installation methods are available:


To install latest release:

pip install --user torchdata

To install nightly version:

pip install --user torchdata-nightly


Various torchdata images are available both CPU and GPU-enabled. You can find them at Docker Cloud at szymonmaszke/torchdata


CPU image is based on ubuntu:18.04 and official release can be pulled with:

docker pull szymonmaszke/torchdata:18.04

For nightly release:

docker pull szymonmaszke/torchdata:nightly_18.04

This image is significantly lighter due to lack of GPU support.


All images are based on nvidia/cuda Docker image. Each has corresponding CUDA version tag ( 10.1, 10 and 9.2) CUDNN7 support and base image ( ubuntu:18.04 ).

Following images are available:

  • 10.1-cudnn7-runtime-ubuntu18.04

  • 10.1-runtime-ubuntu18.04

  • 10.0-cudnn7-runtime-ubuntu18.04

  • 10.0-runtime-ubuntu18.04

  • 9.2-cudnn7-runtime-ubuntu18.04

  • 9.2-runtime-ubuntu18.04

Example pull:

docker pull szymonmaszke/torchdata:10.1-cudnn7-runtime-ubuntu18.04

You can use nightly builds as well, just prefix the tag with nightly_, for example like this:

docker pull szymonmaszke/torchdata:nightly_10.1-cudnn7-runtime-ubuntu18.04



conda install -c conda-forge torchdata


If you find any issue or you think some functionality may be useful to others and fits here well, please open new Issue or create Pull Request.

To get an overview of things on can do to help this project, please see Roadmap.