Shortcuts

torchdata

torchdata is PyTorch oriented library focused on data processing and input pipelines in general.

It extends torch.utils.data.Dataset and equips it with functionalities known from tensorflow.data like map or cache.

All of that with minimal interference (single call to super().__init__()) in original PyTorch’s datasets.

Functionalities Overview:

If you are looking for ecosystem of supporting functions around PyTorch check torchfunc.

Installation

Following installation methods are available:

pip:

To install latest release:

pip install --user torchdata

To install nightly version:

pip install --user torchdata-nightly

Docker:

Various torchdata images are available both CPU and GPU-enabled. You can find them at Docker Cloud at szymonmaszke/torchdata

CPU

CPU image is based on ubuntu:18.04 and official release can be pulled with:

docker pull szymonmaszke/torchdata:18.04

For nightly release:

docker pull szymonmaszke/torchdata:nightly_18.04

This image is significantly lighter due to lack of GPU support.

GPU

All images are based on nvidia/cuda Docker image. Each has corresponding CUDA version tag ( 10.1, 10 and 9.2) CUDNN7 support and base image ( ubuntu:18.04 ).

Following images are available:

  • 10.1-cudnn7-runtime-ubuntu18.04

  • 10.1-runtime-ubuntu18.04

  • 10.0-cudnn7-runtime-ubuntu18.04

  • 10.0-runtime-ubuntu18.04

  • 9.2-cudnn7-runtime-ubuntu18.04

  • 9.2-runtime-ubuntu18.04

Example pull:

docker pull szymonmaszke/torchdata:10.1-cudnn7-runtime-ubuntu18.04

You can use nightly builds as well, just prefix the tag with nightly_, for example like this:

docker pull szymonmaszke/torchdata:nightly_10.1-cudnn7-runtime-ubuntu18.04

conda:

TO BE ADDED

conda install -c conda-forge torchdata

Contributing

If you find any issue or you think some functionality may be useful to others and fits here well, please open new Issue or create Pull Request.

To get an overview of things on can do to help this project, please see Roadmap.