EnterpriseInterested in the managed version for your company? Let’s chat!

Stream your data to PyTorch and TensorFlow.No boilerplate code.

The fastest computer vision dataset management framework for data streaming, version-control and collaboration.


Creating the future of data, together

in Python on Github
GitHub Stars
Community Members
Active Contributors

Databases, data lakes, and data warehouses are unfit for unstructured data types such as images, videos, and text. Data 2.0 allows storing computer vision data, including metadata, in multi-dimensional arrays, locally or on the cloud.

In an open, unified, and seamless way.

Write less lines of code, do more ML

There is a daunting way to do ML. And then there’s the Data 2.0 way.

import hub

ds = hub.Dataset(url='activeloop/mnist')

Visualize any computer vision dataset

Open data, one click away

Access and share popular or community-assembled public datasets.

Fork what you like

Found the dataset you were looking for? Fork it and start training right away!

Manage access

Create private or public repositories, organizations and monitor access.

Introducing Data 2.0 - built for collaboration, at scale

Our open-source package Hub neatly rearranges your computer vision datasets into our Data 2.0 format, represented as NumPy-like, cloud-native arrays. A Data 2.0 dataset comes with a multitude of features for more efficient machine and deep learning workflows.

Deployment anywhere

Activeloop Hub works where you work. Locally, on Google Cloud, MinIO, AWS S3, Azure as well as our storage.

Dataset Version Control

Unlike file-based approaches, Hub uses a Python API. Modify dataset elements across different versions and easily switch between them.

Dataset Filtering

Quickly modify the samples of the dataset or create a new dataset from the existing one.


Stream data to PyTorch or TensorFlow, as if the data was local.

Distributed Workloads

Train in a distributed fashion with Ray for maximum utilization of your GPUs.

Team and Access Management

Keep your datasets private, share them with your team or anyone on the web. View or edit the data simultaneously with your colleagues.

Instant Visualization

Instantly visualize any slice of the dataset you upload to Hub in our web UI. For free.


Quickly modify the samples of the dataset or create a new dataset from the existing one.

Software 2.0 needs Data 2.0. Activeloop Hub enables it.The future of data is one line of code away.

> pip install hub

Unifying and abstracting away infrastructure for easier and highly efficient machine and deep learning.