
Database for AIAI-native way of working with data.
No boilerplate code.
The fastest open-source dataset format for data-centric computer vision workflows.
The open-source community
enabling data-centric AITrended #1 in Python
4.5kGithub Stars
+10%
75+Contributors
+31%
910+Community members
Meet Tensie. Tensie's lit. She likes
optimizing datasets & fire puns.Your model's gonna love your data.
You will, tooPlus, we're
open-source!As a machine learning engineer, you can spend weeks setting up your data. It takes far too long to download or copy your data, collaborate on it with your team or connect it to your ML models. Our open-source package Hub neatly rearranges your data into NumPy-like arrays on the cloud that are native to deep learning frameworks. Thanks to that, you can finally break data silos and stream your data straight to your GPUs without sacrificing performance.
Check out our Repo
Better data for your team, at scale
Features
Load Data from Anywhere
Hub works locally, on Google Cloud, AWS, MinIO, as well as Activeloop storage (no servers needed). Load data (even from Kaggle!) and directly stream datasets from cold storage to ML workflows. It's that fast.
Dataset Version Control
Modify dataset elements across different versions and seamlessly switch between them. Hub's intuitive Python API works with datasets of any size and overcomes the limitations of file-based version control.
Transformations at Scale
Quickly modify, update, or resample your datasets in order to find the optimal dataset for your models. Scale to hundreds of machines with one line of code.
Scalable Model Training
Stream your dataset to PyTorch or TensorFlow with one line of code. Train in a distributed fashion with Ray to efficiently utilize your compute resources.
Team and Access Management
Keep your datasets private, share them with your organization or anyone on the web. Have multiple data scientists working on the same data? We can handle that, too.
Dataset Querying
Easily filter, subset or query terabyte-scale datasets to generate instant insights and uncover opportunities for dataset optimization.
Visualize any computer
vision dataset
Instantly visualize any slice of the dataset you upload to Hub.

- Rapidly visualize different versions of your data
- Understand your data and improve its quality
- Run transforms on the data with in-built compute infrastructure
- Train your models on our compute infrastructure
Data-centric AI is easy(If you use Hub)
1import hub
2from PIL import Image
3
4ds = hub.load('hub://activeloop/mnist-train') # Hub Dataset
5
6# Display an image
7Image.fromarray(ds.images[0].numpy())
Your models and GPUs need organized & streamable data.
The future of Machine Learning is data-centric.
Store your data in AI-native format with Activeloop Hub.> pip install hub
Get started with
Activeloop HubCreate
an accountHub is open source.
Join the communityStay in the loop