Database for AIAI-native way of working with data.
No boilerplate code.
The fastest open-source dataset format for data-centric computer vision workflows.
The open-source community
enabling data-centric AI
Trended #1 in Python
Your model's gonna love your data.
You will, too
As a machine learning engineer, you can spend weeks setting up your data. It takes far too long to download or copy your data, collaborate on it with your team or connect it to your ML models. Our open-source package Hub neatly rearranges your data into NumPy-like arrays on the cloud that are native to deep learning frameworks. Thanks to that, you can finally break data silos and stream your data straight to your GPUs without sacrificing performance.Check out our Repo
Better data for your team, at scale
Load Data from AnywhereHub works locally, on Google Cloud, AWS, MinIO, as well as Activeloop storage (no servers needed). Load data (even from Kaggle!) and directly stream datasets from cold storage to ML workflows. It's that fast.
Dataset Version ControlModify dataset elements across different versions and seamlessly switch between them. Hub's intuitive Python API works with datasets of any size and overcomes the limitations of file-based version control.
Transformations at ScaleQuickly modify, update, or resample your datasets in order to find the optimal dataset for your models. Scale to hundreds of machines with one line of code.
Visualize any computer
Instantly visualize any slice of the dataset you upload to Hub.
- Rapidly visualize different versions of your data
- Understand your data and improve its quality
- Run transforms on the data with in-built compute infrastructure
- Train your models on our compute infrastructure
Open data, one click away
Click to View
- Explore verified datasets
by team Activeloop...
- ... public organizations
(like Google!) ...
- ... and our open-source
Data-centric AI is easy(If you use Hub)
1import hub 2from PIL import Image 3 4ds = hub.load('hub://activeloop/mnist-train') # Hub Dataset 5 6# Display an image 7Image.fromarray(ds.images.numpy())