Star us on Like our dataset format for AI? Give us a ⭐ on GitHub.

  • ActiveLoop
    • Solutions
      • Drones & Satellite Imagery
      • Store and process aerial imaging data with ease

      • Self-driving Cars and Robotics
      • Self-serve data pipelines from cameras, radar & lidar

      • Biomedical
      • Store and build data pipelines collected from biomedical images

      • IntelinAir
        Case Study
      • Learn more how Activeloop helped IntelinAir to process and generate datasets from millions acres of aerial imagery

      Company
      • About
      • Learn about the company, its members and its vision

      • Help Center
      • Get all of your questions answered by contacting our team

      • Careers
      • Build cool things that matter. From anywhere

      DocsResourcesPricing
  • Log in
Activeloop Database for AI structuring the computer vision data using a simple dataset format for AI based on tensors for easier dataset streaming, querying, version control and visualization
Interested in the managed version for your company? Let’s chat!

Database for AIAI-native way of working with data.
No boilerplate code.

The fastest open-source dataset format for data-centric computer vision workflows.

Get startedRead the docs
  • The open-source community
    enabling data-centric AI

    • Trended #1 in Python

    4.5k

    Github Stars

    • +10%

    75+

    Contributors

    • +31%

    910+

    Community members

  • Meet Tensie. Tensie's lit. She likes
    optimizing datasets & fire puns.

  • Your model's gonna love your data.
    You will, too

    Plus, we're
    open-source!

    As a machine learning engineer, you can spend weeks setting up your data. It takes far too long to download or copy your data, collaborate on it with your team or connect it to your ML models. Our open-source package Hub neatly rearranges your data into NumPy-like arrays on the cloud that are native to deep learning frameworks. Thanks to that, you can finally break data silos and stream your data straight to your GPUs without sacrificing performance.

    Check out our Repo

Better data for your team, at scale

Features

    • Load Data from Anywhere

      Load Data from Anywhere

      Hub works locally, on Google Cloud, AWS, MinIO, as well as Activeloop storage (no servers needed). Load data (even from Kaggle!) and directly stream datasets from cold storage to ML workflows. It's that fast.
    • Dataset Version Control

      Dataset Version Control

      Modify dataset elements across different versions and seamlessly switch between them. Hub's intuitive Python API works with datasets of any size and overcomes the limitations of file-based version control.
    • Transformations at Scale

      Transformations at Scale

      Quickly modify, update, or resample your datasets in order to find the optimal dataset for your models. Scale to hundreds of machines with one line of code.
    • Scalable Model Training

      Scalable Model Training

      Stream your dataset to PyTorch or TensorFlow with one line of code. Train in a distributed fashion with Ray to efficiently utilize your compute resources.
    • Team and Access Management

      Team and Access Management

      Keep your datasets private, share them with your organization or anyone on the web. Have multiple data scientists working on the same data? We can handle that, too.
    • Dataset Querying

      Dataset Querying

      Easily filter, subset or query terabyte-scale datasets to generate instant insights and uncover opportunities for dataset optimization.

Visualize any computer
vision dataset

Instantly visualize any slice of the dataset you upload to Hub.

COCO dataset visualization on Activeloop Platform
  • Rapidly visualize different versions of your data
  • Understand your data and improve its quality
  • Run transforms on the data with in-built compute infrastructure
  • Train your models on our compute infrastructure

Open data, one click away

Click to View
the Dataset

  • Explore verified datasets
    by team Activeloop...
  • ... public organizations
    (like Google!) ...
  • ... and our open-source
    community!

Data-centric AI is easy(If you use Hub)

1import hub
2from PIL import Image
3
4ds = hub.load('hub://activeloop/mnist-train') # Hub Dataset
5
6# Display an image
7Image.fromarray(ds.images[0].numpy())
  • Your models and GPUs need organized & streamable data.

    The future of Machine Learning is data-centric.
    Store your data in AI-native format with Activeloop Hub.

  • > pip install hub
  • Get started with
    Activeloop Hub
    Get started
  • Create
    an account
    Create
  • Hub is open source.
    Join the community
    Join
  • Stay in the loop
  • Simple API for creating, storing, and collaborating on AI datasets at scale.

    • Solutions
      Drones & Satellite ImagerySelf-driving Cars and RoboticsBiomedical
    • Company
      AboutContact usCareersPrivacy PolicyTerms & Conditions
    • Resources
      BlogHumans in the Loop PodcastWhitepaper
  • Tensie

    Featured by