• ActiveLoop
    • Solutions
      Industries
      • agriculture
        Agriculture
      • audio proccesing
        Audio Processing
      • autonomous_vehicles
        Autonomous & Robotics
      • biomedical_healthcare
        Biomedical & Healthcare
      • generative_ai_and_rag
        Generative AI & RAG
      • multimedia
        Multimedia
      • safety_security
        Safety & Security
      Case Studies
      Enterprises
      BayerBiomedical

      Chat with X-Rays. Bye-bye, SQL

      MatterportMultimedia

      Cut data prep time by up to 80%

      Flagship PioneeringBiomedical

      +18% more accurate RAG

      MedTechMedTech

      Fast AI search on 40M+ docs

      Generative AI
      Hercules AIMultimedia

      100x faster queries

      SweepGenAI

      Serverless DB for code assistant

      Ask RogerGenAI

      RAG for multi-modal AI assistant

      Startups
      IntelinairAgriculture

      -50% lower GPU costs & 3x faster

      EarthshotAgriculture

      5x faster with 4x less resources

      UbenwaAudio

      2x faster data preparation

      Tiny MileRobotics

      +19.5% in model accuracy

      Company
      Company
      about
      About
      Learn about our company, its members, and our vision
      Contact Us
      Contact Us
      Get all of your questions answered by our team
      Careers
      Careers
      Build cool things that matter. From anywhere
      Docs
      Resources
      Resources
      blog
      Blog
      Opinion pieces & technology articles
      langchain
      LangChain
      LangChain how-tos with Deep Lake Vector DB
      tutorials
      Tutorials
      Learn how to use Activeloop stack
      glossary
      Glossary
      Top 1000 ML terms explained
      news
      News
      Track company's major milestones
      release notes
      Release Notes
      See what's new?
      Academic Paper
      Deep Lake Academic Paper
      Read the academic paper published in CIDR 2023
      White p\Paper
      Deep Lake White Paper
      See how your company can benefit from Deep Lake
      Free GenAI CoursesSee all
      LangChain & Vector DBs in Production
      LangChain & Vector DBs in Production
      Take AI apps to production
      Train & Fine Tune LLMs
      Train & Fine Tune LLMs
      LLMs from scratch with every method
      Build RAG apps with LlamaIndex & LangChain
      Build RAG apps with LlamaIndex & LangChain
      Advanced retrieval strategies on multi-modal data
      Pricing
  • Book a Demo
Major updates: Introducing rapid querying, up to 10 users per org, and more
    • Back
      • Release Notes
      • News

    Major updates: Introducing rapid querying, up to 10 users per org, and more

    We're celebrating a major milestone of a more powerful, C++ based querying engine by making collaboration more accessible in our Platform. Read more on in Hub 2.7.1 release notes
    • Davit BuniatyanDavit Buniatyan
    3 min readon Jul 25, 2022Updated Sep 27, 2023
  • Months of hard work culminated in this release - we’ve rebuilt and introduced certain features for more powerful & collaborative workflows like data lineage with your datasets.

    Query your Hub datasets in Platform using our highly-performant query engine, powered by C++

    Today’s release of Hub 2.7.1 improved queries and data lineage (you can check out how these features work together in this playbook).

    Querying just got a major revamp. This allows you to query large datasets in seconds using SQL-like queries. Example queries can be:

    1(select * where contains(labels, 'person') limit 1000) union (select * where contains(labels, 'frisbee') limit 1000)
    2
    3select * where contains(labels, 'car') and contains(labels, 'truck')
    4
    5select * where shape[0] > 10
    6

    Upon querying, you may now save query results or subsets of your data and optimize them for training.

    1ds_view = ds[1:2:100].save_view() # Saving views
    2ds_view = ds.load_view(id, optimize = True, num_workers = 4) # Loading views
    3

    If optimize = True, the method copies and re-chunks the subset of the data so that it’s materialized for streaming. And, finally,

    1ds_view = ds.delete_view(id) # Deleting views
    2

    Full details on how/where dataset views are saved are available here

    The enhanced querying feature is available on all Activeloop datasets for free, and on Growth & Enterprise plans for private datasets.

    No more need to copy all your data to Hub format

    Based on user feedback, we’ve decided to ship a feature to allow your Hub datasets store references to your data using hub.link. This is now possible for original data is saved as discreet files(images, videos, audios, etc.).

    When you want to materialize your data for training, save your data as a dataset view and optimize (rechunk) it using the ds_view = ds.load_view(id, optimize = True) API above.

    More collaboration for less

    Up to 10 collaborators per organization

    Best datasets are built in teams of people - and Activeloop exists to make such collaboration more seamless. That’s why with this update, we’ve decided to increase the capabilities of our Community plan, opening it to up 10 collaborators (instead of 3 previously). We believe this will help smaller ML teams across companies and in Academia to be more productive together.

    Growth plan is more accessible

    Simultaneously, we’ve decided to lower the Growth plan to just $495 per month. This change will be reflected in our self-serve option starting this week.

    Additional Features

    • Delete samples from your data using ds.pop(index) or ds.tensor.pop(index)
    • Instead of returning all data as numpy arrays, access more information about your data using ds.tensor[index].data(), which returns a dictionary containing all the important information about your sample.

    Note that for htype = class_labels, the key “numeric” has been changed to “value” in .data(), in order to make it more consistent with other htypes.

    • Performance optimizations for small samples and bug fixes.

    Share:

    • Table of Contents
    • Query your Hub datasets in Platform using our highly-performant query engine, powered by C++
    • No more need to copy all your data to Hub format
    • More collaboration for less
    • Up to 10 collaborators per organization
    • Growth plan is more accessible
    • Additional Features
    • Previous
        • Release Notes
      • Release Notes: Hub 2.3.3 is out, exciting new features for version control as well as several important helper functions

      • on Apr 6, 2022
    • Next
        • News
      • How to Monitor Models in Production with Activeloop & manot

      • on Feb 4, 2023
  • deep lake database

    Deep Lake. Database for AI.

    • Solutions
      AgricultureAudio ProcessingAutonomous Vehicles & RoboticsBiomedical & HealthcareMultimediaSafety & Security
    • Company
      AboutContact UsCareersPrivacy PolicyDo Not SellTerms & Conditions
    • Resources
      BlogDocumentationDeep Lake WhitepaperDeep Lake Academic Paper
  • Tensie

    Featured by

    featuredfeaturedfeaturedfeatured