• ActiveLoop
    • Solutions
      Industries
      • agriculture
        Agriculture
      • audio proccesing
        Audio Processing
      • autonomous_vehicles
        Autonomous & Robotics
      • biomedical_healthcare
        Biomedical & Healthcare
      • generative_ai_and_rag
        Generative AI & RAG
      • multimedia
        Multimedia
      • safety_security
        Safety & Security
      Case Studies
      Enterprises
      BayerBiomedical

      Chat with X-Rays. Bye-bye, SQL

      MatterportMultimedia

      Cut data prep time by up to 80%

      Flagship PioneeringBiomedical

      +18% more accurate RAG

      MedTechMedTech

      Fast AI search on 40M+ docs

      Generative AI
      Hercules AIMultimedia

      100x faster queries

      SweepGenAI

      Serverless DB for code assistant

      Ask RogerGenAI

      RAG for multi-modal AI assistant

      Startups
      IntelinairAgriculture

      -50% lower GPU costs & 3x faster

      EarthshotAgriculture

      5x faster with 4x less resources

      UbenwaAudio

      2x faster data preparation

      Tiny MileRobotics

      +19.5% in model accuracy

      Company
      Company
      about
      About
      Learn about our company, its members, and our vision
      Contact Us
      Contact Us
      Get all of your questions answered by our team
      Careers
      Careers
      Build cool things that matter. From anywhere
      Docs
      Resources
      Resources
      blog
      Blog
      Opinion pieces & technology articles
      langchain
      LangChain
      LangChain how-tos with Deep Lake Vector DB
      tutorials
      Tutorials
      Learn how to use Activeloop stack
      glossary
      Glossary
      Top 1000 ML terms explained
      news
      News
      Track company's major milestones
      release notes
      Release Notes
      See what's new?
      Academic Paper
      Deep Lake Academic Paper
      Read the academic paper published in CIDR 2023
      White p\Paper
      Deep Lake White Paper
      See how your company can benefit from Deep Lake
      Free GenAI CoursesSee all
      LangChain & Vector DBs in Production
      LangChain & Vector DBs in Production
      Take AI apps to production
      Train & Fine Tune LLMs
      Train & Fine Tune LLMs
      LLMs from scratch with every method
      Build RAG apps with LlamaIndex & LangChain
      Build RAG apps with LlamaIndex & LangChain
      Advanced retrieval strategies on multi-modal data
      Pricing
  • Book a Demo
Release Notes: Hub 2.3.4 is released, new features for ingesting data from Kaggle, enhancements to hub auto and PyCon.
    • Back
      • Release Notes

    Release Notes: Hub 2.3.4 is released, new features for ingesting data from Kaggle, enhancements to hub auto and PyCon.

    tl;dr Hub 2.3.4 is released! Ingest data from Kaggle, enhancements to hub auto, dataset summary & PyCon.
    • Davit BuniatyanDavit Buniatyan
    2 min readon Apr 21, 2022
  • Hub v2.3.4 features

    We added support for the most common image formats in hub.ingest and hub.ingest_kaggle (so you can directly ingest popular datasets from Kaggle). Also, we introduced ds.summary() so you can easily understand your dataset layout. See what’s included in the screenshot!
    Dataset summary

    Now you can return data in PyTorch Dataloaders as bytes instead of tensors, using ds.pytorch(... tobytes = True). This enables you to use libraries of your choice to decompress and remove your data. We also shipped less intrusive locking when performing operations on different version control branches.

    Community contributions

    New datasets were uploaded and documented by our community members Uday Uppal (KKanji) and Manas Gupta (EMNIST). We also changed str return to include Tensor-Wise information which was for issue by Suhaas Neel.

    Additionally, Bikram Maharjan added support to additional image formats in hub.auto. Sai Nikhilesh Reddy has contributed to ds.summary and the ReadMe in Chinese was merged by Jinyi Chen.

    Events

    We’re presenting at PyCon next week (Apr 27 - May 1)! Stop by our booth if you’re around or register for Davit Buniatyan's workshop The Future of Handing off Data to Compute.

    Community feature

    Gradient Health, our partner company (Ouwen Huang) has published a great guide to Open-source Medical Imagery Datasets check it out and let us know what you think.

    Share:

    • Table of Contents
    • Hub v2.3.4 features
    • Community contributions
    • Events
    • Community feature
    • Previous
        • Blog
        • Tutorials
      • Radiology Machine Learning. Multi-Image Segmentation with TransUNet

      • on Nov 4, 2022
    • Next
        • Tutorials
      • TensorFlow tf.data & Activeloop Hub. How to implement your TensorFlow data pipelines with Hub

      • on Oct 4, 2021
  • deep lake database

    Deep Lake. Database for AI.

    • Solutions
      AgricultureAudio ProcessingAutonomous Vehicles & RoboticsBiomedical & HealthcareMultimediaSafety & Security
    • Company
      AboutContact UsCareersPrivacy PolicyDo Not SellTerms & Conditions
    • Resources
      BlogDocumentationDeep Lake WhitepaperDeep Lake Academic Paper
  • Tensie

    Featured by

    featuredfeaturedfeaturedfeatured