• ActiveLoop
    • Solutions
      Industries
      • agriculture
        Agriculture
      • audio proccesing
        Audio Processing
      • autonomous_vehicles
        Autonomous & Robotics
      • biomedical_healthcare
        Biomedical & Healthcare
      • generative_ai_and_rag
        Generative AI & RAG
      • multimedia
        Multimedia
      • safety_security
        Safety & Security
      Case Studies
      Enterprises
      BayerBiomedical

      Chat with X-Rays. Bye-bye, SQL

      MatterportMultimedia

      Cut data prep time by up to 80%

      Flagship PioneeringBiomedical

      +18% more accurate RAG

      MedTechMedTech

      Fast AI search on 40M+ docs

      Generative AI
      Hercules AIMultimedia

      100x faster queries

      SweepGenAI

      Serverless DB for code assistant

      Ask RogerGenAI

      RAG for multi-modal AI assistant

      Startups
      IntelinairAgriculture

      -50% lower GPU costs & 3x faster

      EarthshotAgriculture

      5x faster with 4x less resources

      UbenwaAudio

      2x faster data preparation

      Tiny MileRobotics

      +19.5% in model accuracy

      Company
      Company
      about
      About
      Learn about our company, its members, and our vision
      Contact Us
      Contact Us
      Get all of your questions answered by our team
      Careers
      Careers
      Build cool things that matter. From anywhere
      Docs
      Resources
      Resources
      blog
      Blog
      Opinion pieces & technology articles
      langchain
      LangChain
      LangChain how-tos with Deep Lake Vector DB
      tutorials
      Tutorials
      Learn how to use Activeloop stack
      glossary
      Glossary
      Top 1000 ML terms explained
      news
      News
      Track company's major milestones
      release notes
      Release Notes
      See what's new?
      Academic Paper
      Deep Lake Academic Paper
      Read the academic paper published in CIDR 2023
      White p\Paper
      Deep Lake White Paper
      See how your company can benefit from Deep Lake
      Free GenAI CoursesSee all
      LangChain & Vector DBs in Production
      LangChain & Vector DBs in Production
      Take AI apps to production
      Train & Fine Tune LLMs
      Train & Fine Tune LLMs
      LLMs from scratch with every method
      Build RAG apps with LlamaIndex & LangChain
      Build RAG apps with LlamaIndex & LangChain
      Advanced retrieval strategies on multi-modal data
      Pricing
  • Book a Demo
Release Notes: Hub 2.3.3 is out, exciting new features for version control as well as several important helper functions
    • Back
      • Release Notes

    Release Notes: Hub 2.3.3 is out, exciting new features for version control as well as several important helper functions

    tl:dr Hub 2.3.3 is out, version control upgrade, helper functions, GSoC 2022, and exciting community contributions
    • Davit BuniatyanDavit Buniatyan
    2 min readon Apr 6, 2022Updated Sep 27, 2023
  • Hub 2.3.3 is out! Version control upgrade, helper functions, GSOC 2022, and exciting community contributions. Here’s what’s new.

    New Hub features

    Now you can delete uncommitted changes using ds.reset(). Also, with Hub 2.3.3 you can merge branches and commits using ds.merge(). Copying datasets from one location to another is now possible using hub.copy() and hub.deepcopy() (includes version control history). Metadata from file headers appended using hub.read(fn) is now automatically stored in ds.tensor_name.sample_info.

    Community shoutouts

    Abid Ali Awan has written a great guide on Hub and the Activeloop Platform for KDnuggets!

    Alex Wang has uploaded and documented the KMINST dataset on our Machine Learning Datasets Catalogue.

    Manas Gupta has documented the Google Objectron dataset on our Machine Learning Datasets Catalogue.

    Paul created an example for using Hub, Tensorboard & Docker to train a model in PyTorch.

    Jinyi Chen is currently finalizing the Chinese version of the readme! Let us know if you’d like to translate it into other languages.

    Bikram Maharjan is working on support for additional image formats in hub.auto Also thanks to Suhaas Neel for the multiple PRs he’s working on!

    GSOC 2022

    GSOC proposals opened yesterday, make sure you contribute/finalize your PRs by April 19 and apply!

    1import requests
    2import tqdm
    3from typing import List
    4
    5#financial reports of Amazon, but can be replaced by any URLs of pdfs
    6urls = ['https://s2.q4cdn.com/299287126/files/doc_financials/Q1_2018_-_8-K_Press_Release_FILED.pdf', 'https://s2.q4cdn.com/299287126/files/doc_financials/Q2_2018_Earnings_Release.pdf', 'https://s2.q4cdn.com/299287126/files/doc_news/archive/Q318-Amazon-Earnings-Press-Release.pdf', 'https://s2.q4cdn.com/299287126/files/doc_news/archive/AMAZON.COM-ANNOUNCES-FOURTH-QUARTER-SALES-UP-20-TO-$72.4-BILLION.pdf', 'https://s2.q4cdn.com/299287126/files/doc_financials/Q119_Amazon_Earnings_Press_Release_FINAL.pdf', 'https://s2.q4cdn.com/299287126/files/doc_news/archive/Amazon-Q2-2019-Earnings-Release.pdf', 'https://s2.q4cdn.com/299287126/files/doc_news/archive/Q3-2019-Amazon-Financial-Results.pdf', 'https://s2.q4cdn.com/299287126/files/doc_news/archive/Amazon-Q4-2019-Earnings-Release.pdf', 'https://s2.q4cdn.com/299287126/files/doc_financials/2020/Q1/AMZN-Q1-2020-Earnings-Release.pdf', 'https://s2.q4cdn.com/299287126/files/doc_financials/2020/q2/Q2-2020-Amazon-Earnings-Release.pdf', 'https://s2.q4cdn.com/299287126/files/doc_financials/2020/q4/Amazon-Q4-2020-Earnings-Release.pdf', 'https://s2.q4cdn.com/299287126/files/doc_financials/2021/q1/Amazon-Q1-2021-Earnings-Release.pdf', 'https://s2.q4cdn.com/299287126/files/doc_financials/2021/q2/AMZN-Q2-2021-Earnings-Release.pdf', 'https://s2.q4cdn.com/299287126/files/doc_financials/2021/q3/Q3-2021-Earnings-Release.pdf', 'https://s2.q4cdn.com/299287126/files/doc_financials/2021/q4/business_and_financial_update.pdf', 'https://s2.q4cdn.com/299287126/files/doc_financials/2022/q1/Q1-2022-Amazon-Earnings-Release.pdf', 'https://s2.q4cdn.com/299287126/files/doc_financials/2022/q2/Q2-2022-Amazon-Earnings-Release.pdf', 'https://s2.q4cdn.com/299287126/files/doc_financials/2022/q3/Q3-2022-Amazon-Earnings-Release.pdf', 'https://s2.q4cdn.com/299287126/files/doc_financials/2022/q4/Q4-2022-Amazon-Earnings-Release.pdf' ]
    7

    React 2023
    ---------

    1import requests
    2import tqdm
    3from typing import List
    4
    5#financial reports of Amazon, but can be replaced by any URLs of pdfs
    6urls = ['https://s2.q4cdn.com/299287126/files/doc_financials/Q1_2018_-_8-K_Press_Release_FILED.pdf', 'https://s2.q4cdn.com/299287126/files/doc_financials/Q2_2018_Earnings_Release.pdf', 'https://s2.q4cdn.com/299287126/files/doc_news/archive/Q318-Amazon-Earnings-Press-Release.pdf', 'https://s2.q4cdn.com/299287126/files/doc_news/archive/AMAZON.COM-ANNOUNCES-FOURTH-QUARTER-SALES-UP-20-TO-$72.4-BILLION.pdf', 'https://s2.q4cdn.com/299287126/files/doc_financials/Q119_Amazon_Earnings_Press_Release_FINAL.pdf', 'https://s2.q4cdn.com/299287126/files/doc_news/archive/Amazon-Q2-2019-Earnings-Release.pdf', 'https://s2.q4cdn.com/299287126/files/doc_news/archive/Q3-2019-Amazon-Financial-Results.pdf', 'https://s2.q4cdn.com/299287126/files/doc_news/archive/Amazon-Q4-2019-Earnings-Release.pdf', 'https://s2.q4cdn.com/299287126/files/doc_financials/2020/Q1/AMZN-Q1-2020-Earnings-Release.pdf', 'https://s2.q4cdn.com/299287126/files/doc_financials/2020/q2/Q2-2020-Amazon-Earnings-Release.pdf', 'https://s2.q4cdn.com/299287126/files/doc_financials/2020/q4/Amazon-Q4-2020-Earnings-Release.pdf', 'https://s2.q4cdn.com/299287126/files/doc_financials/2021/q1/Amazon-Q1-2021-Earnings-Release.pdf', 'https://s2.q4cdn.com/299287126/files/doc_financials/2021/q2/AMZN-Q2-2021-Earnings-Release.pdf', 'https://s2.q4cdn.com/299287126/files/doc_financials/2021/q3/Q3-2021-Earnings-Release.pdf', 'https://s2.q4cdn.com/299287126/files/doc_financials/2021/q4/business_and_financial_update.pdf', 'https://s2.q4cdn.com/299287126/files/doc_financials/2022/q1/Q1-2022-Amazon-Earnings-Release.pdf', 'https://s2.q4cdn.com/299287126/files/doc_financials/2022/q2/Q2-2022-Amazon-Earnings-Release.pdf', 'https://s2.q4cdn.com/299287126/files/doc_financials/2022/q3/Q3-2022-Amazon-Earnings-Release.pdf', 'https://s2.q4cdn.com/299287126/files/doc_financials/2022/q4/Q4-2022-Amazon-Earnings-Release.pdf' ]
    7
    8def load_reports(urls: List[str]) -> List[str]:
    9 """ Load pages from a list of urls"""
    10 pages = []
    11
    12 for url in tqdm.tqdm(urls):
    13   r = requests.get(url)
    14   path = url.split('/')[-1]
    15   with open(path, 'wb') as f:
    16     f.write(r.content)
    17   loader = PagedPDFSplitter(path)
    18   local_pages = loader.load_and_split()
    19   pages.extend(local_pages)
    20 return pages
    21
    22pages = load_reports(urls)
    23

    Share:

    • Table of Contents
    • New Hub features
    • Community shoutouts
    • GSOC 2022
    • Previous
        • Tutorials
      • Label Studio and Activeloop Hub. Work on semantic segmentation projects with a smile

      • on Nov 9, 2021
    • Next
        • Release Notes
        • News
      • Major updates: Introducing rapid querying, up to 10 users per org, and more

      • on Jul 25, 2022
  • deep lake database

    Deep Lake. Database for AI.

    • Solutions
      AgricultureAudio ProcessingAutonomous Vehicles & RoboticsBiomedical & HealthcareMultimediaSafety & Security
    • Company
      AboutContact UsCareersPrivacy PolicyDo Not SellTerms & Conditions
    • Resources
      BlogDocumentationDeep Lake WhitepaperDeep Lake Academic Paper
  • Tensie

    Featured by

    featuredfeaturedfeaturedfeatured