• ActiveLoop
    • Solutions
      Industries
      • agriculture
        Agriculture
      • audio proccesing
        Audio Processing
      • autonomous_vehicles
        Autonomous & Robotics
      • biomedical_healthcare
        Biomedical & Healthcare
      • generative_ai_and_rag
        Generative AI & RAG
      • multimedia
        Multimedia
      • safety_security
        Safety & Security
      Case Studies
      Enterprises
      BayerBiomedical

      Chat with X-Rays. Bye-bye, SQL

      MatterportMultimedia

      Cut data prep time by up to 80%

      Flagship PioneeringBiomedical

      +18% more accurate RAG

      MedTechMedTech

      Fast AI search on 40M+ docs

      Generative AI
      Hercules AIMultimedia

      100x faster queries

      SweepGenAI

      Serverless DB for code assistant

      Ask RogerGenAI

      RAG for multi-modal AI assistant

      Startups
      IntelinairAgriculture

      -50% lower GPU costs & 3x faster

      EarthshotAgriculture

      5x faster with 4x less resources

      UbenwaAudio

      2x faster data preparation

      Tiny MileRobotics

      +19.5% in model accuracy

      Company
      Company
      about
      About
      Learn about our company, its members, and our vision
      Contact Us
      Contact Us
      Get all of your questions answered by our team
      Careers
      Careers
      Build cool things that matter. From anywhere
      Docs
      Resources
      Resources
      blog
      Blog
      Opinion pieces & technology articles
      langchain
      LangChain
      LangChain how-tos with Deep Lake Vector DB
      tutorials
      Tutorials
      Learn how to use Activeloop stack
      glossary
      Glossary
      Top 1000 ML terms explained
      news
      News
      Track company's major milestones
      release notes
      Release Notes
      See what's new?
      Academic Paper
      Deep Lake Academic Paper
      Read the academic paper published in CIDR 2023
      White p\Paper
      Deep Lake White Paper
      See how your company can benefit from Deep Lake
      Free GenAI CoursesSee all
      LangChain & Vector DBs in Production
      LangChain & Vector DBs in Production
      Take AI apps to production
      Train & Fine Tune LLMs
      Train & Fine Tune LLMs
      LLMs from scratch with every method
      Build RAG apps with LlamaIndex & LangChain
      Build RAG apps with LlamaIndex & LangChain
      Advanced retrieval strategies on multi-modal data
      Pricing
  • Book a Demo
Case Study

How Matterport Decreased Data Prep Times by 80% and Enabled Multimodal AI

Discover how Matterport leveraged Deep Lake to overcome data management challenges and expedite the training process of their machine learning models

icon
poster
icon80% Faster
Data Prep

Matterport: Pioneers in 3D Digital Twin Technology

Matterport, a leader in 3D digital twins has digitized more than 35 billion square feet, making them one of the largest players in the domain. The company’s Vision & Learning team drives the company’s AI/ML capabilities. Alan Dolhasz manages the research activities of this team with a focus on computer vision and machine learning problems.

Furthermore, the team is responsible for rapidly assessing new research, converting promising results into fully fledged products that answer vital questions about the scanned spaces. They are at the core of Matterport's innovation, developing machine learning models on opt-in data to predict useful information about their spaces based on their extensive datasets1. Importantly, Matterport's highly selective approach to data utilization, aligned with customer privacy settings, ensures model accuracy and compliance amidst diverse data usage preferences.

Matterport: Pioneers in 3D Digital Twin Technology

The Challenges

Before adopting Activeloop, Matterport faced challenges in managing their colossal datasets2. With over 7 million scanned spaces, the sheer size of the data posed significant logistical issues.

"Imagine you take a million Matterport spaces, each one might have a hundred photographs taken inside of it. You've got effectively two images that you need to store and maintain for every one of the 10 million items in the dataset. Very quickly, this becomes impossible to carry around"

Alan Dolhasz

Manager, Machine Learning Development at Matterport
Alan Dolhasz
  • 1

    Rapidly evolving vast data

    The dynamic nature of Matterport's datasets3 introduced certain challenges. As Alan observed, "With every new engineer undertaking a project, there was an initial phase dedicated to transferring data, which, while necessary, involved a considerable amount of foundational work." This aspect of the process meant that a significant portion of time was invested in preliminary tasks such as data preparation and basic coding routines.

  • 2

    Lack of standardization

    The absence of a unified standard in data management occasionally led to variations in how ML datasets4 were created, leading to a less streamlined approach across different projects. While this diversity in methods offered flexibility, it also underscored the potential for enhancing organizational coherence.

  • 3

    Experimentation and training models in the cloud

    The process of setting up a new machine learning project was time-consuming since it involved downloading a large dataset5 from a cloud storage service like S3 and moving it back and forth. The transferring, storing, and tracking changes of these datasets were time-consuming and complex. As Alan highlighted, "Very quickly, as you scale up, this becomes super hard." This offered a valuable opportunity for streamlining processes within the dynamic environment at Matterport.

The Solution

  • With its capacity to handle multimodal data, Deep Lake significantly streamlined the data handling process for Matterport's machine learning projects.

    Deep Lake just made it super easy for us to scale horizontally the different data modalities that we use.
    Alan Dolhasz

    Alan Dolhasz

    Manager, Machine Learning Development at Matterport
  • Deep Lake provided a uniform, efficient storage format for Matterport's datasets, allowing stakeholders across teams to store data in an ML-native format, and abstract away a lot of the boilerplate code required to set up a training pipeline for one project.

    Deep Lake knocked out like 80 percent of the data random work associated... because once you've done it, that's it. Nobody else has to repeat that process unless you change the dataset.
    Alan Dolhasz

    Alan Dolhasz

    Manager, Machine Learning Development at Matterport
  • With Deep Lake's streaming dataloader, Matterport was able to stream their data real-time to training frameworks, utilizing compute resources efficiently. With Deep Lake datasets acting as 'magic links' within the code, Matterport team was able to plug and play the dataset they wanted to rapidly iterate on choosing the best model architecture for the problem at hand.

    With Deep Lake, it's literally changing one line and we can train on a completely different dataset. This is something that would take at least a day before
    Alan Dolhasz

    Alan Dolhasz

    Manager, Machine Learning Development at Matterport
Multimodal support

Data Visualization

Deep Lake's powerful UI for complex data visualization allowed the team to share datasets6 easily for QA among the team and with other teams who may not understand their work thoroughly.

Deep Lake allowed Matterport store and visualize multimodal datasets in one place, setting the team up for fast ML cycles
Deep Lake allowed Matterport store and visualize multimodal datasets in one place, setting the team up for fast ML cycles

Results

Deep Lake significantly reduced the time and effort required to get from raw data7 to training. Implementing Deep Lake also led to substantial improvements in Matterport's operations, enabling the team to focus more on core tasks like iterating on model architecture and less on time-consuming data wrangling. It has freed up resources, and made managing complex, multimodal data easier.

It just abstracted so much of this work away so we could actually focus on the hard problems.
Alan Dolhasz

Alan Dolhasz

Manager, Machine Learning Development at Matterport
Increased Productivity
By standardizing the data handling process, Deep Lake allowed Matteport to allocate more of their time to business logic rather than infrastructure.
-80% Less Time Spent
On Training Data Preparation
From Hours to Seconds
Time to Train On a New Dataset
“Deep Lake made working on more complex data no more complicated from a data management point of view. Whether I'm working on 10 million images with 10 different modalities or a thousand images with just one modality, it's all the same from the perspective of the user of the system.”

Alan Dolhasz

Manager, Machine Learning Development at Matterport
Alan Dolhasz

Future Plans

Combining generative AI and property insights, Matterport’s digital twin platform aims to reshape the real estate landscape, optimizing interior design, space utilization, energy efficiency, safety, and accessibility while transforming property marketing strategies.

The company is particularly focused on leveraging multimodal data to modify spaces based on user requests. As they dive deeper into this complex data, Deep Lake's ability to efficiently manage multimodal data will be instrumental in helping Matterport achieve its future objectives.

Matterport: Pioneers in 3D Digital Twin Technology

Disclaimer

1-7. Matterport is dedicated to using only authorized data to enhance and refine their services, with a strong commitment to respecting the privacy preferences of their diverse customer base. For further details, see Matterport's Terms of Use. https://matterport.com/terms-of-use

    Book a Call
    Case studyLarge Language Models (LLMs) are pioneering the next frontier in enterprise workflows. Learn how top companies unlock value by linking their multimodal data to LLMs with the database for AI

    Increase in Lawyer Productivity with Hercules.ai by 18.5%

    Discover how Ropers Majeski, a leading law firm, utilized Hercules.AI, powered by Activeloop's cutting-edge enterprise data solutions, to achieve remarkable productivity gains and cost efficiencies with LLMs

    Read more
    Herculesai

    How Bayer Radiology Uses Database for AI to Disrupt Healthcare with GenAI

    Learn how Bayer Radiology, a division of a pharmaceutical powerhouse, used a secure, efficient, & scalable database for AI to pioneer medical GenAI workflows

    Read more
    Bayer
    • deep lake database

      Deep Lake. Database for AI.

      • Solutions
        AgricultureAudio ProcessingAutonomous Vehicles & RoboticsBiomedical & HealthcareMultimediaSafety & Security
      • Company
        AboutContact UsCareersPrivacy PolicyDo Not SellTerms & Conditions
      • Resources
        BlogDocumentationDeep Lake WhitepaperDeep Lake Academic Paper
    • Tensie

      Featured by

      featuredfeaturedfeaturedfeatured