• ActiveLoop
    • Solutions
      Industries
      • agriculture
        Agriculture
      • audio proccesing
        Audio Processing
      • autonomous_vehicles
        Autonomous & Robotics
      • biomedical_healthcare
        Biomedical & Healthcare
      • generative_ai_and_rag
        Generative AI & RAG
      • multimedia
        Multimedia
      • safety_security
        Safety & Security
      Case Studies
      Enterprises
      BayerBiomedical

      Chat with X-Rays. Bye-bye, SQL

      MatterportMultimedia

      Cut data prep time by up to 80%

      Flagship PioneeringBiomedical

      +18% more accurate RAG

      MedTechMedTech

      Fast AI search on 40M+ docs

      Generative AI
      Hercules AIMultimedia

      100x faster queries

      SweepGenAI

      Serverless DB for code assistant

      Ask RogerGenAI

      RAG for multi-modal AI assistant

      Startups
      IntelinairAgriculture

      -50% lower GPU costs & 3x faster

      EarthshotAgriculture

      5x faster with 4x less resources

      UbenwaAudio

      2x faster data preparation

      Tiny MileRobotics

      +19.5% in model accuracy

      Company
      Company
      about
      About
      Learn about our company, its members, and our vision
      Contact Us
      Contact Us
      Get all of your questions answered by our team
      Careers
      Careers
      Build cool things that matter. From anywhere
      Docs
      Resources
      Resources
      blog
      Blog
      Opinion pieces & technology articles
      langchain
      LangChain
      LangChain how-tos with Deep Lake Vector DB
      tutorials
      Tutorials
      Learn how to use Activeloop stack
      glossary
      Glossary
      Top 1000 ML terms explained
      news
      News
      Track company's major milestones
      release notes
      Release Notes
      See what's new?
      Academic Paper
      Deep Lake Academic Paper
      Read the academic paper published in CIDR 2023
      White p\Paper
      Deep Lake White Paper
      See how your company can benefit from Deep Lake
      Free GenAI CoursesSee all
      LangChain & Vector DBs in Production
      LangChain & Vector DBs in Production
      Take AI apps to production
      Train & Fine Tune LLMs
      Train & Fine Tune LLMs
      LLMs from scratch with every method
      Build RAG apps with LlamaIndex & LangChain
      Build RAG apps with LlamaIndex & LangChain
      Advanced retrieval strategies on multi-modal data
      Pricing
  • Book a Demo
Case Study

Improving Audio Machine Learning Infrastructure at Ubenwa

Learn how Ubenwa, a growing force in sound-based infant medical diagnostics, 2x efficiency & improved scalability with streamable, standardized Deep Lake datasets

icon
poster
icon2x Faster Data
Processing

Company Background

Ubenwa develops AI-powered software for the early detection of neurological and respiratory conditions in infants using their cry. You've probably wondered at least once - why is my baby crying? Ubenwa is addressing just that. The company has a machine learning organization with 3 machine learning researchers (and some occasional interns!). The startup is in the early stages of developing a machine learning system that can accurately predict neonatal distress, a critical need, especially in developing countries. The company faced several challenges in building a scalable and efficient data infrastructure to support its machine learning models. Upon joining the company, our interviewee, Arsenii, was tasked with solving these challenges

Infant cry diagnostics with audio ML, courtesy: UbenwaInfant cry diagnostics with audio ML, courtesy: Ubenwa

Meet the Interviewee

Arsenii is a Lead Machine Learning Engineer at Ubenwa who has been with the company for over six months. Arsenii was also responsible for building the data infrastructure and ensuring the efficient operation of the machine learning models. Before Ubenwa, Arsenii experienced all the bottlenecks of building complex data infrastructure in a quickly growing startup. He evaluated several plug-and-play solutions and chose Activeloop thanks to the quick time-to-value he experienced with Deep Lake.

pulse oximeter
“Accessing data in the cloud is like walking through quicksand, and relying on slow and unreliable file systems is like sinking deeper. Downloading data every time you run an experiment is like carrying a heavy burden that slows you down, and eventually, it might break you (and the training process). Deep Lake's on-the-fly streaming was an excellent choice for us: it was really easy to set up, and it started to bring the value of fast data loading from day one.”

Arsenii Gorin

Lead Machine Learning Engineer at Ubenwa
Arsenii Gorin

The Challenges

Before Activeloop, Ubenwa ML team faced several challenges in building a scalable and efficient data infrastructure.

  • 1

    Lack of Standardization

    The data infrastructure was in its early stages, and there was no standardization in how data was loaded or processed. This led to a fragmented and disorganized data pipeline, making it difficult to scale the system.

  • 2

    Inefficient Data Loading

    Ubenwa ML team spent a lot of time on the data loading process, which was not optimized for the company's use case. This resulted in slow and inefficient machine learning training pipelines. More importantly, for PyTorch training in the cloud, for instance, one could spend a lot of time loading data only after it catches an error in the training code. 

  • 3

    No Support for Audio Data

    Ubenwa's primary data source was audio recordings of crying babies, which the existing data infrastructure was not optimized for. This was a significant bottleneck in the system, as audio data is critical for building accurate machine learning models.

Solution

Speed, data quality, single source of truth, & easy-to-use UI. Activeloop was the solution that Arsenii was looking for to solve the problems faced at Ubenwa. Activeloop is a scalable and efficient data infrastructure platform that supports audio data and provides a standard way of processing and loading data.

Ubenwa app UI, courtesy: UbenwaInfant cry diagnostics with audio ML, courtesy: Ubenwa

Results

2x the efficiency, standardization of ML datasets quality, plug-and-play scalable audio infrastructure for machine learning. Activeloop significantly improved the data infrastructure at Ubenwa, improving the efficiency and scalability of the system. Some of the key results were

  • Increased Efficiency by 2x
    The Data Loading Process was Optimized, Reducing the Time Spent on Data Loading - From Two Weeks to Just One Week.
  • Standardization of Datasets for Machine Learning
    Activeloop Provided a Standard Way of Processing and Loading Data, Resulting in a More Organized and Streamlined Data Pipeline.
  • Support for Audio Data
    Activeloop Supported Audio Data, a Critical Requirement for Ubenwa's Machine Learning Models. This Allowed the Ubenwa ML Team to Efficiently Process Audio Recordings of Neonatal Distress, Which Was Impossible Before.
  • Improved Scalability
    The Efficient and Standardized Data Pipeline Enabled Ubenwa to Scale its Machine Learning Models More Efficiently, Resulting in a More Scalable System.

Concluding Remarks

Critical solution for scaling startups. Activeloop was a critical solution for Ubenwa's data infrastructure, providing a scalable and efficient platform for processing and loading data. The optimized data pipeline and support for audio data significantly improved the efficiency and scalability of Ubenwa's machine learning models. By adopting Activeloop, Ubenwa was able to build a more efficient and scalable system, accelerating towards their goal of detecting neonatal distress more accurately.

Medical Diagnostics, courtesy: UbenwaInfant cry diagnostics with audio ML, courtesy: Ubenwa
    Book a Call
    Case studyLarge Language Models (LLMs) are pioneering the next frontier in enterprise workflows. Learn how top companies unlock value by linking their multimodal data to LLMs with the database for AI

    How Bayer Radiology Uses Database for AI to Disrupt Healthcare with GenAI

    Learn how Bayer Radiology, a division of a pharmaceutical powerhouse, used a secure, efficient, & scalable database for AI to pioneer medical GenAI workflows

    Read more
    Bayer

    Increase in Lawyer Productivity with Hercules.ai by 18.5%

    Discover how Ropers Majeski, a leading law firm, utilized Hercules.AI, powered by Activeloop's cutting-edge enterprise data solutions, to achieve remarkable productivity gains and cost efficiencies with LLMs

    Read more
    Herculesai
    • deep lake database

      Deep Lake. Database for AI.

      • Solutions
        AgricultureAudio ProcessingAutonomous Vehicles & RoboticsBiomedical & HealthcareMultimediaSafety & Security
      • Company
        AboutContact UsCareersPrivacy PolicyDo Not SellTerms & Conditions
      • Resources
        BlogDocumentationDeep Lake WhitepaperDeep Lake Academic Paper
    • Tensie

      Featured by

      featuredfeaturedfeaturedfeatured