Star us on Like our dataset format for AI? Give us a ⭐ on GitHub.

  • ActiveLoop
    • Solutions

      INDUSTRIES

      • Agriculture
        agriculture_technology_agritech
      • Audio Processing
        audio_processing
      • Autonomous Vehicles & Robotics
        autonomous_vehicles
      • Biomedical & Healthcare
        Biomedical_Healthcare
      • Multimedia
        multimedia
      • Safety & Security
        safety_security

      CASE STUDIES

      • IntelinAir
      • Learn how IntelinAir generates & processes datasets from petabytes of aerial imagery at 0.5x the cost

      • Earthshot Labs
      • Learn how Earthshot increased forest inventory management speed 5x with a mobile app

      • Ubenwa
      • Learn how Ubenwa doubled ML efficiency & improved scalability for sound-based diagnostics

      Company
      • About
      • Learn about our company, its members, and our vision

      • Contact Us
      • Get all of your questions answered by our team

      • Careers
      • Build cool things that matter. From anywhere

      Docs
      Resources
      • Blog
      • Opinion pieces & technology articles

      • Tutorials
      • Learn how to use Activeloop stack

      • Release Notes
      • See what's new?

      • News
      • Track company's major milestones

      • What is Deep Lake?
      • Read the whitepaper & academic paper

      Pricing
  • Log in

Improving Audio Machine Learning Infrastructure at Ubenwa

Learn how Ubenwa, a growing force in sound-based infant medical diagnostics, 2x efficiency & improved scalability with streamable, standardized Deep Lake datasets

Faster Tree Segmentation with Earthshot Labs’ Forest Inventory App

Machine Learning Case Study

Company Background

How Earthshot Labs conducts forest inventory projects in the field

Infant cry diagnostics with audio ML, courtesy: Ubenwa

Ubenwa develops AI-powered software for the early detection of neurological and respiratory conditions in infants using their cry

You've probably wondered at least once - why is my baby crying? Ubenwa is addressing just that. The company has a machine learning organization with 3 machine learning researchers (and some occasional interns!). The startup is in the early stages of developing a machine learning system that can accurately predict neonatal distress, a critical need, especially in developing countries. The company faced several challenges in building a scalable and efficient data infrastructure to support its machine learning models. Upon joining the company, our interviewee, Arsenii, was tasked with solving these challenges

Meet the interviewee

How Earthshot Labs conducts forest inventory projects in the field

Arsenii Gorin, Lead Machine Learning Engineer at Ubenwa

Arsenii is a Lead Machine Learning Engineer at Ubenwa who has been with the company for over six months. Arsenii was also responsible for building the data infrastructure and ensuring the efficient operation of the machine learning models. Before Ubenwa, Arsenii experienced all the bottlenecks of building complex data infrastructure in a quickly growing startup. He evaluated several plug-and-play solutions and chose Activeloop thanks to the quick time-to-value he experienced with Deep Lake.

comma

Accessing data in the cloud is like walking through quicksand, and relying on slow and unreliable file systems is like sinking deeper. Downloading data every time you run an experiment is like carrying a heavy burden that slows you down, and eventually, it might break you (and the training process). Deep Lake's on-the-fly streaming was an excellent choice for us: it was really easy to set up, and it started to bring the value of fast data loading from day one.

Arsenii Gorin

Lead ML Engineer@ubenwa_ai

Problems faced by Ubenwa

How Earthshot Labs conducts forest inventory projects in the field

Ubenwa app UI, courtesy: Ubenwa

Before Activeloop, Ubenwa ML team faced several challenges in building a scalable and efficient data infrastructure.

  • Lack of standardization: The data infrastructure was in its early stages, and there was no standardization in how data was loaded or processed. This led to a fragmented and disorganized data pipeline, making it difficult to scale the system.
  • Inefficient data loading: Ubenwa ML team spent a lot of time on the data loading process, which was not optimized for the company's use case. This resulted in slow and inefficient machine learning training pipelines. More importantly, for PyTorch training in the cloud, for instance, one could spend a lot of time loading data only after it catches an error in the training code. 
  • No support for audio data: Ubenwa's primary data source was audio recordings of crying babies, which the existing data infrastructure was not optimized for. This was a significant bottleneck in the system, as audio data is critical for building accurate machine learning models.

Solution

Speed, data quality, single source of truth, & easy-to-use UI

Activeloop was the solution that Arsenii was looking for to solve the problems faced at Ubenwa. Activeloop is a scalable and efficient data infrastructure platform that supports audio data and provides a standard way of processing and loading data.

Results achieved by Ubenwa with Activeloop

How Earthshot Labs conducts forest inventory projects in the field

Medical Diagnostics, courtesy: Ubenwa

2x the efficiency, standardization of ML datasets quality, plug-and-play scalable audio infrastructure for machine learning

Activeloop significantly improved the data infrastructure at Ubenwa, improving the efficiency and scalability of the system. Some of the key results were:

  • Increased efficiency by 2x: The data loading process was optimized, reducing the time spent on data loading - from two weeks to just one week.
  • Standardization of datasets for machine learning: Activeloop provided a standard way of processing and loading data, resulting in a more organized and streamlined data pipeline.
  • Support for audio data: Activeloop supported audio data, a critical requirement for Ubenwa's machine learning models. This allowed the Ubenwa ML team to efficiently process audio recordings of neonatal distress, which was impossible before.
  • Improved scalability: The efficient and standardized data pipeline enabled Ubenwa to scale its machine learning models more efficiently, resulting in a more scalable system.

Concluding remarks

Critical solution for scaling startups

Activeloop was a critical solution for Ubenwa's data infrastructure, providing a scalable and efficient platform for processing and loading data. The optimized data pipeline and support for audio data significantly improved the efficiency and scalability of Ubenwa's machine learning models. By adopting Activeloop, Ubenwa was able to build a more efficient and scalable system, accelerating towards their goal of detecting neonatal distress more accurately.

Case Study: AgriTech

Harvest season constrains the timeframe in which you can retrain your models on the new data to extract valuable insights on-the-fly. With our scalable ML data infrastructure, we helped IntelinAir achieve just that

Better AgriTech solutions with scalable aerial data pipelines

Learn how IntelinAir, a leading AgriTech company, transformed 1.5 petabytes of aerial data into vital insights for farmers with scalable plug-and-play data pipelines

webp
  • Deep Lake. Data Lake for deep learning applications

    • Solutions
      AgricultureAudio ProcessingAutonomous Vehicles & RoboticsBiomedical & HealthcareMultimediaSafety & Security
    • Company
      AboutContact UsCareersPrivacy PolicyTerms & Conditions
    • Resources
      BlogDocumentationDeep Lake WhitepaperDeep Lake Academic PaperHumans in the Loop Podcast
  • Tensie

    Featured by