Improving Audio Machine Learning Infrastructure at Ubenwa
Learn how Ubenwa, a growing force in sound-based infant medical diagnostics, 2x efficiency & improved scalability with streamable, standardized Deep Lake datasets
Machine Learning Case Study
Infant cry diagnostics with audio ML, courtesy: Ubenwa
Ubenwa develops AI-powered software for the early detection of neurological and respiratory conditions in infants using their cry
You've probably wondered at least once - why is my baby crying? Ubenwa is addressing just that. The company has a machine learning organization with 3 machine learning researchers (and some occasional interns!). The startup is in the early stages of developing a machine learning system that can accurately predict neonatal distress, a critical need, especially in developing countries. The company faced several challenges in building a scalable and efficient data infrastructure to support its machine learning models. Upon joining the company, our interviewee, Arsenii, was tasked with solving these challenges
Meet the interviewee
Arsenii Gorin, Lead Machine Learning Engineer at Ubenwa
Arsenii is a Lead Machine Learning Engineer at Ubenwa who has been with the company for over six months. Arsenii was also responsible for building the data infrastructure and ensuring the efficient operation of the machine learning models. Before Ubenwa, Arsenii experienced all the bottlenecks of building complex data infrastructure in a quickly growing startup. He evaluated several plug-and-play solutions and chose Activeloop thanks to the quick time-to-value he experienced with Deep Lake.
Accessing data in the cloud is like walking through quicksand, and relying on slow and unreliable file systems is like sinking deeper. Downloading data every time you run an experiment is like carrying a heavy burden that slows you down, and eventually, it might break you (and the training process). Deep Lake's on-the-fly streaming was an excellent choice for us: it was really easy to set up, and it started to bring the value of fast data loading from day one.
Lead ML Engineer@ubenwa_ai
Problems faced by Ubenwa
Ubenwa app UI, courtesy: Ubenwa
Before Activeloop, Ubenwa ML team faced several challenges in building a scalable and efficient data infrastructure.
- Lack of standardization: The data infrastructure was in its early stages, and there was no standardization in how data was loaded or processed. This led to a fragmented and disorganized data pipeline, making it difficult to scale the system.
- Inefficient data loading: Ubenwa ML team spent a lot of time on the data loading process, which was not optimized for the company's use case. This resulted in slow and inefficient machine learning training pipelines. More importantly, for PyTorch training in the cloud, for instance, one could spend a lot of time loading data only after it catches an error in the training code.
- No support for audio data: Ubenwa's primary data source was audio recordings of crying babies, which the existing data infrastructure was not optimized for. This was a significant bottleneck in the system, as audio data is critical for building accurate machine learning models.
Speed, data quality, single source of truth, & easy-to-use UI
Activeloop was the solution that Arsenii was looking for to solve the problems faced at Ubenwa. Activeloop is a scalable and efficient data infrastructure platform that supports audio data and provides a standard way of processing and loading data.
Results achieved by Ubenwa with Activeloop
Medical Diagnostics, courtesy: Ubenwa
2x the efficiency, standardization of ML datasets quality, plug-and-play scalable audio infrastructure for machine learning
Activeloop significantly improved the data infrastructure at Ubenwa, improving the efficiency and scalability of the system. Some of the key results were:
- Increased efficiency by 2x: The data loading process was optimized, reducing the time spent on data loading - from two weeks to just one week.
- Standardization of datasets for machine learning: Activeloop provided a standard way of processing and loading data, resulting in a more organized and streamlined data pipeline.
- Support for audio data: Activeloop supported audio data, a critical requirement for Ubenwa's machine learning models. This allowed the Ubenwa ML team to efficiently process audio recordings of neonatal distress, which was impossible before.
- Improved scalability: The efficient and standardized data pipeline enabled Ubenwa to scale its machine learning models more efficiently, resulting in a more scalable system.
Critical solution for scaling startups
Activeloop was a critical solution for Ubenwa's data infrastructure, providing a scalable and efficient platform for processing and loading data. The optimized data pipeline and support for audio data significantly improved the efficiency and scalability of Ubenwa's machine learning models. By adopting Activeloop, Ubenwa was able to build a more efficient and scalable system, accelerating towards their goal of detecting neonatal distress more accurately.