Learn how IntelinAir, the leading crop intelligence company, transformed 1500 terabytes of aerial imagery into vital insights for farmers with scalable plug-and-play data pipelines with Activeloop and NVIDIA.
Inference Speed
3x
from weeks to 1-2 hours
Compute and Storage Cost
0.5x
money spent
Data Storage
-30%
storage required
Model Accuracy
95%
+12% from the baseline model
IntelinAir is a full-season and full-spectrum crop intelligence company focused on agriculture that delivers actionable intelligence to help farmers make data-driven decisions to improve operational efficiency, yields, and ultimately their profitability. IntelinAir, a member of NVIDIA Inception, combines the power of aerial imagery analytics through computer vision and deep learning methodologies, agronomic science, and user-friendly interface (mobile) technologies to deliver near real-time decision support to farmers.
“Farmers simply cannot afford 1000 agronomists scouting their fields to detect pests, diseases, nutrition, and irrigation problems in their fields. That is where IntelinAir comes in. Our goal is to organize and digitize the world’s crop information and performance - making it universally accessible and useful to deliver high yields, greater efficiencies, and sustainable farming to feed the human race.”
“Our goal is to organize and digitize the world’s crop information and performance - making it universally accessible and useful to deliver high yields, greater efficiencies, and sustainable farming to feed the human race.”
Challenges
What have been the challenges while dealing with large scale aerial data?
“We are continuously collecting data. To rapidly deliver value to our customers, we need to develop, test, scale, and deploy our best-performing models to the cloud. Doing this consistently without reusable data pipelines is a Sisyphean task. Having a stable, scalable pipeline in place was crucial to meet tight time limits from both our customers and Mother Nature.
At IntelinAir, our data is big in several ways. First, we gather high-resolution imagery (10cm/pixel) which results in aerial images of >1GB for individual fields. Next, our unstructured data is multi-spectral and multi-sensor: in addition to RGB and NIR (infrared) imagery, we collect thermal, topography, soil composition, weather, and management (e.g. planter files and harvest maps) data. Third, our data is temporal: we fly 13 flights across the season to understand how field health evolves and to capture how management decisions impact yield. This means that we have a lot of it: in 2020 alone, we will image millions of acres of farmland across hundreds of thousands of fields and capture over 1.5 petabytes (1,500,000 gigabytes) of raw data.
“1.5 PBs is a lot of data”, interjects Davit. “If one were to line up 1.5 petabyte's worth of 1 GB flash drives end to end, they would stretch across 138 football fields. 1.5 PBs equates to about 10 Billion, or 8.7% of all photos ever uploaded to Facebook. Intelinair is not alone in its search for ways to deal with big data more efficiently. In our experience, data scientists spend more than half of their time cleaning up the mix of structured and unstructured data and preparing to input it into the machine learning / AI models rather than actual big data analysis. At Activeloop, we managed to solve it by creating a fast and simple framework for building and scaling the data pipeline for IntelinAir.”
“To operate in an agile fashion, we want our data science team to focus on building high quality models instead of fighting with data pipelines, infrastructure, and deployment challenges”, says Jennifer. Of course, we could always hire more data scientists to tackle these issues. However, a smarter way to deal with this is to harness the power of NVIDIA GPUs and Activeloop's highly efficient, integrated processing pipelines and data storage to deliver cutting edge analytics to our customers, in a cost-effective way.
Furthermore, the sheer size of this data means we need to be very efficient in our data management and pipelining to avoid incurring unnecessary costs. This means we have to identify the most appropriate data for training our models and optimizing whether to train locally or in the cloud. Balancing the cost of compute versus network (computing while one is downloading data so that we do not waste time or resources) becomes paramount. To move quickly and operate in an agile fashion, we have to experiment with different data architectures on the fly and iterate on experiments quickly. The more we can rely on reusable data pipelining, the better.”
Activeloop’s solution automatically ingests and preprocesses large volumes of data with scalable plug-and-play data pipelines. Our data scientists can now easily build these pipelines, try them out locally, and then effortlessly scale to the cloud. We can directly stream generated datasets to deep learning frameworks for training machine learning models.
“They helped us develop an automated pipeline and platform for training classification models. We can now do this via a clean and easy-to-use interface. By simply uploading a different set of flight-codes and labels, we can train a variety of common models (ResNet, DenseNet, VGG, etc.), compare their performance, and select the best ones for deployment. All of the auto-scaling and cluster management is now handled in the background by Activeloop, and we do not have to worry about it.
For more complex models involving object detection or segmentation, we wanted greater control over the pipeline. Here, Activeloop integrated our annotation database and cloud data store with their Hub storage platform. Now, we can automatically pull our annotations (which are often in polygon form) and generate either bounding boxes or segmentation masks depending on the model we want to build. This saves us a lot of time spent on improving the accuracy of our algorithms.
Solution
Can you please describe Activeloop’s solution and its benefits for IntelinAir?
This is breakthrough stuff, letting us generate a dataset for our different tasks with just a few parameter changes, and get past the step of dataset creation, which is often the most time-consuming part of the model-building process. We can pull small datasets for debugging, as well as pull larger ones for experimentation and training, and do each one either locally or in the cloud as needed. The dynamic computation graph Activeloop built makes it possible to pull only the data we need at a given point in time, so we save on data access costs. And the power of our NVIDIA V100 GPUs means we can crunch that data at the highest possible speeds, so training happens as quickly as possible.”
Experience
What do you like most about Activeloop? How was the experience working with them?
The team at Activeloop works so fast. It's tremendous how quickly they are able to push out new features, address our concerns, and make improvements. We would discuss an idea one week, and it would be in production the next.
On numerous occasions, we hopped on calls with the Activeloop team so we could really understand how things worked. They're very available to take calls and answer questions; just super-reliable, accessible, and easy to work with.
The Activeloop team has been so dependable they function as part of our core team. When we were up against deadlines, they dove in and embraced them as their deadlines and helped push us over the finish line. During the process, they helped us build several models, so we hit our targets and could see the entire system work end to end; the NVIDIA V100 GPUs did their part by ensuring that the throughput we needed was there. There is truly no part of the machine learning lifecycle that Activeloop isn't equipped to handle. And backed by NVIDIA's GPUs, we know we'll get the best when it comes to training and inference speeds".
Collaboration
Davit, what was interesting about this collaboration with IntelinAir?
“We were delighted to be a part of this project”, Davit, the CEO of Activeloop, says. “It is not every day when you get to see a company with such a powerful vision. Solving one of the most crucial issues we face as a society - sustainable food for everyone is a great cause to be a part of.
We loved IntelinAir’s dedication to helping farmers make crucial decisions as fast and informed as possible. IntelinAir partnered up with us because they felt like if they cut the time they spend on building data pipelines, they can concentrate on improving the predictive power of their models to provide better insights to farmers. While working with them, we were amazed by their customer-first attitude. All of IntelinAir’s goals are driven by maximizing the value for the farmers - by catching with their technology what one cannot see with a naked eye. This was manifested in various ways: be it boosting farm efficiency by capturing early signs of plant malnutrition or irrigation problems or identifying abnormal crop conditions more accurately, or providing more tailored suggestions based on the development of the fields over time with their sophisticated ML models.
This whole project was all about empowering IntelinAir to do what they are best at - using their know-how in deep learning, computer vision, and agronomics. Together, we succeeded by contributing our expertise in processing data - intelligently and at scale.
What's next
Jennifer, now that Activeloop helped you resolve the issue with data pipelines, what do you want to focus on next in terms of making machine learning and deep learning more efficient at Intelinair?
“Since we are dealing with aerial imagery, data labeling is another bottleneck of ours. From a strategic standpoint, we want to achieve a broad active learning platform and “close the (feedback) loop”. We can train a model with the annotations we have, deploy it, inference new data, correct missed predictions, and feed those back to the model for retraining. This will make our algorithms more accurate over time as they see more (challenging) examples. We are excited to be collaborating with Activeloop to establish this paradigm to support all of our models in this manner going forward.
IntelinAir always strives to be at the forefront of machine learning and artificial intelligence. Activeloop and NVIDIA understand how the future of the technology will evolve, and what companies need to succeed in adapting to the new era. Together, we can leverage a multitude of cutting-edge technologies and frameworks to solve the biggest problems for our customers. Ultimately, we are able to make them more agile and efficient to shape the future of their agriculture.”