Faster Tree Segmentation with Earthshot Labs’ Forest Inventory App
Learn how Earthshot, an AgriTech powerhouse in environmental conservation space, 5x their speed with 4x less resources required
Earthshot Labs collaborates with world-class ecosystem conservation and regeneration organizations around the globe to deliver the highest quality nature-based carbon projects
With products like the LandOS platform, Earthshot equips stakeholders with state-of-the-art AI ecological forecasting, financial modeling, and risk assessment. This allows Earthshot's customers and users to focus on the highest-impact activities and evaluate the impact of reforestation or conservation projects on the land.
Quickly developing a tree segmentation app in-house to shorten the time needed to conduct the forest inventory
One of Earthshot's core goals is to help scientists and volunteers conduct forest inventory. Earthshot’s mobile app Biome uses a combination of machine learning and augmented reality to measure various properties of trees, including their count, height, diameter at breast height (DBH), and species. These data are crucial for science teams to forecast forest biomass growth over time, which is critical for securing carbon market financing for reforestation projects. Earthshot's vision for this project in the future is for project managers to initiate and monitor reforestation projects anywhere in the world easily and for millions of citizen scientists to gather invaluable forest data that greatly assists ecological science and nature projects across the globe.
Margaux, what was the business value achieved with the project?
5x faster process, 4x less people needed for data collection
The data collection process is highly manual, requiring a lot of time and human resources. Even if you invest enough resources, it is prone to errors, as data input is non-standardized. For instance, we found that some people consider the last leaf on the tree to be the topmost point, while others consider it to be the highest point of the tree branch, which may introduce significant noise into the height data. This also happens when measuring DBH with a tape: some people might measure the tree at a different height than others which leads to significant differences in the final DBH value measured.
Another advantage of the Biome app is speed. When testing the app in the field, in Panama, I outperformed a team of 3-4 people by 5x on a 10-meter radius plot because measuring a single tree with the app was much faster and easier. My colleagues, who have forest inventory experience, stated that experienced foresters would be even faster! After a day or two of using the app in the forest, it actually took me less and less time to measure a whole plot. We were able to measure 20-meter plots with Biome by the end of the field trip.
What were the challenges solved with Deep Lake?
Speed, data quality, single source of truth, & easy-to-use UI
- Quickly developing & deploying a performant TensorFlow-based tree segmentation model in-app.
- Adding new data easily with the commit system and consistently ensuring its quality as it evolves.
- Establishing a single source of truth between the team members for the changing AI dataset
- Presenting the data to the larger team in an easy-to-use UI
In Margaux's words, "tree segmentation is a fairly straightforward project, making it very easy to get lost and overspend resources. I just needed to deploy a solution that works - and Activeloop made it simpler to ship our AI app quickly! The features that shined the most for me were the seamless versioning and instant visualization of various versions in the Deep Lake UI, as well as fast data access Deep Lake format enabled.
Did you encounter data quality issues? How did you solve them?
Managing data in Firebase was increasingly difficult. Activeloop saved the day!
As we were developing the tree segmentation model for the mobile app, managing the data we previously pooled in Firebase was becoming increasingly difficult as the app users took photos in the field. Then I decided to start using Activeloop. The tool helped me a lot in troubleshooting the model performance. At a certain point in time, we had really odd model results after retraining, with many unusual predictions and false positives. It didn't make sense at the time because the model was specifically trained to eliminate false positives. I inspected each version, finding multiple faulty masks with values from 100 to 255 (instead of binary). Such mistakes can take a long time to troubleshoot because they were in data from a trusted folder. I'm glad I was able to find the problem by using Deep Lake version control.
With the specific commit ID (see below), I could roll back to the intact version of the dataset until I figured out which exact commit was problematic with Deep Lake visualization.
What other features were notable for you?
Deep Lake enables magic links for data
While collaborating with my coworker, it was hard to work on the same data (especially with me switching between different machines). With Activeloop, I could share a simple link or commit ID to run the training scripts or notebook on all of our machines on the same dataset version - with one line of code. Much easier than always pulling back the data from the bucket in Firebase! I also started to use the commit ID for the test sets, as we have performance metrics for each model in a specific commit ID of the test set. In effect, if we had more data for the test set, we would re-run the older models on the new commit. This helps us to release the best-performing machine learning model on a specific test set.
What were the incremental results you achieved thanks to Activeloop?
+3.65% increase in accuracy, +7.81% IOU, +4.63% F1 score,
The resulting model does what I needed it to do: performs well on trees with weird shapes, in different lighting settings and angles, which is crucial for jungle forest inventory.