The Problem: Accessing Big Public Datasets for Computer Vision
Working on big datasets should be fun. But it's usually not. I've definitely spent a couple of afternoons on understanding how to access the data instead of training my machine learning models. At times, I also had to worry if my machine fits it in memory or in the storage once I start pre-processing the data. This was a shared sentiment between my team and I, and we created a solution for it. Now, you can access and start working with big datasets like Google Objectron in a matter of seconds.
This comes thanks to Data 2.0, a new approach to managing data, enabled by Activeloop's open source package Hub for storing any type of a dataset as cloud-native NumPy-like arrays. To illustrate how it works, we've partnered with Google's team working on one of Google's most popular datasets - Google Objectron.
Objectron a large dataset containing nearly four million annotated images. The images, annotations and labels along with other metadata has just been transferred and converted to Hub format. After reading this blog post, you will be acquainted with the simplicity of handling terabyte-scale datasets for computer vision.
About Google Objectron
Courtesy of Google Objectron research team*
Google Objectron is a dataset representing short video clips of various objects commonly found in modern settings. Developed in 2020 by a team of researchers from Google, the dataset has quickly attracted interest of the open source community. As of March 2021, it is the most popular among all published Google research datasets. It may be speculated that its popularity is driven by the meticulous standards of data collection and annotation as well as the diversity and comprehensibility of the collected images, split in 9 separate categories:
These sections include from nearly 500 to over 2000 distinct objects, which translates to approximately 150,000 - 580,000 video frames that can be used in training the machine learning models. The usual way of accessing Objectron data (which involves downloading, parsing and decoding TF Records) may be tedious and time-consuming even for experienced data scientists.
With Hub, it's just 28.6 seconds (with a little over 3 seconds per category).
Visualizing Google Objectron
Before we delve into the data, let's quickly inspect what's included in the data with the use of our visualization app. The visualizer is a good starting point for working with any computer vision dataset as it provides you with the schema (information on the data types the dataset is comprised of) and the instructions on how to load the data. Besides, having some intuition with regards to the contents of the data is often required to proceed with the selection of appropriate machine learning solutions to a given problem.
This is how visualizing the bike category of the dataset works. On the left, you may note the schema as well as specific information on a given tensor, including its shape. You may tweak with some of these parameters directly in the app, e.g. you may readily decrease the opacity of a label or an image. Up to 48 images may be represented on a single page. Disclaimer: currently, we're visualizing just the name and bounding box of the image.
Accessing Google Objectron
Install the open-source package Hub.
pip install hub
Load the data for bike category.
import hub bikes = hub.Dataset("google/bikes")
That's it, really! You can now access the data as if it were local.
Working with Google Objectron
The most basic use of Objectron involves browsing through images and their annotations. We need additional tools for some of the visualizations. Please install the following packages:
pip install matplotlib pip install opencv-python
Then, import these modules.
import matplotlib.pyplot as plt import cv2
To fetch a sample image with an 4500 index, you may simply run this line of code:
image = bikes['image', 4500].compute()
Then, to plot it, use:
This is the image you should see after running the above code.
Finally, the core value of Objectron are its annotations - 3D bounding boxes. We can represent those with just one function.
def get_bbox(example): RADIUS = 10 COLOR = (255, 255, 255) EDGES = [ [1, 5], [2, 6], [3, 7], [4, 8], # lines along x-axis [1, 3], [5, 7], [2, 4], [6, 8], # lines along y-axis [1, 2], [3, 4], [5, 6], [7, 8] # lines along z-axis ] fig, ax = plt.subplots() arranged_points = example['point_2d'].reshape(9,3) for i in range(arranged_points.shape): x, y, _ = arranged_points[i] cv2.circle( element['image'], (int(x * example['image_width']), int(y * example['image_height'])), RADIUS, COLOR, -10 ) for edge in EDGES: start_points = arranged_points[edge] start_x = int(example['image_width'] * start_points) start_y = int(example['image_height'] * start_points) end_points = arranged_points[edge] end_x = int(example['image_width'] * end_points) end_y = int(example['image_height'] * end_points) cv2.line(example['image'], (start_x, start_y), (end_x, end_y), COLOR, 2) ax.imshow(example['image'])
Fetch the element and pass it to the function to plot the image with the bounding box.
element = bikes.compute() get_bbox(element)
An image of a bike with annotations that you will see after going thorough the guidelines.
You can work with other data categories, like cup, in an identical fashion.
cups = hub.Dataset("google/cups") cups['image'].shape
We hope this was a fun experience for you. Try to work with the dataset on your own and share your results with us. Feel free to use our Objectron-related notebook to further experiment with Hub. you can also check out our website and documentation to learn more, and join Activeloop’s Slack community to ask our team more questions!
We would like to thank Adel Ahmadyan for the support on this project, as well the researchers from Google who developed Objectron, including Liangkai Zhang, Jianing Wei, Artsiom Ablavatski and Matthias Grundmann. Jakub Boros contributed to this blogpost.