• ActiveLoop
    • Solutions
      Industries
      • agriculture
        Agriculture
      • audio proccesing
        Audio Processing
      • autonomous_vehicles
        Autonomous & Robotics
      • biomedical_healthcare
        Biomedical & Healthcare
      • generative_ai_and_rag
        Generative AI & RAG
      • multimedia
        Multimedia
      • safety_security
        Safety & Security
      Case Studies
      Enterprises
      BayerBiomedical

      Chat with X-Rays. Bye-bye, SQL

      MatterportMultimedia

      Cut data prep time by up to 80%

      Flagship PioneeringBiomedical

      +18% more accurate RAG

      MedTechMedTech

      Fast AI search on 40M+ docs

      Generative AI
      Hercules AIMultimedia

      100x faster queries

      SweepGenAI

      Serverless DB for code assistant

      Ask RogerGenAI

      RAG for multi-modal AI assistant

      Startups
      IntelinairAgriculture

      -50% lower GPU costs & 3x faster

      EarthshotAgriculture

      5x faster with 4x less resources

      UbenwaAudio

      2x faster data preparation

      Tiny MileRobotics

      +19.5% in model accuracy

      Company
      Company
      about
      About
      Learn about our company, its members, and our vision
      Contact Us
      Contact Us
      Get all of your questions answered by our team
      Careers
      Careers
      Build cool things that matter. From anywhere
      Docs
      Resources
      Resources
      blog
      Blog
      Opinion pieces & technology articles
      langchain
      LangChain
      LangChain how-tos with Deep Lake Vector DB
      tutorials
      Tutorials
      Learn how to use Activeloop stack
      glossary
      Glossary
      Top 1000 ML terms explained
      news
      News
      Track company's major milestones
      release notes
      Release Notes
      See what's new?
      Academic Paper
      Deep Lake Academic Paper
      Read the academic paper published in CIDR 2023
      White p\Paper
      Deep Lake White Paper
      See how your company can benefit from Deep Lake
      Free GenAI CoursesSee all
      LangChain & Vector DBs in Production
      LangChain & Vector DBs in Production
      Take AI apps to production
      Train & Fine Tune LLMs
      Train & Fine Tune LLMs
      LLMs from scratch with every method
      Build RAG apps with LlamaIndex & LangChain
      Build RAG apps with LlamaIndex & LangChain
      Advanced retrieval strategies on multi-modal data
      Pricing
  • Book a Demo
Generate image embeddings using a pre-trained CNN and store them in Hub
    • Back
      • Tutorials

    Generate image embeddings using a pre-trained CNN and store them in Hub

    Computer vision and LLM training can be... ruff. Image embeddings, a lower-dimensional representation of images, make it easier. Learn more about image embeddings and how to store them in our new paw-some article featuring the Dog Breed Images Dataset.
    • Margaux Masson-ForsytheMargaux Masson-...
    8 min readon Sep 20, 2021Updated Apr 21, 2023
  • Computer vision is one of the biggest challenges of Machine Learning. Humans are very good at distinguishing visual representations, but teaching this skill to a computer is no easy task.

    For instance, we can easily tell the difference between a dog or a cat. Even between different breeds of dog. A chihuahua does not look like a golden retriever, but it can be hard for a computer to learn how to distinguish between these two breeds of dog. We will leave out contemplating which one’s cuter (as it’s much, much harder to answer this question than generating image embeddings in Python using a pre-trained CNN and storing them in Activeloop Deep Lake).

    In this article, we will thus study image embeddings: what are they, how they are generated, and why they are so useful in Computer Vision. Before we start though, if you’re reading this right now, chances are you’re considering training your own Large Language Model (LLM), finetuning it or connecting an LLM to LangChain.

    1. Training A CLiP model from scratch with Deep Lake: code example.
    2. Generative AI Data Infrastructure: How to Train Large Language Models (LLMs) with Deep Lake - a practical example showing high GPU utilization with Deep Lake + Lambda Labs.
    3. LangChain & GPT-4 for Code Understanding: Twitter Algorithm
    4. Ultimate Guide to LangChain & Deep Lake: Build ChatGPT to Answer Questions on Your Financial Data
    5. How we integrated GPT-4 into our product to create Text to SQL (or TQL - Tensor Query Language in our case)

    What are image embeddings?

    An image embedding is a lower-dimensional representation of the image. In other words, it is a dense vector representation of the image which can be used for many tasks such as classification.

    A convolutional neural network (CNN) can be used to create the image embedding.

    For instance, these deep learning representations are sometimes used to create a search engine since it relies on image similarity. Indeed, to find images of one class (for example dog), we would only need to find the embedding vectors the closest to the dog image’s vector.

    A good way to find those is by calculating the cosine similarity between the embeddings. Similar images will have a high cosine similarity between embeddings.

    Dog Breed Images Dataset from Kaggle

    For this example, we will use one of my favorite datasets: the Kaggle Dog Breed Images 🐶

    Kaggle-dog-breed-images (1)

    First, we need to download this dataset:

    !export KAGGLE_USERNAME="xxxx" && export KAGGLE_KEY="xxxx" && mkdir -p data && cd data 
    && kaggle datasets download -d eward96/dog-breed-images 
    && unzip -n dog-breed-images.zip && rm dog-breed-images.zip
    

    Let’s see what is in this data folder:

    See what is in the data folder that was downloaded with the ls command in the terminal (2)

    So here we have 10 different breeds of dog: *bernese_mountain_dog, chihuahua, dachshund, jack_russell, pug, border_collie, corgi, golden_retriever, labrador, siberian_husky. *

    import glob
    data_dir = 'data'
    
    list_imgs = glob.glob(data_dir + "/**/*.jpg")
    print(f"There are {len(list_imgs)} images in the dataset {data_dir}")
    

    => There are 918 images in the dataset data.

    Here is an example of how to create a Deep Lake dataset from the dog breeds folder and store it in Deep Lake cloud.

    To create the dataset, we used the torchvision modules: datasets and transforms, along with torch.utils.data.DataLoader:

    from torchvision import datasets, transforms
    import torch
    
    # create dataloader with required transforms 
    tc = transforms.Compose([
            transforms.Resize((256, 256)),
            transforms.ToTensor()              
        ])
    
    image_datasets = datasets.ImageFolder(data_dir, transform=tc)
    dloader = torch.utils.data.DataLoader(image_datasets, batch_size=10, shuffle=False)
    
    print(len(image_datasets)) # returns 918
    

    We have a resized, and batched dataset dloader ready to be used.

    NB: Pytorch default backend for images are Pillow, and when you use ToTensor()class, PyTorch automatically converts all images into [0,1] so no need to normalize the images here.

    If we want to visualize the first image in this dataset:

    for img, label in dloader:
          print(np.transpose(img[0], (1,2,0)).shape)
          print(img[i])
          plt.imshow((img[i].detach().numpy().transpose(1, 2, 0)*255).astype(np.uint8))
          plt.show()
          i = i + 1
          break
    

    Result of visualizing the first image in this dataset within the terminal (1)

    We can see that the image was resized to 256x256 and is normalized.

    Generate image embeddings from the Dog Breed Images Dataset

    To generate the image embeddings, we will use a pre-trained model up to the last layer before classification, also called the penultimate layer.

    The first layers of a CNN (Convolutional Neural Network) extract the features of the input image, then the fully-connected layers handle the classification and return class probabilities that are then passed to a softmax loss for example that will determine which class had the highest probability score:

    Diagram explaining how the CNN (convolutional neural network) extracts the features of the input image and then returns class probabilities that are then passed to a softmax loss (1)

    In our case, we will use a pre-trained **ResNet-18** model:

    ResNet-18 model architecture—Source Almezhghwi, Khaled & Serte, Sertan. (2020). Improved Classification of White Blood Cells with the Generative Adversarial Network and Deep Convolutional Neural Network. Computational Intelligence a (1)

    It can easily be downloaded with torch.hub.load:

    # fetch pretrained model
    model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet18', pretrained=True)
    

    Now we need to select the layer we want to extract features from. If we look at the architecture of ResNet-18 again:

    Image of the result of visualizing images and their embeddings within the terminal (1)

    We can see that the last layer is the layer fc (fully-connected) where the features are being classified. We want the features before the classification part of the CNN, so, we want the first fully-connected layer which is the one before fc: the avgpool layer.

    We can select this layer using the model object:

    # Select the desired layer
    layer = model._modules.get('avgpool')
    

    Then, we use the register_forward_hook module to get the embeddings:

    def copy_embeddings(m, i, o):
        """Copy embeddings from the penultimate layer.
        """
        o = o[:, :, 0, 0].detach().numpy().tolist()
        outputs.append(o)
    
    outputs = []
    # attach hook to the penulimate layer
    _ = layer.register_forward_hook(copy_embeddings)
    

    NB: The function copy_embeddings will be called every time after forward() has computed an output and will save it in the list ouputs .

    Then, we need to model to inference mode:

    model.eval() # Inference mode
    

    Let’s use this model to generate embeddings for our dog breed images:

    # Generate image's embeddings for all images in dloader and saves 
    # them in the list outputs
    for X, y in dloader:
        _ = model(X)
    print(len(outputs)) #returns 92
    

    Since dloader is batched, we need to flatten the outputs:

    # flatten list of embeddings to remove batches
    list_embeddings = [item for sublist in outputs for item in sublist]
    
    print(len(list_embeddings)) # returns 918
    print(np.array(list_embeddings[0]).shape)) #returns (512,)
    

    As expected, the length of the new flattened list list_embeddings is equal to 918 which is the number of images we have in this dog breed dataset. Plus, the shape of the first item in the list list_embeddings is (512,) which corresponds to the shape of the output of the avgpool layer.

    Send images and image embeddings to Deep Lake

    Once the embeddings of all images are generated, we do not need to generate them again and can use them directly to perform diverse tasks such as classification, as explained previously. This is one of the reasons why embeddings in computer vision are so popular as they are very easy to re-use once generated.

    Therefore, we will send our freshly generated embeddings and their images to Activeloop Deep Lake.

    First, we need to login into our Activeloop account with this command:

    !activeloop login -u username -p password
    

    You can alternatively use a Deep Lake API token to authenticate.Then, we choose the name of the canine dataset we are about to create from the dog breed images dataset:

    hub_dogs_path = "hub://margauxmforsythe/dogs_breeds_embeddings"
    

    Now, we can send our doggie data into this dataset that will be easily accessible using the path “hub://margauxmforsythe/dogs_breeds_embeddings”. In this example, we use the “with” syntax for better performance (see more about it here):

    with deeplake.empty(hub_dogs_path) as ds:
        # Create the tensors 
        ds.create_tensor('images', htype = 'image', 
                         sample_compression = 'jpeg')
        ds.create_tensor('embeddings')
    
        # Add arbitrary metadata - Optional
        ds.info.update(description = 'Dog breeds embeddings dataset')
        ds.images.info.update(camera_type = 'SLR')
    
        # Iterate through the images and their corresponding embeddings,
        and append them to hub dataset    
        for i in tqdm(range(len(image_datasets))):
          img = image_datasets[i][0].detach().numpy().transpose(1, 2, 0)
          img = img * 255 # images are normalized
          img = img.astype(np.uint8)
    
          # Append to Deep Lake Dataset
          ds.images.append(img)  
          ds.embeddings.append(list_embeddings[i])
    

    Our dog breed — embeddings dataset is now available in Hub. Paw-some! This means we can load these images and their embeddings easily with this line:

    ds_from_hub = deeplake.dataset(hub_dogs_path)
    

    Let’s visualize some of the images and their embeddings:

    def show_image_in_ds(ds, idx=1):
        image = ds.images[idx].numpy()
        embedding = ds.embeddings[idx].numpy()
        print("Image:")
        print(image.shape)
        plt.imshow(image)
        plt.show()
        print(embedding[0:10]) # show only 10 first values of the image embedding
    
    for i in range(4):
        show_image_in_ds(ds_from_hub, i)
    

    Alternatively, you can visualize the dataset calling the following function:

    1ds_from_hub.visualize()
    2

    Showcasing four image of dogs and their embeddings from the Deep Lake dataset

    We can now easily get an image and its embedding from our Hub dataset, and start finding similar images using the similarities between embeddings! On a side note, those doggos are so beautiful, they could’ve easily been on the cover of… Vanity Fur.

    Embeddings are routinely used across industries such as AgriTech, Autonomous Vehicles & Robotics, Audio Processing & Enhancement

    Here is the link to the notebook with all the steps demonstrated in this article. If you have more questions about the notebook, feel free to ask in #community channel of team Activeloop’s Slack.

    Share:

    • Table of Contents
    • What are image embeddings?
    • Dog Breed Images Dataset from Kaggle
    • Generate image embeddings from the Dog Breed Images Dataset
    • Send images and image embeddings to Deep Lake
    • Previous
        • Blog
      • Data-centric AI enablers. Best data-centric MLOps tools in 2022

      • on Oct 18, 2021
    • Next
        • Blog
        • Tutorials
      • Radiology Machine Learning. Multi-Image Segmentation with TransUNet

      • on Nov 4, 2022

Related Articles

In this article, it's cloudy with a chance of U-Net and Hub fixing it. Community member Margaux fixes one of the biggest challenges while working with remote sensing images.
    • Tutorials
Binary Semantic Segmentation: Cloud detection with U-net and Activeloop HubAug 17, 2021
ML projects sometimes get stuck due to... fishy reasons. Margaux compares Activeloop Cloud storage against AWS S3 using a famous aquatic dataset. Uploading datasets to Hub using parallel computing was 2x faster than AWS CLI and ~20x faster than boto3!
    • Tutorials
Accelerate your Machine Learning WorkflowSep 13, 2021
  • deep lake database

    Deep Lake. Database for AI.

    • Solutions
      AgricultureAudio ProcessingAutonomous Vehicles & RoboticsBiomedical & HealthcareMultimediaSafety & Security
    • Company
      AboutContact UsCareersPrivacy PolicyDo Not SellTerms & Conditions
    • Resources
      BlogDocumentationDeep Lake WhitepaperDeep Lake Academic Paper
  • Tensie

    Featured by

    featuredfeaturedfeaturedfeatured