New LangChain & Vector DBs course. Enroll nowLangChain & Vector DBs: 60+ lessons & projects in our course. Enroll for free

  • ActiveLoop
    • Solutions

      INDUSTRIES

      • agricultureAgriculture
        agriculture_technology_agritech
      • audioAudio Processing
        audio_processing
      • roboticsAutonomous & Robotics
        autonomous_vehicles
      • biomedicalBiomedical & Healthcare
        Biomedical_Healthcare
      • multimediaMultimedia
        multimedia
      • safetySafety & Security
        safety_security

      CASE STUDIES

      • IntelinAir
      • Learn how IntelinAir generates & processes datasets from petabytes of aerial imagery at 0.5x the cost

      • Earthshot Labs
      • Learn how Earthshot increased forest inventory management speed 5x with a mobile app

      • Ubenwa
      • Learn how Ubenwa doubled ML efficiency & improved scalability for sound-based diagnostics

      ​

      • Sweep
      • Learn how Sweep powered their code generation assistant with serverless and scalable data infrastructure

      • AskRoger
      • Learn how AskRoger leveraged Retrieval Augmented Generation for their multimodal AI personal assistant

      • TinyMile
      • Enhance last mile delivery robots with 10x quicker iteration cycles & 30% lower ML model training cost

      Company
      • About
      • Learn about our company, its members, and our vision

      • Contact Us
      • Get all of your questions answered by our team

      • Careers
      • Build cool things that matter. From anywhere

      Docs
      Resources
      • blogBlog
      • Opinion pieces & technology articles

      • tutorialTutorials
      • Learn how to use Activeloop stack

      • notesRelease Notes
      • See what's new?

      • newsNews
      • Track company's major milestones

      • langchainLangChain
      • LangChain how-tos with Deep Lake Vector DB

      • glossaryGlossary
      • Top 1000 ML terms explained

      • deepDeep Lake Academic Paper
      • Read the academic paper published in CIDR 2023

      • deepDeep Lake White Paper
      • See how your company can benefit from Deep Lake

      Pricing
  • Log in
Generate image embeddings using a pre-trained CNN and store them in Hub
    • Back
      • Tutorials

    Generate image embeddings using a pre-trained CNN and store them in Hub

    Computer vision and LLM training can be... ruff. Image embeddings, a lower-dimensional representation of images, make it easier. Learn more about image embeddings and how to store them in our new paw-some article featuring the Dog Breed Images Dataset.
    • Margaux Masson-Forsythe

      Margaux Masson-Forsythe

      on Sep 20, 20218 min read

    • Upvotes: 0

    • Share:

  • Computer vision is one of the biggest challenges of Machine Learning. Humans are very good at distinguishing visual representations, but teaching this skill to a computer is no easy task.

    For instance, we can easily tell the difference between a dog or a cat. Even between different breeds of dog. A chihuahua does not look like a golden retriever, but it can be hard for a computer to learn how to distinguish between these two breeds of dog. We will leave out contemplating which one’s cuter (as it’s much, much harder to answer this question than generating image embeddings in Python using a pre-trained CNN and storing them in Activeloop Deep Lake).

    In this article, we will thus study image embeddings: what are they, how they are generated, and why they are so useful in Computer Vision. Before we start though, if you’re reading this right now, chances are you’re considering training your own Large Language Model (LLM), finetuning it or connecting an LLM to LangChain.

    1. Training A CLiP model from scratch with Deep Lake: code example.
    2. Generative AI Data Infrastructure: How to Train Large Language Models (LLMs) with Deep Lake - a practical example showing high GPU utilization with Deep Lake + Lambda Labs.
    3. LangChain & GPT-4 for Code Understanding: Twitter Algorithm
    4. Ultimate Guide to LangChain & Deep Lake: Build ChatGPT to Answer Questions on Your Financial Data
    5. How we integrated GPT-4 into our product to create Text to SQL (or TQL - Tensor Query Language in our case)

    What are image embeddings?

    An image embedding is a lower-dimensional representation of the image. In other words, it is a dense vector representation of the image which can be used for many tasks such as classification.

    A convolutional neural network (CNN) can be used to create the image embedding.

    For instance, these deep learning representations are sometimes used to create a search engine since it relies on image similarity. Indeed, to find images of one class (for example dog), we would only need to find the embedding vectors the closest to the dog image’s vector.

    A good way to find those is by calculating the cosine similarity between the embeddings. Similar images will have a high cosine similarity between embeddings.

    Dog Breed Images Dataset from Kaggle

    For this example, we will use one of my favorite datasets: the Kaggle Dog Breed Images 🐶

    Kaggle-dog-breed-images (1)

    First, we need to download this dataset:

    !export KAGGLE_USERNAME="xxxx" && export KAGGLE_KEY="xxxx" && mkdir -p data && cd data 
    && kaggle datasets download -d eward96/dog-breed-images 
    && unzip -n dog-breed-images.zip && rm dog-breed-images.zip
    

    Let’s see what is in this data folder:

    See what is in the data folder that was downloaded with the ls command in the terminal (2)

    So here we have 10 different breeds of dog: *bernese_mountain_dog, chihuahua, dachshund, jack_russell, pug, border_collie, corgi, golden_retriever, labrador, siberian_husky. *

    import glob
    data_dir = 'data'
    
    list_imgs = glob.glob(data_dir + "/**/*.jpg")
    print(f"There are {len(list_imgs)} images in the dataset {data_dir}")
    

    => There are 918 images in the dataset data.

    Here is an example of how to create a Deep Lake dataset from the dog breeds folder and store it in Deep Lake cloud.

    To create the dataset, we used the torchvision modules: datasets and transforms, along with torch.utils.data.DataLoader:

    from torchvision import datasets, transforms
    import torch
    
    # create dataloader with required transforms 
    tc = transforms.Compose([
            transforms.Resize((256, 256)),
            transforms.ToTensor()              
        ])
    
    image_datasets = datasets.ImageFolder(data_dir, transform=tc)
    dloader = torch.utils.data.DataLoader(image_datasets, batch_size=10, shuffle=False)
    
    print(len(image_datasets)) # returns 918
    

    We have a resized, and batched dataset dloader ready to be used.

    NB: Pytorch default backend for images are Pillow, and when you use ToTensor()class, PyTorch automatically converts all images into [0,1] so no need to normalize the images here.

    If we want to visualize the first image in this dataset:

    for img, label in dloader:
          print(np.transpose(img[0], (1,2,0)).shape)
          print(img[i])
          plt.imshow((img[i].detach().numpy().transpose(1, 2, 0)*255).astype(np.uint8))
          plt.show()
          i = i + 1
          break
    

    Result of visualizing the first image in this dataset within the terminal (1)

    We can see that the image was resized to 256x256 and is normalized.

    Generate image embeddings from the Dog Breed Images Dataset

    To generate the image embeddings, we will use a pre-trained model up to the last layer before classification, also called the penultimate layer.

    The first layers of a CNN (Convolutional Neural Network) extract the features of the input image, then the fully-connected layers handle the classification and return class probabilities that are then passed to a softmax loss for example that will determine which class had the highest probability score:

    Diagram explaining how the CNN (convolutional neural network) extracts the features of the input image and then returns class probabilities that are then passed to a softmax loss (1)

    In our case, we will use a pre-trained **ResNet-18** model:

    ResNet-18 model architecture—Source Almezhghwi, Khaled & Serte, Sertan. (2020). Improved Classification of White Blood Cells with the Generative Adversarial Network and Deep Convolutional Neural Network. Computational Intelligence a (1)

    It can easily be downloaded with torch.hub.load:

    # fetch pretrained model
    model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet18', pretrained=True)
    

    Now we need to select the layer we want to extract features from. If we look at the architecture of ResNet-18 again:

    Image of the result of visualizing images and their embeddings within the terminal (1)

    We can see that the last layer is the layer fc (fully-connected) where the features are being classified. We want the features before the classification part of the CNN, so, we want the first fully-connected layer which is the one before fc: the avgpool layer.

    We can select this layer using the model object:

    # Select the desired layer
    layer = model._modules.get('avgpool')
    

    Then, we use the register_forward_hook module to get the embeddings:

    def copy_embeddings(m, i, o):
        """Copy embeddings from the penultimate layer.
        """
        o = o[:, :, 0, 0].detach().numpy().tolist()
        outputs.append(o)
    
    outputs = []
    # attach hook to the penulimate layer
    _ = layer.register_forward_hook(copy_embeddings)
    

    NB: The function copy_embeddings will be called every time after forward() has computed an output and will save it in the list ouputs .

    Then, we need to model to inference mode:

    model.eval() # Inference mode
    

    Let’s use this model to generate embeddings for our dog breed images:

    # Generate image's embeddings for all images in dloader and saves 
    # them in the list outputs
    for X, y in dloader:
        _ = model(X)
    print(len(outputs)) #returns 92
    

    Since dloader is batched, we need to flatten the outputs:

    # flatten list of embeddings to remove batches
    list_embeddings = [item for sublist in outputs for item in sublist]
    
    print(len(list_embeddings)) # returns 918
    print(np.array(list_embeddings[0]).shape)) #returns (512,)
    

    As expected, the length of the new flattened list list_embeddings is equal to 918 which is the number of images we have in this dog breed dataset. Plus, the shape of the first item in the list list_embeddings is (512,) which corresponds to the shape of the output of the avgpool layer.

    Send images and image embeddings to Deep Lake

    Once the embeddings of all images are generated, we do not need to generate them again and can use them directly to perform diverse tasks such as classification, as explained previously. This is one of the reasons why embeddings in computer vision are so popular as they are very easy to re-use once generated.

    Therefore, we will send our freshly generated embeddings and their images to Activeloop Deep Lake.

    First, we need to login into our Activeloop account with this command:

    !activeloop login -u username -p password
    

    You can alternatively use a Deep Lake API token to authenticate.Then, we choose the name of the canine dataset we are about to create from the dog breed images dataset:

    hub_dogs_path = "hub://margauxmforsythe/dogs_breeds_embeddings"
    

    Now, we can send our doggie data into this dataset that will be easily accessible using the path “hub://margauxmforsythe/dogs_breeds_embeddings”. In this example, we use the “with” syntax for better performance (see more about it here):

    with deeplake.empty(hub_dogs_path) as ds:
        # Create the tensors 
        ds.create_tensor('images', htype = 'image', 
                         sample_compression = 'jpeg')
        ds.create_tensor('embeddings')
    
        # Add arbitrary metadata - Optional
        ds.info.update(description = 'Dog breeds embeddings dataset')
        ds.images.info.update(camera_type = 'SLR')
    
        # Iterate through the images and their corresponding embeddings,
        and append them to hub dataset    
        for i in tqdm(range(len(image_datasets))):
          img = image_datasets[i][0].detach().numpy().transpose(1, 2, 0)
          img = img * 255 # images are normalized
          img = img.astype(np.uint8)
    
          # Append to Deep Lake Dataset
          ds.images.append(img)  
          ds.embeddings.append(list_embeddings[i])
    

    Our dog breed — embeddings dataset is now available in Hub. Paw-some! This means we can load these images and their embeddings easily with this line:

    ds_from_hub = deeplake.dataset(hub_dogs_path)
    

    Let’s visualize some of the images and their embeddings:

    def show_image_in_ds(ds, idx=1):
        image = ds.images[idx].numpy()
        embedding = ds.embeddings[idx].numpy()
        print("Image:")
        print(image.shape)
        plt.imshow(image)
        plt.show()
        print(embedding[0:10]) # show only 10 first values of the image embedding
    
    for i in range(4):
        show_image_in_ds(ds_from_hub, i)
    

    Alternatively, you can visualize the dataset calling the following function:

    1ds_from_hub.visualize()
    2

    Showcasing four image of dogs and their embeddings from the Deep Lake dataset

    We can now easily get an image and its embedding from our Hub dataset, and start finding similar images using the similarities between embeddings! On a side note, those doggos are so beautiful, they could’ve easily been on the cover of… Vanity Fur.

    Embeddings are routinely used across industries such as AgriTech, Autonomous Vehicles & Robotics, Audio Processing & Enhancement

    Here is the link to the notebook with all the steps demonstrated in this article. If you have more questions about the notebook, feel free to ask in #community channel of team Activeloop’s Slack.

    • Previous
        • Blog
      • Data-centric AI enablers. Best data-centric MLOps tools in 2022

      • on Oct 18, 2021
    • Next
        • Blog
        • Tutorials
      • Radiology Machine Learning. Multi-Image Segmentation with TransUNet

      • on Nov 4, 2022
cubescubescubescubescubescubes
  • Subscribe to our newsletter for more articles like this
  • deep lake database

    Deep Lake. Database for AI.

    • Solutions
      AgricultureAudio ProcessingAutonomous Vehicles & RoboticsBiomedical & HealthcareMultimediaSafety & Security
    • Company
      AboutContact UsCareersPrivacy PolicyTerms & Conditions
    • Resources
      BlogDocumentationDeep Lake WhitepaperDeep Lake Academic PaperHumans in the Loop Podcast
  • Tensie

    Featured by

    featuredfeaturedfeaturedfeatured