• ActiveLoop
    • Solutions
      Industries
      • agriculture
        Agriculture
      • audio proccesing
        Audio Processing
      • autonomous_vehicles
        Autonomous & Robotics
      • biomedical_healthcare
        Biomedical & Healthcare
      • generative_ai_and_rag
        Generative AI & RAG
      • multimedia
        Multimedia
      • safety_security
        Safety & Security
      Case Studies
      Enterprises
      BayerBiomedical

      Chat with X-Rays. Bye-bye, SQL

      MatterportMultimedia

      Cut data prep time by up to 80%

      Flagship PioneeringBiomedical

      +18% more accurate RAG

      MedTechMedTech

      Fast AI search on 40M+ docs

      Generative AI
      Hercules AIMultimedia

      100x faster queries

      SweepGenAI

      Serverless DB for code assistant

      Ask RogerGenAI

      RAG for multi-modal AI assistant

      Startups
      IntelinairAgriculture

      -50% lower GPU costs & 3x faster

      EarthshotAgriculture

      5x faster with 4x less resources

      UbenwaAudio

      2x faster data preparation

      Tiny MileRobotics

      +19.5% in model accuracy

      Company
      Company
      about
      About
      Learn about our company, its members, and our vision
      Contact Us
      Contact Us
      Get all of your questions answered by our team
      Careers
      Careers
      Build cool things that matter. From anywhere
      Docs
      Resources
      Resources
      blog
      Blog
      Opinion pieces & technology articles
      langchain
      LangChain
      LangChain how-tos with Deep Lake Vector DB
      tutorials
      Tutorials
      Learn how to use Activeloop stack
      glossary
      Glossary
      Top 1000 ML terms explained
      news
      News
      Track company's major milestones
      release notes
      Release Notes
      See what's new?
      Academic Paper
      Deep Lake Academic Paper
      Read the academic paper published in CIDR 2023
      White p\Paper
      Deep Lake White Paper
      See how your company can benefit from Deep Lake
      Free GenAI CoursesSee all
      LangChain & Vector DBs in Production
      LangChain & Vector DBs in Production
      Take AI apps to production
      Train & Fine Tune LLMs
      Train & Fine Tune LLMs
      LLMs from scratch with every method
      Build RAG apps with LlamaIndex & LangChain
      Build RAG apps with LlamaIndex & LangChain
      Advanced retrieval strategies on multi-modal data
      Pricing
  • Book a Demo
Build an AI QR Code Generator with ControlNet, Stable Diffusion, and LangChain
    • Back
      • Blog
      • LangChain

    Build an AI QR Code Generator with ControlNet, Stable Diffusion, and LangChain

    Infuse Creativity into your QR Codes with Deep Lake, LangChain, Stable Diffusion and ControlNet and Create Eye-Catching Artistic Images
    • Emanuele FenocchiEmanuele Fenocc...
    • Dominik BenkDominik Benk
    28 min readon Apr 1, 2024Updated Apr 8, 2024
  • Summary

    We built a tool that can generate artistic QR codes for a specific website/url with the use of Deep Lake, LangChain, Stable Diffusion and ControlNet via AUTOMATIC1111 and ComfyUI. If you want to try the code directly from our notebook, just download it from the repository.

    Deep Lake is a Database for AI, designed to efficiently store and search large-scale AI data including audio, video or embeddings from text documents, which will also be utilized in this article. It offers unique storage optimization for deep learning applications, featuring data streaming, vector search, data versioning, and seamless integration with other popular frameworks such as LangChain. This comprehensive toolkit is designed to simplify the process of developing workflows of large language model (LLM), and in our case we will focus on its capability to summarize and answer questions from large-scale documents such as web pages.

    Stable diffusion is a recent development in the field of image synthesis, with exciting potential for reducing high computational demand. It is primarily used for text-to-image generation, but is capable of variety of other tasks such as image modification, inpainting, outpainting, upscaling and generating image-to-image conditioned on text input. Meanwhile, ControlNet is an innovative neural network architecture that is a game-changer in managing the control of these diffusion models by integrating extra conditions. These control techniques include edge and line detection, human poses, image segmentation, depth maps, image styles or simple user scribbles. By applying these techniques, it is then possible to condition our output image with QR codes as well. In case you would be interested in more details, we recommend reading the original ControlNet article.

    By combining all of this, we can achieve a scalable generation of QR codes that are very unique and more likely will attract attention. These are the steps that we are going to walk you through:

    Steps

    1. Scraping the Content From a Website and Splitting It Into Documents
    2. Saving the Documents Along With Their Embeddings to Deep Lake
    3. Extracting the Most Relevant Documents
    4. Creating Prompts to Generate an Image Based on Documents
      1. Custom summary prompt + LLMChain
      2. QA retrieval + LLM
    5. Summarizing the Created Prompts
    6. Generating Simple QR From URL and inserting custom logo
    7. Generating Artistic QR Codes for Activeloop
      1. Txt2Img
        1. Content prompt
        2. Deep Lake prompt
      2. Img2Img with logo
        1. Content prompt
        2. Deep Lake prompt
    8. Generating Artistic QR Codes for E-commerce
      1. Img2Img with logo - Tommy Hilfiger
      2. Img2Img with logo - Patagonia
    9. Hands On with ComfyUI
    10. Limitations of Our Approach
    11. Conclusion
    12. FAQs
    1# Install dependencies
    2!pip install langchain deeplake openai qrcode apify_client tiktoken  langchain-openai python-dotenv pydantic==1.10.8
    3!apt install libzbar0
    4!pip install qreader opencv-python 
    5
    1import cv2
    2import numpy as np
    3import sys
    4import time
    5
    6from langchain_community.vectorstores import DeepLake
    7from langchain_openai import OpenAIEmbeddings
    8from langchain.utilities import ApifyWrapper
    9from langchain.text_splitter import CharacterTextSplitter
    10#from langchain.document_loaders.base import Document
    11from langchain.docstore.document import Document
    12from langchain.chains import RetrievalQA
    13from langchain_openai import OpenAI
    14from langchain.chains import LLMChain
    15from langchain import PromptTemplate
    16import os
    17from dotenv import load_dotenv
    18
    19load_dotenv()
    20
    21# Set API tokens
    22os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY')
    23os.environ['ACTIVELOOP_TOKEN'] = os.getenv('ACTIVELOOP_TOKEN')
    24os.environ["APIFY_API_TOKEN"] = os.getenv('APIFY_API_TOKEN')
    25

    Step 1: Scraping the Content From a Website and Splitting It Into Documents

    First of all, we need to collect data that will be used as a content used to generate QR codes. Since the goal is to personalize it to a specific website, we provide a simple pipeline that can crawl data from a given URL. As an example, we use https://www.activeloop.ai/ from which we scraped 20 pages, but you could use any other website as long as it does not violate the Terms of Use. Or, if you wish to use other type of content, LangChain provide many other File loaders and Website loaders and you can personalize QR codes for them too!

    1# We use crawler from ApifyWrapper(), which is available in Langchain
    2# For convenience, we set 20 maximum pages to crawl with a timeout of 300 seconds.
    3apify = ApifyWrapper()
    4loader = apify.call_actor(
    5    actor_id="apify/website-content-crawler",
    6    run_input={"startUrls": [{"url": "https://www.activeloop.ai/"}], "maxCrawlPages": 20},
    7    dataset_mapping_function=lambda item: Document(
    8        page_content=item["text"] or "", metadata={"source": item["url"]}
    9    ),
    10    timeout_secs=300,
    11)
    12
    13# Now the pages are loaded and split into chunks with a maximum size of 1000 tokens
    14pages = loader.load()
    15text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0, separator = ".")
    16docs = text_splitter.split_documents(pages)
    17docs
    18

    Step 2: Saving the Documents Along With Their Embeddings to Deep Lake

    Once the website is scraped and pages are split into documents, it’s time to generate the embeddings and save them to the Deep Lake. This means that we can come back to our previously scraped data at any time and don’t need to recalculate the embeddings again. To do that, you need to set your ACTIVELOOP_ORGANIZATION_ID.

    1activeloop_org = "YOUR_ACTIVELOOP_ORG_ID"
    2# initialize the embedding model
    3embeddings = OpenAIEmbeddings()
    4
    5# initialize the database, can also be used to load the database
    6db = DeepLake(
    7    dataset_path=f"hub://{activeloop_org}/scraped-websites",
    8    embedding=embeddings,
    9    overwrite=False,
    10)
    11
    12# save the documents
    13db.add_documents(docs)
    14

    Step 3: Extracting the Most Relevant Documents

    Since we want to generate an image in the context of the given website that can have hundreds of pages, it is useful to filter documents that are the most relevant for our query, in order to save money on chained API calls to LLM. For this, we are going to leverage Deep Lake Vector Store similarity search as well as retrieval functionality.

    To pre-filter the documents based on a query, choose the query that works best for you based on the type of information you have scraped from the internet

    1query_for_company = 'Business core of the company'
    2
    3result = db.similarity_search(query_for_company, k=10)
    4result
    5

    For question-answering pipeline, we can then define the retriever

    1retriever = db.as_retriever(
    2    search_kwargs={"k":10}
    3)
    4

    Step 4: Creating Prompts to Generate an Image Based on Documents

    The goal is to understand the content and generate prompts in an automated way, so that the process can be scalable. We start by initializing the LLM with a default gpt-3.5-turbo-instruct model and set medium temperature to introduce some randomness.

    1# Initialize LLM
    2llm = OpenAI(temperature=0.5)
    3

    One of many advantages of LangChain are also prompt templates, which significantly help with clarity and readability. To make the output description more precise, we should also provide examples as can be seen here.

    1query = "You are a prompt generator. Based on the content, write a detailed one sentence description that can be used to generate an image"
    2
    3prompt_template = """{query}:
    4
    5Content: {text}
    6"""
    7
    8# set the prompt template
    9PROMPT = PromptTemplate(
    10    template=prompt_template,
    11    input_variables=["text"],
    12    partial_variables={"query": query}
    13)
    14

    The query is used to indicate some information we expect to output, text is the content provided to LLM, whereby it is supposed to provide a detailed description of the image. Additionally, to have more control over the output, we also create an alternative prompt that can generate a specific image type.

    Using this, we then experimented with 2 following approaches, that differ in what kind of text is provided.

    Option 1: Custom Summary Prompt with LLMChain

    The idea is simple, we chain the description prompt on each filtered document and then apply it once again on the summarized descriptions. In other words, text will be a variable that is iterated during LMMChain operation.

    1# Initialize the chain
    2chain = LLMChain(llm=llm, prompt=PROMPT)
    3
    4# Filter the most relevant documents
    5result = db.similarity_search(query_for_company, k=10)
    6# Run the Chain
    7image_prompt = chain.invoke(result)
    8image_prompt = image_prompt["text"]
    9image_prompt
    10

    Option 2: Retrieval Question-Answering with LLM

    Here we initialize QA retriever, which will allow us to ask to explain a particular concept on the filtered documents.

    1qa = RetrievalQA.from_chain_type(
    2    llm=llm,
    3    chain_type='stuff',
    4    retriever=retriever
    5)
    6
    7chain_answer = qa.invoke("Explain what is Deep Lake")
    8chain_answer
    9

    The answer is then used as text in the PromptTemplate without the need for any chain.

    1answer = llm(prompt=PROMPT.format(text=answer))
    2answer
    3

    Step 5: Summarizing the Created Prompts

    We experimented with different prompt setups in the previous section, and yet there is more to explore. In case you would be interested in perfectionizing your LLM prompts even further, we have an amazing course that will provide you many useful tips and tricks. Mastering prompts for image generation is, however, more art than science. Nevertheless, by providing the LLM with examples we can see that it can do a pretty good job by generating very specific image descriptions. Here are 3 different types of prompts that we were able to generate with our approach:

    1. Content Prompt

    This prompt summarizes all relevant documents scraped from Activeloop into a general but detailed image description: `high-tech, futuristic, AI-driven, advanced, complex, computer-generated, robot, machine learning, data visualization, interactive, cutting-edge technology, automation, precision, efficiency, innovation, digital transformation, smart technology, science fiction-inspired.

    2. Deep Lake Prompt

    Here we show a Question-Answering example with a detailed image description of Deep Lake: `An aerial view of a serene, glassy lake surrounded by trees and mountains, with giant blocks of data floating on the surface, each block representing a different data type such as images, videos, audio, and tabular data, all stored as tensors, while a team of data scientists in a nearby cabin focus on their work to build advanced deep learning models, powered by GPUs that are seamlessly integrated with Deep Lake.

    Step 6: Generating Simple QR From URL and Inserting Custom Logo

    Before we generate the art, it is important to prepare the simple QR code for ControlNet, which can be created directly from Python code. It is important to set the error correction level to 'H’, which increases the probability of QR being readable, as 30% of the code can be covered/destroyed by an image. To generate a QR code with a logo, we created a function that takes the logo image and places it on the previously generated QR code. It is also important to note, that some of the URLs might be too long to generate a QR that is not too complicated and reliable enough for scanning. For this purpose, we can use url shorteners such as bit.ly.

    1import qrcode
    2from PIL import Image
    3
    4def create_qrcode(url:str):
    5    QRcode = qrcode.QRCode(
    6        error_correction=qrcode.constants.ERROR_CORRECT_H
    7    )
    8
    9    # taking url or text
    10    url = 'https://www.activeloop.com/'
    11
    12    # adding URL or text to QRcode
    13    QRcode.add_data(url)
    14
    15    # adding color to QR code
    16    QRimg = QRcode.make_image(
    17        back_color="white").convert('RGB')
    18    return QRimg
    19
    20def qr_with_logo(logo_path: str, QRimg: Image.Image, output_image_name: str):
    21    logo = Image.open(logo_path)
    22
    23    # taking base width
    24    basewidth = 100
    25
    26    # adjust image size
    27    wpercent = (basewidth/float(logo.size[0]))
    28    hsize = int((float(logo.size[1])*float(wpercent)))
    29    logo = logo.resize((basewidth, hsize))
    30
    31    # set size of QR code
    32    pos = ((QRimg.size[0] - logo.size[0]) // 2,
    33        (QRimg.size[1] - logo.size[1]) // 2)
    34    QRimg.paste(logo, pos)
    35
    36    # save the QR code generated
    37    QRimg.save(output_image_name)
    38    return QRimg
    39

    Step 7: Generating Artistic QR Codes for Activeloop

    First of all, we need to keep in mind that it is still very fresh and unexplored topic and the more pleasing-looking QRs you want to generate, the higher risk of not being readable by a scanner. This results in an endless cycle of adjusting parameters to find the most general setup. Many approaches can be applied, but their main difference is in ControlNet units. The highest success we had was with brightness and tile preprocessors, as well as the qrcode preprocessor. Sometimes, adding a depth preprocessor was also helpful. A great guide on how to set up the Stable-diffusion webui with ControlNet extension to generate your first QR codes can be found for example here. Nevertheless, there is no single setup that would work 100% of the time and a lot of experimenting is needed, especially in terms of finetuning the control’s strength/start/end to achieve a desirable output.

    For example, in most of the QR codes we used the following setup:

    • Negative prompt: ugly, disfigured, low quality, blurry, nsfw
    • Steps: 20
    • Sampler: DPM++ 2M Karras
    • CFG scale: 9
    • Size: 768x768
    • Model: dreamshaper_631BakedVae
    • ControlNet
      • 0: preprocessor: none, model: control_v1p_sd15_qrcode, weight: 1.1, starting/ending: (0, 1), resize mode: Crop and Resize, pixel perfect: False, control mode: Balanced
      • 1: preprocessor: none, model: control_v1p_sd15_brightness, weight: 0.3, starting/ending: (0, 1), resize mode: Crop and Resize, pixel perfect: False, control mode: Balanced

    In case of Img2Img, we would also need to put an inpaint mask to disable any changes to the logo.

    Download and Run AUTOMATIC1111

    In order to play with this interface you have to clone the official repository in you workspace and double click the webui-user.bat file.

    1!git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
    2

    Integrate ControlNet in AUTOMATIC1111

    If you want to install ControlNet you must follow the istructions given in the official repository:

    • Open “Extensions” tab.
    • Open “Install from URL” tab in the tab.
    • Enter https://github.com/Mikubill/sd-webui-controlnet.git to "URL for extension’s git repository".
    • Press “Install” button.
    • Wait for 5 seconds, and you will see the message "Installed into stable-diffusion-webui\extensions\sd-webui-controlnet. Use Installed tab to restart".
    • Go to “Installed” tab, click "Check for updates", and then click "Apply and restart UI". (The next time you can also use these buttons to update ControlNet.)
    • Completely restart A1111 webui including your terminal. (If you do not know what is a "terminal", you can reboot your computer to achieve the same effect.)
    • Download models (see below).
    • After you put models in the correct folder, you may need to refresh to see the models. The refresh button is right to your “Model” dropdown.

    If you visit the official page of AUTOMATIC1111, you can see the interface from which you can select the settings you prefer.

    In the above part you can choose what type of task to perform txt2img, img2img and so on, in the middle part of the interface there are settings you can change. If you install the controlnet plugin you will be able to find all the configurations that will allow you to control how much the impact of ControlNet will weigh on the final result.

    qr_code_controlnet_interface_1
    qr_code_controlnet_interface_2

    Txt2Img - Generating QR Code From a Simple QR and Previously Created Prompt

    Content prompt
    qrcode cyborg 1qrcode cyborg 2
    qrcode cyborg 3qrcode cyborg 4
    Deep Lake prompt
    qrcode deeplake 1qrcode deeplake2
    qrcode deeplake3qrcode deeplake4

    Img2Img with logo - Generating QR Code From a QR with Logo and Previously Created Prompt

    Content prompt
    qrcode digital 1qrcode digital 2
    qrcode digital 3qrcode digital 4
    Deep Lake prompt
    qrcode nature 1qrcode nature 2
    qrcode nature 3qrcode nature 4

    Step 8: Generating Artistic QR Codes for E-commerce

    The idea here is a little different compared to the previous examples in context of Activeloop.
    Now, we focus on product advertising and we want to generate a QR code only for a single URL and its product. The challenge is to generate QR code, while also keeping the product as similar to the original as possible to avoid misleading information. To do this, we experimented with many preprocessors such as the tile, depth, reference_only, lineart or styles, but we found most of them too unreliable and far from being similar to the original input. At this moment, we believe that the most useful is the tile preprocessor, which can preserve a lot of information. The disadvantage is, however, that it does not allow for many changes during control phase and the QR fit can sometimes be questionable. In practice, this would be done by adding another CotntrolNet unit:

    • 2: preprocessor: none, model: control_v11f1e_sd15_tile, weight: 1.0, starting/ending: (0, 1), resize mode: Crop and Resize, pixel perfect: False, control mode: Balanced
      Since the tile input image control is very strong, theres not much else we can do. Styles are one of the little extra adjustments possible and very useful style cheat sheet can be found here. For our purposes, however, we did not end up utilizing any of them.

    Similarly as before, we generated prompts automaticaly from the given websites. We randomly selected 2 products and in the first case (Tommy Hilfiger) We added logo to the initial basic QR code while in the second case (Patagonia), we only mask the logo that is already present on the product. To see the comparison, we also provide the original input images (Sources: Patagonia, Tommy Hilfiger).

    Img2Img with Logo - Generating Tommy Hilfiger QR Code

    qrcode human 8qrcode human 7
    qrcode human 6qrcode human 5

    Img2Img with logo - Generating Patagonia QR Code

    qrcode human 4qrcode human 3
    qrcode human 2qrcode human 1

    Step 9: Hands On with ComfyUI

    A different approach from the one listed above can be obtained by taking advantage of ComfyUI which is a powerful and modular stable diffusion GUI.
    The main idea is to create a schema from the proposed GUI and transform this schema into code thanks to an extension called ComfyUI-to-Python-Extension.
    We need to load the Stable Diffusion and the ControlNet checkpoints we want to use.

    In our case we experimented with:

    • Diffusion models: v1-5-pruned-emaonly, dreamshaper_8 and revAnimated_v122EOL
    • ControlNet models: control_v1p_sd15_brightness, control_v1p_sd15_qrcode, control_v11f1e_sd15_tile and control_v11f1p_sd15_depth.

    Install ComfyUI

    1!git clone https://github.com/comfyanonymous/ComfyUI.git
    2%cd ComfyUI
    3!pip install -r requirements.txt
    4

    Add the ComfyUI plugin to transform the schema into code

    1!git clone https://github.com/pydn/ComfyUI-to-Python-Extension.git
    2%cd ComfyUI-to-Python-Extension
    3!pip install -r requirements.txt
    4

    Download all the checkpoints in the right folder

    1!wget https://github.com/efenocchi/QRCodeGenerator/blob/main/3_workflow_qr_codes2.json https://github.com/efenocchi/QRCodeGenerator/blob/main/2_workflow_qr_codes.json -P /content/ComfyUI/ComfyUI-to-Python-Extension
    2!wget https://github.com/efenocchi/QRCodeGenerator/blob/main/workflow_api.py -P /content/ComfyUI/ComfyUI-to-Python-Extension
    3!wget https://huggingface.co/stabilityai/sd-vae-ft-mse-original/resolve/main/vae-ft-mse-840000-ema-pruned.safetensors -P /content/ComfyUI/models/vae
    4!wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11f1p_sd15_depth.pth -P /content/ComfyUI/models/controlnet
    5!wget https://huggingface.co/latentcat/latentcat-controlnet/resolve/main/models/control_v1p_sd15_brightness.safetensors -P /content/ComfyUI/models/controlnet
    6!wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11f1e_sd15_tile.pth -P /content/ComfyUI/models/controlnet
    7!wget https://huggingface.co/autismanon/modeldump/resolve/main/dreamshaper_8.safetensors -P /content/ComfyUI/models/checkpoints
    8!wget https://civitai.com/api/download/models/46846?type=Model&format=SafeTensor&size=full&fp=fp32 -P /content/ComfyUI/models/checkpoints
    9

    If you have problems downloading the model from CivitAI try downloading it manually after logging in or download it directly from Hugging Face, in this last case remember to pay attention to choosing the right name when loading the model in the following steps.

    1# !wget https://huggingface.co/emmajoanne/models/resolve/main/revAnimated_v122.safetensors -P /content/ComfyUI/models/checkpoints
    2

    Run the Comfyui GUI.

    If you encounter problems, run this command from the terminal. For advice and error resolution, refer to the official repository.

    Since this is not a ComfyUI guide and some things may not be clear, if you want to know more, consult the official repository or some free guides like this one.

    This command has been put for informational purposes only, to proceed with image generation you will not need to create the scheme from scratch with ComfyUI but directly execute the cells shown below.

    1#!python main.py
    2

    Below will be shown one of the schemes used, it is composed by 1 Diffusion model called v1-5-pruned-emaonly and 3 different controlnet models control_v1p_sd15_brightness and control_v11f1e_sd15_tile.

    schema ControlNet ComfyUI

    As illustrated, the QR Code was generated by combining the basic QR Code image with a textual input, making it easy to merge the initial image with the generated one. In this case the positive prompt was simply “a cyborg character” and the negative one "ugly, artefacts, bad". This pipeline produced the images shown below:

    qrcode detail1qrcode detail3
    qrcode detail2qrcode detail4

    To export the schema and transform it to python code you must follow some different step:

    1. Enable Dev mode Options: you need to click on the settings button (located above to the Queue Prompt text in the window that will appear when you activate ComfyUI) and select the “Enable Dev mode Options” box.
    2. Export the schema via the button "Save (API Format)"
    3. Put this schema in the ComfyUI-to-Python-Extension folder
    4. Run the python file comfyui_to_python.py

    As with the previous command, this explanation has also been made for informational purposes only, to proceed with image generation you will not need to create the scheme from scratch with ComfyUI and convert it into python code because this step has already been done by me.

    The following functions are used to create the QR code from a text and to load a logo in the center of it:

    Go to the main ComfyUI folder:

    1# if you are in Colab you can simply run
    2# % cd /content/ComfyUI
    3%cd ..
    4!ls
    5
    6img = create_qrcode('https://www.activeloop.com/')
    7img.save("activeloop_qr.jpg")
    8img_with_logo = qr_with_logo("activeloop_logo.jpg", img, "activeloop_qr_with_logo.jpg")
    9img_with_logo
    10
    11

    Set the correct path to be able to work with ComfyUI

    1import os
    2import random
    3import sys
    4from typing import Sequence, Mapping, Any, Union
    5import torch
    6
    7def get_value_at_index(obj: Union[Sequence, Mapping], index: int) -> Any:
    8    """Returns the value at the given index of a sequence or mapping.
    9    If the object is a sequence (like list or string), returns the value at the given index.
    10    If the object is a mapping (like a dictionary), returns the value at the index-th key.
    11
    12    Some return a dictionary, in these cases, we look for the "results" key
    13    """
    14    try:
    15        return obj[index]
    16    except KeyError:
    17        return obj["result"][index]
    18
    19def find_path(name: str, path: str = None) -> str:
    20    """
    21    Recursively looks at parent folders starting from the given path until it finds the given name.
    22    Returns the path as a Path object if found, or None otherwise.
    23    """
    24    # If no path is given, use the current working directory
    25    if path is None:
    26        path = os.getcwd()
    27
    28    # Check if the current directory contains the name
    29    if name in os.listdir(path):
    30        path_name = os.path.join(path, name)
    31        print(f"{name} found: {path_name}")
    32        return path_name
    33
    34    # Get the parent directory
    35    parent_directory = os.path.dirname(path)
    36
    37    # If the parent directory is the same as the current directory, we've reached the root and stop the search
    38    if parent_directory == path:
    39        return None
    40
    41    # Recursively call the function with the parent directory
    42    return find_path(name, parent_directory)
    43
    44def add_comfyui_directory_to_sys_path() -> None:
    45    """
    46    Add 'ComfyUI' to the sys.path
    47    """
    48    comfyui_path = find_path("ComfyUI")
    49    if comfyui_path is not None and os.path.isdir(comfyui_path):
    50        sys.path.append(comfyui_path)
    51        print(f"'{comfyui_path}' added to sys.path")
    52
    53add_comfyui_directory_to_sys_path()
    54

    Upload all the models that you will need during the generation phase, in this case we will need a model to generate the image starting from the text and another 2-3 models to integrate the generated image into the image of our QR code

    1from nodes import (
    2    KSampler,
    3    CLIPTextEncode,
    4    ControlNetApplyAdvanced,
    5    VAEDecode,
    6    CheckpointLoaderSimple,
    7    LoadImage,
    8    ControlNetLoader,
    9    NODE_CLASS_MAPPINGS,
    10    EmptyLatentImage,
    11    VAELoader,
    12    SaveImage,
    13)
    14controlnetapplyadvanced = ControlNetApplyAdvanced()
    15ksampler = KSampler()
    16vaedecode = VAEDecode()
    17saveimage = SaveImage()
    18vaeloader = VAELoader()
    19
    20def load_checkpoints(diffuser_model:str = "dreamshaper_8.safetensors", controlnet_1:str = "control_v11f1e_sd15_tile.pth", controlnet_2: str = "control_v1p_sd15_brightness.safetensors", controlnet_3: str = None):
    21    with torch.inference_mode():
    22
    23        checkpointloadersimple = CheckpointLoaderSimple()
    24        checkpointloadersimple_4 = checkpointloadersimple.load_checkpoint(
    25            ckpt_name=diffuser_model
    26        )
    27        emptylatentimage = EmptyLatentImage()
    28        emptylatentimage_5 = emptylatentimage.generate(
    29            width=768, height=768, batch_size=4
    30        )
    31        controlnetloader = ControlNetLoader()
    32        controlnetloader_10 = controlnetloader.load_controlnet(
    33                control_net_name=controlnet_1
    34        )
    35
    36        controlnetloader_11 = controlnetloader.load_controlnet(
    37            control_net_name=controlnet_2
    38        )
    39        controlnetloader_12 = None
    40        if controlnet_3 is not None:
    41            controlnetloader_12 = controlnetloader.load_controlnet(
    42                control_net_name=controlnet_3
    43            )
    44        vaeloader_24 = vaeloader.load_vae(
    45            vae_name="vae-ft-mse-840000-ema-pruned.safetensors"
    46        )
    47        return checkpointloadersimple_4, emptylatentimage_5, controlnetloader_10, controlnetloader_11, controlnetloader_12, vaeloader_24
    48

    Choose which models to use, pass the same name as the models in the “checkpoints” and “controlnet” folders. You can decide whether to use two controlnets or three, passing the filenames for controlnet_1 and controlnet_2 or controlnet_1, controlnet_2 and controlnet_3. By default only two are used.

    1checkpointloadersimple_4, emptylatentimage_5, controlnetloader_10, controlnetloader_11, controlnetloader_12, vaeloader_24 = load_checkpoints()
    2

    Load the CLIP templates which, given the texts, will allow you to choose which image to generate and which effects to avoid

    1
    2def load_text_and_image(prompt_text:str = None, prompt_text_negative:str = None, input_image_path: str = None):
    3  torch.device("cuda" if torch.cuda.is_available() else "cpu")
    4
    5  with torch.inference_mode():
    6
    7        cliptextencode = CLIPTextEncode()
    8        cliptextencode_6 = cliptextencode.encode(
    9            text=prompt_text, clip=get_value_at_index(checkpointloadersimple_4, 1)
    10        )
    11
    12        cliptextencode_7 = cliptextencode.encode(
    13            text=prompt_text_negative,
    14            clip=get_value_at_index(checkpointloadersimple_4, 1),
    15        )
    16
    17        loadimage = LoadImage()
    18        loadimage_14 = loadimage.load_image(image=input_image_path)
    19
    20        return cliptextencode_6, cliptextencode_7, loadimage_14
    21

    This is the heart of the generation and allows you to choose the values ​​to attribute to the different controlnets, if you want to play with the ou to choose theparameters and see how the result varies as they vary, you can go to the custom values ​​variable or modify the start_percent and end_percent values ​​directly in the models.

    • strength: strength of controlnet; 1.0 is full strength, 0.0 is no effect at all.
    • start_percent: sampling step percentage at which controlnet should start to be applied - no matter what start_percent is set on timestep keyframes, they won’t take effect until this start_percent is reached.
    • stop_percent: sampling step percentage at which controlnet should stop being applied - no matter what start_percent is set on timestep keyframes, they won’t take effect once this end_percent is reached.
    1def run_inference(cliptextencode_6, cliptextencode_7, loadimage_14, tile_controlnet_values: float=0.5, brightness_controlnet_values: float = 0.35, depth_controlnet_values: float = 1.0, number_of_examples=3):
    2  with torch.inference_mode():
    3          for _ in range(number_of_examples):
    4            controlnetapplyadvanced_28 = controlnetapplyadvanced.apply_controlnet(
    5                strength=tile_controlnet_values,
    6                start_percent=0.35,
    7                end_percent=0.6,
    8                positive=get_value_at_index(cliptextencode_6, 0),
    9                negative=get_value_at_index(cliptextencode_7, 0),
    10                control_net=get_value_at_index(controlnetloader_10, 0),
    11                image=get_value_at_index(loadimage_14, 0),
    12            )
    13
    14            controlnetapplyadvanced_27 = controlnetapplyadvanced.apply_controlnet(
    15                strength=brightness_controlnet_values,
    16                start_percent=0,
    17                end_percent=1,
    18                positive=get_value_at_index(controlnetapplyadvanced_28, 0),
    19                negative=get_value_at_index(controlnetapplyadvanced_28, 1),
    20                control_net=get_value_at_index(controlnetloader_11, 0),
    21                image=get_value_at_index(loadimage_14, 0),
    22            )
    23            controlnetapplyadvanced_26 = None
    24            if controlnetloader_12 is not None:
    25                controlnetapplyadvanced_26 = controlnetapplyadvanced.apply_controlnet(
    26                    strength=depth_controlnet_values,
    27                    start_percent=0,
    28                    end_percent=0.2,
    29                    positive=get_value_at_index(controlnetapplyadvanced_27, 0),
    30                    negative=get_value_at_index(controlnetapplyadvanced_27, 1),
    31                    control_net=get_value_at_index(controlnetloader_12, 0),
    32                    image=get_value_at_index(loadimage_14, 0),
    33                )
    34
    35          if controlnetapplyadvanced_26 is not None:
    36            ksampler_17 = ksampler.sample(
    37                seed=random.randint(1, 2**64),
    38                steps=20,
    39                cfg=8,
    40                sampler_name="euler",
    41                scheduler="normal",
    42                denoise=1,
    43                model=get_value_at_index(checkpointloadersimple_4, 0),
    44                positive=get_value_at_index(controlnetapplyadvanced_26, 0),
    45                negative=get_value_at_index(controlnetapplyadvanced_26, 1),
    46                latent_image=get_value_at_index(emptylatentimage_5, 0),
    47            )
    48          else:
    49            ksampler_17 = ksampler.sample(
    50                seed=random.randint(1, 2**64),
    51                steps=20,
    52                cfg=8,
    53                sampler_name="euler",
    54                scheduler="normal",
    55                denoise=1,
    56                model=get_value_at_index(checkpointloadersimple_4, 0),
    57                positive=get_value_at_index(controlnetapplyadvanced_27, 0),
    58                negative=get_value_at_index(controlnetapplyadvanced_27, 1),
    59                latent_image=get_value_at_index(emptylatentimage_5, 0),
    60            )
    61
    62            vaedecode_18 = vaedecode.decode(
    63                samples=get_value_at_index(ksampler_17, 0),
    64                vae=get_value_at_index(vaeloader_24, 0),
    65            )
    66
    67            saveimage_29 = saveimage.save_images(
    68                filename_prefix="ComfyUI", images=get_value_at_index(vaedecode_18, 0)
    69            )
    70
    1%cd ComfyUI
    2# IF YOU ARE IN COLAB COULD BE QUICKER TO USE THE FOLLOWING PATH
    3#OUTPUT_DIR = "/content/ComfyUI/output/"
    4#OUTPUT_DIR_NR = "/content/ComfyUI/output/non_readable/"
    5
    6# IF YOU ARE NOT IN COLAB
    7OUTPUT_DIR = "output/"
    8OUTPUT_DIR_NR = "output/non_readable/"
    9
    10os.makedirs(OUTPUT_DIR_NR, exist_ok = True)
    11

    Choose the prompt from the first step based on your chosen company, or write a prompt from scratch

    1print(image_prompt)
    2print(chain_answer)
    3print(answer)
    4
    1prompt_text = "a Deep Lake in the forest"
    2prompt_text_negative = "ugly, bad, artifacts"
    3# must be in the ComfyUI/input/ folder
    4input_image_path = "activeloop_qr.jpg"
    5
    6cliptextencode_6, cliptextencode_7, loadimage_14 = load_text_and_image(prompt_text, prompt_text_negative, input_image_path)
    7

    Run the inference function and check the generated images in the output folder

    1run_inference(cliptextencode_6, cliptextencode_7, loadimage_14, tile_controlnet_values=0.5, brightness_controlnet_values=0.35, depth_controlnet_values=1.0, number_of_examples=1)
    2

    Depending on the CheckpointLoaderSimple model loaded (see load_checkpoints function), you will get different results with the same prompt.

    Here, for example, we have done different tests with different models and we can see that with the same prompt some models generate images rich in details while others generate a QR code that is little different from the original. This makes us understand that the prompts we write are bound to the chosen model.

    With v1-5-pruned-emaonly.safetensors

    qrcode v1.5 1qrcode v1.5 4
    qr code ComfyUI 00095qr code ComfyUI 00092

    With dreamshaper_8.safetensors

    qrcode dreamshaper 1qrcode dreamshaper 4
    qrcode dreamshaper deeplake1qrcode dreamshaper deeplake2

    With revAnimated_v122EOL.safetensors

    qr code revAnimated 1qr code revAnimated 4
    qr code revAnimated 3qr code revAnimated 2

    Keep only truly scannable QR codes

    1import os
    2from qreader import QReader
    3import cv2
    4import shutil
    5from PIL import Image
    6qreader = QReader()
    7
    8def keep_readable_qrcodes(folder):
    9    images = []
    10    for filename in os.listdir(folder):
    11        path = folder + filename
    12        img = cv2.imread(folder + filename)
    13
    14        if img is not None:
    15          decoded_text = qreader.detect_and_decode(image=img)
    16          if decoded_text:
    17            print(decoded_text)
    18          else:
    19            print(f"non readable: {filename}")
    20            shutil.move(path, OUTPUT_DIR_NR + filename)
    21
    22    return images
    23
    1#images = keep_readable_qrcodes("/content/ComfyUI/output")
    2keep_readable_qrcodes(OUTPUT_DIR)
    3

    Apply logo to generated images to make them more intriguing

    1generated_image = Image.open("<YOUR_GENERATED_IMAGE_PATH>")
    2img_with_logo = qr_with_logo("activeloop_logo.jpg", generated_image, "generated_image_with_logo.jpg")
    3img_with_logo
    4

    Limitations of Our Approach

    • Overall, the ControlNet model required extensive manual tuning of parameters. There are many methods to control the QR code generation process, but none are entirely reliable. The problem intensifies when you want to account for the input product image as well.

    • Adding an image to the input might offer more control and bring about various use-cases, but it significantly restricts the possibilities of stable diffusion. This usually only results in changes to the image’s style without fitting much of the QR structure. Moreover, we saw greater success with text-to-image compared to image-to-image with logo masks. However, the former wasn’t as desirable because we believe logos are essential in product QR codes.

    • From our examples, it’s evident that the generated products don’t exactly match the actual products one-to-one. If the goal is to advertise a specific product, even a minor mismatch could be misleading. Nonetheless, we believe that LoRA models or a different type of preprocessor model could address these issues.

    • Automated image prompts can sometimes be confusing, drawing focus to unimportant details within the context. This is particularly problematic if we don’t have enough relevant textual information to build upon. This presents an opportunity to further use the Deep Lake Vector Store to analyze the image bind embeddings for a better understanding of the content on e-commerce websites.

    Conclusion: Scalable Prompt Generation Achieved, QR Code Generation Remains Unreliable

    Deep Lake combined with LangChain can significantly reduce the costs of analyzing the contents of a website to provide image descriptions in a scalable way. Thanks to the Deep Lake Vector Store, we can save a large number of documents and images along with their embeddings. This allows us to iteratively adjust the image prompts and efficiently filter based on embedding similarities. Taking into account all of the limitations we’ve discussed, we believe that there needs to be more experimenting with ControlNet, in order to generated product QR codes that are reliable and applicable for real-world businesses. The choice of checkpoint remains very important and must be made based on the specific use case for which it is being generated.

    I hope that you find this useful and already have many ideas on how to further build on this. Thank you for reading and I wish you a great day and see you in the next one.

    FAQs

    What is prompt engineering?

    Prompt engineering is the practice of carefully crafting inputs (prompts) to be given to AI models, particularly language models, in order to elicit the desired output. It involves understanding how the model interprets inputs and using that knowledge to achieve more accurate, relevant, or creative responses.

    What is Stable Diffusion?

    Stable Diffusion refers to a specific generative artificial intelligence model designed for text-to-image synthesis. This model has the capability to generate photorealistic images based on textual input. It empowers users to create stunning artwork quickly and autonomously. Additionally, besides images, Stable Diffusion can also be used for image-to-image generation or to create videos and animations. It was originally launched in 2022.

    Is Stable Diffusion free to use?

    Stable Diffusion is an open-source project, which means it is freely available for anyone to use. You can access and use Stable Diffusion without any cost, subject to the terms of its open-source license.

    What is ControlNet?

    ControlNet is an innovative neural network architecture that integrates extra conditions to manage the control of diffusion models. These techniques include edge and line detection, human poses, image segmentation, depth maps, image styles, or simple user scribbles, allowing for conditioned output images.

    What is AUTOMATIC1111?

    AUTOMATIC1111 is a robust web-based user interface (WebUI) tailored for Stable Diffusion, an AI model for text-to-image generation. It provides an intuitive platform for creating remarkable images from textual prompts.

    How to use ComfyUI?

    ComfyUI is a powerful and modular stable diffusion GUI. To use it, you need to clone the ComfyUI repository, install its requirements, and run the main.py file. It allows for the creation of schemas from the GUI and transforms these schemas into code for image generation tasks.

    What is QR code and how it works?

    A QR code, or Quick Response code, is a two-dimensional barcode that stores data. It works by encoding information in black squares arranged on a white grid. When scanned by a QR code reader or smartphone camera, the encoded data is decoded and can trigger actions such as opening a website or displaying text.

    How to make artistic QR codes?

    Making artistic QR codes involves incorporating design elements, colors, and sometimes logos into the QR code without compromising its scan-ability. This can be achieved through specialized software or online tools that allow for the customization of QR codes while ensuring they remain functional.

    What is image synthesis?

    Image synthesis is the process of generating new images from textual descriptions, existing images, or a combination of both, using artificial intelligence and machine learning models. It involves creating visually coherent and contextually relevant images based on the input provided.

    What are LoRA models?

    LoRA (Low-Rank Adaptation of Large Language Models) is a popular and lightweight training technique that significantly reduces the number of trainable parameters. It works by inserting a smaller number of new weights into the model and only these are trained. In the context of AI and machine learning, particularly concerning stable diffusion models, LoRA models are mentioned as potentially useful for addressing issues in generating product QR codes that match actual products more closely.

    Share:

    • Table of Contents
    • Summary
    • Steps
    • Step 1: Scraping the Content From a Website and Splitting It Into Documents
    • Step 2: Saving the Documents Along With Their Embeddings to Deep Lake
    • Step 3: Extracting the Most Relevant Documents
    • Step 4: Creating Prompts to Generate an Image Based on Documents
    • Option 1: Custom Summary Prompt with LLMChain
    • Option 2: Retrieval Question-Answering with LLM
    • Step 5: Summarizing the Created Prompts
    • 1. Content Prompt
    • 2. Deep Lake Prompt
    • Step 6: Generating Simple QR From URL and Inserting Custom Logo
    • Step 7: Generating Artistic QR Codes for Activeloop
    • Download and Run AUTOMATIC1111
    • Integrate ControlNet in AUTOMATIC1111
    • Txt2Img - Generating QR Code From a Simple QR and Previously Created Prompt
    • Content prompt
    • Deep Lake prompt
    • Img2Img with logo - Generating QR Code From a QR with Logo and Previously Created Prompt
    • Content prompt
    • Deep Lake prompt
    • Step 8: Generating Artistic QR Codes for E-commerce
    • Img2Img with Logo - Generating Tommy Hilfiger QR Code
    • Img2Img with logo - Generating Patagonia QR Code
    • Step 9: Hands On with ComfyUI
    • With v1-5-pruned-emaonly.safetensors
    • With dreamshaper_8.safetensors
    • With revAnimated_v122EOL.safetensors
    • Keep only truly scannable QR codes
    • Apply logo to generated images to make them more intriguing
    • Limitations of Our Approach
    • Conclusion: Scalable Prompt Generation Achieved, QR Code Generation Remains Unreliable
    • FAQs
    • What is prompt engineering?
    • What is Stable Diffusion?
    • Is Stable Diffusion free to use?
    • What is ControlNet?
    • What is AUTOMATIC1111?
    • How to use ComfyUI?
    • What is QR code and how it works?
    • How to make artistic QR codes?
    • What is image synthesis?
    • What are LoRA models?
    • Previous
        • Blog
        • News
      • Low AWS GPU usage? Achieve up to 95% GPU utilization in SageMaker with Hub

      • on Dec 3, 2021
    • Next
        • Blog
        • LangChain
      • Use OpenAI CLIP, LangGraph, & RAG to Generate Competitive Restaurant Insights

      • on Mar 21, 2024
  • deep lake database

    Deep Lake. Database for AI.

    • Solutions
      AgricultureAudio ProcessingAutonomous Vehicles & RoboticsBiomedical & HealthcareMultimediaSafety & Security
    • Company
      AboutContact UsCareersPrivacy PolicyDo Not SellTerms & Conditions
    • Resources
      BlogDocumentationDeep Lake WhitepaperDeep Lake Academic Paper
  • Tensie

    Featured by

    featuredfeaturedfeaturedfeatured