• ActiveLoop
    • Solutions
      Industries
      • agriculture
        Agriculture
      • audio proccesing
        Audio Processing
      • autonomous_vehicles
        Autonomous & Robotics
      • biomedical_healthcare
        Biomedical & Healthcare
      • generative_ai_and_rag
        Generative AI & RAG
      • multimedia
        Multimedia
      • safety_security
        Safety & Security
      Case Studies
      Enterprises
      BayerBiomedical

      Chat with X-Rays. Bye-bye, SQL

      MatterportMultimedia

      Cut data prep time by up to 80%

      Flagship PioneeringBiomedical

      +18% more accurate RAG

      MedTechMedTech

      Fast AI search on 40M+ docs

      Generative AI
      Hercules AIMultimedia

      100x faster queries

      SweepGenAI

      Serverless DB for code assistant

      Ask RogerGenAI

      RAG for multi-modal AI assistant

      Startups
      IntelinairAgriculture

      -50% lower GPU costs & 3x faster

      EarthshotAgriculture

      5x faster with 4x less resources

      UbenwaAudio

      2x faster data preparation

      Tiny MileRobotics

      +19.5% in model accuracy

      Company
      Company
      about
      About
      Learn about our company, its members, and our vision
      Contact Us
      Contact Us
      Get all of your questions answered by our team
      Careers
      Careers
      Build cool things that matter. From anywhere
      Docs
      Resources
      Resources
      blog
      Blog
      Opinion pieces & technology articles
      langchain
      LangChain
      LangChain how-tos with Deep Lake Vector DB
      tutorials
      Tutorials
      Learn how to use Activeloop stack
      glossary
      Glossary
      Top 1000 ML terms explained
      news
      News
      Track company's major milestones
      release notes
      Release Notes
      See what's new?
      Academic Paper
      Deep Lake Academic Paper
      Read the academic paper published in CIDR 2023
      White p\Paper
      Deep Lake White Paper
      See how your company can benefit from Deep Lake
      Free GenAI CoursesSee all
      LangChain & Vector DBs in Production
      LangChain & Vector DBs in Production
      Take AI apps to production
      Train & Fine Tune LLMs
      Train & Fine Tune LLMs
      LLMs from scratch with every method
      Build RAG apps with LlamaIndex & LangChain
      Build RAG apps with LlamaIndex & LangChain
      Advanced retrieval strategies on multi-modal data
      Pricing
  • Book a Demo
AI Story Generator: OpenAI Function Calling, LangChain, & Stable Diffusion
    • Back
      • Blog
      • LangChain

    AI Story Generator: OpenAI Function Calling, LangChain, & Stable Diffusion

    Go from Prompt to an Illustrated Children's Book with LangChain & OpenAI Function Calling. Improve the Gen AI App with Fine-Tuning and Model Training Deep Lake
    • Davit BuniatyanDavit Buniatyan
    12 min readon Jun 19, 2023Updated May 31, 2024
  • If this Machine Learning thing never works out, you can still make passive income by mass selling these on Amazon. Thanks us later (disclaimer: this is a joke)

    Meet FableForge, AI Picture Books Generator powered by OpenAI, LangChain, Stable Diffusion, & Deep Lake

    Imagine a world where children’s picture books are created on-demand by children from a single prompt. With each generated image, the text and prompt pairs are stored for further finetuning if the child likes the story - to fit one human’s imagination perfectly.

    This is the grand vision of FableForge.

    FableForge is an open-source app that generates children’s picture books from a single prompt. First, GPT-3.5/4 is instructed to write a short children’s book. Then, using the new function calling feature OpenAI just announced, the text from each book page is transformed into a prompt for Stable Diffusion. These prompts are sent to Replicate, corresponding images are generated, and all the elements are combined for a complete picture book. The matching images and prompts are stored in a Deep Lake vector database, allowing easy storing and visualizing of multimodal data (image and text pairs). Beyond that, the generated data can be streamed to machine learning frameworks in real time while training, to finetune our generative AI model. While the latter is beyond the scope of this example, we’d love to cover how it all works together.

    fableforge demo

    But first…

    What Did and Didn’t Work while Building FableForge?

    Before we look at the exact solution we eventually decided on, let’s take a glance at the approaches that didn’t work and what we learned from them:

    Didn’t Work: Instructing GPT-4 To Generate Stable Diffusion Prompts

    Initially, it seemed like it might be possible to send the LLM the text of our book and tell it to generate a prompt for each page. However, this didn’t work for a few reasons:

    • Stable Diffusion released in 2022: While it might seem like Stable Diffusion is already "old news", to GPT-3.5 and GPT-4 it’s in the future. Look at GPT-4’s response to the question, "What is Stable Diffusion?":
      What is Stable Diffusion? Nobody Knew in 2021

    • Teaching the LLM how to prompt is difficult: It’s possible to instruct the LLM to generate prompts without the LLM knowing what Stable Diffusion is; giving it the exact format to generate a prompt with has decent results. Unfortunately, the often injects plot details or non-visual content into the prompts, no matter how often you tell it not to. These details skew the relevance of the prompts and negatively impact the quality of the generated images.

    What Did Work: Function Calling Capabilities

    What is OpenAI Function Calling?

    On June 13th, OpenAI announced a huge update to the chat completions API - function calling!. This means we can provide the chat model with a function, and the chat model will output a JSON object according to that function’s parameters.

    Now, the chat models can interpret natural language input into a structured format suitable for external tools, APIs, or database queries. The chat models are designed to detect when a function needs to be called based on the user’s input and can then respond with JSON that conforms to the described function’s signature.

    In essence, function calling is a way to bridge the gap between unstructured language input and structured, actionable output that other systems, tools, or services can use.

    How FableForge Uses Functions

    For our Stable Diffusion prompts, we need structured data that strictly adheres to specific rules - a function is perfect for that! Let’s take a look at one of the functions we used:

    1get_visual_description_function = [{
    2    'name': 'get_passage_setting',
    3    'description': 'Generate and describe the visuals of a passage in a book. Visuals only, no characters, plot, or people.',
    4    'parameters': {
    5        'type': 'object',
    6        'properties': {
    7            'setting': {
    8                'type': 'string',
    9                'description': 'The visual setting of the passage, e.g. a green forest in the pacific northwest',
    10            },
    11            'time_of_day': {
    12                'type': 'string',
    13                'description': 'The time of day of the passage, e.g. nighttime, daytime. If unknown, leave blank.',
    14            },
    15            'weather': {
    16                'type': 'string',
    17                'description': 'The weather of the passage, eg. rain. If unknown, leave blank.',
    18            },
    19            'key_elements': {
    20                'type': 'string',
    21                'description': 'The key visual elements of the passage, eg tall trees',
    22            },
    23            'specific_details': {
    24                'type': 'string',
    25                'description': 'The specific visual details of the passage, eg moonlight',
    26            }
    27        },
    28        'required': ['setting', 'time_of_day', 'weather', 'key_elements', 'specific_details']
    29    }
    30}]
    31

    With this, we can send the chat model a page from our book, the function, and instructions to infer the details from the provided page. In return, we get structured data that we can use to form a great Stable Diffusion prompt!

    LangChain and OpenAI Function Calling

    When we created FableForge, OpenAI announced the new function calling capabilities. Since then, LangChain - the open-source library we use to interact with OpenAI’s Large Language Models - has added even better support for using functions. Our implementation of functions using LangChain is as follows:

    1. Define our function: First, we define our function, as we did above with get_visual_description_function.

    2. Give the chat model access to our function: Next, we call our chat model, including our function within the functions parameter, like so:

    1
    2response= self.chat([HumanMessage(content=f'{page}')],functions=get_visual_description_function)
    3
    4
    1. Parse the JSON object: When the chat model uses our function, it provides the output as a JSON object. To convert the JSON object into a Python dictionary containing the function output, we can do the following:
    1
    2function_dict = json.loads(response.additional_kwargs['function_call']['arguments'])
    3
    4

    In the function, we defined earler, ‘setting’ was one of the parameters. To access this, we can write:

    1setting = function_dict['setting']
    2

    And we’re done! We can follow the same steps for the each of the other parameters to extract them.

    Perfecting the Process: Using Deep Lake for Storage and Analysis

    The final step breakthrough for perfecting FableForge was using Deep Lake to store the generated images and text. With Deep Lake, we could store multiple modalities of data, such as images and text, in the cloud. The web-based UI provided by Deep Lake made it incredibly straightforward to display, analyze, and optimize the generated images and prompts, improving the quality of our picture book output. For future Stable Diffusion endeavors, we now have a decently-sized dataset showing us what prompts work, and what prompts don’t!

    Deep Lake UI with text and image pairs

    Building FableForge

    FableForge’s open-sourced code is located here.

    FableForge consists of four main components:

    1. The generation of the text and images
    2. The combining of the text and images to create the book
    3. Saving the images and prompts to the Deep Lake dataset
    4. The UI

    Let’s take a look at each component individually, starting with the generation of the text and images. Here’s a high-level overview of the architecture:

    enter image description here

    First Component: AI Book Generation

    All code for this component can be found in the api_utils.py file.

    1. Text Generation: To generate the text for the children’s book, we use LangChain and the ChatOpenAI chat model.
    1def get_pages(self):
    2        pages = self.chat([HumanMessage(content=f'{self.book_text_prompt} Topic: {self.input_text}')]).content
    3        return pages
    4
    5

    self.book_text_prompt is a simple prompt instructing the model to generate a children’s story. We specify the number of pages inside the prompt and what format the text should come in. The full prompt can be found in the prompts.py file.

    1. Visual Prompts Generation: To produce the prompts we will use with Stable Diffusion, we use functions, as outlined above. First, we send the whole book to the model:
    1    def get_prompts(self):
    2        base_atmosphere = self.chat([HumanMessage(content=f'Generate a visual description of the overall lightning/atmosphere of this book using the function.'
    3                                                          f'{self.book_text}')], functions=get_lighting_and_atmosphere_function)
    4        summary = self.chat([HumanMessage(content=f'Generate a concise summary of the setting and visual details of the book')]).content
    5
    6

    Since we want our book to have a consistent style throughout, we will take the contents of base_atmosphere and append it to each individual prompt we generate later on. To further ensure our visuals stay consistent, we generate a concise summary of the visuals of the book. This summary will be sent to the model later on, accompanying each individual page, to generate our Stable Diffusion prompts.

    1        def generate_prompt(page, base_dict):
    2            prompt = self.chat([HumanMessage(content=f'General book info: {base_dict}. Passage: {page}. Infer details about passage if they are missing, '
    3                                                     f'use function with inferred detailsm as if you were illustrating the passage.')],
    4                               functions=get_visual_description_function)
    5

    This method will be called for each individual page of the book. We send the model the info we just gathered along with a page from the book, and give it access to the get_visual_description_function function. The output of this will be a JSON object containing all the elements we need to form our prompts!

    1    for i, prompt in enumerate(prompt_list):
    2        entry = f"{prompt['setting']}, {prompt['time_of_day']}, {prompt['weather']}, {prompt['key_elements']}, {prompt['specific_details']}, " \
    3                f"{base_dict['lighting']}, {base_dict['mood']}, {base_dict['color_palette']}, in the style of {style}"
    4

    Here, we combine everything. Now that we have our prompts, we can send them to Replicate’s Stable Diffusion API and get our images. Once those are downloaded, we can move on to the next step.

    Second Component: Combining Text and Images

    Now that we have our text and images, we can open up MS Paint and copy-paste the text onto each corresponding image. That would be different, and it’s also time-consuming; instead, let’s do it programmatically. In pdf_gen_utils.py, we turn our ingredients into a proper book in these steps:

    1. Text Addition and Image Conversion: First, we take each image, resize it, and apply a fading mask to the bottom - a white space for us to place our text. We then add the text to the faded area, convert it into a PDF, and save it.
    2. Cover Generation: A book needs a cover that follows a different format than the rest of the pages. Instead of a fading mask, we take the cover image and place a white box over a portion for the title to be placed within. The other steps (resizing and saving as PDF) are the same as above.
    3. PDF Assembly: Once we have completed all the pages, we combine them into a single PDF and delete the files we no longer need.

    Third Component: Saving to Deep Lake

    Now that we have finalized our picture book, we want to store the images and prompts in Deep Lake. For this, we created a SaveToDeepLake class:

    1import deeplake
    2
    3class SaveToDeepLake:
    4    def __init__(self, buildbook_instance, name=None, dataset_path=None):
    5        self.dataset_path = dataset_path
    6        try:
    7            self.ds = deeplake.load(dataset_path, read_only=False)
    8            self.loaded = True
    9        except:
    10            self.ds = deeplake.empty(dataset_path)
    11            self.loaded = False
    12
    13        self.prompt_list = buildbook_instance.sd_prompts_list
    14        self.images = buildbook_instance.source_files
    15
    16    def fill_dataset(self):
    17        if not self.loaded:
    18            self.ds.create_tensor('prompts', htype='text')
    19            self.ds.create_tensor('images', htype='image', sample_compression='png')
    20        for i, prompt in enumerate(self.prompt_list):
    21            self.ds.append({'prompts': prompt, 'images': deeplake.read(self.images[i])})
    22
    23

    When initialized, the class first tries to load a Deep Lake dataset from the provided path. If the dataset doesn’t exist, a new one is created.

    If the dataset already existed, we simply added the prompts and images. The images can be easily uploaded using deeplake.read(), as Deep Lake is built to handle multimodal data.

    If the dataset is empty, we must first create the tensors to store our data. In this case, we create a tensor ‘prompts’ for our prompts and ‘images’ for our images. Our images are in PNG format, so we set sample_compression to 'png'.

    Once uploaded, we can view them in the UI, as shown above.

    All code can be found in the deep_lake_utils.py file.

    Final Component: Streamlit UI

    To create a quick and simple UI, we used Streamlit. The complete code can be found in main.py.

    Our UI has three main features:

    1. Prompt Format: In this text input box, we allow the user to specify the prompt to generate the book based on. This could be anything from a theme, a plot, a time, and so on.
    2. Book Generation: Once the user has input their prompt, they can click the Generate button to generate the book. The app will run through all of the steps outlined above until it completes the generation. The user will then have a button to download their finished book.
    3. Saving to Deep Lake: The user can click the Save to Deep Lake checkbox to save the prompts and images to their Deep Lake vector database. Once the book is generated, this will run in the background, filling the user’s dataset with all their generated prompts and images.

    Streamlit is an excellent choice for quick prototyping and smaller projects like FableForge - the entire UI is less than 60 lines of code!

    Conclusion: The Future of AI-Generated Picture Books with FableForge & Deep Lake

    Developing FableForge was a perfect example of how new AI tools and methodologies can be leveraged to overcome hurdles. By leveraging the power of LangChain, OpenAI’s function calling feature, Stable Diffusion’s image generation abilities, and Deep Lake’s multimodal dataset storage and analysis capabilities, we created an app that opens up a new frontier in children’s picture book creation.

    Everyone can create an app like this - we did it, too. What will matter for you in the end, however, is having the data as the moat - and using the data you gether from your users to finetune models - providing them personal, curated experiences as they immerse themsleves into fiction. This is where Deep Lake comes into place. With its ‘data lake’ features of visualization of multimodal data and streaming capability, Deep Lake enables teams to finetune their LLM performance or train entirely new ML models in a cost-effective manner.

    Share:

    • Table of Contents
    • Meet FableForge, AI Picture Books Generator powered by OpenAI, LangChain, Stable Diffusion, & Deep Lake
    • What Did and Didn't Work while Building FableForge?
    • Didn't Work: Instructing GPT-4 To Generate Stable Diffusion Prompts
    • What Did Work: Function Calling Capabilities
    • What is OpenAI Function Calling?
    • How FableForge Uses Functions
    • LangChain and OpenAI Function Calling
    • Perfecting the Process: Using Deep Lake for Storage and Analysis
    • Building FableForge
    • First Component: AI Book Generation
    • Second Component: Combining Text and Images
    • Third Component: Saving to Deep Lake
    • Final Component: Streamlit UI
    • Conclusion: The Future of AI-Generated Picture Books with FableForge & Deep Lake
    • Previous
        • Blog
        • LangChain
      • Unlocking Advanced Retrieval Capabilities: LLM and Deep Memory for RAG Applications

      • on Aug 29, 2024
    • Next
        • Blog
        • Tutorials
        • LlamaIndex
      • Use LlamaIndex to Build an AI Shopping Assistant with RAG and Agents

      • on Jan 4, 2024
  • deep lake database

    Deep Lake. Database for AI.

    • Solutions
      AgricultureAudio ProcessingAutonomous Vehicles & RoboticsBiomedical & HealthcareMultimediaSafety & Security
    • Company
      AboutContact UsCareersPrivacy PolicyDo Not SellTerms & Conditions
    • Resources
      BlogDocumentationDeep Lake WhitepaperDeep Lake Academic Paper
  • Tensie

    Featured by

    featuredfeaturedfeaturedfeatured