• ActiveLoop
    • Solutions
      Industries
      • agriculture
        Agriculture
      • audio proccesing
        Audio Processing
      • autonomous_vehicles
        Autonomous & Robotics
      • biomedical_healthcare
        Biomedical & Healthcare
      • generative_ai_and_rag
        Generative AI & RAG
      • multimedia
        Multimedia
      • safety_security
        Safety & Security
      Case Studies
      Enterprises
      BayerBiomedical

      Chat with X-Rays. Bye-bye, SQL

      MatterportMultimedia

      Cut data prep time by up to 80%

      Flagship PioneeringBiomedical

      +18% more accurate RAG

      MedTechMedTech

      Fast AI search on 40M+ docs

      Generative AI
      Hercules AIMultimedia

      100x faster queries

      SweepGenAI

      Serverless DB for code assistant

      Ask RogerGenAI

      RAG for multi-modal AI assistant

      Startups
      IntelinairAgriculture

      -50% lower GPU costs & 3x faster

      EarthshotAgriculture

      5x faster with 4x less resources

      UbenwaAudio

      2x faster data preparation

      Tiny MileRobotics

      +19.5% in model accuracy

      Company
      Company
      about
      About
      Learn about our company, its members, and our vision
      Contact Us
      Contact Us
      Get all of your questions answered by our team
      Careers
      Careers
      Build cool things that matter. From anywhere
      Docs
      Resources
      Resources
      blog
      Blog
      Opinion pieces & technology articles
      langchain
      LangChain
      LangChain how-tos with Deep Lake Vector DB
      tutorials
      Tutorials
      Learn how to use Activeloop stack
      glossary
      Glossary
      Top 1000 ML terms explained
      news
      News
      Track company's major milestones
      release notes
      Release Notes
      See what's new?
      Academic Paper
      Deep Lake Academic Paper
      Read the academic paper published in CIDR 2023
      White p\Paper
      Deep Lake White Paper
      See how your company can benefit from Deep Lake
      Free GenAI CoursesSee all
      LangChain & Vector DBs in Production
      LangChain & Vector DBs in Production
      Take AI apps to production
      Train & Fine Tune LLMs
      Train & Fine Tune LLMs
      LLMs from scratch with every method
      Build RAG apps with LlamaIndex & LangChain
      Build RAG apps with LlamaIndex & LangChain
      Advanced retrieval strategies on multi-modal data
      Pricing
  • Book a Demo
Use Deep Memory to Boost RAG Apps' Accuracy by up to +22%
    • Back
      • Blog
      • News

    Use Deep Memory to Boost RAG Apps' Accuracy by up to +22%

    Boost Retrieval Accuracy by +41% & Cut GPT-4 Costs in Half with Deep Memory, a Feature to Boost Chat-With-Data Applications Built with RAG & Deep Lake
    • Davit BuniatyanDavit Buniatyan
    7 min readon Sep 28, 2023Updated Dec 21, 2023
  • The Problem in RAGs is Inaccurate Retrieval

    Enterprises are currently building ‘chat with your data’ applications for a variety of use cases ranging from documenting internal processes to adding automation for customer support. These solutions are typically implemented using Retrieval Augmented Generation (RAG) systems that provide context to Large Language Models (LLMs) like GPT-4. Retrieval Augmented Generation also infuses the model with strategically relevant bits of the latest information, reducing the need to fine-tune the model separately and removing the training dataset restriction. While truthfulness and faithfulness are important, the utility of the implementation ultimately depends on the retrieval accuracy. At best, RAG applications achieve a retrieval accuracy of 70%, so in 30% of cases, an LLM like GPT-4 cannot provide an accurate response to the user’s requests. This level of inconsistency is obvious and perceptible by the end user, and it is unacceptable when applications play a crucial role in business-critical operations.

    Existing Solutions Provide Incremental Improvements

    There are several options for boosting the accuracy of RAG systems such as: adjusting the input with feature engineering, fine-tuning embeddings, employing hybrid or lexical search, reranking the final results with cross encoders, and context-aware fine-tuning your LLM. However, these techniques offer marginal improvements that do not fundamentally change the user-experience of using LLM apps.

    Another approach for increasing accuracy is to use a broader LLM context, but this often results in higher costs with only minor gains. Why?

    • Answer quality decreases, and the risk of hallucination increases
      As shown in a recent study from Stanford, LLMs struggle to pick out key details from large contexts, especially if these details are in the middle. These models find it hard to focus on relevant info when given many documents, and the problem gets worse (up to 20%) with more documents.

    • The bigger the context, the higher the cost for LLM execution.
      LLM providers bill based on data size, and adding more data to a query increases the cost. For self-deployments, the cost manifests though higher compute requirements and larger infrastructure.

    Introducing Deep Memory: An Actual Solution That Works

    Deep Memory significantly increases Deep Lake’s vector search accuracy up to +22% by learning an index from labeled queries tailored to your application, without impacting search time. These results can be achieved with only a few hundred example pairs of prompt embeddings and most relevant answers from the vector store.

    How Deep Memory Works

    Post training, vector search is used without modifications as if normal.

    Embeddings can still be computed using a model of your choice such as Open AI ada-002 or other OSS models BGE by BAAI. Furthermore, search results from Deep Memory can be further improved by combining them with lexical search or reranker.

    Recall comparison

    Recall comparison of lexical search, vector search and deep memory at top 10 retrieved documents

    How can you do it yourself?

    Let’s get hands-on. first, we load the indexed dataset, let’s call it corpus. It has to be in managed database (runtime={"tensor_db": True}) to leverage Deep Memory training on the Deep Lake Managed Tensor Database.

    1from deeplake import VectorStore
    2
    3corpus = VectorStore(
    4    "hub://activeloop-test/scifact-demo",
    5    embedding_function = embeddings.embed_documents,
    6    runtime={"tensor_db": True}
    7)
    8

    Then, we construct a dataset of questions and relevance. Relevance is a set of pairs (corpus.id: str, significance: str) which provides information where is the answer inside the corpus. Sometimes an answer can be found in multiple locations or have different significance. Relevance enables Deep Memory training to correctly optimize the embedding space for higher accuracy. The goal here is to obtain a labelled query dataset similar to what your user would ask in production setting. To observe significant improvements, we would expect to have few hundred query pairs. If you don’t have already labelled query dataset, you can also use GPT4 to generate synthetic pairs based on chunks from the corpus.

    Then, we kickstart the training job on the Activeloop platform.

    1questions = ["question 1", ...]
    2relevance = [[(corpus.dataset.id[0], 1), ...], ...]
    3
    4job_id = corpus.deep_memory.train(
    5    queries = questions,
    6    relevance = relevance,
    7    embedding_function = embeddings.embed_documents,
    8)
    9

    We can monitor training progress and observe that we received up to +21.5% on validation set in few minutes 🤯.

    1corpus.deep_memory.status(job_id)
    2

    training progress

    We can directly enable Deep Memory inside vector search without waiting the training to finish.

    1corpus.search(
    2    embedding_data = "Female carriers of the Apolipoprotein E4 (APOE4) allele have increased risk for dementia.",
    3    embedding_function = embeddings.embed_query,
    4    deep_memory = True
    5)
    6

    A GPT4 answer based on naive vector search would produce the following result

    “The provided context does not explicitly state that female carriers of the Apolipoprotein E4 (APOE4) allele have an increased risk for dementia.”

    however with deep memory enabled

    “The Apolipoprotein E4 (APOE4) allele is a confirmed susceptibility locus for late-onset Alzheimer’s disease.”

    While qualitative results are good for manual inspection, in order to evaluate in a production setting we would like to quantitatively measure the accuracy of the information retrieval on previously unseen queries. This provides objective metrics to compare against naive vector search, or even lexical search.

    1corpus.deep_memory.evaluate(
    2    queries = test_questions,
    3    relevance = test_relevance,
    4    embedding_function = embeddings.embed_documents,
    5    top_k=[1, 3, 5, 10, 50, 100]
    6)
    7

    Deep Memory compared to Lexical Search and Vector Search

    X Axis computes recall per top k. Higher recall % and lower K is better.

    As you increase k , the likelihood of correct answer appearing in the retrieved results increases. However, since the LLM API costs are proportional to the number of tokens in the context, your goal is to decrease top_k while preserving accuracy. Deep Memory lets you to decrease top_k from 10 to 3 while preserving the accuracy, resulting in up to 70% lower token usage, faster computations, and lower costs.

    Customer Spotlight: Munai Boosts Vector Search Accuracy by 41% with Deep Memory

    munai logo

    Health tech startup Munai, backed by the Bill & Melinda Gates Foundation, provides intelligent patient management.

    “Leveraging Deep Memory, we’ve achieved an astounding 18.6% boost in our vector search accuracy across medical documents. Such a transformation directly impacts the efficiency and efficacy of our solutions.” - Mateus Cichelero da Silva, Data Chapter Lead at Munai

    The company collaborates with one of the largest hospitals in Brazil to facilitate patient management for healthcare professionals and health insurance companies.

    “Munai is one of the leading innovative generative AI solution providers in Healthcare. We are more than excited to partner with them to deploy mission critical AI workloads into hospitals” - Davit Buniatyan, CEO of Activeloop

    munai founders

    Hugo Morales e Cristian Rocha, founders of Munai, healthtech generative AI startup (Crédito: Divulgação)

    “Deep Memory marks a groundbreaking advancement in constructing precision-focused RAG systems for medical applications. In contexts as delicate as healthcare environments, this is not just an improvement; it’s a revolution” - Cristian Rocha CEO of Munai

    Higher Accuracy at Lower Cost for Your RAG Application

    Deep Memory increases retrieval accuracy without altering your existing workflow. Additionally, by reducing the top_k input into the LLM, you can significantly cut inference costs via lower token usage.

    Benefits of Deep Memory

    1. Higher Quality: Accuracy improves up to +22% learning from queries on average, with customers like Munai achieving up to 41% increase.
    2. Cost Reduction: Saving up to 50% GPT4 cost and execution time by reducing top k.
    3. Simple to Use: No change in Deep Lake Vector Search usage, same speed with higher accuracy. Natively integrated inside Langchain & LlamaIndex.
    4. Smaller, Faster, Cheaper: Use smaller text embeddings combined with Deep Memory such as BGE_small (384 dims) to beat OpenAI’s ada-002 (1536 dims) and/or Elastic (BM25).

    Get Started Today

    A number of enterprise customers are already reaping the benefits of the Deep Memory-powered Retrieval Augmented Generation apps. Dive into the future with our hosted training service in our Managed Tensor Database.

    Note: Deep Memory is now generally available with the latest Deep Lake version.

    Share:

    • Table of Contents
    • The Problem in RAGs is Inaccurate Retrieval
    • Existing Solutions Provide Incremental Improvements
    • Introducing Deep Memory: An Actual Solution That Works
    • How can you do it yourself?
    • Customer Spotlight: Munai Boosts Vector Search Accuracy by 41% with Deep Memory
    • Higher Accuracy at Lower Cost for Your RAG Application
    • Benefits of Deep Memory
    • Get Started Today
    • Previous
        • News
      • Activeloop is now SOC 2 Type II Certified

      • on Jan 8, 2024
    • Next
        • Blog
      • Use ImageBind & Multimodal Retrieval for AI Image Search

      • on Jul 28, 2023
  • deep lake database

    Deep Lake. Database for AI.

    • Solutions
      AgricultureAudio ProcessingAutonomous Vehicles & RoboticsBiomedical & HealthcareMultimediaSafety & Security
    • Company
      AboutContact UsCareersPrivacy PolicyDo Not SellTerms & Conditions
    • Resources
      BlogDocumentationDeep Lake WhitepaperDeep Lake Academic Paper
  • Tensie

    Featured by

    featuredfeaturedfeaturedfeatured