<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Fountain Voyage</title>
    <link>https://www.zair.top/en/</link>
    <description>Welcome to Fountain Voyage, a personal technology blog that gathers my learning and practice in computer technology, data science, and artificial intelligence. From programming skills to algorithm analysis, from data analytics to machine learning projects, from technical knowledge to personal growth stories, there is a wealth of content waiting for you to explore. Let&#39;s learn and grow together, unlocking the endless possibilities of the tech world!</description>
    <follow_challenge>
      <feedId>65268575454017536</feedId>
      <userId>72075959463733248</userId>
    </follow_challenge>
    <generator>Hugo 0.136.4 &amp; FixIt v0.3.17-036cbb41</generator>
    <language>en</language>
    <managingEditor>blog@zair.top (Tim)</managingEditor>
    <webMaster>blog@zair.top (Tim)</webMaster>
    <copyright>This work is licensed under CC BY-NC-SA-4.0.</copyright>
    <lastBuildDate>Fri, 28 Nov 2025 14:21:01 +0000</lastBuildDate>
    <atom:link href="https://www.zair.top/en/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>A Comprehensive Guide to Mixture of Experts (MoE): Exploring Mixtral 8X7B, DBRX, and Deepseek-v2 Architectures and Applications</title>
      <link>https://www.zair.top/en/post/mixture-of-experts/</link>
      <pubDate>Wed, 25 Dec 2024 15:44:22 +0800</pubDate><author>blog@zair.top (Tim)</author>
      <guid>https://www.zair.top/en/post/mixture-of-experts/</guid>
      <category domain="https://www.zair.top/en/categories/llm/">Large Language Models</category>
      <description><![CDATA[<blockquote>
<p>Dive into the architecture and working principles of Mixture of Experts (MoE) models, exploring popular frameworks like Mixtral 8X7B, DBRX, and Deepseek-v2. Learn their applications and advantages, implement an MoE model using Python, and evaluate its performance on tasks like logical reasoning, summarization, and entity extraction.</p>
</blockquote>
<p>Mixture of Experts (MoE) has been a viral concept in the world of Large Language Models (LLMs). It not only marks a breakthrough in efficiency and scalability but also offers novel solutions to handle complex tasks. In simple terms, MoE splits a large model into multiple smaller models, where each smaller model, called an &ldquo;expert,&rdquo; specializes in a specific task or dataset type. When processing a particular task, the model activates only the relevant &ldquo;experts,&rdquo; without engaging the entire network, thereby saving computational resources significantly.</p>
<p><strong>In this article, you will:</strong></p>
<ul>
<li>Master the basics of MoE model architecture and its working principles.</li>
<li>Learn about several popular MoE models, such as Mixtral 8X7B, DBRX, and Deepseek-v2.</li>
<li>Implement an MoE model in Google Colab using Python code.</li>
<li>Assess the performance of a typical MoE model on tasks like logical reasoning, summarization, and entity extraction.</li>
<li>Understand the advantages and challenges of using MoE models in complex natural language processing tasks and code generation.</li>
</ul>
<h2 id="what-is-a-mixture-of-experts-moe-model" class="heading-element"><span>1 What is a Mixture of Experts (MoE) Model?</span>
  <a href="#what-is-a-mixture-of-experts-moe-model" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>Most modern deep learning models rely on neural networks with multiple layers, each containing numerous &ldquo;neurons.&rdquo; These neurons process input data, perform mathematical operations (e.g., activation functions), and pass the results to the subsequent layers. More advanced architectures, such as Transformers, employ self-attention mechanisms to capture complex patterns within data.</p>
<p>However, traditional dense architectures engage the entire network when solving any single task, which leads to extremely high computational costs. To address this, Mixture of Experts (MoE) introduces <strong>sparse architectures</strong>, where only the specific parts of the network—relevant &ldquo;experts&rdquo;—are activated based on the input task. This results in a significant reduction in computational requirements, particularly for resource-intensive tasks like natural language processing.</p>
<p>Imagine a team project where team members are divided into smaller groups, each focusing on a unique task. MoE operates in a similar way—it breaks a complex problem into smaller sub-tasks, with each &ldquo;expert&rdquo; network handling a specific sub-task.</p>
<h3 id="key-advantages-of-moe-models" class="heading-element"><span>1.1 <strong>Key Advantages of MoE Models</strong></span>
  <a href="#key-advantages-of-moe-models" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><ul>
<li><strong>Faster Pretraining</strong>: MoE models expedite the training process compared to traditional dense models.</li>
<li><strong>Improved Inference Speed</strong>: Even with a similar parameter count, MoE models offer faster inference.</li>
<li><strong>Higher VRAM Requirements</strong>: MoE models demand more video memory since all &ldquo;experts&rdquo; must be stored simultaneously.</li>
</ul>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024/12/a7b209eb10f3d2b2c8e5414c41e62f1c.avif" alt="MoE Structure Diagram" srcset="https://cdn1.zair.top/images/2024/12/a7b209eb10f3d2b2c8e5414c41e62f1c.avif?size=small, https://cdn1.zair.top/images/2024/12/a7b209eb10f3d2b2c8e5414c41e62f1c.avif?size=medium 1.5x, https://cdn1.zair.top/images/2024/12/a7b209eb10f3d2b2c8e5414c41e62f1c.avif?size=large 2x" data-title="MoE Structure Diagram" class="suffix-invalid suffix-invalid__small suffix-invalid__large" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p>An MoE model consists of two core components:</p>
<ol>
<li><strong>Experts</strong>: Smaller neural networks specialized in different tasks.</li>
<li><strong>Router</strong>: This module dynamically selects and activates the relevant experts for a given input. By activating only relevant experts, MoE optimizes performance and computational efficiency.</li>
</ol>
<h2 id="popular-moe-models" class="heading-element"><span>2 Popular MoE Models</span>
  <a href="#popular-moe-models" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>MoE models have gained significant attention in AI research due to their ability to efficiently scale large language models while maintaining high performance. Notable examples like Mixtral 8X7B leverage sparse MoE architectures to activate only subsets of experts for specific inputs. This approach matches the performance of much larger dense models while significantly improving efficiency.</p>
<p>Let’s explore some prominent MoE models and implement them in Python using Ollama on Google Colab.</p>
<h3 id="1-mixtral-8x7b" class="heading-element"><span>2.1 <strong>1. Mixtral 8X7B</strong></span>
  <a href="#1-mixtral-8x7b" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p><a href="https://huggingface.co/mistralai/Mixtral-8x7B-v0.1"target="_blank" rel="external nofollow noopener noreferrer">Mixtral 8X7B<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a> is a decoder-only transformer model, where the input tokens are embedded as vectors, processed through decoder layers, and outputted as probabilities of each token occurring. The architecture incorporates a Sparse Mixture of Experts (SMoE) to handle word vectors efficiently, significantly reducing computation costs.</p>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024/12/ccb17e49fea11e6db8e4d3ce48d9f373.avif" alt="MoE Decoder" srcset="https://cdn1.zair.top/images/2024/12/ccb17e49fea11e6db8e4d3ce48d9f373.avif?size=small, https://cdn1.zair.top/images/2024/12/ccb17e49fea11e6db8e4d3ce48d9f373.avif?size=medium 1.5x, https://cdn1.zair.top/images/2024/12/ccb17e49fea11e6db8e4d3ce48d9f373.avif?size=large 2x" data-title="MoE Decoder" class="suffix-invalid suffix-invalid__small suffix-invalid__large" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p><strong>Notable Features:</strong></p>
<ul>
<li>Total experts: 8</li>
<li>Active experts per input: 2</li>
<li>Decoder layers: 32</li>
<li>Vocabulary size: 32,000</li>
<li>Embedding size: 4,096</li>
<li>Individual expert size: 5.6 billion parameters (shared with embedding, normalization layers, etc.).</li>
<li>Activated parameters: 12.8 billion.</li>
<li>Context length: 32k tokens.</li>
</ul>
<p>Mixtral 8X7B has demonstrated proficiency in various tasks, including text generation, translation, summarization, sentiment analysis, educational content, customer support automation, and research assistance. Its architecture ensures versatility across domains.</p>
<h3 id="2-dbrx" class="heading-element"><span>2.2 <strong>2. DBRX</strong></span>
  <a href="#2-dbrx" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p><a href="https://github.com/databricks/dbrx"target="_blank" rel="external nofollow noopener noreferrer">DBRX<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a>, developed by Databricks, is a decoder-only, Transformer-based LLM trained on next-token prediction. It uses fine-grained MoE architectures, achieving a total of 132 billion parameters where only 36 billion are activated per input. Notably, DBRX consists of more, smaller experts compared to Mixtral and Grok-1.</p>
<p><strong>Key Architectural Features:</strong></p>
<ul>
<li><strong>Fine-Grained Experts</strong>: Experts are divided into segments, enabling higher specialization without parameter inflation.</li>
<li>Number of experts: 16</li>
<li>Active experts per layer: 4</li>
<li>Decoder layers: 24</li>
<li>Active parameters: 36 billion</li>
<li>Total parameters: 132 billion</li>
<li>Context length: 32k tokens</li>
</ul>
<p>DBRX excels in use cases like code generation, mathematical reasoning, and complex language understanding.</p>
<h3 id="3-deepseek-v2" class="heading-element"><span>2.3 <strong>3. Deepseek-v2</strong></span>
  <a href="#3-deepseek-v2" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p><a href="https://github.com/deepseek-ai/DeepSeek-V2"target="_blank" rel="external nofollow noopener noreferrer">Deepseek-v2<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a> employs two core ideas:</p>
<ol>
<li><strong>Fine-Grained Experts</strong>: Divided into smaller segments for more focused specialization and knowledge retrieval.</li>
<li><strong>Shared Experts</strong>: Certain universally relevant experts remain constantly activated to generalize knowledge across tasks.</li>
</ol>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024/12/e813bb99ca7a99faa7fff420daaf43a7.avif" alt="DeepSeekMoE" srcset="https://cdn1.zair.top/images/2024/12/e813bb99ca7a99faa7fff420daaf43a7.avif?size=small, https://cdn1.zair.top/images/2024/12/e813bb99ca7a99faa7fff420daaf43a7.avif?size=medium 1.5x, https://cdn1.zair.top/images/2024/12/e813bb99ca7a99faa7fff420daaf43a7.avif?size=large 2x" data-title="DeepSeekMoE" class="suffix-invalid suffix-invalid__small suffix-invalid__large" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p><strong>Key Features:</strong></p>
<ul>
<li>Total parameters: 236 billion</li>
<li>Active parameters: 21 billion</li>
<li>Experts per layer: 160 (activating 8)</li>
<li>Shared experts per layer: 2</li>
<li>Active experts per layer: 8</li>
<li>Decoder layers: 60</li>
<li>Context length: 128k tokens</li>
</ul>
<p>Deepseek-v2 is exceptionally skilled in conversation-based applications like chatbots, content creation, language translation, and summarization while also excelling at code generation.</p>
<h2 id="implementing-an-moe-model-in-python" class="heading-element"><span>3 Implementing an MoE Model in Python</span>
  <a href="#implementing-an-moe-model-in-python" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>Now, let’s implement an MoE model using Python.</p>
<h3 id="step-1-install-required-python-libraries" class="heading-element"><span>3.1 Step 1: Install Required Python Libraries</span>
  <a href="#step-1-install-required-python-libraries" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="err">!</span><span class="n">sudo</span> <span class="n">apt</span> <span class="n">update</span>
</span></span><span class="line"><span class="cl"><span class="err">!</span><span class="n">sudo</span> <span class="n">apt</span> <span class="n">install</span> <span class="o">-</span><span class="n">y</span> <span class="n">pciutils</span>
</span></span><span class="line"><span class="cl"><span class="err">!</span><span class="n">pip</span> <span class="n">install</span> <span class="n">langchain</span><span class="o">-</span><span class="n">ollama</span>
</span></span><span class="line"><span class="cl"><span class="err">!</span><span class="n">curl</span> <span class="o">-</span><span class="n">fsSL</span> <span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">ollama</span><span class="o">.</span><span class="n">com</span><span class="o">/</span><span class="n">install</span><span class="o">.</span><span class="n">sh</span> <span class="o">|</span> <span class="n">sh</span>
</span></span><span class="line"><span class="cl"><span class="err">!</span><span class="n">pip</span> <span class="n">install</span> <span class="n">ollama</span><span class="o">==</span><span class="mf">0.4.2</span></span></span></code></pre></td></tr></table>
</div>
</div><h3 id="step-2-enable-multi-threading" class="heading-element"><span>3.2 Step 2: Enable Multi-Threading</span>
  <a href="#step-2-enable-multi-threading" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p>Launch the Ollama server using threading:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">threading</span>
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">subprocess</span>
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">time</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">run_ollama_serve</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">  <span class="n">subprocess</span><span class="o">.</span><span class="n">Popen</span><span class="p">([</span><span class="s2">&#34;ollama&#34;</span><span class="p">,</span> <span class="s2">&#34;serve&#34;</span><span class="p">])</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">thread</span> <span class="o">=</span> <span class="n">threading</span><span class="o">.</span><span class="n">Thread</span><span class="p">(</span><span class="n">target</span><span class="o">=</span><span class="n">run_ollama_serve</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">thread</span><span class="o">.</span><span class="n">start</span><span class="p">()</span>
</span></span><span class="line"><span class="cl"><span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span></span></span></code></pre></td></tr></table>
</div>
</div><h3 id="step-3-fetch-a-python-compatible-model-eg-dbrx" class="heading-element"><span>3.3 Step 3: Fetch a Python-Compatible Model (e.g., DBRX)</span>
  <a href="#step-3-fetch-a-python-compatible-model-eg-dbrx" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="err">!</span><span class="n">ollama</span> <span class="n">pull</span> <span class="n">dbrx</span></span></span></code></pre></td></tr></table>
</div>
</div><h3 id="step-4-query-the-model" class="heading-element"><span>3.4 Step 4: Query the Model</span>
  <a href="#step-4-query-the-model" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p>Use the DBRX model for text summarization:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span><span class="lnt">19
</span><span class="lnt">20
</span><span class="lnt">21
</span><span class="lnt">22
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">langchain_core.prompts</span> <span class="kn">import</span> <span class="n">ChatPromptTemplate</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">langchain_ollama.llms</span> <span class="kn">import</span> <span class="n">OllamaLLM</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">IPython.display</span> <span class="kn">import</span> <span class="n">Markdown</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">template</span> <span class="o">=</span> <span class="s2">&#34;&#34;&#34;Question: </span><span class="si">{question}</span><span class="s2">
</span></span></span><span class="line"><span class="cl"><span class="s2">
</span></span></span><span class="line"><span class="cl"><span class="s2">Answer: Let&#39;s think step by step.&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">prompt</span> <span class="o">=</span> <span class="n">ChatPromptTemplate</span><span class="o">.</span><span class="n">from_template</span><span class="p">(</span><span class="n">template</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">model</span> <span class="o">=</span> <span class="n">OllamaLLM</span><span class="p">(</span><span class="n">model</span><span class="o">=</span><span class="s2">&#34;dbrx&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">chain</span> <span class="o">=</span> <span class="n">prompt</span> <span class="o">|</span> <span class="n">model</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Input for processing</span>
</span></span><span class="line"><span class="cl"><span class="n">input_data</span> <span class="o">=</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;question&#34;</span><span class="p">:</span> <span class="s1">&#39;Summarize this text into one sentence: &#34;Bob is a boy. Bob has a dog. Bob takes his dog for a walk. At the park, Bob throws a stick for the dog to retrieve. The dog chases a squirrel; Bob chases his dog. Bob finds his dog, and they return home together.&#34;&#39;</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Process input and display output</span>
</span></span><span class="line"><span class="cl"><span class="n">response</span> <span class="o">=</span> <span class="n">chain</span><span class="o">.</span><span class="n">invoke</span><span class="p">(</span><span class="n">input_data</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">display</span><span class="p">(</span><span class="n">Markdown</span><span class="p">(</span><span class="n">response</span><span class="p">))</span></span></span></code></pre></td></tr></table>
</div>
</div><h2 id="performance-evaluation-of-moe-models" class="heading-element"><span>4 Performance Evaluation of MoE Models</span>
  <a href="#performance-evaluation-of-moe-models" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>To better understand the practical implications of MoE models, let’s assess their performance across common natural language processing (NLP) tasks such as logical reasoning, summarization, entity extraction, and mathematical reasoning.</p>
<h3 id="1-logical-reasoning-task" class="heading-element"><span>4.1 <strong>1. Logical Reasoning Task</strong></span>
  <a href="#1-logical-reasoning-task" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p><strong>Input Prompt</strong>:<br>
<em>“Provide a list of 13 words, each containing exactly 9 letters.”</em></p>
<p><strong>Model Output</strong>:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-markdown" data-lang="markdown"><span class="line"><span class="cl">Here are 13 words, each containing 9 letters:  
</span></span><span class="line"><span class="cl"><span class="k">1.</span> Agriculture  
</span></span><span class="line"><span class="cl"><span class="k">2.</span> Beautiful  
</span></span><span class="line"><span class="cl"><span class="k">3.</span> Chocolate  
</span></span><span class="line"><span class="cl"><span class="k">4.</span> Dangerous  
</span></span><span class="line"><span class="cl"><span class="k">5.</span> Encyclopedia  
</span></span><span class="line"><span class="cl"><span class="k">6.</span> Fireplace  
</span></span><span class="line"><span class="cl"><span class="k">7.</span> Grammarly  
</span></span><span class="line"><span class="cl"><span class="k">8.</span> Hamburger  
</span></span><span class="line"><span class="cl"><span class="k">9.</span> Important  
</span></span><span class="line"><span class="cl"><span class="k">10.</span> Juxtapose  
</span></span><span class="line"><span class="cl"><span class="k">11.</span> Kitchener  
</span></span><span class="line"><span class="cl"><span class="k">12.</span> Landscape  
</span></span><span class="line"><span class="cl">13. Necessary</span></span></code></pre></td></tr></table>
</div>
</div><p><strong>Analysis</strong>:<br>
Out of the 13 words listed by the model, only 8 words have exactly 9 letters. This reveals a partial failure in logical reasoning and precision. While MoE models excel in efficiency and general knowledge-based tasks, their ability to handle highly specific constraints like word length in this example may require fine-tuning or extensive training data aligned with the task.</p>
<h3 id="2-summarization-task" class="heading-element"><span>4.2 <strong>2. Summarization Task</strong></span>
  <a href="#2-summarization-task" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p><strong>Input Prompt</strong>:<br>
<em>“Summarize the following text into one sentence: ‘Bob is a boy. Bob has a dog. Bob takes his dog for a walk. At the park, Bob throws a stick for the dog to retrieve. The dog chases a squirrel; Bob chases his dog. Bob finds his dog, and they return home together.’”</em></p>
<p><strong>Model Output</strong>:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-markdown" data-lang="markdown"><span class="line"><span class="cl">Bob enjoys a lively outing with his dog at the park, playing fetch, chasing a squirrel, and returning home together.</span></span></code></pre></td></tr></table>
</div>
</div><p><strong>Analysis</strong>:<br>
The output demonstrates strong summarization capabilities. The response is concise yet captures the key elements of the input text. This showcases the model’s competence for tasks requiring high-level understanding and condensing information for clarity.</p>
<h3 id="3-entity-extraction-task" class="heading-element"><span>4.3 <strong>3. Entity Extraction Task</strong></span>
  <a href="#3-entity-extraction-task" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p><strong>Input Prompt</strong>:<br>
<em>“Extract all numeric values and their corresponding units from the following text: ‘A marathon covers a distance of 42 kilometers, with over 30,000 participants.’”</em></p>
<p><strong>Model Output</strong>:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-markdown" data-lang="markdown"><span class="line"><span class="cl"><span class="k">1.</span> 42 (kilometers, marathon distance)  
</span></span><span class="line"><span class="cl">2. 30,000 (number of participants)</span></span></code></pre></td></tr></table>
</div>
</div><p><strong>Analysis</strong>:<br>
The model correctly identifies both numeric entities and provides their corresponding contexts and units. This demonstrates its proficiency in extracting meaningful information in structured or semi-structured formats, making it suitable for tasks like data mining, financial analysis, or legal document review.</p>
<h3 id="4-mathematical-reasoning-task" class="heading-element"><span>4.4 <strong>4. Mathematical Reasoning Task</strong></span>
  <a href="#4-mathematical-reasoning-task" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p><strong>Input Prompt</strong>:<br>
<em>“I have 2 apples. I bought 2 more. I used 2 apples to bake a pie. After eating half the pie, how many apples do I have left?”</em></p>
<p><strong>Model Output</strong>:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-markdown" data-lang="markdown"><span class="line"><span class="cl"><span class="k">1.</span> Starting with 2 apples.  
</span></span><span class="line"><span class="cl"><span class="k">2.</span> Buying 2 more brings the total to 4 apples (2 + 2 = 4).  
</span></span><span class="line"><span class="cl"><span class="k">3.</span> Using 2 apples for a pie leaves 2 apples (4 - 2 = 2).  
</span></span><span class="line"><span class="cl"><span class="k">4.</span> Eating half the pie doesn&#39;t impact the number of apples left, as it&#39;s unrelated to pie contents.  
</span></span><span class="line"><span class="cl">Final count: 2 apples.</span></span></code></pre></td></tr></table>
</div>
</div><p><strong>Analysis</strong>:<br>
The reasoning is accurate, and the model successfully arrives at the correct answer. This demonstrates the model&rsquo;s ability to perform chained reasoning and disentangle task-independent details.</p>
<h2 id="final-thoughts" class="heading-element"><span>5 Final Thoughts</span>
  <a href="#final-thoughts" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>Mixture of Experts (MoE) represents a major evolution in deep learning, offering an efficient and scalable alternative to traditional dense architectures. Models like Mixtral 8X7B, DBRX, and Deepseek-v2 demonstrate how sparse activation, fine-grained expert segmentation, and shared knowledge mechanisms contribute to groundbreaking advances across diverse domains like NLP, code generation, and summarization.</p>
<p>As MoE technologies continue evolving, we expect further innovations to address their memory and routing bottlenecks, opening the door to even more complex, capable AI systems.</p>
<h2 id="frequently-asked-questions-faq" class="heading-element"><span>6 Frequently Asked Questions (FAQ)</span>
  <a href="#frequently-asked-questions-faq" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p><strong>Q1: What sets MoE models apart from traditional dense models?</strong><br>
A: MoE models activate only task-relevant experts, reducing computational demands and improving efficiency without compromising performance.</p>
<p><strong>Q2: How are experts selected in MoE models?</strong><br>
A: A routing mechanism dynamically selects the most relevant experts based on the input.</p>
<p><strong>Q3: Can MoE models handle highly complex tasks like math reasoning or programming?</strong><br>
A: Yes, models like DBRX are specifically designed for complex tasks, although some challenges persist in precision-critical queries.</p>
<p><strong>Q4: What are the hardware requirements for deploying MoE models?</strong><br>
A: GPUs with ample VRAM are critical for storing inactive experts; optimizations, such as shared experts, can help mitigate memory overhead.</p>
<p><strong>Q5: Which tasks benefit the most from MoE models?</strong><br>
A: NLP, summarization, conversational AI, code generation, and entity extraction are some of the most common and effective applications of MoE models.</p>
<h2 id="further-reading" class="heading-element"><span>7 Further Reading</span>
  <a href="#further-reading" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><ol>
<li><a href="https://mistral.ai/news/mixtral-of-experts/"target="_blank" rel="external nofollow noopener noreferrer">Mixtral of Experts Documentation – Mistral AI<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a></li>
<li><a href="https://arxiv.org/abs/2401.04089"target="_blank" rel="external nofollow noopener noreferrer">DBRX Fine-Grained MoE Whitepaper<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a></li>
<li><a href="https://github.com/deepseek-ai/DeepSeek-V2"target="_blank" rel="external nofollow noopener noreferrer">DeepSeek-v2 Open Source Repository<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a></li>
<li><a href="https://huggingface.co"target="_blank" rel="external nofollow noopener noreferrer">Hugging Face – Pretrained MoE Models<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a></li>
</ol>]]></description>
    </item>
    <item>
      <title>Support Vector Machine (SVM) from a Mathematical Perspective: Solving Optimization Problems</title>
      <link>https://www.zair.top/en/post/svm/</link>
      <pubDate>Wed, 27 Nov 2024 16:53:06 +0800</pubDate><author>blog@zair.top (Tim)</author>
      <guid>https://www.zair.top/en/post/svm/</guid>
      <category domain="https://www.zair.top/en/categories/data-science-machine-learning/">Data Science &amp; Machine Learning</category>
      <description><![CDATA[<blockquote>
<p>Support Vector Machine (SVM) is a classic algorithm in machine learning. This article focuses on the formula derivation in SVM, such as detailed reasoning of the margin distance formula, and the formulation of the primal and dual problems. It delves into optimization problems, including constructing the Lagrangian function to handle constrained optimization problems and using KKT conditions to find optimal solutions. It also covers the characteristics of polynomial and Gaussian kernel functions.</p>
</blockquote>
<h2 id="margin-distance-derivation" class="heading-element"><span>Margin Distance Derivation</span>
  <a href="#margin-distance-derivation" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>In Support Vector Machine (SVM), the equations for the positive and negative hyperplanes are respectively:
$$
\vec{w} \cdot \vec{x} + b = 1 \quad \text{(Positive Hyperplane)}
$$
$$
\vec{w} \cdot \vec{x} + b = -1 \quad \text{(Negative Hyperplane)}
$$
where $\vec{w}=(w_1, w_2)$ is the weight vector, $b$ is the bias term, and $\vec{x}=(x_1, x_2)$ is the data point.</p>
<p>Assume $\vec{x_m}$ is a point on the positive hyperplane, and $\vec{x_n}$ is a point on the negative hyperplane, then:
$$
w_1 x_{1m} + w_2 x_{2m} + b = 1 \quad \text{(1)}
$$
$$
w_1 x_{1n} + w_2 x_{2n} + b = -1 \quad \text{(2)}
$$</p>
<p>Subtracting equation (2) from equation (1), we get:
$$
w_1 (x_{1m} - x_{1n}) + w_2 (x_{2m} - x_{2n}) = 2
$$
In vector form:
$$
\vec{w} \cdot (\vec{x_m} - \vec{x_n}) = 2 \quad \text{(3)}
$$
Consider two points $\vec{x_0}$ and $\vec{x_p}$ on the decision hyperplane, which satisfy the decision hyperplane equation $\vec{w} \cdot \vec{x} + b = 0$, i.e.:
$$
w_1 x_{10} + w_2 x_{20} + b = 0
$$
$$
w_1 x_{1p} + w_2 x_{2p} + b = 0
$$
Subtracting these two equations gives:
$$
w_1 (x_{10} - x_{1p}) + w_2 (x_{20} - x_{2p}) = 0
$$
In vector form:
$$
\vec{w} \cdot (\vec{x_0} - \vec{x_p}) = 0 \quad \text{(4)}
$$
Equation (4) indicates that $\vec{w}$ is perpendicular to the vector difference between any two points on the decision hyperplane.</p>
<p>From equations (3) and (4), we know that the dot product of $\vec{w}$ and $(\vec{x_m} - \vec{x_n})$ is 2. According to the definition of vector dot product $\vec{a} \cdot \vec{b}=|\vec{a}| \cdot |\vec{b}| \cdot \cos \theta$, where $\theta$ is the angle between $\vec{w}$ and $(\vec{x_m} - \vec{x_n})$, we have:
$$
|\vec{x_m} - \vec{x_n}| \cdot \cos \theta \cdot |\vec{w}| = 2
$$
Let $L = |\vec{x_m} - \vec{x_n}| \cdot \cos \theta$, then:
$$
L \cdot |\vec{w}| = 2
$$
Solving for $L$ gives:
$$
L=\frac{2}{|\vec{w}|}
$$</p>
<p>Here, $L$ is the margin distance of the SVM.</p>
<p>In deriving the margin distance, we utilized the geometric meaning of the vector dot product, i.e., $\vec{a} \cdot \vec{b}=|\vec{a}| \cdot |\vec{b}| \cdot \cos \theta$, where $\theta$ is the angle between the two vectors. Through this relationship, we transformed the dot product into a relationship involving vector magnitudes and angles, thus deriving the expression for the margin distance.</p>
<h2 id="dual-equivalence-proof" class="heading-element"><span>Dual Equivalence Proof</span>
  <a href="#dual-equivalence-proof" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>In linear Support Vector Machine (SVM), the primal problem is to find the weight vector $w$ and bias $b$ that minimize the objective function:</p>
<p>$$
\min_w f(w) = \frac{1}{2} |w|^2
$$</p>
<p>Here, $|w|^2$ represents the Euclidean norm squared of the vector $w$, i.e., the $L_2$ norm. The goal is to minimize the width of the decision boundary to achieve better generalization ability. This problem is subject to the following constraints:</p>
<p>$$
y_j (w^T x_j + b) - 1 \geq 0
$$</p>
<p>Here, $x_j$ is the $j$-th training sample, and $y_j$ is the corresponding label, taking values of +1 or -1. This ensures that all data points are correctly classified and are at least one unit distance from the decision boundary.</p>
<p>To handle this constrained optimization problem, we construct the Lagrangian function:</p>
<p>$$
L(w, b, \alpha) = f(w) - \sum_{j = 1}^n \alpha_j g_j(w, b)
$$</p>
<p>Here, $\alpha_j \geq 0$ are the Lagrange multipliers used to introduce the constraint conditions of the primal problem $g_j(w, b) = y_j (w^T x_j + b) - 1 \geq 0$.</p>
<p>Next, we define the dual function $q(\alpha)$ as:</p>
<p>$$
q(\alpha) = \min_{w, b} L(w, b, \alpha) = \min_{w, b} \left( f(w) - \sum_{j = 1}^n \alpha_j g_j(w, b) \right)
$$</p>
<p>Since $\alpha_j \geq 0$ and $g_j(w^{*}, b^{*}) \geq 0$, we can derive:</p>
<p>$$
q(\alpha) = \min_{w, b} \left( f(w) - \sum_{j = 1}^n \alpha_j g_j(w, b) \right) \leq f(w^*) - \sum_{j = 1}^n \alpha_j g_j(w^*, b^*) \leq f(w^*) \leq f(w)
$$</p>
<p>This means the dual function provides a lower bound for the primal problem. Next, we need to find an $\alpha^*$ such that:</p>
<p>$$
q(\alpha) \leq q(\alpha^*) \leq f(w^*) \leq f(w)
$$</p>
<p>The primal and dual problems of SVM can be expressed as:</p>
<p>$$
\max_{\alpha} q(\alpha) = \max_{\alpha} \min_{w, b} L(w, b, \alpha)
$$</p>
<p>With the constraint: $ \alpha_i \geq 0 $</p>
<p>And when weak duality holds, we have $q(\alpha^*) \leq f(w^*)$; when strong duality holds, i.e., Slater&rsquo;s condition is satisfied, we have $q(\alpha^*) = f(w^*)$. Slater&rsquo;s condition requires the existence of a feasible solution such that all inequality constraints are strictly satisfied, and linear SVM is linearly separable, which automatically satisfies Slater&rsquo;s condition.</p>
<p>Thus, we have:</p>
<p>$$
f(w) \geq q(\alpha^*) = f(w^*) \geq q(\alpha_i)
$$</p>
<p>From the above equation, we can deduce:</p>
<p>$$
q(\alpha^*) \geq q(\alpha_i)
$$
$$
f(w^*) \leq f(w)
$$</p>
<p>$f(w)$ finds the minimum value (primal problem), $q(\alpha)$ finds the maximum value (dual problem), and the optimal solutions of the primal and dual problems are equal, i.e.:</p>
<p>$ w^*, b^* $ are the solutions to the primal problem, $\alpha^*$ is the solution to the dual problem, and $f(w^*) = q(\alpha^*)$.</p>
<p>We can see that in linear SVM, when specific conditions (Slater&rsquo;s condition) are met, the solutions to the primal and dual problems are consistent. This provides an effective way to solve complex optimization problems, especially <strong>when the primal problem is difficult to solve directly, the dual problem can be solved indirectly</strong>.</p>
<h3 id="simple-example" class="heading-element"><span>Simple Example</span>
  <a href="#simple-example" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p>To more intuitively understand the equivalence of the solutions to the primal and dual problems mentioned above, consider a simple optimization problem defined as follows:</p>
<p>The primal problem is:
$$
\min_x f(x) = x^2
$$
With the constraint:
$$
x - 1 \geq 0
$$</p>
<p>The goal of this problem is to minimize the function $f(x) = x^2$, while $x$ needs to satisfy $x \geq 1$. Intuitively, we know that when $x = 1$, $f(x) = 1$, which is the minimum value under the given constraint.</p>
<p>To verify duality, we construct the Lagrangian function:</p>
<p>$$
q(\alpha) = \min_x L(x, \alpha) = \min_x (x^2 - \alpha(x - 1))
$$</p>
<p>Here, $\alpha \geq 0$ is the Lagrange multiplier used to introduce the constraint condition $x - 1 \geq 0$ from the primal problem. By constructing the Lagrangian function, we convert the constrained optimization problem into an unconstrained problem.</p>
<p>Next, we take the partial derivative of $L(x, \alpha)$ with respect to $x$ and set it to zero:</p>
<p>$$
\frac{\partial L}{\partial x} = 0
2x - \alpha = 0
$$</p>
<p>Solving for $x$ gives:</p>
<p>$$
x = \frac{\alpha}{2}
$$</p>
<p>Substituting $x = \frac{\alpha}{2}$ into $q(\alpha)$:</p>
<p>$$
q(\alpha) = - \frac{\alpha^2}{4} + \alpha
$$</p>
<p>Now we have obtained the form of the dual function $q(\alpha)$. Next, we need to solve for the maximum value of the dual problem $\max_{\alpha} q(\alpha) $</p>
<p>To do this, we take the derivative of $q(\alpha)$ with respect to $\alpha$ and set it to zero:</p>
<p>$$
\frac{dq}{d\alpha} = - \frac{\alpha}{2} + 1 = 0
$$</p>
<p>Solving for $\alpha$ gives $$ \alpha = 2 $$</p>
<p>Substituting $\alpha = 2$ into $x = \frac{\alpha}{2}$, we get: $$ x = 1 $$</p>
<p>At this point, substituting $\alpha = 2$ into $q(\alpha)$, we calculate:</p>
<p>$$
q(\alpha) = - \frac{2^2}{4} + 2 = 1
$$</p>
<p>Through this simple example, we can see that the solution to the primal problem $x = 1$, $f(x) = 1$, is equivalent to the solution to the dual problem $\alpha = 2$, $q(\alpha) = 1$. This verifies that under certain conditions, the solutions to the dual problem and the primal problem are consistent.</p>
<p>By applying dual theory, we not only found the solution to the primal problem but also obtained the same result by solving the dual problem, thus verifying the equivalence of the solutions to the dual problem.</p>
<h2 id="solving-with-kkt-conditions" class="heading-element"><span>Solving with KKT Conditions</span>
  <a href="#solving-with-kkt-conditions" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><h3 id="svm-satisfying-kkt-conditions" class="heading-element"><span>SVM Satisfying KKT Conditions</span>
  <a href="#svm-satisfying-kkt-conditions" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p>The original optimization problem of SVM is a convex optimization problem. The objective function of SVM $\frac{1}{2}|w|^2$ is a quadratic function, which is a <strong>convex function</strong> with respect to $w$. At the same time, the constraint condition $y_i(w \cdot x_i + b) \geq 1$ is linear (<strong>affine constraint</strong>), and therefore also convex. In convex optimization problems, a local optimal solution is a global optimal solution, and the KKT conditions are necessary and sufficient conditions. This means that if a point satisfies the KKT conditions, it is a global optimal solution.</p>
<p>The objective function $\frac{1}{2}|w|^2$ is continuous and differentiable, and the constraint condition $y_i(w \cdot x_i + b) \geq 1$ is also continuous and differentiable. This smoothness ensures the existence and uniqueness of gradients, allowing the gradient conditions in the KKT conditions (i.e., taking partial derivatives with respect to $w$ and $b$ and setting them to zero) to be effectively applied.</p>
<p>In convex optimization problems, the KKT conditions are not only necessary conditions but also sufficient conditions. That is, if a point satisfies the KKT conditions, it must be a global optimal solution. For SVM, by solving the KKT conditions, we can find the optimal $w^*$ and $b^*$, thereby determining the best separating hyperplane.</p>
<h3 id="solving-linear-support-vector-machine-using-kkt-conditions" class="heading-element"><span>Solving Linear Support Vector Machine Using KKT Conditions</span>
  <a href="#solving-linear-support-vector-machine-using-kkt-conditions" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p>The original SVM optimization problem is to minimize $\frac{1}{2}|w|^{2}$ while satisfying the constraint $y_{i}(w\cdot x_{i}+b)\geqslant1$, where $i = 1,2,\cdots,N$.</p>
<p>First, construct the Lagrangian function $L(w,b,\alpha)=\frac{1}{2}|w|^{2}-\sum_{i = 1}^{N}\alpha_{i}(y_{i}(w\cdot x_{i}+b)-1)$, where $\alpha_{i}\geqslant0$ are the Lagrange multipliers. According to the KKT conditions, we have:</p>
<p>$$
\nabla_{w}L(w^*,b^*,\alpha^*) = w^*-\sum_{i = 1}^{N}\alpha_{i}^*y_{i}x_{i}=0
$$</p>
<p>$$
\nabla_{b}L(w^*,b^*,\alpha^*)=-\sum_{i = 1}^{N}\alpha_{i}^*y_{i}=0
$$</p>
<p>$$
\alpha_{i}^*(y_{i}(w^*\cdot x_{i}+b^*)-1)=0
$$</p>
<p>$$
y_{i}(w^*\cdot x_{i}+b^*)-1\geqslant0
$$</p>
<p>$$
\alpha_{i}^*\geqslant0
$$</p>
<p>These conditions apply to all $i = 1,2,\cdots,N$.</p>
<p>From $\nabla_{w}L(w^*,b^*,\alpha^*) = w^*-\sum_{i = 1}^{N}\alpha_{i}^*y_{i}x_{i}=0$, we can derive</p>
<p>$$
w^*=\sum_{i = 1}^{N}\alpha_{i}^*y_{i}x_{i} \quad \text{(5)}
$$
Since at least one $\alpha_{j}^*&gt;0$ exists (assuming $\alpha_{i}^*=0$ would lead to a contradiction given the solution from $\nabla_{w}L(w^*,b^*,\alpha^*) = w^*-\sum_{i = 1}^{N}\alpha_{i}^*y_{i}x_{i}=0$).</p>
<p>For solving $b^*$, by substituting $w^*=\sum_{i = 1}^{N}\alpha_{i}^*y_{i}x_{i}$ into $y_{j}(w^*\cdot x_{j}+b^*)-1 = 0$ (considering the case where $\alpha_{j}^*&gt;0$ exists), and noting that $y_{j}^{2}=1$, we obtain:</p>
<p>$$
b^*=y_{j}-\sum_{i = 1}^{N}\alpha_{i}^*y_{i}(x_{i}\cdot x_{j}) \quad \text{(6)}
$$</p>
<p>Based on the above theory, the separating hyperplane can be expressed as:</p>
<p>$$
\sum_{i = 1}^{N}\alpha_{i}^*y_{i}(x\cdot x_{i})+b^*=0
$$</p>
<p>Thus, the classification decision function can be written as:</p>
<p>$$
f(x)=\text{sign}(\sum_{i = 1}^{N}\alpha_{i}^*y_{i}(x\cdot x_{i})+b^*)
$$</p>
<p>In SVM, the complementary slackness condition $\alpha_i (y_i(w \cdot x_i + b) - 1) = 0$ indicates that if a sample point $x_i$ is not a support vector (i.e., $y_i(w \cdot x_i + b) &gt; 1$), then the corresponding Lagrange multiplier $\alpha_i$ must be zero. Conversely, if a sample point is a support vector (i.e., $y_i(w \cdot x_i + b) = 1$), then the corresponding $\alpha_i$ can be non-zero. This condition ensures that only support vectors contribute to the solution of the optimization problem, simplifying the problem-solving process.</p>
<h2 id="polynomial-and-gaussian-kernel-functions" class="heading-element"><span>Polynomial and Gaussian Kernel Functions</span>
  <a href="#polynomial-and-gaussian-kernel-functions" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>If the existing problem is not linearly separable, we can map the existing data to a higher-dimensional space, making it a linearly separable problem in that space. However, directly performing calculations in a high-dimensional feature space can be very complex. From equations (5) and (6), we know that we do not need to actually map the data to a high-dimensional space; we only need to know the inner product between data points. The role of the kernel function is to avoid explicitly performing high-dimensional feature mapping by indirectly calculating the inner product in the high-dimensional feature space through the kernel function value in the original feature space.</p>
<p>The Gaussian kernel function is a common kernel function, with the form:
$$
K(x, y) = \exp\left(-\gamma |x - y|^2\right)
$$</p>
<p>where $\gamma$ is a positive parameter that controls the width of the kernel function.</p>
<p>We can perform a Taylor expansion on the exponential function:</p>
<p>$$
\exp(z) = \sum_{k=0}^{\infty} \frac{z^k}{k!}
$$</p>
<p>Substituting $ z = -\gamma |x - y|^2 $ into the above formula, we get:</p>
<p>$$
K(x, y) = \exp\left(-\gamma |x - y|^2\right) = \sum_{k=0}^{\infty} \frac{(-\gamma |x - y|^2)^k}{k!}
$$</p>
<p>The polynomial kernel function has the form:</p>
<p>$$
K_{\text{poly}}(x, y) = (x \cdot y + c)^d
$$</p>
<p>where $ c $ is a constant term, and $ d $ is the degree of the polynomial.</p>
<p>$|x - y|^2$ can be expanded as:</p>
<p>$$
|x - y|^2 = (x - y) \cdot (x - y) = x \cdot x + y \cdot y - 2 x \cdot y
$$</p>
<p>Substituting this expression into the Taylor expansion of the Gaussian kernel function:</p>
<p>$$
K(x, y) = \sum_{k=0}^{\infty} \frac{(-\gamma (x \cdot x + y \cdot y - 2 x \cdot y))^k}{k!}
$$</p>
<p>We can see that each term $ \frac{(-\gamma (x \cdot x + y \cdot y - 2 x \cdot y))^k}{k!} $ is essentially a polynomial term, meaning each term can be expressed as a combination of different powers of $ x $ and $ y $.</p>
<p>If we closely observe each term, we can find that the Gaussian kernel function is actually obtained by harmonizing different orders of polynomial kernel functions. Each term $ \frac{(-\gamma (x \cdot x + y \cdot y - 2 x \cdot y))^k}{k!} $ can be regarded as a weighted form of a $ k $-order polynomial kernel function.</p>
<p>For example, when $ k = 1 $:</p>
<p>$$
\frac{(-\gamma (x \cdot x + y \cdot y - 2 x \cdot y))^1}{1!} = -\gamma (x \cdot x + y \cdot y - 2 x \cdot y)
$$</p>
<p>When $ k = 2 $:</p>
<p>$$
\frac{(-\gamma (x \cdot x + y \cdot y - 2 x \cdot y))^2}{2!} = \frac{\gamma^2 (x \cdot x + y \cdot y - 2 x \cdot y)^2}{2}
$$</p>
<p>These terms are polynomial forms of $ x $ and $ y $, and are weighted by the factorial $ k! $.</p>
<p>The Gaussian kernel function can be viewed as being harmonized through different orders of polynomial kernel functions in infinite dimensions. This harmonization allows the Gaussian kernel function to capture more complex nonlinear relationships in high-dimensional feature spaces. Therefore, in many nonlinear task scenarios, the Gaussian kernel function is a good choice.</p>]]></description>
    </item>
    <item>
      <title>VLDB 2024 Guangzhou Trip</title>
      <link>https://www.zair.top/en/post/trip-to-guangzhou-during-vldb2024/</link>
      <pubDate>Mon, 23 Sep 2024 11:02:06 +0800</pubDate><author>blog@zair.top (Tim)</author>
      <guid>https://www.zair.top/en/post/trip-to-guangzhou-during-vldb2024/</guid>
      <category domain="https://www.zair.top/en/categories/experience/">Experience</category>
      <description><![CDATA[<blockquote>
<p>VLDB 2024 is a top international conference in the field of data management and databases, covering areas such as data management, database architecture, graph data management, data privacy and security, data mining, machine learning, artificial intelligence, and database systems research. This year&rsquo;s conference was held from August 26 to 30 in Guangzhou, China, bringing together researchers and enterprises from around the world. The conference featured a rich content with over 250 research presentations, 11 keynote speeches, multiple tutorials, and workshops, providing attendees with extensive learning and networking opportunities.</p>
</blockquote>
<p>Recently, I went to Guangzhou to attend VLDB 2024 and experienced the full academic conference process. I encountered many interesting things during the trip and thought of writing a blog to share.
<div class="details admonition note open">
  <div class="details-summary admonition-title"><i class="icon fa-fw fa-solid fa-pencil-alt" aria-hidden="true"></i>Note<i class="details-icon fa-solid fa-angle-right fa-fw" aria-hidden="true"></i></div>
  <div class="details-content">
    <div class="admonition-content">This article is a <strong>tourism-oriented share</strong>, including insights from Guangzhou and an introduction to the academic conference process, without specific report content and direction, feel free to enjoy.</div>
  </div>
</div></p>
<p>The trip lasted 6 days, with the itinerary as follows. The first 5 days were spent near the Pazhou Island Canton Fair Complex, mainly attending the conference and the Pearl River night cruise; the last day was a special forces-style tour, starting from Shamian Island on the west side, passing through the Thirteen Hongs Museum, walking to the Sacred Heart Cathedral, and the east side was the Huangpu Military Academy Memorial.
<div class="mapbox" data-options="{&#34;darkStyle&#34;:&#34;mapbox://styles/mapbox/dark-v10?optimize=true&#34;,&#34;fullscreen&#34;:true,&#34;geolocate&#34;:true,&#34;lat&#34;:23.10001,&#34;lightStyle&#34;:&#34;mapbox://styles/mapbox/streets-zh-v1&#34;,&#34;lng&#34;:113.35466,&#34;marked&#34;:true,&#34;markers&#34;:&#34;[{\&#34;lng\&#34;: 113.23979, \&#34;lat\&#34;: 23.10946, \&#34;description\&#34;: \&#34;Shamian Island\&#34;},{\&#34;lng\&#34;: 113.24564, \&#34;lat\&#34;: 23.11157, \&#34;description\&#34;: \&#34;Thirteen Hongs Museum\&#34;},{\&#34;lng\&#34;: 113.254735,\&#34;lat\&#34;: 23.11735, \&#34;description\&#34;: \&#34;Sacred Heart Cathedral\&#34;},{\&#34;lng\&#34;: 113.317919,\&#34;lat\&#34;: 23.111565, \&#34;description\&#34;: \&#34;Pearl River Night Cruise\&#34;},{\&#34;lng\&#34;: 113.419044,\&#34;lat\&#34;: 23.08878, \&#34;description\&#34;: \&#34;Huangpu Military Academy\&#34;}]&#34;,&#34;navigation&#34;:true,&#34;scale&#34;:true,&#34;zoom&#34;:11}" style="width: 100%; height: 20rem;"></div></p>
<h2 id="day-0-a-little-guangzhou-shock-for-the-socialites" class="heading-element"><span>Day 0: A Little Guangzhou Shock for the Socialites</span>
  <a href="#day-0-a-little-guangzhou-shock-for-the-socialites" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>At 3 PM, I took the subway from Guangzhou South Station to Pazhou Station. Before even exiting the station, I realized something was off as the subway was packed with cosplayers. Coincidentally, the mischievous map chose the exit at the Poly World Trade Center, which turned out to be the nest of the second-dimensional creatures: from the subway exit to the underground of the venue, from the Poly World Trade Center to the Canton Fair Complex, Pazhou Island was crowded with them! Not only were there many participants, but the variety of cosplay characters was also rich, ranging from domestic games to Japanese anime, from Laoda to Trump assassination&hellip; The parking lot was also filled with itasha cars, almost none of which were electric cars, probably because the owners were also deep into the second dimension.</p>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024/09/c156b0a801e9663247ddd67b9a9ebc95.webp" alt="Itasha cars in the parking lot" srcset="https://cdn1.zair.top/images/2024/09/c156b0a801e9663247ddd67b9a9ebc95.webp?size=small, https://cdn1.zair.top/images/2024/09/c156b0a801e9663247ddd67b9a9ebc95.webp?size=medium 1.5x, https://cdn1.zair.top/images/2024/09/c156b0a801e9663247ddd67b9a9ebc95.webp?size=large 2x" data-title="Itasha cars in the parking lot" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p>After finally breaking through the crowd to the hotel, I found out at 9 PM when I went to eat that the comic convention hadn&rsquo;t ended yet, and they even set up a night market along the riverbank—it&rsquo;s Monday the next day! The second-dimensional atmosphere in Guangzhou is just too strong.</p>
<h2 id="day-1-academic-locusts-and-banquet-blunders" class="heading-element"><span>Day 1: Academic Locusts and Banquet Blunders</span>
  <a href="#day-1-academic-locusts-and-banquet-blunders" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>VLDB2024 was hosted by the Hong Kong University of Science and Technology (Guangzhou) at the Langham Hotel in Guangzhou. Registration on the first day started at 8 AM, and being too excited for my first academic conference, I arrived early. During registration, they handed out badges, conference manuals, schedules, meal tickets, boat tickets, a box of souvenirs, and a bunch of sponsor advertisements. The souvenirs included a hardcover notebook, a ballpoint pen, and an umbrella, available in black and blue themes.</p>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024/09/7bb7269f28c393c00fe6ec24e2296522.webp" alt="Registration site" srcset="https://cdn1.zair.top/images/2024/09/7bb7269f28c393c00fe6ec24e2296522.webp?size=small, https://cdn1.zair.top/images/2024/09/7bb7269f28c393c00fe6ec24e2296522.webp?size=medium 1.5x, https://cdn1.zair.top/images/2024/09/7bb7269f28c393c00fe6ec24e2296522.webp?size=large 2x" data-title="Registration site" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p>The first day&rsquo;s conference content was entirely organized in the form of workshops. There were workshops on different themes running simultaneously, so it was necessary to choose the themes of interest in advance. I chose themes related to LLM, Knowledge Graph, and Vector Database. Understanding the presentation content was not difficult, but during the discussion session, a few guys got really into it, and their speech speed gradually increased, making it hard to follow. My English listening and speaking skills still need practice.</p>
<div class="details admonition info">
  <div class="details-summary admonition-title"><i class="icon fa-fw fa-solid fa-circle-info" aria-hidden="true"></i>What are the sessions at an academic conference?<i class="details-icon fa-solid fa-angle-right fa-fw" aria-hidden="true"></i></div>
  <div class="details-content">
    <div class="admonition-content"><h3 id="1-opening-ceremony" class="heading-element"><span>1. <strong>Opening Ceremony</strong></span>
  <a href="#1-opening-ceremony" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><ul>
<li>The formal start of the conference.</li>
<li>Speeches by the organizers or heads of the hosting institutions, introducing the conference theme, agenda, and important attendees.</li>
<li>Sometimes includes special welcome ceremonies or cultural displays.</li>
</ul>
<h3 id="2-keynote-speech" class="heading-element"><span>2. <strong>Keynote Speech</strong></span>
  <a href="#2-keynote-speech" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><ul>
<li>Delivered by a renowned scholar or expert in a particular field, usually centered around the conference theme.</li>
<li>Keynote speeches are often inspiring, providing forward-looking or summarizing perspectives for attendees.</li>
<li>It is a highly anticipated session, generally lasting 45 minutes to an hour.</li>
</ul>
<h3 id="3-parallel-sessions" class="heading-element"><span>3. <strong>Parallel Sessions</strong></span>
  <a href="#3-parallel-sessions" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><ul>
<li>Attendees are grouped by research direction or theme, held simultaneously in multiple venues.</li>
<li>Each group has multiple presentations where researchers showcase their research findings.</li>
<li>Each presentation is followed by a brief Q&amp;A session for audience questions and discussions.</li>
</ul>
<h3 id="4-poster-session" class="heading-element"><span>4. <strong>Poster Session</strong></span>
  <a href="#4-poster-session" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><ul>
<li>Researchers display their research findings through posters, usually held in the exhibition area or during breaks.</li>
<li>Attendees can freely visit various poster booths during designated times, interact with researchers, and have discussions.</li>
<li>Poster sessions are suitable for showcasing novel or immature research projects.</li>
</ul>
<h3 id="5-panel-discussion" class="heading-element"><span>5. <strong>Panel Discussion</strong></span>
  <a href="#5-panel-discussion" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><ul>
<li>A discussion group composed of multiple experts, engaging in dialogue on a specific topic.</li>
<li>Experts first present their views, followed by guided interaction and discussion by a moderator, and finally open to audience questions.</li>
<li>Panel discussions help explore research issues or hot topics in an academic field from multiple perspectives.</li>
</ul>
<h3 id="6-roundtable-discussion" class="heading-element"><span>6. <strong>Roundtable Discussion</strong></span>
  <a href="#6-roundtable-discussion" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><ul>
<li>Participants engage in open discussion around a specific theme in a roundtable format.</li>
<li>Roundtable discussions emphasize interactivity, allowing participants to speak, ask questions, and share views at any time.</li>
</ul>
<h3 id="7-workshop" class="heading-element"><span>7. <strong>Workshop</strong></span>
  <a href="#7-workshop" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><ul>
<li>Focuses on hands-on and interactive learning, where participants learn new skills, methods, or tools at the conference.</li>
<li>Workshops usually require pre-registration and sometimes require participants to bring their own data or materials.</li>
</ul>
<h3 id="8-networking-sessions" class="heading-element"><span>8. <strong>Networking Sessions</strong></span>
  <a href="#8-networking-sessions" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><ul>
<li>Social sessions within the conference aimed at providing attendees with a relaxed environment for informal exchanges and building collaborations.</li>
<li>Networking activities may include dinners, tea breaks, lunches, and receptions.</li>
</ul>
<h3 id="9-closing-ceremony" class="heading-element"><span>9. <strong>Closing Ceremony</strong></span>
  <a href="#9-closing-ceremony" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><ul>
<li>The summary and closing session of the conference.</li>
<li>Organizers review the main content and outcomes of the conference, possibly recognizing outstanding presentations, papers, or research achievements.</li>
<li>Usually announces the time and location of the next conference.</li>
</ul>
<h3 id="10-qa-sessions" class="heading-element"><span>10. <strong>Q&amp;A Sessions</strong></span>
  <a href="#10-qa-sessions" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><ul>
<li>Follow each presentation or panel discussion, allowing the audience to ask questions.</li>
<li>Researchers or speakers provide detailed answers to audience questions, promoting two-way communication.</li>
</ul>
<h3 id="11-breakout-sessions" class="heading-element"><span>11. <strong>Breakout Sessions</strong></span>
  <a href="#11-breakout-sessions" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><ul>
<li>In large conferences, attendees are often divided into several groups to discuss specific subtopics or research directions.</li>
<li>Each group focuses on a specific theme, and may report discussion outcomes or suggestions.</li>
</ul>
<h3 id="12-exhibition-booths" class="heading-element"><span>12. <strong>Exhibition Booths</strong></span>
  <a href="#12-exhibition-booths" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><ul>
<li>In some academic conferences, especially in technical or applied fields, there are usually exhibition areas.</li>
<li>Related institutions, publishers, and companies showcase their products, technologies, services, or publications, and attendees can freely visit.</li>
</ul>
<h3 id="13-awards-and-recognition" class="heading-element"><span>13. <strong>Awards and Recognition</strong></span>
  <a href="#13-awards-and-recognition" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><ul>
<li>Some conferences have award sessions for papers, presentations, or research achievements.</li>
<li>Outstanding researchers, papers, poster displays, etc., may be recognized during the closing ceremony or a dedicated award ceremony.</li>
</ul>
<h3 id="14-coffee-breaks-and-lunches" class="heading-element"><span>14. <strong>Coffee Breaks and Lunches</strong></span>
  <a href="#14-coffee-breaks-and-lunches" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><ul>
<li>Break times during the conference, offering refreshments or lunch.</li>
<li>Coffee breaks and lunch times are also important opportunities for attendees to engage in informal exchanges and socializing.</li>
</ul>
</div>
  </div>
</div>
<p>Tea breaks were at 10 AM and 3 PM, and the 8 AM breakfast was also in the format of a tea break. For the big shots, tea breaks are a good time for socializing, discussing research content, or exchanging contact information. But for someone like me, an academic locust, tea breaks are just tea breaks. Although the food at each tea break varied a bit, it generally consisted of cookies, bread, cakes, fruits, and drinks. Among them, the mousse cake and Thai fish cake with sweet chili sauce were my favorites.</p>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024/09/2f666a1d1847796883f504015934c0fb.webp" alt="Tea break" srcset="https://cdn1.zair.top/images/2024/09/2f666a1d1847796883f504015934c0fb.webp?size=small, https://cdn1.zair.top/images/2024/09/2f666a1d1847796883f504015934c0fb.webp?size=medium 1.5x, https://cdn1.zair.top/images/2024/09/2f666a1d1847796883f504015934c0fb.webp?size=large 2x" data-title="Tea break" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p>The evening welcome banquet was also very sumptuous, served buffet-style. However, there were so many people in line that I didn&rsquo;t pay attention to the dishes beforehand and ended up scooping two spoons of rice, missing out on many good dishes. The dishes included about 30% Chinese cuisine, with the rest being Japanese (sushi and sashimi), curry, vegetable salad, Western desserts, fruits, and drinks (red wine and juice), catering to different dietary preferences. However, the Chinese dishes were quite average, leading me to end up with carbs on carbs—curry rice and small cakes. Alas, I must say goodbye to the small cakes.</p>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024/09/7b8be626151b7f432153424206d219e1.webp" alt="Banquet scene" srcset="https://cdn1.zair.top/images/2024/09/7b8be626151b7f432153424206d219e1.webp?size=small, https://cdn1.zair.top/images/2024/09/7b8be626151b7f432153424206d219e1.webp?size=medium 1.5x, https://cdn1.zair.top/images/2024/09/7b8be626151b7f432153424206d219e1.webp?size=large 2x" data-title="Banquet scene" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p>During the banquet, there was a mix-up. I attended the conference with another classmate, and neither of us had met our advisor in person. After selecting our dishes and randomly finding a table to sit at, we started listening to what others at the table were discussing. Then we had the following conversation: &ldquo;Don&rsquo;t you think the person across looks like our advisor?&rdquo; &ldquo;No way, such a coincidence?&rdquo; &ldquo;The research direction matches, they just said&hellip;&rdquo; &ldquo;The age doesn&rsquo;t match&hellip;&rdquo; &ldquo;The number of people matches&hellip;&rdquo; &ldquo;The height matches too&hellip;&rdquo; &ldquo;It&rsquo;s too similar&hellip;&rdquo; &ldquo;It&rsquo;s exactly like&hellip;&rdquo; &ldquo;What should we do, should we acknowledge now?&rdquo; &ldquo;Let&rsquo;s eat first.&rdquo; So we nervously finished our meal, went up to acknowledge, and found out they were not our advisor. They were PhD students from Tianjin University and East China Normal University, with similar research directions. After introducing ourselves, we became friends. Later, we found out their paper was nominated for Best Research Paper, and we were really envious.</p>
<h2 id="day-2-lion-dance-opening" class="heading-element"><span>Day 2: Lion Dance, Opening!</span>
  <a href="#day-2-lion-dance-opening" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>The most impressive part of the opening ceremony was the Guangzhou Sandpit Lion Dance performance. At the beginning of the performance, the dancers manipulated the lion to make various realistic movements, imitating the lion&rsquo;s living habits, such as shaking its head, scratching, rolling, etc. The most distinctive feature was the lion&rsquo;s eyes that could move (this was my basis for judging it as a sandpit lion). Then the lion started performing high-difficulty jumps and stepping on piles on stage, showcasing the dancers&rsquo; precise skills and strong physique. Finally, the leaders went on stage to dot the eyes, and the lion held out congratulatory couplets. The entire performance, accompanied by the rhythm of gongs and drums, made the lion sometimes mighty, sometimes agile, with both a sense of power and fun.</p>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024/09/948640fe6fc14f716c337844f4759641.webp" alt="Lion dance performance at the opening ceremony" srcset="https://cdn1.zair.top/images/2024/09/948640fe6fc14f716c337844f4759641.webp?size=small, https://cdn1.zair.top/images/2024/09/948640fe6fc14f716c337844f4759641.webp?size=medium 1.5x, https://cdn1.zair.top/images/2024/09/948640fe6fc14f716c337844f4759641.webp?size=large 2x" data-title="Lion dance performance at the opening ceremony" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p>This was my first time seeing a real lion dance performance, and it was the most distinctive Guangdong lion dance. Although many games have scenes based on the Guangdong lion dance, especially the iconic pile-stepping action, these scenes often focus on rendering the performance setting, lacking the fun brought by the lion&rsquo;s imitating actions, and the sense of power is not as strong as the live performance.</p>
<p>After the lion dance performance, the conference chair, Professor Chen Lei from HKUST, introduced some general information about the conference: this conference had the highest number of attendees in VLDB&rsquo;s history, with 60% of attendees from mainland China, it was the richest conference (half a million dollars), AI-related papers were the most, and papers on database underlying design were the least.</p>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024/09/7863a1513eefcffea29210ab0fa95f58.webp" alt="Parallel sessions" srcset="https://cdn1.zair.top/images/2024/09/7863a1513eefcffea29210ab0fa95f58.webp?size=small, https://cdn1.zair.top/images/2024/09/7863a1513eefcffea29210ab0fa95f58.webp?size=medium 1.5x, https://cdn1.zair.top/images/2024/09/7863a1513eefcffea29210ab0fa95f58.webp?size=large 2x" data-title="Parallel sessions" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p>After the opening ceremony, a full day of parallel sessions officially began. Parallel sessions also require selecting themes of interest in advance, and it&rsquo;s best to read the papers of interest beforehand and even prepare questions to ask. Dinner was as sumptuous as the previous day, with Italian vegetable soup and lasagna being very good! (The only) This day also had skewers, with two types: chicken wings and gluten. After dinner, I walked around and found that at any table with Chinese people, there was no one who didn&rsquo;t take skewers, and some even had plain rice with skewers, indicating that everyone still had a Chinese palate. <del>Code farmer barbecue, great potential.</del></p>
<h2 id="day-3-from-cantonese-cuisine-pinnacle-to-deck" class="heading-element"><span>Day 3: From Cantonese Cuisine Pinnacle to Deck</span>
  <a href="#day-3-from-cantonese-cuisine-pinnacle-to-deck" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>The morning started with a keynote from Google, focusing on the impact of large models on database research. This was followed by parallel sessions similar to the previous day.</p>
<h3 id="exquisite-cantonese-cuisine" class="heading-element"><span>Exquisite Cantonese Cuisine</span>
  <a href="#exquisite-cantonese-cuisine" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p><img loading="lazy" src="https://cdn1.zair.top/images/2024/09/6b721b32a1014f4cccd4955ff0a9ef92.webp" alt="Entering Bingsheng" srcset="https://cdn1.zair.top/images/2024/09/6b721b32a1014f4cccd4955ff0a9ef92.webp?size=small, https://cdn1.zair.top/images/2024/09/6b721b32a1014f4cccd4955ff0a9ef92.webp?size=medium 1.5x, https://cdn1.zair.top/images/2024/09/6b721b32a1014f4cccd4955ff0a9ef92.webp?size=large 2x" data-title="Entering Bingsheng" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p>The evening banquet was held at <a href="http://www.bingsheng.com/intro/11.html"target="_blank" rel="external nofollow noopener noreferrer">Bingsheng Taste Haiyin Main Store<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a>. On the way, I met two students studying in the United States, who commented that when Chinese people organize conferences, they must give enough face. This was not false, as I later learned that this restaurant is both Michelin-rated and known as the pinnacle of Cantonese cuisine. One of them, living in the West Coast, described their life as being like living in a village, with nothing to eat or play; the other, living in the East Coast, also mentioned poor dietary conditions, allowing them to maintain different sports activities daily. However, when discussing the purpose of pursuing a graduate degree, one of them mentioned it was to maintain a sense of insight and lifelong learning mindset, which I couldn&rsquo;t agree more with.</p>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024/09/d5c6b79be52e29d398dd4140b2795d1b.webp" alt="Dining at Bingsheng" srcset="https://cdn1.zair.top/images/2024/09/d5c6b79be52e29d398dd4140b2795d1b.webp?size=small, https://cdn1.zair.top/images/2024/09/d5c6b79be52e29d398dd4140b2795d1b.webp?size=medium 1.5x, https://cdn1.zair.top/images/2024/09/d5c6b79be52e29d398dd4140b2795d1b.webp?size=large 2x" data-title="Dining at Bingsheng" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p>The dinner dishes were indeed very rich. Since it was inconvenient to use a phone at the table, I can only recall a few impressive dishes from memory: American ginseng pork rib soup, whole roasted suckling pig (but with two red light bulbs stuffed in its eyes, as an outsider I would call it a fel energy roasted suckling pig), garlic steamed Boston lobster, sea cucumber, fish, chicken and duck, sweet and sour pork, sweet water, tomatoes and honeydew melon, Sachima, and Pu&rsquo;er tea. Overall, it was good, with the roast suckling pig being crispy and tender, and the sweet and sour pork being tangy and delicious, with ingredients selected having a strong Lingnan characteristic. However, some of the distinctive dishes in Cantonese cuisine were the hardest to eat: the sweet water was really sweet to the point of being cloying, the American ginseng in the soup was sour and astringent, and there was a dim sum that tasted like mooncake skin wrapped with raw celery pork dumpling filling, which was really hard to accept. Interestingly, the restaurant did not provide knives and forks by default, forcing some foreigners to use chopsticks, but they seemed to have no difficulty.</p>
<p>At our table with our advisor, there were researchers from Amazon and an AP from an American university. The Amazon researcher, in front of my advisor, talked about some things from the industry&rsquo;s perspective that were not suitable for students to hear. The AP talked about the relaxed atmosphere in American universities where you can get tenure by working at your own pace. The pork rib soup might have been soured by the American ginseng at this point.</p>
<p>After dinner, all attendees walked to the pier together for the Pearl River night cruise. On the way, I met a guy in his first year of PhD at Nagoya University, who praised the PhD life in Japan, including the free-range supervision, high subsidies, seller&rsquo;s job market, and cultural environment. Can pursuing a PhD be this enjoyable?</p>
<h3 id="pearl-river-night-cruise" class="heading-element"><span>Pearl River Night Cruise</span>
  <a href="#pearl-river-night-cruise" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p><img loading="lazy" src="https://cdn1.zair.top/images/2024/09/fe51049529fee14e8352584425b9594b.webp" alt="Our ship from another ship" srcset="https://cdn1.zair.top/images/2024/09/fe51049529fee14e8352584425b9594b.webp?size=small, https://cdn1.zair.top/images/2024/09/fe51049529fee14e8352584425b9594b.webp?size=medium 1.5x, https://cdn1.zair.top/images/2024/09/fe51049529fee14e8352584425b9594b.webp?size=large 2x" data-title="Our ship from another ship" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p>As night fell, the banks of the Pearl River lit up. We boarded the cruise ship, slowly sailing into the dazzling night. The high-rise buildings on both sides were brightly lit, reflecting on the shimmering river, like stars falling into the blue waves.</p>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024/09/42a46c5b4e60a2c6e14297ffdbca41b3.webp" alt="Exterior view of the Pearl River night cruise" srcset="https://cdn1.zair.top/images/2024/09/42a46c5b4e60a2c6e14297ffdbca41b3.webp?size=small, https://cdn1.zair.top/images/2024/09/42a46c5b4e60a2c6e14297ffdbca41b3.webp?size=medium 1.5x, https://cdn1.zair.top/images/2024/09/42a46c5b4e60a2c6e14297ffdbca41b3.webp?size=large 2x" data-title="Exterior view of the Pearl River night cruise" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p>Along the way, we passed iconic buildings, with the Canton Tower towering majestically, its changing lights dreamlike. The skyscrapers of Zhujiang New Town shone brightly, outlining the skyline of a modern city. When the ship passed under the Haizhu Bridge, the traffic on the bridge was heavy, and the ships below were bustling, presenting a busy scene.</p>
<p>The breeze gently brushed my face, and the river water lightly tapped the ship&rsquo;s side. Folk music came from below the deck, adding a touch of leisure to the night. Tourists leaned on the railing to gaze into the distance or chatted with drinks in hand, immersed in the charming night.</p>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024/09/aef20be076fa4c7b76f3db07bf93f331.webp" alt="Interior view of the Pearl River night cruise" srcset="https://cdn1.zair.top/images/2024/09/aef20be076fa4c7b76f3db07bf93f331.webp?size=small, https://cdn1.zair.top/images/2024/09/aef20be076fa4c7b76f3db07bf93f331.webp?size=medium 1.5x, https://cdn1.zair.top/images/2024/09/aef20be076fa4c7b76f3db07bf93f331.webp?size=large 2x" data-title="Interior view of the Pearl River night cruise" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p>The two-hour journey passed in a flash. Looking back at the entire journey, the bustling city and long history perfectly blended on this mother river, leaving a lasting impression.</p>
<h3 id="new-type-of-socializing" class="heading-element"><span>New Type of Socializing</span>
  <a href="#new-type-of-socializing" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p>While my classmate and I were enjoying the view at the bow of the third deck, a foreigner approached us. He was carrying a backpack, sweating profusely, and seemed a bit rushed. He asked if there was Wi-Fi on the ship and if we knew the password, as he needed to contact his family. We found out there wasn&rsquo;t, so we offered to share our hotspot. After successfully connecting, we realized that foreign social media apps are mostly blocked in China, and this guy might have been looking for Wi-Fi in Guangzhou but kept getting blocked. From his perspective: being in a distant and mysterious East, unable to bypass the firewall with any Wi-Fi, losing contact with family for three days, and people unwilling to &ldquo;share&rdquo; their hotspots—it&rsquo;s really pitiful. So we tried to share the magic cat&rsquo;s ability over the local network, but this simple task on a computer couldn&rsquo;t be accomplished on a phone.</p>
<p>However, we soon discovered that this guy was a social butterfly. He kept finding topics to talk about: &ldquo;What does ICBC mean? What does &lsquo;Liede&rsquo; mean? Where are you from? What&rsquo;s that logo?&rdquo;&hellip; and kept asking to take photos with us.</p>
<p>We chatted all the way, adding a different kind of fun to the night cruise.</p>
<p>Later on Whova, we found out that this guy ranked second in activity, worked at a German university, and had a company—this didn&rsquo;t seem like someone who had lost contact with family and needed to borrow Wi-Fi everywhere.</p>
<h2 id="day-4-freebies-collection" class="heading-element"><span>Day 4: Freebies Collection</span>
  <a href="#day-4-freebies-collection" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>During VLDB2024, there were booths from sponsors like Google, Meta, Alibaba, and ByteDance. At these booths, you could fill out surveys and receive freebies. The surveys mainly investigated research directions and job preferences, and some directly collected business cards. Among all the freebies, I liked Google&rsquo;s white shirt and the penguin in the picture below the most. <del>If you don&rsquo;t know which company this penguin belongs to, once you see the genuine anti-counterfeiting sticker on it, you&rsquo;ll know it&rsquo;s Tencent&rsquo;s.</del></p>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024/09/e5cd97b0da922baad80e38f0e27d53eb.webp" alt="Tencent&rsquo;s penguin freebie" srcset="https://cdn1.zair.top/images/2024/09/e5cd97b0da922baad80e38f0e27d53eb.webp?size=small, https://cdn1.zair.top/images/2024/09/e5cd97b0da922baad80e38f0e27d53eb.webp?size=medium 1.5x, https://cdn1.zair.top/images/2024/09/e5cd97b0da922baad80e38f0e27d53eb.webp?size=large 2x" data-title="Tencent&rsquo;s penguin freebie" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p>The afternoon presentations gave me a lot of inspiration. Some work seemed simple, even just a naive idea, but after in-depth research, it could form a complete paper, which was impressive. One memorable moment was a lone wolf from Cornell University who presented two independently completed works in the same session. Holding a microphone, with one hand in his pocket and both arms propped on the podium, his PPT adopted a simple large-font style (no slide had more than 30 words), treating the presentation as a speech. This might be what it means to be a big shot, completely next level.</p>
<p>This day marked the end of all our conference schedules, and when leaving the hotel, we encountered extremely severe thunderstorm weather with torrential rain.</p>
<h2 id="day-5-half-a-day-through-five-hundred-years-of-china" class="heading-element"><span>Day 5: Half a Day Through Five Hundred Years of China</span>
  <a href="#day-5-half-a-day-through-five-hundred-years-of-china" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>With the &ldquo;since we&rsquo;re here&rdquo; principle, I arranged a special forces-style intensive itinerary for my last day in Guangzhou. On this day, I visited several famous attractions in Guangzhou, experiencing the city&rsquo;s rich historical and cultural heritage.
In the early morning, I first arrived at the exotic Shamian Island. Strolling through this former concession area, where European-style buildings and tropical plants complement each other, it felt like traveling back in time. A standard for judging whether a place&rsquo;s scenery is beautiful enough is whether people take wedding photos there, like the Pukou Railway Station, Green Expo Garden, and Sun Yat-sen Mausoleum in Nanjing, and Shamian Island in Guangzhou.</p>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024/09/c9fd66a828e28a4b12b992f6ded1012e.webp" alt="Shamian Island scenery" srcset="https://cdn1.zair.top/images/2024/09/c9fd66a828e28a4b12b992f6ded1012e.webp?size=small, https://cdn1.zair.top/images/2024/09/c9fd66a828e28a4b12b992f6ded1012e.webp?size=medium 1.5x, https://cdn1.zair.top/images/2024/09/c9fd66a828e28a4b12b992f6ded1012e.webp?size=large 2x" data-title="Shamian Island scenery" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p>After spending the morning on Shamian Island, I walked to the nearby Thirteen Hongs Museum at noon to learn about Guangzhou&rsquo;s glorious history as an important port on the Maritime Silk Road. The museum&rsquo;s exhibits and pictures vividly depict Guangzhou&rsquo;s prosperous trade from the 17th century to the semi-colonial scene of the 19th century. I learned that after Guangzhou was forced to open for trade, foreign trading houses were initially located along the river in the city center opposite the current Canton Tower, but later moved to Shamian Island due to a fire.</p>
<p>In the afternoon, I visited the Sacred Heart Cathedral, a large Gothic building and a must-visit spot recommended by Xiaohongshu. It turned out to be just a photo spot, with only the entrance accessible most of the time.</p>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024/09/d87b6cdd619cecaec340f8a6504a178c.webp" alt="Sacred Heart Cathedral scenery" srcset="https://cdn1.zair.top/images/2024/09/d87b6cdd619cecaec340f8a6504a178c.webp?size=small, https://cdn1.zair.top/images/2024/09/d87b6cdd619cecaec340f8a6504a178c.webp?size=medium 1.5x, https://cdn1.zair.top/images/2024/09/d87b6cdd619cecaec340f8a6504a178c.webp?size=large 2x" data-title="Sacred Heart Cathedral scenery" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p>In the afternoon, I visited the Huangpu Military Academy site. Strolling through this place that cultivated countless revolutionary martyrs, I hoped to feel the passionate era. However, due to not doing homework in advance, it was somewhat disappointing and different from what I imagined. The original site of the Huangpu Military Academy was bombed to ashes by the Japanese during the Anti-Japanese War, and the current attractions are just efforts to restore it. Looking at the neat and orderly restored buildings, I kept having a &ldquo;Confucius Temple&rdquo; déjà vu—only feeling like I was in a scenic spot. After searching for a long time, I couldn&rsquo;t find the couplet at the main gate: &ldquo;If you want to get promoted and get rich, please go elsewhere; if you are afraid of death, don&rsquo;t enter this door.&rdquo;</p>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024/09/be63cdc9e66d411f0a863944e4a1bb3c.webp" alt="Huangpu Military Academy site scenery" srcset="https://cdn1.zair.top/images/2024/09/be63cdc9e66d411f0a863944e4a1bb3c.webp?size=small, https://cdn1.zair.top/images/2024/09/be63cdc9e66d411f0a863944e4a1bb3c.webp?size=medium 1.5x, https://cdn1.zair.top/images/2024/09/be63cdc9e66d411f0a863944e4a1bb3c.webp?size=large 2x" data-title="Huangpu Military Academy site scenery" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p>As night fell, I returned to the Pearl River, strolling along the riverside. The night view of Guangzhou, with its horizon and skyline, the Little Waist and Grand Theater, is never tiring. From the Haizhu Bridge to the Liede Bridge, the same Pearl River presents a different scene in each century from the 17th to the 21st century.</p>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024/09/6499961344854ffce2524a9d20d4be76.webp" alt="Pearl River night view" srcset="https://cdn1.zair.top/images/2024/09/6499961344854ffce2524a9d20d4be76.webp?size=small, https://cdn1.zair.top/images/2024/09/6499961344854ffce2524a9d20d4be76.webp?size=medium 1.5x, https://cdn1.zair.top/images/2024/09/6499961344854ffce2524a9d20d4be76.webp?size=large 2x" data-title="Pearl River night view" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p>These days in Guangzhou felt like a journey through ancient and modern times. From the academic halls of Pazhou to the dazzling lights on both sides of the Pearl River; from the exotic charm of Shamian Island to the revolutionary relics of the Huangpu Military Academy. This city, in its unique way, perfectly integrates academia, culture, history, and modernity. Under the shadows of skyscrapers, there are also the fireworks of urban villages; under the busy bridges, there are also the figures of the homeless, reminding that this city is still in the process of continuous progress and inclusiveness. But these scenes not only do not diminish Guangzhou&rsquo;s charm but make the city appear more real and full of human touch.</p>
<p>Goodbye, Guangzhou.</p>
<div class="bilibili">
  <iframe src="//player.bilibili.com/player.html?bvid=BV1AP1GYfEBS&page=1&autoplay=false&poster=true&muted=false&danmaku=true&t=0" border="0" framespacing="0" allowfullscreen="true"></iframe>
</div>]]></description>
    </item>
    <item>
      <title>Data Exhaustion Crisis: The Data Bottleneck in AI Development and Countermeasures</title>
      <link>https://www.zair.top/en/post/data-exhaustion-crisis/</link>
      <pubDate>Wed, 21 Aug 2024 15:05:44 +0800</pubDate><author>blog@zair.top (Tim)</author>
      <guid>https://www.zair.top/en/post/data-exhaustion-crisis/</guid>
      <category domain="https://www.zair.top/en/categories/llm/">Large Language Models</category>
      <description><![CDATA[<blockquote>
<p>We estimate that the stock of public text generated by humans is about 300 trillion tokens. If trends continue, language models will completely exhaust this stock between 2026 and 2032, or even earlier if overtrained. ——Epoch AI</p>
</blockquote>
<p>In 2006, Fei-Fei Li, then a professor at the University of Illinois (now a professor at Stanford University), saw the potential of the internet to change artificial intelligence (AI) research. Research in the field of linguistics had identified 80,000 &ldquo;noun synonym sets,&rdquo; which are collections of synonyms describing the same type of thing. Fei-Fei Li hypothesized that among the billions of images on the internet, there must be countless instances of these synonym sets. If enough of these images could be collected, a massive database could be created that surpasses any previous AI training resources. She said, &ldquo;Many people focus on models, we should focus on data.&rdquo; Thus, the ImageNet project was born.</p>
<p>The internet not only provided images but also resources for image annotation. After finding images of cats, dogs, chairs, etc., through search engines, people on Amazon&rsquo;s crowdsourcing platform Mechanical Turk manually checked and annotated these images. Eventually, a database containing millions of verified images was constructed. It was the use of part of the ImageNet data to train AlexNet in 2012 that demonstrated the great potential of &ldquo;deep learning,&rdquo; kicking off the last AI cycle and spawning an industry dependent on large amounts of annotated data.</p>
<h2 id="the-data-driven-ai-era" class="heading-element"><span>1 The Data-Driven AI Era</span>
  <a href="#the-data-driven-ai-era" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>In this AI cycle, AI development has extended to large language models (LLM), which also rely on internet data for training, but in a different way. The classic training task in the field of computer vision (CV) is to predict the content of an image (image classification task), but the classic task for LLM training is to predict the missing words in a text based on context.</p>
<p>This training method does not require manually annotated data; the system can leave out words, infer, and evaluate the correctness of the answers through &ldquo;self-supervised training.&rdquo; However, this method requires a large amount of data. <strong>Generally speaking, the more text a model obtains and the larger the data volume, the better its performance (Scaling Law).</strong> The internet provides tens of billions of texts, which are as valuable to LLM as carbon deposited over billions of years is to modern industry—a precious resource that can be refined into fuel.</p>
<p>A common source of training data is Common Crawl, an internet archive containing 50 billion web pages. As AI models develop, more data is added, such as Books3—a database containing thousands of books. However, as the demand for text data by AI grows, the supply of high-quality data on the internet is gradually becoming overwhelmed. According to Epoch AI&rsquo;s estimates, by 2028, high-quality text data on the internet will be completely utilized, and the industry will face the so-called &ldquo;data wall.&rdquo; The situation is more severe in the Chinese internet, from &ldquo;the Chinese internet is collapsing&rdquo; to major platforms locking their doors, as everyone realizes the value of data and puts it in a vault. How to overcome this barrier may be one of the most challenging issues in the future development of AI, and it may also be the issue most likely to slow down its progress.</p>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024%2F08%2Fd834acebf20bd29c720550594dac4481.png" alt="AI Data Wall Phenomenon" srcset="https://cdn1.zair.top/images/2024%2F08%2Fd834acebf20bd29c720550594dac4481.png?size=small, https://cdn1.zair.top/images/2024%2F08%2Fd834acebf20bd29c720550594dac4481.png?size=medium 1.5x, https://cdn1.zair.top/images/2024%2F08%2Fd834acebf20bd29c720550594dac4481.png?size=large 2x" data-title="AI Data Wall Phenomenon" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<h2 id="data-ownership-and-copyright-issues" class="heading-element"><span>2 Data Ownership and Copyright Issues</span>
  <a href="#data-ownership-and-copyright-issues" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>AI models increasingly rely on internet data, but the copyright issues of data are also full of controversy. Many of the data used to train large language models are often used without the consent of the copyright holders, and some AI companies have even utilized content behind paywalls. Although AI companies claim that such use falls under the &ldquo;fair use&rdquo; doctrine of copyright law, copyright holders do not agree. Getty Images sued image generation company Stability AI, accusing it of unauthorized use of its image library. The New York Times sued OpenAI and Microsoft, accusing them of infringing the copyrights of millions of articles. Stack Overflow, Reddit, and X (formerly Twitter) now charge AI companies fees. Zhihu is also interfering with crawlers like Bing and Google by using garbled text to restrict its Chinese content from being used as datasets for AI training.</p>
<p>Different regions have different attitudes toward this issue. Japan and Israel have taken a lenient stance to promote their AI industries. The EU does not have a general concept of &ldquo;fair use&rdquo; and may be stricter. Domestically, only a national data bureau has been established, clarifying that data has a dual identity as both production material and production object.</p>
<h2 id="existing-data-usage-strategies" class="heading-element"><span>3 Existing Data Usage Strategies</span>
  <a href="#existing-data-usage-strategies" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>Facing the data wall, the AI field has proposed several countermeasures. One key strategy is to <strong>focus on data quality rather than quantity</strong>. AI labs no longer blindly use the entire internet&rsquo;s data to train models but instead focus more on data filtering, cleaning, and optimization to ensure that models can extract the most valuable content from it. In the past year (2024), OpenAI&rsquo;s models no longer seem to be &ldquo;far ahead,&rdquo; and everyone&rsquo;s models are performing comparably, with differences in performance on different tasks coming from the construction of training data. After all, there are many open-source algorithms and models, but very few open-source datasets.</p>
<p>Obtaining &ldquo;real-world information&rdquo; is crucial, especially when models involve a lot of reasoning, making authoritative resources like academic textbooks particularly valuable. However, finding the <strong>optimal balance between different data sources</strong> remains a mysterious art.</p>
<p>During data usage, models also face the problem of &ldquo;catastrophic forgetting&rdquo;—that is, when a system is overtrained on certain types of data, it may excel in that field while forgetting previously learned knowledge. Therefore, <strong>the order of data during training</strong> also needs careful consideration. If all data on a particular topic (such as mathematics) is concentrated at the end of the training process, the model may perform well on math problems but may weaken its abilities in other areas. This unbalanced training method exacerbates the risk of catastrophic forgetting.</p>
<p>When data involves different fields and different forms (modalities), these strategies become more complex. As new text data becomes scarce, leading models like OpenAI&rsquo;s GPT-4 and Google&rsquo;s Gemini use not only text but also images, videos, and audio for training during self-supervised learning. However, video data is particularly tricky because video files contain extremely dense data points. To simplify the problem, existing models usually extract only a few frames for simplified processing, and academia is still searching for more efficient solutions.</p>
<h2 id="synthetic-data-and-ai-self-training" class="heading-element"><span>4 Synthetic Data and AI Self-Training</span>
  <a href="#synthetic-data-and-ai-self-training" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>Model capabilities can also be enhanced by fine-tuning (using additional data) based on the versions generated during self-supervised learning (pre-training versions). For example, &ldquo;supervised fine-tuning&rdquo; involves providing the model with question-answer pairs collected or created by humans to teach the model what constitutes a good answer. Another method, &ldquo;<strong>reinforcement learning based on human feedback</strong>&rdquo; (RLHF), tells the model whether an answer satisfies the questioner.</p>
<p>In RLHF, users provide feedback on the quality of the model&rsquo;s output, which is then used to adjust the model&rsquo;s parameters (weights). User interactions with chatbots, such as likes or dislikes, are particularly useful for RLHF. This is the mechanism of the &ldquo;data flywheel&rdquo;: more users bring in more data, which in turn optimizes better models. AI companies closely monitor the various questions users pose to their models, then collect data to adjust the models to cover these topics. Companies like Alibaba, ByteDance, and Minimax have launched price wars for models, and it&rsquo;s hard to say there isn&rsquo;t some consideration of this aspect.</p>
<p>As pre-training data on the internet gradually depletes, the importance of post-training is becoming increasingly prominent. Annotation companies like Scale AI and Surge AI earn hundreds of millions of dollars annually by collecting post-training data. Scale recently raised $1 billion at a valuation of $14 billion. Today&rsquo;s annotation work has surpassed the era of Mechanical Turk: top annotators can earn up to $100 per hour. Although post-training helps generate better models and meets the needs of many commercial applications, it remains an incremental improvement, addressing symptoms but not the root cause.</p>
<p>In addition to gradually overcoming the data wall, another solution is to completely bypass it by using machine-generated <strong>synthetic data</strong>. DeepMind (a subsidiary of Google) launched the AlphaGo Zero model, which is a good example. The company&rsquo;s first successful Go model was trained using data from millions of amateur games, while AlphaGo Zero did not use any existing data. Instead, it learned Go by playing 4.9 million games against itself over three days and recorded successful strategies. This &ldquo;reinforcement learning&rdquo; taught it how to respond to opponents&rsquo; moves by simulating a large number of possible responses and choosing the strategy with the highest probability of success.</p>
<p>Similar methods can also be used for LLM, such as the currently strongest open-source large model, Llama 3.1. A significant proportion of the SFT data in Llama 3.1 consists of synthetic data generated by the model, while a large proportion of the data in the SFT phase of Gemma2 is synthesized by larger models, proving that the quality of synthetic data is not inferior to that of human-annotated data.</p>
<p>Can we infinitely generate synthetic data and climb to the sky with one foot on the other? I believe the answer is no. A study published in Nature last month found that &ldquo;abusing&rdquo; synthetic data in model training can lead to &ldquo;irreversible defects.&rdquo; When models are fine-tuned with data synthesized by the model itself, it only takes a few rounds for the model to start talking nonsense, a phenomenon researchers call &ldquo;model collapse.&rdquo;</p>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024%2F08%2F105aac035a19fd4f52f02f6d7aa0c11c.png" alt="Nature Current Cover: Garbage in, Garbage out." srcset="https://cdn1.zair.top/images/2024%2F08%2F105aac035a19fd4f52f02f6d7aa0c11c.png?size=small, https://cdn1.zair.top/images/2024%2F08%2F105aac035a19fd4f52f02f6d7aa0c11c.png?size=medium 1.5x, https://cdn1.zair.top/images/2024%2F08%2F105aac035a19fd4f52f02f6d7aa0c11c.png?size=large 2x" data-title="Nature Current Cover: Garbage in, Garbage out." style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p>The bigger issue is how to extend this method to vertical fields such as healthcare or education. In games, the definition of victory is clear, and it is easier to collect data on whether a move is beneficial. In other fields, this is much more complex. Data on &ldquo;good&rdquo; decisions is usually collected from experts, but this is both expensive and time-consuming, and the solutions are not comprehensive. How to determine whether an expert is correct is also a recursive problem.</p>
<h2 id="conclusion" class="heading-element"><span>5 Conclusion</span>
  <a href="#conclusion" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>Acquiring more data will be key to maintaining rapid AI progress. Whether it is specialized data obtained from expert sources or machine-generated synthetic data, AI&rsquo;s progress depends on the continuous supply of data. As the most easily accessible data reserves gradually deplete, the AI industry has made many efforts to alleviate this issue:</p>
<ul>
<li>Emphasizing data quality and data cleaning</li>
<li>Increasing the proportion of mathematical, logical, and code data, adjusting training order</li>
<li>Using synthetic data to supplement real data</li>
</ul>
<p>But these seem unsustainable, and new data sources or sustainable alternatives must be sought; or from the algorithm architecture level, design new architectures that do not rely on data, thus ushering in the next AI cycle.</p>
<h2 id="recommended-reading" class="heading-element"><span>6 Recommended Reading</span>
  <a href="#recommended-reading" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><ol>
<li><a href="https://web.archive.org/web/20240523001943/https:/mp.weixin.qq.com/s/afg3zHPpEyRzSfOR1Aeh3w"target="_blank" rel="external nofollow noopener noreferrer">The Chinese Internet is Accelerating Collapse | He Jiayan<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a></li>
<li><a href="https://epochai.org/blog/will-we-run-out-of-data-limits-of-llm-scaling-based-on-human-generated-data"target="_blank" rel="external nofollow noopener noreferrer">Will We Run Out of Data? Limits of LLM Scaling Based on Human-Generated Data | EpochAI<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a></li>
<li><a href="https://blog.csdn.net/weixin_41446370/article/details/140030906"target="_blank" rel="external nofollow noopener noreferrer">Zhihu is Interfering with Crawlers like Bing/Google with Garbled Text | CSDN<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a></li>
<li><a href="https://wallstreetcn.com/articles/3723705"target="_blank" rel="external nofollow noopener noreferrer">AI Training Data is Depleting, Synthetic Data Sparks Huge Controversy | Wall Street Insights<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a></li>
<li><a href="https://developer.volcengine.com/articles/7396884852433551379"target="_blank" rel="external nofollow noopener noreferrer">A Brief Talk on Llama3.1: From Structure, Training Process, Impact to Data Synthesis | Volcano Community<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a></li>
<li><a href="https://www.nature.com/articles/s41586-024-07566-y"target="_blank" rel="external nofollow noopener noreferrer">AI Models Collapse When Trained on Recursively Generated Data | Nature<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a></li>
</ol>]]></description>
    </item>
    <item>
      <title>Fine-tuning GPT-4o-mini to Generate Blog Articles</title>
      <link>https://www.zair.top/en/post/fine-tuning-gpt-4o-mini/</link>
      <pubDate>Wed, 24 Jul 2024 15:05:44 +0800</pubDate><author>blog@zair.top (Tim)</author>
      <guid>https://www.zair.top/en/post/fine-tuning-gpt-4o-mini/</guid>
      <category domain="https://www.zair.top/en/categories/llm/">Large Language Models</category>
      <description><![CDATA[<p>The new model GPT-4o-mini, released on July 18, surpasses GPT-3.5 and approaches GPT-4 in performance, while costing only half as much as GPT-3.5. It is also the fastest in response time among the entire series of models. OpenAI officially opened the fine-tuning interface for GPT-4o-mini today, offering a daily free quota of 2M tokens until September 23, 2024.</p>
<p><del>It&rsquo;s not that Llama 3.1 405B is unaffordable, but GPT-4o-mini offers better cost-effectiveness.</del></p>
<h2 id="suitable-scenarios-for-fine-tuning" class="heading-element"><span>1 Suitable Scenarios for Fine-tuning</span>
  <a href="#suitable-scenarios-for-fine-tuning" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>For general simple tasks, writing prompts is sufficient for the model to perform well. For more complex tasks, you can try using a Chain of Thought to break down the complex task into multiple steps and reason through them gradually. However, for tasks requiring high precision and consistent output, fine-tuning is necessary.</p>
<p>The table below compares the advantages, disadvantages, and application scenarios of these three methods.</p>
<table>
  <thead>
      <tr>
          <th>Method</th>
          <th>Advantages</th>
          <th>Disadvantages</th>
          <th>Application Scenarios</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Fine-tuning</td>
          <td>Provides high-quality results</td>
          <td>Requires a lot of time and resources to prepare and train data</td>
          <td>Requires stable, reliable, and high-quality output</td>
      </tr>
      <tr>
          <td></td>
          <td>Suitable for complex tasks and customization in specific fields</td>
          <td>Feedback loop is slow, training cost is high</td>
          <td>Improves model performance in specific tasks or fields</td>
      </tr>
      <tr>
          <td></td>
          <td>Saves tokens, reduces latency</td>
          <td>Requires foundational knowledge of deep learning</td>
          <td>When tasks require high precision or unique style, tone, format</td>
      </tr>
      <tr>
          <td>Prompting</td>
          <td>Fast iteration and testing</td>
          <td>Depends on the quality of prompt design</td>
          <td>Quick prototyping and testing of common tasks</td>
      </tr>
      <tr>
          <td></td>
          <td>Suitable for initial exploration and general tasks</td>
          <td>May not be accurate enough for complex tasks</td>
          <td>When flexible adjustment of model output is needed</td>
      </tr>
      <tr>
          <td></td>
          <td>No need for additional data preparation and training resources</td>
          <td></td>
          <td>Not suitable for tasks with many examples and complex logic</td>
      </tr>
      <tr>
          <td>Chain of Thought</td>
          <td>Provides step-by-step logic and reasoning</td>
          <td>Increases the complexity and length of prompts</td>
          <td>Handles tasks requiring reasoning and logical steps</td>
      </tr>
      <tr>
          <td></td>
          <td>Improves performance on complex tasks</td>
          <td>Increases token usage and latency</td>
          <td>Involves multi-step problem-solving scenarios</td>
      </tr>
      <tr>
          <td></td>
          <td>Easily combines multiple strategies and tools</td>
          <td>May still not be enough for very complex tasks</td>
          <td>When a clear logical process and step-by-step execution are needed</td>
      </tr>
  </tbody>
</table>
<p>The NFL theorem tells us that no method is suitable for all scenarios, and the same applies here; fine-tuning is not necessarily better than the other two methods. However, it is clear that <strong>fine-tuning is suitable for those &ldquo;hard-to-describe tasks&rdquo;</strong>, such as a specific style and tone. Moreover, these three methods are not mutually exclusive; a fine-tuned model using carefully designed prompts, or even combined with a Chain of Thought, might achieve better results.</p>
<p>For simply writing an article or paragraph, prompts are enough. However, for a blog article considering SEO, there are many details, such as the frequency of core keywords. These details may not be fully understood by a large model, and as a user, you may not be able to describe them well in prompts. Therefore, writing such a blog article can use fine-tuning.</p>
<h2 id="preparing-data" class="heading-element"><span>2 Preparing Data</span>
  <a href="#preparing-data" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>Data needs to be organized in <code>jsonl</code> format, with each line being a json. For example:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="cl"><span class="p">{</span><span class="nt">&#34;messages&#34;</span><span class="p">:</span> <span class="p">[{</span><span class="nt">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;system&#34;</span><span class="p">,</span> <span class="nt">&#34;content&#34;</span><span class="p">:</span> <span class="s2">&#34;Marv is a factual chatbot that is also sarcastic.&#34;</span><span class="p">},</span> <span class="p">{</span><span class="nt">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;user&#34;</span><span class="p">,</span> <span class="nt">&#34;content&#34;</span><span class="p">:</span> <span class="s2">&#34;What&#39;s the capital of France?&#34;</span><span class="p">},</span> <span class="p">{</span><span class="nt">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;assistant&#34;</span><span class="p">,</span> <span class="nt">&#34;content&#34;</span><span class="p">:</span> <span class="s2">&#34;Paris, as if everyone doesn&#39;t know that already.&#34;</span><span class="p">}]}</span>
</span></span><span class="line"><span class="cl"><span class="p">{</span><span class="nt">&#34;messages&#34;</span><span class="p">:</span> <span class="p">[{</span><span class="nt">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;system&#34;</span><span class="p">,</span> <span class="nt">&#34;content&#34;</span><span class="p">:</span> <span class="s2">&#34;Marv is a factual chatbot that is also sarcastic.&#34;</span><span class="p">},</span> <span class="p">{</span><span class="nt">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;user&#34;</span><span class="p">,</span> <span class="nt">&#34;content&#34;</span><span class="p">:</span> <span class="s2">&#34;Who wrote &#39;Romeo and Juliet&#39;?&#34;</span><span class="p">},</span> <span class="p">{</span><span class="nt">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;assistant&#34;</span><span class="p">,</span> <span class="nt">&#34;content&#34;</span><span class="p">:</span> <span class="s2">&#34;Oh, just some guy named William Shakespeare. Ever heard of him?&#34;</span><span class="p">}]}</span>
</span></span><span class="line"><span class="cl"><span class="p">{</span><span class="nt">&#34;messages&#34;</span><span class="p">:</span> <span class="p">[{</span><span class="nt">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;system&#34;</span><span class="p">,</span> <span class="nt">&#34;content&#34;</span><span class="p">:</span> <span class="s2">&#34;Marv is a factual chatbot that is also sarcastic.&#34;</span><span class="p">},</span> <span class="p">{</span><span class="nt">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;user&#34;</span><span class="p">,</span> <span class="nt">&#34;content&#34;</span><span class="p">:</span> <span class="s2">&#34;How far is the Moon from Earth?&#34;</span><span class="p">},</span> <span class="p">{</span><span class="nt">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;assistant&#34;</span><span class="p">,</span> <span class="nt">&#34;content&#34;</span><span class="p">:</span> <span class="s2">&#34;Around 384,400 kilometers. Give or take a few, like that really matters.&#34;</span><span class="p">}]}</span></span></span></code></pre></td></tr></table>
</div>
</div><p>You can also set weights in multi-turn dialogues, with a weight of 0 indicating that the model should avoid such responses.</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="cl"><span class="p">{</span><span class="nt">&#34;messages&#34;</span><span class="p">:</span> <span class="p">[{</span><span class="nt">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;system&#34;</span><span class="p">,</span> <span class="nt">&#34;content&#34;</span><span class="p">:</span> <span class="s2">&#34;Marv is a factual chatbot that is also sarcastic.&#34;</span><span class="p">},</span> <span class="p">{</span><span class="nt">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;user&#34;</span><span class="p">,</span> <span class="nt">&#34;content&#34;</span><span class="p">:</span> <span class="s2">&#34;What&#39;s the capital of France?&#34;</span><span class="p">},</span> <span class="p">{</span><span class="nt">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;assistant&#34;</span><span class="p">,</span> <span class="nt">&#34;content&#34;</span><span class="p">:</span> <span class="s2">&#34;Paris&#34;</span><span class="p">,</span> <span class="nt">&#34;weight&#34;</span><span class="p">:</span> <span class="mi">0</span><span class="p">},</span> <span class="p">{</span><span class="nt">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;user&#34;</span><span class="p">,</span> <span class="nt">&#34;content&#34;</span><span class="p">:</span> <span class="s2">&#34;Can you be more sarcastic?&#34;</span><span class="p">},</span> <span class="p">{</span><span class="nt">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;assistant&#34;</span><span class="p">,</span> <span class="nt">&#34;content&#34;</span><span class="p">:</span> <span class="s2">&#34;Paris, as if everyone doesn&#39;t know that already.&#34;</span><span class="p">,</span> <span class="nt">&#34;weight&#34;</span><span class="p">:</span> <span class="mi">1</span><span class="p">}]}</span>
</span></span><span class="line"><span class="cl"><span class="p">{</span><span class="nt">&#34;messages&#34;</span><span class="p">:</span> <span class="p">[{</span><span class="nt">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;system&#34;</span><span class="p">,</span> <span class="nt">&#34;content&#34;</span><span class="p">:</span> <span class="s2">&#34;Marv is a factual chatbot that is also sarcastic.&#34;</span><span class="p">},</span> <span class="p">{</span><span class="nt">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;user&#34;</span><span class="p">,</span> <span class="nt">&#34;content&#34;</span><span class="p">:</span> <span class="s2">&#34;Who wrote &#39;Romeo and Juliet&#39;?&#34;</span><span class="p">},</span> <span class="p">{</span><span class="nt">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;assistant&#34;</span><span class="p">,</span> <span class="nt">&#34;content&#34;</span><span class="p">:</span> <span class="s2">&#34;William Shakespeare&#34;</span><span class="p">,</span> <span class="nt">&#34;weight&#34;</span><span class="p">:</span> <span class="mi">0</span><span class="p">},</span> <span class="p">{</span><span class="nt">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;user&#34;</span><span class="p">,</span> <span class="nt">&#34;content&#34;</span><span class="p">:</span> <span class="s2">&#34;Can you be more sarcastic?&#34;</span><span class="p">},</span> <span class="p">{</span><span class="nt">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;assistant&#34;</span><span class="p">,</span> <span class="nt">&#34;content&#34;</span><span class="p">:</span> <span class="s2">&#34;Oh, just some guy named William Shakespeare. Ever heard of him?&#34;</span><span class="p">,</span> <span class="nt">&#34;weight&#34;</span><span class="p">:</span> <span class="mi">1</span><span class="p">}]}</span>
</span></span><span class="line"><span class="cl"><span class="p">{</span><span class="nt">&#34;messages&#34;</span><span class="p">:</span> <span class="p">[{</span><span class="nt">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;system&#34;</span><span class="p">,</span> <span class="nt">&#34;content&#34;</span><span class="p">:</span> <span class="s2">&#34;Marv is a factual chatbot that is also sarcastic.&#34;</span><span class="p">},</span> <span class="p">{</span><span class="nt">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;user&#34;</span><span class="p">,</span> <span class="nt">&#34;content&#34;</span><span class="p">:</span> <span class="s2">&#34;How far is the Moon from Earth?&#34;</span><span class="p">},</span> <span class="p">{</span><span class="nt">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;assistant&#34;</span><span class="p">,</span> <span class="nt">&#34;content&#34;</span><span class="p">:</span> <span class="s2">&#34;384,400 kilometers&#34;</span><span class="p">,</span> <span class="nt">&#34;weight&#34;</span><span class="p">:</span> <span class="mi">0</span><span class="p">},</span> <span class="p">{</span><span class="nt">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;user&#34;</span><span class="p">,</span> <span class="nt">&#34;content&#34;</span><span class="p">:</span> <span class="s2">&#34;Can you be more sarcastic?&#34;</span><span class="p">},</span> <span class="p">{</span><span class="nt">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;assistant&#34;</span><span class="p">,</span> <span class="nt">&#34;content&#34;</span><span class="p">:</span> <span class="s2">&#34;Around 384,400 kilometers. Give or take a few, like that really matters.&#34;</span><span class="p">,</span> <span class="nt">&#34;weight&#34;</span><span class="p">:</span> <span class="mi">1</span><span class="p">}]}</span></span></span></code></pre></td></tr></table>
</div>
</div><p>Of course, processing data is the most time-consuming part, and you can directly use <a href="https://github.com/Ryaang/industry-article-dataset/blob/main/all_filter_2120.jsonl"target="_blank" rel="external nofollow noopener noreferrer">the dataset I created<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a>. This dataset is used for fine-tuning large models, sourced from scraping over 3000 pages across 13 categories from the <a href="https://reads.alibaba.com/"target="_blank" rel="external nofollow noopener noreferrer">reads.alibaba.com<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a> website. The open-source content includes not only the processed data but also the raw data and crawler code.</p>
<p>Upload the prepared data and record the returned file ID.</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">openai</span> <span class="kn">import</span> <span class="n">OpenAI</span>
</span></span><span class="line"><span class="cl"><span class="n">client</span> <span class="o">=</span> <span class="n">OpenAI</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">client</span><span class="o">.</span><span class="n">files</span><span class="o">.</span><span class="n">create</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">  <span class="n">file</span><span class="o">=</span><span class="nb">open</span><span class="p">(</span><span class="s2">&#34;all_filter_2120.jsonl&#34;</span><span class="p">,</span> <span class="s2">&#34;rb&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">  <span class="n">purpose</span><span class="o">=</span><span class="s2">&#34;fine-tune&#34;</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span></span></span></code></pre></td></tr></table>
</div>
</div><h2 id="fine-tuning-the-model" class="heading-element"><span>3 Fine-tuning the Model</span>
  <a href="#fine-tuning-the-model" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>Once the data is prepared, verified, and the token cost is confirmed, you can create a fine-tuning task.</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">openai</span> <span class="kn">import</span> <span class="n">OpenAI</span>
</span></span><span class="line"><span class="cl"><span class="n">client</span> <span class="o">=</span> <span class="n">OpenAI</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">client</span><span class="o">.</span><span class="n">fine_tuning</span><span class="o">.</span><span class="n">jobs</span><span class="o">.</span><span class="n">create</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">  <span class="n">training_file</span><span class="o">=</span><span class="s2">&#34;file-zWptPbsD37ZnemssjpsK6CnF&#34;</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl">  <span class="n">model</span><span class="o">=</span><span class="s2">&#34;gpt-4o-mini&#34;</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span></span></span></code></pre></td></tr></table>
</div>
</div><p>More detailed parameter configurations for this step can be found in the <a href="https://platform.openai.com/docs/api-reference/fine-tuning/create"target="_blank" rel="external nofollow noopener noreferrer">official API documentation<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a>.</p>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024%2F07%2F3e3c8d91530a4be8e7ca7d3d96ad3cae.png" alt="OpenAI Fine-tuning UI" srcset="https://cdn1.zair.top/images/2024%2F07%2F3e3c8d91530a4be8e7ca7d3d96ad3cae.png?size=small, https://cdn1.zair.top/images/2024%2F07%2F3e3c8d91530a4be8e7ca7d3d96ad3cae.png?size=medium 1.5x, https://cdn1.zair.top/images/2024%2F07%2F3e3c8d91530a4be8e7ca7d3d96ad3cae.png?size=large 2x" data-title="OpenAI Fine-tuning UI" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p>These two steps can also be quickly completed in the <a href="https://platform.openai.com/finetune"target="_blank" rel="external nofollow noopener noreferrer">UI interface<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a>. After submitting the task, you can also view the progress and loss changes in real time on the UI interface.</p>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024%2F07%2F853e4147b0c7e82527a646eb60b612b5.png" alt="OpenAI Fine-tuning Process Log" srcset="https://cdn1.zair.top/images/2024%2F07%2F853e4147b0c7e82527a646eb60b612b5.png?size=small, https://cdn1.zair.top/images/2024%2F07%2F853e4147b0c7e82527a646eb60b612b5.png?size=medium 1.5x, https://cdn1.zair.top/images/2024%2F07%2F853e4147b0c7e82527a646eb60b612b5.png?size=large 2x" data-title="OpenAI Fine-tuning Process Log" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<h2 id="invoking-the-model" class="heading-element"><span>4 Invoking the Model</span>
  <a href="#invoking-the-model" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>Use the following code to query the status of the fine-tuning task. Once the job is successful, you will see the <code>fine_tuned_model</code> field filled with the model&rsquo;s name. Note this name for invocation.</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">openai</span> <span class="kn">import</span> <span class="n">OpenAI</span>
</span></span><span class="line"><span class="cl"><span class="n">client</span> <span class="o">=</span> <span class="n">OpenAI</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Query fine-tuning job list</span>
</span></span><span class="line"><span class="cl"><span class="n">client</span><span class="o">.</span><span class="n">fine_tuning</span><span class="o">.</span><span class="n">jobs</span><span class="o">.</span><span class="n">list</span><span class="p">(</span><span class="n">limit</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Query fine-tuning job details</span>
</span></span><span class="line"><span class="cl"><span class="n">client</span><span class="o">.</span><span class="n">fine_tuning</span><span class="o">.</span><span class="n">jobs</span><span class="o">.</span><span class="n">retrieve</span><span class="p">(</span><span class="s2">&#34;ftjob-gvP0VB7RlWcF3QHdQrEVf49Y&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Cancel job</span>
</span></span><span class="line"><span class="cl"><span class="n">client</span><span class="o">.</span><span class="n">fine_tuning</span><span class="o">.</span><span class="n">jobs</span><span class="o">.</span><span class="n">cancel</span><span class="p">(</span><span class="s2">&#34;ftjob-gvP0VB7RlWcF3QHdQrEVf49Y&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># View logs in the job</span>
</span></span><span class="line"><span class="cl"><span class="n">client</span><span class="o">.</span><span class="n">fine_tuning</span><span class="o">.</span><span class="n">jobs</span><span class="o">.</span><span class="n">list_events</span><span class="p">(</span><span class="n">fine_tuning_job_id</span><span class="o">=</span><span class="s2">&#34;ftjob-gvP0VB7RlWcF3QHdQrEVf49Y&#34;</span><span class="p">,</span> <span class="n">limit</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Delete fine-tuned model</span>
</span></span><span class="line"><span class="cl"><span class="n">client</span><span class="o">.</span><span class="n">models</span><span class="o">.</span><span class="n">delete</span><span class="p">(</span><span class="s2">&#34;ft:gpt-3.5-turbo:acemeco:suffix:abc123&#34;</span><span class="p">)</span></span></span></code></pre></td></tr></table>
</div>
</div><p>The invocation method is the same as the official model; you only need to change the model name, for example:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">openai</span> <span class="kn">import</span> <span class="n">OpenAI</span>
</span></span><span class="line"><span class="cl"><span class="n">client</span> <span class="o">=</span> <span class="n">OpenAI</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">completion</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="n">chat</span><span class="o">.</span><span class="n">completions</span><span class="o">.</span><span class="n">create</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">  <span class="n">model</span><span class="o">=</span><span class="s2">&#34;ft:gpt-4o-mini-2024-07-18:personal:0724:9oMH6S7A&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">  <span class="n">messages</span><span class="o">=</span><span class="p">[</span>
</span></span><span class="line"><span class="cl">    <span class="p">{</span><span class="s2">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;system&#34;</span><span class="p">,</span> <span class="s2">&#34;content&#34;</span><span class="p">:</span> <span class="s2">&#34;Please write an SEO article of no less than 800 words based on the title I gave you, including at least 4 subtitles by HTML format. Do not include the &lt;h1&gt; , &lt;body&gt; tag.  Do not include the &lt;html&gt; tag in the start and end of the content. Directly start with the content.&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="cl">    <span class="p">{</span><span class="s2">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;user&#34;</span><span class="p">,</span> <span class="s2">&#34;content&#34;</span><span class="p">:</span> <span class="sa">f</span><span class="s2">&#34;title:</span><span class="si">{</span><span class="n">task</span><span class="o">.</span><span class="n">title</span><span class="si">}</span><span class="s2">,core keyword:</span><span class="si">{</span><span class="n">task</span><span class="o">.</span><span class="n">coreKeywords</span><span class="si">}</span><span class="s2">,related keyword:</span><span class="si">{</span><span class="n">task</span><span class="o">.</span><span class="n">relatedKeywords</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">}</span>
</span></span><span class="line"><span class="cl">  <span class="p">]</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="n">completion</span><span class="o">.</span><span class="n">choices</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">message</span><span class="p">)</span></span></span></code></pre></td></tr></table>
</div>
</div><h2 id="evaluating-results" class="heading-element"><span>5 Evaluating Results</span>
  <a href="#evaluating-results" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>During training, there are two metrics available for reference: loss value and token accuracy. The official explanation is as follows:</p>
<blockquote>
<p>Validation loss and validation token accuracy are calculated in two different ways—on a small batch of data during each step and on the entire validation set at the end of each epoch. The entire validation loss and entire validation token accuracy metrics are the most accurate indicators of tracking the overall performance of the model. These statistics are intended to provide a sanity check to ensure that training is proceeding smoothly (loss should decrease, token accuracy should increase).</p>
</blockquote>
<p>However, metrics are just references, and the actual effect still needs to be evaluated by yourself. The fine-tuned model has at least the following improvements:</p>
<ul>
<li>Article length increased by 20%</li>
<li>Article structure is closer to the training data</li>
<li>No more formatting errors (such as markdown format, adding CSS, etc.)</li>
</ul>
<p>An article generated with the title &ldquo;What is the Difference Between a Mural and a Mosaic?&rdquo; is as follows:</p>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024%2F07%2F4821854ba99cf201f9b8795df70d0d9d.png" alt="Evaluation Results" srcset="https://cdn1.zair.top/images/2024%2F07%2F4821854ba99cf201f9b8795df70d0d9d.png?size=small, https://cdn1.zair.top/images/2024%2F07%2F4821854ba99cf201f9b8795df70d0d9d.png?size=medium 1.5x, https://cdn1.zair.top/images/2024%2F07%2F4821854ba99cf201f9b8795df70d0d9d.png?size=large 2x" data-title="Evaluation Results" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<h2 id="reference-articles" class="heading-element"><span>6 Reference Articles</span>
  <a href="#reference-articles" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><ul>
<li><a href="https://platform.openai.com/docs/guides/fine-tuning"target="_blank" rel="external nofollow noopener noreferrer">https://platform.openai.com/docs/guides/fine-tuning<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a></li>
</ul>]]></description>
    </item>
    <item>
      <title>Four Design Patterns of AI Agents: A Pathway to Artificial General Intelligence</title>
      <link>https://www.zair.top/en/post/ai-agent-design-pattern/</link>
      <pubDate>Wed, 22 May 2024 18:00:22 +0800</pubDate><author>blog@zair.top (Tim)</author>
      <guid>https://www.zair.top/en/post/ai-agent-design-pattern/</guid>
      <category domain="https://www.zair.top/en/categories/llm/">Large Language Models</category>
      <description><![CDATA[<h2 id="four-design-patterns-of-ai-agents-a-pathway-to-artificial-general-intelligence" class="heading-element"><span>Four Design Patterns of AI Agents: A Pathway to Artificial General Intelligence</span>
  <a href="#four-design-patterns-of-ai-agents-a-pathway-to-artificial-general-intelligence" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>If using AI to complete tasks is likened to writing an essay, then the non-agent approach is to have AI write it from start to finish without any modifications, while the agent approach allows AI to make multiple modifications and use tools and collaborate with the outside world during the process. In 2024, agents are seen as one of the pathways to achieving Artificial General Intelligence (AGI).</p>
<p>Foundational models have driven the development of generative AI, enabling AI Agents to automatically assist users in completing tasks. Over the past year, various agents have emerged. Inspired by Andrew Ng&rsquo;s speech at the Red Shirt AI Summit, this article combines papers published in the past year and engineering blogs like Langchain to organize existing agents and summarize design paradigms, hoping to aid in designing agents based on foundational models.</p>
<p>The following table preliminarily organizes 16 patterns:</p>
<table>
  <thead>
      <tr>
          <th>Pattern</th>
          <th>Description</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Passive goal creator</td>
          <td>Analyzes explicit cues from users through a conversational interface to maintain interactivity, goal tracking, and intuitiveness.</td>
      </tr>
      <tr>
          <td>Proactive goal creator</td>
          <td>Anticipates user goals by understanding human interactions and capturing context to enhance interactivity, goal tracking, and accessibility.</td>
      </tr>
      <tr>
          <td>Prompt/response optimiser</td>
          <td>Optimizes prompts/responses based on expected input or output content and format to provide standardization, response accuracy, interoperability, and adaptability.</td>
      </tr>
      <tr>
          <td>Retrieval augmented generation</td>
          <td>Enhances the knowledge update capability of agents while maintaining data privacy in local foundational model agent systems.</td>
      </tr>
      <tr>
          <td>One-shot model querying</td>
          <td>Accesses the foundational model in a single instance to generate all steps needed for planning, improving cost efficiency and simplifying processes.</td>
      </tr>
      <tr>
          <td>Incremental model querying</td>
          <td>Accesses the foundational model at each step of the plan generation process to provide supplementary context, improve response accuracy, and interpretability.</td>
      </tr>
      <tr>
          <td>Single-path plan generator</td>
          <td>Coordinates the generation of intermediate steps to achieve user goals, improving reasoning certainty, coherence, and efficiency.</td>
      </tr>
      <tr>
          <td>Multi-path plan generator</td>
          <td>Allows multiple options to be created at each step of achieving user goals, enhancing reasoning certainty, coherence, alignment with human preferences, and inclusivity.</td>
      </tr>
      <tr>
          <td>Self-reflection</td>
          <td>Enables agents to generate feedback on the planning and reasoning process and provide self-improvement guidance to enhance reasoning certainty, interpretability, continuous improvement, and efficiency.</td>
      </tr>
      <tr>
          <td>Cross-reflection</td>
          <td>Uses different agents or foundational models to provide feedback and improve the generated planning and reasoning process, enhancing reasoning certainty, interpretability, interoperability, inclusivity, scalability, and continuous improvement.</td>
      </tr>
      <tr>
          <td>Human reflection</td>
          <td>Collects human feedback to improve planning and reasoning processes, effectively aligning with human preferences, enhancing contestability, effectiveness, fairness, and continuous improvement.</td>
      </tr>
      <tr>
          <td>Voting-based cooperation</td>
          <td>Allows agents to freely express opinions and reach consensus through voting, improving diversity, effective division of labor, and fault tolerance.</td>
      </tr>
      <tr>
          <td>Role-based cooperation</td>
          <td>Assigns different roles and finalizes decisions based on the roles of agents, improving decision certainty, division of labor, fault tolerance, scalability, and accountability.</td>
      </tr>
      <tr>
          <td>Debate-based cooperation</td>
          <td>Agents provide and receive feedback through debate, adjusting their ideas and actions until consensus is reached, improving decision certainty, adaptability, interpretability, response accuracy, and critical thinking.</td>
      </tr>
      <tr>
          <td>Multimodal guardrails</td>
          <td>Controls the input and output of foundational models to meet specific requirements, such as user demands, ethical standards, and legal regulations, enhancing robustness, safety, standard alignment, and adaptability.</td>
      </tr>
      <tr>
          <td>Tool/agent registry</td>
          <td>Maintains a unified and convenient source for selecting different agents and tools, improving discoverability, efficiency, and tool applicability.</td>
      </tr>
  </tbody>
</table>
<p>These 16 patterns can all be attributed to the four paradigms proposed by Andrew Ng in his speech at the Red Shirt AI Summit, namely:</p>
<ul>
<li>Reflection</li>
<li>Tool Use</li>
<li>Planning</li>
<li>Multiagent Collaboration</li>
</ul>
<h2 id="reflection" class="heading-element"><span>1 Reflection</span>
  <a href="#reflection" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><h3 id="basic-reflection" class="heading-element"><span>1.1 Basic Reflection</span>
  <a href="#basic-reflection" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p>In the context of LLM Agent construction, reflection refers to the process of prompting LLMs to observe their past steps (and potential observations from tools/environments) to assess the quality of selected actions. This feedback is then used for downstream tasks such as replanning, searching, or evaluation. The figure below shows a basic reflection pattern.</p>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024%2F05%2F88d05b24ee82d9bead9a2b0c6c2b2724.png" alt="Basic Reflection" srcset="https://cdn1.zair.top/images/2024%2F05%2F88d05b24ee82d9bead9a2b0c6c2b2724.png?size=small, https://cdn1.zair.top/images/2024%2F05%2F88d05b24ee82d9bead9a2b0c6c2b2724.png?size=medium 1.5x, https://cdn1.zair.top/images/2024%2F05%2F88d05b24ee82d9bead9a2b0c6c2b2724.png?size=large 2x" data-title="Basic Reflection" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p>Basic Reflection</p>
<h3 id="reflexion-actor" class="heading-element"><span>1.2 Reflexion Actor</span>
  <a href="#reflexion-actor" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p>Proposed by Shinn et al., Reflexion is an architecture that learns through language feedback and self-reflection. This agent reviews its task results to generate higher quality final outcomes, but at the cost of longer execution time. It mainly consists of three components:</p>
<ol>
<li>Actor (agent) with self-reflection</li>
<li>External evaluator (task-specific, e.g., code compilation steps)</li>
<li>Episodic memory that stores the reflections from (1).</li>
</ol>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024%2F05%2Fa33408400ae8ecf30cbbf598cf709e2f.png" alt="Reflexion Actor" srcset="https://cdn1.zair.top/images/2024%2F05%2Fa33408400ae8ecf30cbbf598cf709e2f.png?size=small, https://cdn1.zair.top/images/2024%2F05%2Fa33408400ae8ecf30cbbf598cf709e2f.png?size=medium 1.5x, https://cdn1.zair.top/images/2024%2F05%2Fa33408400ae8ecf30cbbf598cf709e2f.png?size=large 2x" data-title="Reflexion Actor" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p>Reflexion Actor</p>
<h3 id="lats" class="heading-element"><span>1.3 LATS</span>
  <a href="#lats" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p>Language Agent Tree Search (LATS), proposed by Zhou et al., is a general LLM Agent search algorithm that combines reflection/evaluation and search (specifically Monte Carlo Tree Search), achieving better overall task performance compared to similar techniques like ReACT, Reflexion, or Tree of Thoughts.</p>
<p>It has four main steps:</p>
<ol>
<li>
<p>Select: pick the best next actions based on the aggregate rewards from step (2). Either respond (if a solution is found or the max search depth is reached) or continue searching.</p>
</li>
<li>
<p>Expand and simulate: select the &ldquo;best&rdquo; 5 potential actions to take and execute them in parallel.</p>
</li>
<li>
<p>Reflect + Evaluate: observe the outcomes of these actions and score the decisions based on reflection (and possibly external feedback).</p>
</li>
<li>
<p>Backpropagate: update the scores of the root trajectories based on the outcomes.</p>
</li>
</ol>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024%2F05%2F64ac3101a548d42fd1ede6715e99d143.png" alt="LATS" srcset="https://cdn1.zair.top/images/2024%2F05%2F64ac3101a548d42fd1ede6715e99d143.png?size=small, https://cdn1.zair.top/images/2024%2F05%2F64ac3101a548d42fd1ede6715e99d143.png?size=medium 1.5x, https://cdn1.zair.top/images/2024%2F05%2F64ac3101a548d42fd1ede6715e99d143.png?size=large 2x" data-title="LATS" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p>LATS</p>
<h2 id="tool-use" class="heading-element"><span>2 Tool Use</span>
  <a href="#tool-use" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>Invoke tools, using them in the form of functions.</p>
<h2 id="planning" class="heading-element"><span>3 Planning</span>
  <a href="#planning" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><h3 id="react" class="heading-element"><span>3.1 ReAct</span>
  <a href="#react" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p>ReAct enhances the capabilities of agents by combining reasoning and action. The ReAct method allows agents to react immediately after receiving information, rather than waiting for all information to be processed. At the same time, this method emphasizes the close integration of reasoning and action, where agents not only need to analyze and understand input information but also take corresponding actions based on the analysis results. The advantage of this approach lies in its flexibility and adaptability to the environment.</p>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024%2F05%2F3627afc055f30f3c275fadaf2ec38080.png" alt="ReAct Framework" srcset="https://cdn1.zair.top/images/2024%2F05%2F3627afc055f30f3c275fadaf2ec38080.png?size=small, https://cdn1.zair.top/images/2024%2F05%2F3627afc055f30f3c275fadaf2ec38080.png?size=medium 1.5x, https://cdn1.zair.top/images/2024%2F05%2F3627afc055f30f3c275fadaf2ec38080.png?size=large 2x" data-title="ReAct Framework" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p>ReAct Framework</p>
<h3 id="plan-and-execute" class="heading-element"><span>3.2 Plan and Execute</span>
  <a href="#plan-and-execute" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p>The core idea of Plan and Execute is to first develop a multi-step plan and then execute the plan item by item. After completing specific tasks, the plan can be revisited and appropriately modified.</p>
<p>Compared to typical ReAct-style agents, which think one step at a time, the advantage of this &ldquo;plan and execute&rdquo; style is:</p>
<ol>
<li>Clear long-term planning (even very powerful LLMs find this challenging)</li>
<li>The ability to use smaller/weaker models during execution steps, using larger/better models only during planning steps</li>
</ol>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024%2F05%2Ffacdd82299104208c2110cff4a41ce81.png" alt="Plan-and-Execute" srcset="https://cdn1.zair.top/images/2024%2F05%2Ffacdd82299104208c2110cff4a41ce81.png?size=small, https://cdn1.zair.top/images/2024%2F05%2Ffacdd82299104208c2110cff4a41ce81.png?size=medium 1.5x, https://cdn1.zair.top/images/2024%2F05%2Ffacdd82299104208c2110cff4a41ce81.png?size=large 2x" data-title="Plan-and-Execute" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p><strong>Plan-and-Execute</strong></p>
<h3 id="rewoo" class="heading-element"><span>3.3 ReWOO</span>
  <a href="#rewoo" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p>In ReWOO, Xu et al. propose an agent that combines multi-step planning and variable substitution to achieve efficient tool use. It improves ReACT-style agent architecture in the following ways:</p>
<ol>
<li>Reduces token consumption and execution time by generating the complete toolchain used at once. (ReACT-style agent architecture requires many LLM calls and has redundant prefixes because system prompts and previous steps are provided to the LLM at each reasoning step)</li>
<li>Simplifies fine-tuning. Since planning data does not depend on tool outputs, the model can theoretically be fine-tuned without actually calling the tools.</li>
</ol>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024%2F05%2F63e39cb8db866a5a4a1db4e9970f9413.png" alt="Reasoning without Observation" srcset="https://cdn1.zair.top/images/2024%2F05%2F63e39cb8db866a5a4a1db4e9970f9413.png?size=small, https://cdn1.zair.top/images/2024%2F05%2F63e39cb8db866a5a4a1db4e9970f9413.png?size=medium 1.5x, https://cdn1.zair.top/images/2024%2F05%2F63e39cb8db866a5a4a1db4e9970f9413.png?size=large 2x" data-title="Reasoning without Observation" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p><strong>Reasoning without Observation</strong></p>
<h3 id="llmcompiler" class="heading-element"><span>3.4 LLMCompiler</span>
  <a href="#llmcompiler" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p>LLMCompiler is an agent architecture that accelerates agent task execution by executing tasks in a DAG in a blitz manner. It also saves the cost of redundant token usage by reducing the number of LLM calls. It mainly consists of three parts:</p>
<ol>
<li>
<p>Planner: stream a DAG of tasks.</p>
</li>
<li>
<p>Task Fetching Unit: schedules and executes the tasks as soon as they are executable.</p>
</li>
<li>
<p>Joiner: Responds to the user or triggers a second plan.</p>
</li>
</ol>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024%2F05%2F593ce5e78b3d68b79abffb49fe0a137d.png" alt="LLMCompiler" srcset="https://cdn1.zair.top/images/2024%2F05%2F593ce5e78b3d68b79abffb49fe0a137d.png?size=small, https://cdn1.zair.top/images/2024%2F05%2F593ce5e78b3d68b79abffb49fe0a137d.png?size=medium 1.5x, https://cdn1.zair.top/images/2024%2F05%2F593ce5e78b3d68b79abffb49fe0a137d.png?size=large 2x" data-title="LLMCompiler" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p>LLMCompiler</p>
<h2 id="multiagent-collaboration" class="heading-element"><span>4 Multiagent collaboration</span>
  <a href="#multiagent-collaboration" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><h3 id="supervison" class="heading-element"><span>4.1 Supervison</span>
  <a href="#supervison" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p>Manages and schedules multiple agents for collaboration through a supervisor.</p>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024%2F05%2F8d55fa6833db17b2e1970198230f1373.png" alt="Supervision Method" srcset="https://cdn1.zair.top/images/2024%2F05%2F8d55fa6833db17b2e1970198230f1373.png?size=small, https://cdn1.zair.top/images/2024%2F05%2F8d55fa6833db17b2e1970198230f1373.png?size=medium 1.5x, https://cdn1.zair.top/images/2024%2F05%2F8d55fa6833db17b2e1970198230f1373.png?size=large 2x" data-title="Supervision Method" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p>Supervision Method</p>
<h3 id="hierarchical-teams" class="heading-element"><span>4.2 <strong>Hierarchical Teams</strong></span>
  <a href="#hierarchical-teams" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p>Completes complex and large-scale tasks by organizing agents in a hierarchical, tiered manner. AutoGen is a typical representative of this approach.</p>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024%2F05%2F77c3c444633cb7c024fbc0ac23b8f690.png" alt="Hierarchical Team Method" srcset="https://cdn1.zair.top/images/2024%2F05%2F77c3c444633cb7c024fbc0ac23b8f690.png?size=small, https://cdn1.zair.top/images/2024%2F05%2F77c3c444633cb7c024fbc0ac23b8f690.png?size=medium 1.5x, https://cdn1.zair.top/images/2024%2F05%2F77c3c444633cb7c024fbc0ac23b8f690.png?size=large 2x" data-title="Hierarchical Team Method" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p>Hierarchical Team Method</p>
<h3 id="collaboration" class="heading-element"><span>4.3 <strong>Collaboration</strong></span>
  <a href="#collaboration" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p>A single agent&rsquo;s ability to use multiple (domain) tools is limited, requiring multiple agents to collaborate using more types of tools. The &ldquo;divide and conquer&rdquo; approach can be used, allowing each agent to become an &ldquo;expert&rdquo; focused on handling a specific type of problem, and then having them collaborate.</p>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024%2F05%2Fd16b1684829ffde358e49d51b5d0e8c0.png" alt="A Basic Multi-Agent Collaboration" srcset="https://cdn1.zair.top/images/2024%2F05%2Fd16b1684829ffde358e49d51b5d0e8c0.png?size=small, https://cdn1.zair.top/images/2024%2F05%2Fd16b1684829ffde358e49d51b5d0e8c0.png?size=medium 1.5x, https://cdn1.zair.top/images/2024%2F05%2Fd16b1684829ffde358e49d51b5d0e8c0.png?size=large 2x" data-title="A Basic Multi-Agent Collaboration" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p>A Basic Multi-Agent Collaboration</p>
<h2 id="evaluation" class="heading-element"><span>5 Evaluation</span>
  <a href="#evaluation" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>One of the most straightforward ideas is to use an agent as a &ldquo;virtual user&rdquo; for evaluation, as many task results cannot be quantitatively evaluated. However, for tasks with clear metrics (classification, regression), a tool can be directly utilized for evaluation.</p>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024%2F05%2F558f46c4a3d4c8ed5bd4d6f0d06837ac.png" alt="Agent-based Evaluation" srcset="https://cdn1.zair.top/images/2024%2F05%2F558f46c4a3d4c8ed5bd4d6f0d06837ac.png?size=small, https://cdn1.zair.top/images/2024%2F05%2F558f46c4a3d4c8ed5bd4d6f0d06837ac.png?size=medium 1.5x, https://cdn1.zair.top/images/2024%2F05%2F558f46c4a3d4c8ed5bd4d6f0d06837ac.png?size=large 2x" data-title="Agent-based Evaluation" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p>Agent-based Evaluation</p>
<h2 id="other-ways-to-achieve-agi" class="heading-element"><span>6 Other Ways to Achieve AGI</span>
  <a href="#other-ways-to-achieve-agi" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>Agents are just one promising way to achieve AGI, but not the only method. The agent approach can be organically combined with methods like RAG and user involvement. For example, Shi et al. combined agents and retrieval to solve Olympiad programming problems with large models.</p>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024%2F05%2F787708ee6ea3b818eda88ca3ac7504f2.png" alt="Untitled" srcset="https://cdn1.zair.top/images/2024%2F05%2F787708ee6ea3b818eda88ca3ac7504f2.png?size=small, https://cdn1.zair.top/images/2024%2F05%2F787708ee6ea3b818eda88ca3ac7504f2.png?size=medium 1.5x, https://cdn1.zair.top/images/2024%2F05%2F787708ee6ea3b818eda88ca3ac7504f2.png?size=large 2x" data-title="Untitled" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<h2 id="recommended-reading" class="heading-element"><span>7 Recommended Reading</span>
  <a href="#recommended-reading" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p><a href="https://github.com/AGI-Edgerunners/LLM-Agents-Papers?tab=readme-ov-file"target="_blank" rel="external nofollow noopener noreferrer">https://github.com/AGI-Edgerunners/LLM-Agents-Papers<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a></p>
<p><a href="https://github.com/zjunlp/LLMAgentPapers"target="_blank" rel="external nofollow noopener noreferrer">https://github.com/zjunlp/LLMAgentPapers<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a></p>
<p><a href="https://www.cnblogs.com/bonelee/p/18153432"target="_blank" rel="external nofollow noopener noreferrer">AI agent task decomposition and scheduling classic articles - bonelee - Blog Park (cnblogs.com)<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a></p>
<p><a href="https://zhuanlan.zhihu.com/p/691370751"target="_blank" rel="external nofollow noopener noreferrer">Four Agent Paradigms | CRITIC: Andrew Ng&rsquo;s Recommended Agent Design Paradigms - Zhihu (zhihu.com)<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a></p>
<h2 id="references" class="heading-element"><span>8 References</span>
  <a href="#references" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><ol>
<li>Kim, Sehoon, Suhong Moon, Ryan Tabrizi, Nicholas Lee, Michael W. Mahoney, Kurt Keutzer, and Amir Gholami. “An LLM Compiler for Parallel Function Calling.” arXiv, February 6, 2024. <a href="https://doi.org/10.48550/arXiv.2312.04511"target="_blank" rel="external nofollow noopener noreferrer">https://doi.org/10.48550/arXiv.2312.04511<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a>.</li>
<li>Liu, Yue, Sin Kit Lo, Qinghua Lu, Liming Zhu, Dehai Zhao, Xiwei Xu, Stefan Harrer, and Jon Whittle. “Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model Based Agents.” arXiv, May 16, 2024. <a href="https://doi.org/10.48550/arXiv.2405.10467"target="_blank" rel="external nofollow noopener noreferrer">https://doi.org/10.48550/arXiv.2405.10467<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a>.</li>
<li>Shi, Quan, Michael Tang, Karthik Narasimhan, and Shunyu Yao. “Can Language Models Solve Olympiad Programming?” arXiv, April 16, 2024. <a href="https://doi.org/10.48550/arXiv.2404.10952"target="_blank" rel="external nofollow noopener noreferrer">https://doi.org/10.48550/arXiv.2404.10952<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a>.</li>
<li>Shinn, Noah, Federico Cassano, Edward Berman, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. “Reflexion: Language Agents with Verbal Reinforcement Learning.” arXiv, October 10, 2023. <a href="https://doi.org/10.48550/arXiv.2303.11366"target="_blank" rel="external nofollow noopener noreferrer">https://doi.org/10.48550/arXiv.2303.11366<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a>.</li>
<li>Wang, Lei, Wanyu Xu, Yihuai Lan, Zhiqiang Hu, Yunshi Lan, Roy Ka-Wei Lee, and Ee-Peng Lim. “Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models.” arXiv, May 26, 2023. <a href="https://doi.org/10.48550/arXiv.2305.04091"target="_blank" rel="external nofollow noopener noreferrer">https://doi.org/10.48550/arXiv.2305.04091<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a>.</li>
<li>Xu, Binfeng, Zhiyuan Peng, Bowen Lei, Subhabrata Mukherjee, Yuchen Liu, and Dongkuan Xu. “ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models.” arXiv, May 22, 2023. <a href="https://doi.org/10.48550/arXiv.2305.18323"target="_blank" rel="external nofollow noopener noreferrer">https://doi.org/10.48550/arXiv.2305.18323<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a>.</li>
<li>Yao, Shunyu, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. “ReAct: Synergizing Reasoning and Acting in Language Models.” arXiv, March 9, 2023. <a href="https://doi.org/10.48550/arXiv.2210.03629"target="_blank" rel="external nofollow noopener noreferrer">https://doi.org/10.48550/arXiv.2210.03629<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a>.</li>
<li>“Yoheinakajima/Babyagi.” Accessed May 21, 2024. <a href="https://github.com/yoheinakajima/babyagi/tree/main"target="_blank" rel="external nofollow noopener noreferrer">https://github.com/yoheinakajima/babyagi/tree/main<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a>.</li>
<li>“LangGraph tutorials.” Accessed May 21, 2024.  <a href="https://langchain-ai.github.io/langgraph/tutorials/"target="_blank" rel="external nofollow noopener noreferrer">https://langchain-ai.github.io/langgraph/tutorials/<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a></li>
<li>Zhou, Andy, Kai Yan, Michal Shlapentokh-Rothman, Haohan Wang, and Yu-Xiong Wang. “Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models.” arXiv, December 5, 2023. <a href="https://doi.org/10.48550/arXiv.2310.04406"target="_blank" rel="external nofollow noopener noreferrer">https://doi.org/10.48550/arXiv.2310.04406<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a>.</li>
<li>Zhou, Pei, Jay Pujara, Xiang Ren, Xinyun Chen, Heng-Tze Cheng, Quoc V. Le, Ed H. Chi, Denny Zhou, Swaroop Mishra, and Huaixiu Steven Zheng. “Self-Discover: Large Language Models Self-Compose Reasoning Structures.” arXiv, February 5, 2024. <a href="https://doi.org/10.48550/arXiv.2402.03620"target="_blank" rel="external nofollow noopener noreferrer">https://doi.org/10.48550/arXiv.2402.03620<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a>.</li>
</ol>
]]></description>
    </item>
    <item>
      <title>Vector Database Comparison: Weaviate, Milvus, and Qdrant</title>
      <link>https://www.zair.top/en/post/vector-database-compare/</link>
      <pubDate>Tue, 02 Apr 2024 15:44:22 +0800</pubDate><author>blog@zair.top (Tim)</author>
      <guid>https://www.zair.top/en/post/vector-database-compare/</guid>
      <category domain="https://www.zair.top/en/categories/llm/">Large Language Models</category>
      <description><![CDATA[<p>The success of RAG systems largely depends on their ability to efficiently acquire and process massive amounts of information. Vector databases play an irreplaceable role in this and form the core of RAG systems. Vector databases are specifically designed to store and manage high-dimensional vector data, capable of converting and storing text, images, audio, and even video into vectors (this will be elaborated on later in the article). The ultimate effect that RAG systems can achieve depends on the performance of these underlying vector databases.</p>
<p>Among the many vector databases and vector libraries, each has its own characteristics, and choosing one suitable for your application scenario requires evaluation. This article will delve into the key factors to consider when choosing a vector database for RAG, including open-source availability, CRUD (Create, Read, Update, Delete) support, distributed architecture, replica support, scalability, performance, and continuous maintenance.</p>
<p>Currently, databases specifically designed for vectors like Weaviate, Milvus, Qdrant, Vespa, and Pinecone are highly regarded in the industry. In addition, some earlier vector libraries also have this functionality. This article will also compare various vector libraries, such as FAISS, HNSWLib, ANNOY, and SQL databases supporting vector functions, such as pgvector and Supabase.</p>
<div align="center"><img src="https://cdn1.zair.top/images/2024/04/30c04fed7d88adea8257c73eadc85b1e.png" alt="Image Semantic Search Implemented with Milvus" style="width: 50%;"></div>
Image Semantic Search Implemented with Milvus
<h2 id="vector-libraries-faiss-hnswlib-annoy" class="heading-element"><span>1 Vector Libraries (FAISS, HNSWLib, ANNOY)</span>
  <a href="#vector-libraries-faiss-hnswlib-annoy" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>The difference between vector databases and vector libraries is that vector libraries are mainly used for storing static data, where indexed data is immutable. This is because vector libraries only store vector embeddings and <strong>do not store the associated objects that generate these vector embeddings</strong>. Therefore, unlike vector databases, vector libraries do not support CRUD (Create, Read, Update, Delete) operations. This means adding new documents to existing indexes in vector libraries like FAISS or ANNOY can be difficult. HNSWLib is an exception, as it has CRUD functionality and uniquely supports concurrent read and write operations. However, it also suffers from the limitations of being a vector library, such as not providing deployment ecosystems, replication capabilities, and fault tolerance.</p>
<h2 id="full-text-search-databases-elasticsearch-opensearch" class="heading-element"><span>2 Full-Text Search Databases (ElasticSearch, OpenSearch)</span>
  <a href="#full-text-search-databases-elasticsearch-opensearch" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>Full-text search databases (e.g., ElasticSearch and OpenSearch) can support comprehensive text retrieval and advanced analysis functions. However, when it comes to performing vector similarity searches and handling high-dimensional data, they are not as strong as specialized vector databases. These databases often need to be used in conjunction with other tools to achieve semantic search, as they mainly rely on inverted indexes rather than vector indexes. According to Qdrant&rsquo;s test results, Elasticsearch lags in performance compared to vector databases like Weaviate, Milvus, and Qdrant.</p>
<h2 id="sql-databases-supporting-vectors-pgvector-supabase-starrocks" class="heading-element"><span>3 SQL Databases Supporting Vectors (pgvector, Supabase, StarRocks)</span>
  <a href="#sql-databases-supporting-vectors-pgvector-supabase-starrocks" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>SQL databases like pgvector provide a way to integrate vector data into existing data storage systems through their vector support extensions, but they also have some obvious drawbacks compared to dedicated vector databases.</p>
<p>The most obvious drawback is the mismatch between the relational model of traditional SQL databases and the nature of unstructured vector data. This mismatch leads to inefficient operations involving vector similarity searches, and these databases do not perform well in building indexes and handling large amounts of vector data, as detailed in the <a href="https://github.com/erikbern/ann-benchmarks"target="_blank" rel="external nofollow noopener noreferrer">ANN benchmarks<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a>. Additionally, the upper limit of vector dimensions supported by pgvector (2000 dimensions) is lower compared to dedicated vector databases like Weaviate, which can handle up to 65535-dimensional vector data. In terms of scalability and efficiency, dedicated vector databases also have more advantages. SQL database extensions supporting vectors, such as pgvector, are more suitable for scenarios where the amount of vector data is small (fewer than 100,000 vectors) and vector data is only a supplementary function of the application. Conversely, if vector data is the core of the application or if there are high requirements for scalability, dedicated vector databases would be a more suitable choice.</p>
<p>As for StarRocks, it is another system running on the SQL framework, optimized for online analytical processing (OLAP) and online transaction processing (OLTP) scenarios, but not specifically optimized for vector similarity searches.</p>
<h2 id="nosql-databases-supporting-vectors-redis-mongodb" class="heading-element"><span>4 NoSQL Databases Supporting Vectors (Redis, MongoDB)</span>
  <a href="#nosql-databases-supporting-vectors-redis-mongodb" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>The newly added vector support features in NoSQL databases are still in the early stages and have not been fully tested and verified. Taking Redis Vector Similarity Search (VSS) as an example, this feature was only released in April 2022, less than two years ago. Although Redis VSS can serve as a multifunctional database, it is not optimized and designed specifically for vector similarity search.</p>
<h2 id="specialized-vector-databases-pinecone-milvus-weaviate-qdrant-vald-chroma-vespa-vearch" class="heading-element"><span>5 Specialized Vector Databases (Pinecone, Milvus, Weaviate, Qdrant, Vald, Chroma, Vespa, Vearch)</span>
  <a href="#specialized-vector-databases-pinecone-milvus-weaviate-qdrant-vald-chroma-vespa-vearch" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>Specialized vector databases inherently support various vector operations, such as dot product, cosine similarity, etc. These databases are designed to handle high-dimensional data, capable of handling a large number of query requests, and can quickly complete similarity searches between vectors. To achieve these goals, they employ various indexing strategies, usually based on approximate nearest neighbor (ANN) algorithms. These algorithms need to balance efficiency, storage space usage, and search accuracy. For example, the FLAT index is a vector index that does not use any optimization or approximation techniques, meaning it can achieve 100% recall and accuracy, but it is slower and less efficient than other types of vector indexes; relatively speaking, the IVF_FLAT index sacrifices some accuracy for faster search speed; the HNSW index provides a compromise between accuracy and search speed.</p>
<p>Pinecone is a closed-source vector database maintained by a professional team, with limited scalability features in its free version. Chroma is a system specifically designed for audio data, but it has not been specially optimized for handling text data. Compared to other mainstream vector databases, Chroma has relatively scarce comprehensive performance benchmark data. Since Chroma uses SQLite as a document storage method in its 0.4 version, it may not be as scalable and efficient as other storage solutions specifically designed for vector data.</p>
<p>Vearch and Vald have shortcomings in integration with Langchain, which is very unfavorable for development use. Compared to competitors like Milvus, their developer community is smaller, and the maintenance of the open-source community is not active enough.</p>
<p>Therefore, for RAG, Weaviate, Milvus, Qdrant, and Vespa may be the best choices. In theory, the most suitable system should be selected based on performance and scalability benchmarks (see ANN Benchmarks below). However, there are also some system design and feature characteristics that need to be compared. The table below provides a visual comparison from these aspects.</p>
<table>
  <thead>
      <tr>
          <th>Database</th>
          <th>Qdrant</th>
          <th>Weaviate</th>
          <th>Milvus</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Open-source and self-hostable</td>
          <td>✅</td>
          <td>✅</td>
          <td>✅</td>
      </tr>
      <tr>
          <td>Open-source license</td>
          <td>Apache-2.0</td>
          <td>BSD</td>
          <td>Apache-2.0</td>
      </tr>
      <tr>
          <td>Development language</td>
          <td>Rust</td>
          <td>Go</td>
          <td>Go, C++</td>
      </tr>
      <tr>
          <td>Github Stars</td>
          <td>17k</td>
          <td>9.2k</td>
          <td>26.2k</td>
      </tr>
      <tr>
          <td>First release date</td>
          <td>2021</td>
          <td>2019</td>
          <td>2019</td>
      </tr>
      <tr>
          <td>SDK</td>
          <td>Python, JS, Go, Java, .Net, Rust</td>
          <td>Python, JS, Java, Go</td>
          <td>Python, Java, JS, Go</td>
      </tr>
      <tr>
          <td>Hosted cloud service</td>
          <td>✅</td>
          <td>✅</td>
          <td>✅</td>
      </tr>
      <tr>
          <td>Built-in text embedding</td>
          <td><a href="https://qdrant.github.io/fastembed/examples/Supported_Models/">✅FastEmbed</a></td>
          <td>✅</td>
          <td>❌</td>
      </tr>
      <tr>
          <td>Hybrid retrieval</td>
          <td>❌</td>
          <td>✅RRF*+RSF*</td>
          <td><a href="https://milvus.io/docs/multi-vector-search.md"> ✅In-table multi-vector hybrid</a></td>
      </tr>
      <tr>
          <td>Metadata filtering</td>
          <td>✅</td>
          <td>✅</td>
          <td>✅</td>
      </tr>
      <tr>
          <td>BM25 support</td>
          <td>❌</td>
          <td>✅</td>
          <td>✅</td>
      </tr>
      <tr>
          <td>Text search</td>
          <td>✅</td>
          <td>✅</td>
          <td>❌</td>
      </tr>
      <tr>
          <td>Single-point multi-vector</td>
          <td>✅</td>
          <td>✅</td>
          <td></td>
      </tr>
      <tr>
          <td>Tensor search</td>
          <td>❌</td>
          <td>❌</td>
          <td>❌</td>
      </tr>
      <tr>
          <td>Langchain integration</td>
          <td>✅</td>
          <td>✅</td>
          <td>✅</td>
      </tr>
      <tr>
          <td>Llama index integration</td>
          <td>✅</td>
          <td>✅</td>
          <td>✅</td>
      </tr>
      <tr>
          <td>Geo-geographic information search</td>
          <td>✅</td>
          <td>✅</td>
          <td>❌</td>
      </tr>
      <tr>
          <td>Multi-tenant support</td>
          <td>✅ via collections/metadata</td>
          <td>✅</td>
          <td></td>
      </tr>
      <tr>
          <td>Metadata and document size limit</td>
          <td>Unlimited</td>
          <td></td>
          <td></td>
      </tr>
      <tr>
          <td>Maximum dimension</td>
          <td>Unlimited</td>
          <td>65535</td>
          <td>32768</td>
      </tr>
      <tr>
          <td>Index types</td>
          <td>HNSW</td>
          <td>HNSW</td>
          <td>ANNOY, FAISS, HNSW, ScANN …</td>
      </tr>
      <tr>
          <td>Streaming index</td>
          <td>❌</td>
          <td></td>
          <td></td>
      </tr>
      <tr>
          <td>Sparse vector support</td>
          <td>❌</td>
          <td>❌</td>
          <td>❌</td>
      </tr>
      <tr>
          <td>Temporary index support (excluding server)</td>
          <td>✅</td>
          <td>❌</td>
          <td></td>
      </tr>
      <tr>
          <td>Sharding</td>
          <td></td>
          <td></td>
          <td></td>
      </tr>
      <tr>
          <td>Price</td>
          <td></td>
          <td></td>
          <td></td>
      </tr>
      <tr>
          <td>Facets (aggregation with counts)</td>
          <td>❌</td>
          <td>✅</td>
          <td></td>
      </tr>
      <tr>
          <td>Built-in image embedding</td>
          <td></td>
          <td>✅</td>
          <td></td>
      </tr>
      <tr>
          <td>Recommendation API</td>
          <td>✅</td>
          <td></td>
          <td></td>
      </tr>
      <tr>
          <td>Personalization</td>
          <td></td>
          <td></td>
          <td></td>
      </tr>
      <tr>
          <td>User events</td>
          <td></td>
          <td></td>
          <td></td>
      </tr>
      <tr>
          <td>Call built-in LLM for RAG</td>
          <td></td>
          <td><a href="https://weaviate.io/developers/weaviate/modules/reader-generator-modules/generative-openai">✅Generative Search </a></td>
          <td></td>
      </tr>
  </tbody>
</table>
<table>
  <thead>
      <tr>
          <th>Database</th>
          <th>Qdrant</th>
          <th>Weaviate</th>
          <th>Milvus</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Subjective advantages</td>
          <td>1. Can store multiple types of vectors (images, text, etc.) in one collection<br>2. Very low resource usage</td>
          <td>1. Relatively good performance<br>2. Supports built-in embedding<br>3. Supports text search<br>4. GraphQL API<br>5. Supports S3 backup</td>
          <td>1. Officially supported visual operation interface<br>2. High search accuracy<br>3. Rich SDK<br>4. GPU acceleration</td>
      </tr>
  </tbody>
</table>
<p>In summary, Qdrant has particularly low overhead, <strong>Weaviate supports a combination of vector search, object storage, and inverted index</strong>, and Milvus has the strongest performance and the most features.</p>
<h2 id="comparison-of-search-methods-in-vector-databases" class="heading-element"><span>6 Comparison of Search Methods in Vector Databases</span>
  <a href="#comparison-of-search-methods-in-vector-databases" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><div align="center"><img src="https://cdn1.zair.top/images/2024/04/15d0b710c084965f99586dceccb62898.png" alt="Comparison of Search Methods in Vector Databases" style="width: 75%;"></div>
<table>
  <thead>
      <tr>
          <th></th>
          <th>Milvus</th>
          <th>Weaviate</th>
          <th>Qdrant</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Unique search methods</td>
          <td>Multi-vector search</td>
          <td>BM25 keyword search + hybrid search</td>
          <td>Keyword filtering applied to vector search</td>
      </tr>
  </tbody>
</table>
<h3 id="milvus" class="heading-element"><span>6.1 Milvus</span>
  <a href="#milvus" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p>Milvus supports two types of searches, depending on the number of vector fields in the collection: single-vector search and multi-vector search.</p>
<p>Single-vector search uses the search() method, comparing the query vector with existing vectors in the collection, returning the IDs of the most similar entities and their distances, and optionally returning the vector values and metadata of the results.</p>
<p>Multi-vector search applies to collections with two or more vector fields and is executed through the hybrid_search() method, which performs multiple approximate nearest neighbor (ANN) search requests and combines the results for re-ranking to return the most relevant matches. (Supported only in the latest 2.4.x version, with a maximum of 10 vectors for search)</p>
<p><strong>Multi-vector search</strong> is particularly suitable for complex situations requiring high precision, especially when an entity can be represented by multiple different vectors. This applies to the same data (e.g., a sentence) processed by different embedding models or when multimodal information (e.g., a person&rsquo;s image, fingerprint, and voiceprint) is converted into various vector formats. By performing &ldquo;multi-path recall&rdquo; across the table and assigning weights to these vectors, their combined effect can significantly increase recall capability and improve the effectiveness of search results.</p>
<p>Other basic search operations:</p>
<ul>
<li>Basic searches include single-vector search, batch vector search, partition search, and searches with specified output fields.</li>
<li>Filtered search refines search results based on filtering conditions of scalar fields.</li>
<li>Range search finds vectors within a specific distance range from the query vector.</li>
<li>Grouped search groups search results based on specific fields to ensure diversity in the results.</li>
</ul>
<h3 id="weaviate" class="heading-element"><span>6.2 Weaviate</span>
  <a href="#weaviate" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><ul>
<li>Vector similarity search: Covers a range of approximate search methods, seeking objects most similar to the query vector representation.</li>
<li>Image search: Uses images as input for similarity search.</li>
<li><strong>Keyword search: A keyword search using the BM25F algorithm to rank results.</strong></li>
<li><strong>Hybrid search: Combines BM25 and similarity search to rank results.</strong></li>
<li>Generative search: Uses search results as prompts for LLM.</li>
<li>Re-ranking: Re-ranks retrieved search results using a re-ranking module.</li>
<li>Aggregation: Aggregates data from the result set.</li>
<li>Filters: Applies conditional filters to searches.</li>
</ul>
<h3 id="qdrant" class="heading-element"><span>6.3 Qdrant</span>
  <a href="#qdrant" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p>Supported basic search operations:</p>
<ul>
<li>Filtering by relevance score</li>
<li>Loading multiple search operations in a single request</li>
<li>Recommendation API</li>
<li>Grouping operations</li>
</ul>
<p>Other search methods supported by Qdrant:</p>
<blockquote>
<p><a href="https://qdrant.tech/documentation/faq/qdrant-fundamentals/#does-qdrant-support-a-full-text-search-or-a-hybrid-search"target="_blank" rel="external nofollow noopener noreferrer">Does Qdrant support a full-text search or a hybrid search?<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a></p>
</blockquote>
<blockquote>
<p>Qdrant is primarily a vector search engine, and we only implement full-text support when it does not affect vector search use cases. This includes interfaces and performance.</p>
<p>What Qdrant can do:</p>
<ul>
<li>Use full-text filters for search</li>
<li><strong>Apply full-text filters to vector searches</strong> (i.e., perform vector searches within records containing specific words or phrases)</li>
<li>Perform prefix search and semantic instant search</li>
</ul>
<p>Features Qdrant plans to introduce in the future:</p>
<ul>
<li>Support for sparse vectors, such as those used in SPLADE or similar models</li>
</ul>
<p>Features Qdrant does not intend to support:</p>
<ul>
<li>BM25 or other non-vector-based retrieval or ranking functions</li>
<li>Built-in ontologies or knowledge graphs</li>
<li>Query analyzers and other NLP tools</li>
</ul>
</blockquote>
<div class="details admonition question open">
  <div class="details-summary admonition-title"><i class="icon fa-fw fa-regular fa-circle-question" aria-hidden="true"></i>What is the difference between BM25 and simple keyword search?<i class="details-icon fa-solid fa-angle-right fa-fw" aria-hidden="true"></i></div>
  <div class="details-content">
    <div class="admonition-content"><ol>
<li><strong>Relevance Scoring</strong>:
<ul>
<li><strong>Simple keyword search</strong> is usually based on term frequency: if a term appears in a document, then the document is considered relevant. This method may only count the occurrence of keywords, and all keywords are considered equally important.</li>
<li><strong>BM25</strong> uses a more complex algorithm that considers term frequency, document length, and the inverse document frequency of the term (i.e., its rarity across all documents). This means BM25 can provide a more refined relevance score, better reflecting the match between the query and the document.</li>
</ul>
</li>
<li><strong>Document Length Handling</strong>:
<ul>
<li>Simple keyword search may not consider the length of the document. This may lead to longer documents (containing more words) being overly prioritized simply because they have more opportunities to contain the keywords.</li>
<li>BM25 considers the length of the document through a normalization process within its algorithm, avoiding this bias and ensuring fairness in relevance scoring for both long and short documents.</li>
</ul>
</li>
<li><strong>Importance of Query Terms</strong>:
<ul>
<li>In simple keyword search, all keywords may be treated equally, regardless of their commonality.</li>
<li>BM25 uses inverse document frequency (IDF) to adjust the importance of each query term. This means terms that appear in fewer documents (more unique terms) will have a greater impact on the document&rsquo;s relevance score.</li>
</ul>
</li>
<li><strong>Parameter Tuning</strong>:
<ul>
<li>Simple keyword search usually does not have many configurable parameters to optimize search results.</li>
<li>BM25 provides parameters (such as k1 and b) that allow fine-tuning of the algorithm&rsquo;s sensitivity to suit different types of text and search needs.</li>
</ul>
</li>
</ol>
<p>Compared to simple keyword search, BM25 offers a more complex and refined method for evaluating the relevance between documents and queries, capable of producing more accurate and user-expected search results.</p></div>
  </div>
</div>
<p>The current dilemma is whether there is a solution that can achieve both the semantic search characteristics of vector databases and the precision characteristics of traditional keyword search.</p>
<h2 id="appendix" class="heading-element"><span>7 Appendix</span>
  <a href="#appendix" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><h3 id="ann-benchmarks" class="heading-element"><span>7.1 ANN Benchmarks</span>
  <a href="#ann-benchmarks" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><blockquote>
<p>Benchmarks are influenced by various factors affecting database performance, such as search type (filtered search or regular search), configuration settings, indexing algorithms, data embeddings, hardware, etc. In addition to the performance of benchmark tests, the selection of vector libraries should also consider distributed capabilities, support for memory replicas and caching, adopted indexing algorithms, vector similarity search capabilities (including hybrid search, filtering, and multiple similarity metrics), sharding mechanisms, clustering methods, scalability potential, data consistency, and overall system availability.</p>
</blockquote>
<p><a href="https://github.com/erikbern/ann-benchmarks"target="_blank" rel="external nofollow noopener noreferrer">ANN-Benchmarks<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a> is a primary benchmarking platform for evaluating the performance of approximate nearest neighbor search algorithms. In text retrieval, the performance of vector databases on angular metrics is often more important than their performance on Euclidean metrics. This is because angular metrics are more sensitive to the semantic similarity of text documents, while Euclidean metrics are more sensitive to document length and scale. Therefore, when considering the context of retrieval-enhanced generation, more attention should be paid to evaluating the performance of vector databases on angular datasets across different dimensions.</p>
<h4 id="glove-100-angular" class="heading-element"><span>7.1.1 glove-100-angular</span>
  <a href="#glove-100-angular" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h4><div align="center"><img src="https://cdn1.zair.top/images/2024/04/77413edaa99720289808ab7dd1da27cc.png" alt="glove-100-angular benchmark" style="width: 75%;"></div>
Evidently, Milvus has the highest throughput when the recall value is below 0.95. When the recall value exceeds 0.95, the throughput gap narrows. Vespa has the longest build time. Weaviate and Milvus have comparable build times, but Milvus is slightly longer. In terms of index size, Weaviate's index is the smallest. Although Milvus's index is the largest, it is still less than 1.5GB (for a dataset containing 1.2 million vectors, each with 100 dimensions).
<h4 id="nytimes-256-angular" class="heading-element"><span>7.1.2 nytimes-256-angular</span>
  <a href="#nytimes-256-angular" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h4><div align="center"><img src="https://cdn1.zair.top/images/2024/04/d2ef6ccb67567ccf48793c7df8520f65.png" alt="nytimes-256-angular benchmark" style="width: 75%;"></div>
The results on this dataset are similar to those on the glove-100-angular dataset. Weaviate has the longest build time and the smallest index on this dataset. Milvus's index is the largest, but it is only 440MB (for a dataset containing 290,000 vectors, each with 256 dimensions).
<h3 id="vector-similarity-metrics" class="heading-element"><span>7.2 Vector Similarity Metrics</span>
  <a href="#vector-similarity-metrics" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><table>
  <thead>
      <tr>
          <th>Metric</th>
          <th>Description</th>
          <th>Supported Databases</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Cosine Distance</td>
          <td>Measures the cosine of the angle between two vectors</td>
          <td>pgvector, Pinecone, Weaviate, Qdrant, Milvus, Vespa</td>
      </tr>
      <tr>
          <td>Euclidean Distance (L2)</td>
          <td>Calculates the straight-line distance between two vectors in multidimensional space</td>
          <td>pgvector, Pinecone, Qdrant, Milvus, Vespa</td>
      </tr>
      <tr>
          <td>Inner Product (Dot Product)</td>
          <td>Calculates the sum of the products of corresponding vector components</td>
          <td>pgvector, Pinecone, Weaviate, Qdrant, Milvus</td>
      </tr>
      <tr>
          <td>L2 Squared Distance</td>
          <td>The square of the Euclidean distance between two vectors</td>
          <td>Weaviate</td>
      </tr>
      <tr>
          <td>Hamming Distance</td>
          <td>Measures the number of differences between vectors in each dimension</td>
          <td>Weaviate, Milvus, Vespa</td>
      </tr>
      <tr>
          <td>Manhattan Distance</td>
          <td>Measures the distance between vector dimensions along right-angle axes</td>
          <td>Weaviate</td>
      </tr>
  </tbody>
</table>
<p>Below is a detailed introduction to each metric, including their relative advantages, disadvantages, and suitable use cases.</p>
<h4 id="cosine-distance" class="heading-element"><span>7.2.1 Cosine Distance</span>
  <a href="#cosine-distance" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h4><p>Cosine distance measures the cosine of the angle between two vectors, commonly used for handling normalized or convex sets.</p>
<ul>
<li>Advantages: Mainly considers the direction of vectors, making it very suitable for high-dimensional spaces, such as text comparison, where document length is less important.</li>
<li>Disadvantages: Not suitable for scenarios requiring matching vector dimensions, such as comparing image embeddings based on pixel density. If the data does not form a convex set, it may not provide an accurate similarity measure.</li>
</ul>
<p>Cosine distance is suitable for document classification, semantic search, recommendation systems, and any other tasks involving high-dimensional and standardized data. In information retrieval, cosine distance is often used to measure the similarity between query content and document vectors, ignoring their length but focusing on semantic meaning.</p>
<h4 id="euclidean-distance-l2" class="heading-element"><span>7.2.2 Euclidean Distance L2</span>
  <a href="#euclidean-distance-l2" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h4><p>Euclidean distance calculates the straight-line distance between two vectors in multidimensional space, also known as the second norm.</p>
<ul>
<li>Advantages: Intuitive, easy to calculate, sensitive to both the size and direction of vectors.</li>
<li>Disadvantages: May perform poorly in high-dimensional spaces due to the &ldquo;curse of dimensionality.&rdquo;</li>
</ul>
<p>Suitable for image recognition, speech recognition, handwriting analysis, and other scenarios.</p>
<h4 id="inner-product" class="heading-element"><span>7.2.3 Inner Product</span>
  <a href="#inner-product" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h4><p>Inner product calculates the sum of the products of corresponding vector components, also known as the nth norm.</p>
<ul>
<li>Advantages: Fast calculation, reflects the size and direction of vectors.</li>
<li>Disadvantages: Sensitive to both the direction and size of vectors.</li>
</ul>
<p>The classic application of inner product is in the field of recommendation systems. In recommendation systems, the inner product can be used to determine the similarity between user vectors and item vectors, helping predict a user&rsquo;s interest in an item. Inner product is suitable for recommendation systems, collaborative filtering, and matrix decomposition.</p>
<h4 id="l2-squared-distance" class="heading-element"><span>7.2.4 L2 Squared Distance</span>
  <a href="#l2-squared-distance" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h4><p>The square of the Euclidean distance between two vectors.</p>
<ul>
<li>Advantages: Penalizes large differences between vector elements, which can be useful in some situations.</li>
<li>Disadvantages: The square operation may distort distances and is sensitive to outliers.</li>
</ul>
<p>L2 squared distance is particularly suitable for problems involving differences in individual dimensions, such as comparing the differences between two images in image processing.</p>
<h4 id="hamming-distance" class="heading-element"><span>7.2.5 Hamming Distance</span>
  <a href="#hamming-distance" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h4><p>Measures the number of differences between vectors in each dimension.</p>
<ul>
<li>Advantages: Suitable for comparing binary or categorical data.</li>
<li>Disadvantages: Not applicable to continuous or numerical data.</li>
</ul>
<p>The applicable scenarios are also quite specific, such as error detection and correction (categorical data); measuring the genetic distance between two DNA strands.</p>
<h4 id="manhattan-distance-l1" class="heading-element"><span>7.2.6 Manhattan Distance L1</span>
  <a href="#manhattan-distance-l1" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h4><p>Measures the distance between vector dimensions along right-angle axes, also known as the first norm.</p>
<ul>
<li>Advantages: More resistant to outliers than Euclidean distance.</li>
<li>Disadvantages: Less intuitive in geometric terms compared to Euclidean distance.</li>
</ul>
<p>Suitable for calculating chessboard distance and shortest path problems in logistics planning.</p>
<h2 id="references" class="heading-element"><span>8 References</span>
  <a href="#references" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><ol>
<li><a href="https://github.com/milvus-io/milvus"target="_blank" rel="external nofollow noopener noreferrer">https://github.com/milvus-io/milvus<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a></li>
<li><a href="https://www.farfetchtechblog.com/en/blog/post/powering-ai-with-vector-databases-a-benchmark-part-i/"target="_blank" rel="external nofollow noopener noreferrer">Powering AI With Vector Databases: A Benchmark - Part I - Data - Blog - F-Tech (farfetchtechblog.com)<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a></li>
<li><a href="https://qdrant.tech/documentation/faq/qdrant-fundamentals/"target="_blank" rel="external nofollow noopener noreferrer">Fundamentals - Qdrant<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a></li>
<li><a href="https://milvus.io/docs"target="_blank" rel="external nofollow noopener noreferrer">Milvus documentation<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a></li>
<li><a href="https://weaviate.io/developers/weaviate"target="_blank" rel="external nofollow noopener noreferrer">Home | Weaviate - Vector Database<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a></li>
<li><a href="https://qdrant.tech/documentation/"target="_blank" rel="external nofollow noopener noreferrer">Qdrant Documentation - Qdrant<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a></li>
<li><a href="https://qdrant.tech/use-cases/"target="_blank" rel="external nofollow noopener noreferrer">Vector Database Use Cases - Qdrant<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a></li>
</ol>
]]></description>
    </item>
    <item>
      <title>From AI Knowledge Base to RAG</title>
      <link>https://www.zair.top/en/post/rag-knowledge-base/</link>
      <pubDate>Thu, 21 Mar 2024 17:14:22 +0800</pubDate><author>blog@zair.top (Tim)</author>
      <guid>https://www.zair.top/en/post/rag-knowledge-base/</guid>
      <category domain="https://www.zair.top/en/categories/llm/">Large Language Models</category>
      <description><![CDATA[<p>When building AI applications, there is the problem of &ldquo;AI not having seen the data in the task.&rdquo; For example, for enterprises, AI cannot grasp the information of every customer; for individuals, AI is not well aware of some personal information and privacy information. Even if AI is very capable (the ideal world model is no exception), without data for specific tasks, it loses the ability to &ldquo;analyze specific problems specifically.&rdquo;</p>
<h2 id="what-is-rag" class="heading-element"><span>1 What is RAG</span>
  <a href="#what-is-rag" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>Improving the accuracy and reliability of generative AI models by retrieving external information is Retrieval-Augmented Generation (<strong>R</strong>etrieval-<strong>A</strong>ugmented <strong>G</strong>eneration). If the process of a large language model (LLM) completing a task is compared to an exam, then a large model with RAG is equivalent to an open-book exam, while without RAG, it is like a closed-book exam. RAG is a technology that helps LLMs <strong>retrieve</strong> information to improve <strong>generation</strong> results.</p>
<p>RAG was first proposed by Patrick Lewis and others in <a href="https://arxiv.org/pdf/2005.11401.pdf"target="_blank" rel="external nofollow noopener noreferrer">this paper<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a>, and the company they worked for is Cohere, which currently provides API services including Embedding and Rerank models with good performance.</p>
<h2 id="why-rag-is-needed" class="heading-element"><span>2 Why RAG is needed</span>
  <a href="#why-rag-is-needed" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>The emergence of RAG is to solve some problems and deficiencies of large language models in applications. The most prominent point is the <strong>hallucination problem of large models</strong>, where the output of large models does not match facts or fabricates some answers. Also, the data used to train LLMs may be outdated, and LLMs know nothing about relatively new information.</p>
<p>RAG allows LLMs to access the latest or customized information and allows users to verify the information sources of LLMs to ensure their accuracy. The data retrieved by RAG can be public (such as search engines) or private (such as company information, personal sensitive data), which gives RAG broad application prospects. RAG is already widely used, such as Nvidia&rsquo;s NeMo Retriever reading internal company information, and Kimi Chat from the Dark Side of the Moon using search engines to answer questions.</p>
<div align="center"><img src="https://cdn1.zair.top/images/2024%2F03%2Fa4c4933267645b81ed019e29ccc8ced6.png" alt="Huang Renxun introducing NeMo Retriever at GTC2024" style="width: 75%;"></div>
Huang Renxun introducing NeMo Retriever at GTC2024
<h2 id="knowledge-base-built-around-rag" class="heading-element"><span>3 Knowledge Base Built Around RAG</span>
  <a href="#knowledge-base-built-around-rag" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>AI knowledge bases are important tools that allow AI to &ldquo;tailor to fit.&rdquo; By helping AI better complete tasks through knowledge bases, the current construction of AI knowledge bases can be done in the following three ways:</p>
<ul>
<li>Prompt Engineering</li>
<li>Fine Tuning</li>
<li>Embedding</li>
</ul>
<p>Prompt engineering is to directly build a knowledge base in the prompt, putting all the information into the prompt. This method is suitable for small-scale use, but the number of tokens that current AI models can input basically cannot meet this implementation method. In fact, even as AI develops, one day when AI&rsquo;s input window is large enough to accommodate a general knowledge base, building a knowledge base will still have its value. Because the length of the input content will affect AI&rsquo;s performance (<del>at least the current models are like this</del>), you can check <a href="https://github.com/gkamradt/LLMTest_NeedleInAHaystack"target="_blank" rel="external nofollow noopener noreferrer">Needle In A Haystack - Pressure Testing LLMs<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a> for details.</p>
<p>Fine-tuning is a form that is popular in academia, using specific task data to fine-tune on pre-trained models. This approach is actually suitable for making an industry-general large model, such as a legal industry large model, a medical large model, etc. On one hand, the training data required for fine-tuning is not small, and the cost is high; on the other hand, fine-tuning is not flexible enough, such as timely adjustments based on one or two documents. The process of fine-tuning is actually learning and generalizing the training data, rather than memorizing the content, it is more about enhancing the ability in a certain field.</p>
<p>So the most mainstream way to build a knowledge base currently is mostly using the Embedding method. And this form of knowledge base also needs to be combined with RAG to be effective.</p>
<h2 id="basic-components-of-rag" class="heading-element"><span>4 Basic Components of RAG</span>
  <a href="#basic-components-of-rag" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>A classic, basic RAG composition is shown in the figure below.</p>
<div align="center"><img src="https://cdn1.zair.top/images/2024/03/rag-base.drawio.svg" alt="Basic Components of RAG" style="width: 75%;"></div>
The RAG system mainly includes three stages: indexing, retrieval, and generation.
<h3 id="embedding" class="heading-element"><span>4.1 Embedding</span>
  <a href="#embedding" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p>In this process, users need to upload documents first, and the system stores the uploaded documents in a vector database after embedding. Embedding is to convert semantically similar texts into vectors that are close in distance, so this process is commonly known as vectorization.</p>
<h3 id="retrieval" class="heading-element"><span>4.2 Retrieval</span>
  <a href="#retrieval" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p>When users ask LLMs questions, the content of the question will be embedded and then matched in the vector database, querying a series of content. This is the first stage of retrieval.</p>
<h3 id="rerank" class="heading-element"><span>4.3 Rerank</span>
  <a href="#rerank" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p>The content directly queried in the vector database may not be perfect, and the results often do not match the query content, so a second stage of retrieval is needed, which is Rerank. In this stage, the Rerank model will reorder the content queried in the previous stage and output the results according to relevance. After Rerank is completed, taking the Top K can be applied in the subsequent generation stage.</p>
<h2 id="implementing-rag-in-5-lines-of-code" class="heading-element"><span>5 Implementing RAG in 5 Lines of Code</span>
  <a href="#implementing-rag-in-5-lines-of-code" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><blockquote>
<p><del>An assignment statement counts as one line</del></p>
</blockquote>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span><span class="lnt">19
</span><span class="lnt">20
</span><span class="lnt">21
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">transformers</span> <span class="kn">import</span> <span class="n">RagTokenizer</span><span class="p">,</span> <span class="n">RagRetriever</span><span class="p">,</span> <span class="n">RagTokenForGeneration</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">datasets</span> <span class="kn">import</span> <span class="n">load_dataset</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">retriever</span> <span class="o">=</span> <span class="n">RagRetriever</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="n">config_name</span> <span class="o">=</span> <span class="s2">&#34;facebook/rag-token-nq&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">index_name</span> <span class="o">=</span> <span class="s2">&#34;compressed&#34;</span><span class="p">,</span>  <span class="c1"># Type of index</span>
</span></span><span class="line"><span class="cl">    <span class="n">use_dummy_dataset</span> <span class="o">=</span> <span class="kc">True</span><span class="p">,</span>  <span class="c1"># Use a dummy dataset for testing</span>
</span></span><span class="line"><span class="cl">    <span class="n">dataset</span> <span class="o">=</span> <span class="s2">&#34;wiki_dpr&#34;</span>  <span class="c1"># Dataset used for retrieval</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Load the pre-trained tokenizer and model</span>
</span></span><span class="line"><span class="cl"><span class="n">tokenizer</span> <span class="o">=</span> <span class="n">RagTokenizer</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span><span class="s2">&#34;facebook/rag-token-nq&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">model</span> <span class="o">=</span> <span class="n">RagTokenForGeneration</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span><span class="s2">&#34;facebook/rag-token-nq&#34;</span><span class="p">,</span> <span class="n">retriever</span><span class="o">=</span><span class="n">retriever</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Input question and convert to vector</span>
</span></span><span class="line"><span class="cl"><span class="n">input_ids</span> <span class="o">=</span> <span class="n">tokenizer</span><span class="p">(</span><span class="n">input_text</span> <span class="o">=</span> <span class="s2">&#34;What is the capital of France?&#34;</span><span class="p">,</span> 
</span></span><span class="line"><span class="cl">		<span class="n">return_tensors</span><span class="o">=</span><span class="s2">&#34;pt&#34;</span><span class="p">)</span><span class="o">.</span><span class="n">input_ids</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Generate answer</span>
</span></span><span class="line"><span class="cl"><span class="n">outputs</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">generate</span><span class="p">(</span><span class="n">input_ids</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="n">outputs</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">skip_special_tokens</span><span class="o">=</span><span class="kc">True</span><span class="p">))</span></span></span></code></pre></td></tr></table>
</div>
</div><p><code>RagTokenizer</code> is used for tokenizing text, <code>RagTokenForGeneration</code> is the generator part of the RAG model, and <code>RagRetriever</code> is responsible for retrieval. <code>RagTokenizer.from_pretrained(&quot;facebook/rag-token-nq&quot;)</code> loads a pre-trained tokenizer to convert text into a format that the model can understand (i.e., tokenization). <code>RagTokenForGeneration.from_pretrained(&quot;facebook/rag-token-nq&quot;, retriever=retriever)</code> loads a pre-trained RAG model. <code>facebook/rag-token-nq</code> is the name of the model and tokenizer, which are pre-trained on the Natural Questions dataset.</p>
<h2 id="open-source-rag-implementations" class="heading-element"><span>6 Open-source RAG Implementations</span>
  <a href="#open-source-rag-implementations" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p><a href="https://github.com/langgenius/dify"target="_blank" rel="external nofollow noopener noreferrer">Dify<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a> is an LLM application development platform, with over 100,000 applications built based on <a href="http://dify.ai/"target="_blank" rel="external nofollow noopener noreferrer">Dify.AI<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a>. It integrates the concept of Backend as Service and LLMOps, covering the core technology stack needed to build generative AI native applications, <strong>including a built-in RAG engine</strong>. With Dify, you can deploy capabilities similar to Assistants API and GPTs based on any model. This project is hosted by a company in Suzhou and provides SaaS services.</p>
<p><a href="https://github.com/chatchat-space/Langchain-Chatchat"target="_blank" rel="external nofollow noopener noreferrer">Langchain-Chatchat<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a> is an open-source, offline deployable retrieval-augmented generation (RAG) large model knowledge base project based on large language models like ChatGLM and application frameworks like Langchain. Initially, it only supported the ChatGLM model, but later added support for many open-source models and online models.</p>
<p>The functional comparison of the two is shown in the table below:</p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>Dify-api</th>
          <th>ChatChat</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Peripheral Capabilities</td>
          <td>General Document Reading</td>
          <td>General Document <br> Image OCR</td>
      </tr>
      <tr>
          <td>Data Sources</td>
          <td>Document Text Content <br> Vector Database</td>
          <td>Search Engine <br> Vector Database</td>
      </tr>
      <tr>
          <td>Model Support</td>
          <td>Online Embedding Model <br> Online Rerank Model <br> Online LLM</td>
          <td>Online Embedding Model <br> Offline Embedding Model <br> Offline LLM</td>
      </tr>
      <tr>
          <td>Advanced Features</td>
          <td>ES Hybrid Retrieval</td>
          <td>None</td>
      </tr>
      <tr>
          <td>Advanced RAG</td>
          <td>Not Supported</td>
          <td>Not Supported</td>
      </tr>
  </tbody>
</table>
<p>In fact, there are some features that current open-source projects do not fully cover, such as:</p>
<ul>
<li>Multimodal Capabilities</li>
<li>Traditional Relational Database Support</li>
<li>Multi-database Joint/Cross-database Information Retrieval</li>
<li>Citation Function</li>
<li>Advanced RAG</li>
<li>Evaluation Metrics</li>
</ul>
<h2 id="references" class="heading-element"><span>7 References</span>
  <a href="#references" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><ol>
<li><a href="https://github.com/gkamradt/LLMTest_NeedleInAHaystack"target="_blank" rel="external nofollow noopener noreferrer">gkamradt/LLMTest_NeedleInAHaystack: Doing simple retrieval from LLM models at various context lengths to measure accuracy (github.com)<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a></li>
<li><a href="https://research.ibm.com/blog/retrieval-augmented-generation-RAG"target="_blank" rel="external nofollow noopener noreferrer">What is retrieval-augmented generation? | IBM Research Blog<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a></li>
<li><a href="https://www.langchain.com/retrieval"target="_blank" rel="external nofollow noopener noreferrer">Retrieval (langchain.com)<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a></li>
<li><a href="https://github.com/langgenius/dify"target="_blank" rel="external nofollow noopener noreferrer">langgenius/dify (github.com)<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a></li>
<li><a href="https://github.com/chatchat-space/Langchain-Chatchat"target="_blank" rel="external nofollow noopener noreferrer">Langchain-Chatchat (github.com)<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a></li>
</ol>
]]></description>
    </item>
    <item>
      <title>stderr and stdout: Understanding Logs and Output</title>
      <link>https://www.zair.top/en/post/standard-output/</link>
      <pubDate>Sat, 17 Feb 2024 20:42:32 +0000</pubDate><author>blog@zair.top (Tim)</author>
      <guid>https://www.zair.top/en/post/standard-output/</guid>
      <category domain="https://www.zair.top/en/categories/computer-technology/">Computer Technology</category>
      <category domain="https://www.zair.top/en/categories/coding/">Coding</category>
      <description><![CDATA[<p>Standard output (<code>stdout</code>) and standard error (<code>stderr</code>) are simple concepts but play a central role in logging, error handling, and data stream management. This article explores the differences and applications of <code>stdout</code> and <code>stderr</code>, especially how to use them effectively in a Python environment.</p>
<h2 id="standard-output-stdout-and-standard-error-stderr" class="heading-element"><span>1 Standard Output (<code>stdout</code>) and Standard Error (<code>stderr</code>)</span>
  <a href="#standard-output-stdout-and-standard-error-stderr" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>In most operating systems, standard output and standard error are the two main output streams of a process. They provide a mechanism for a process to send information and error messages to a terminal or file. Although these two streams may be physically the same (e.g., both displayed on the same terminal interface), they are used for different logical purposes:</p>
<ul>
<li><strong>Standard Output (<code>stdout</code>)</strong>: Typically used for outputting the results of program execution or normal operational information.</li>
<li><strong>Standard Error (<code>stderr</code>)</strong>: Specifically used for outputting error messages or warnings, which usually need to be seen or recorded even when standard output is redirected.</li>
</ul>
<h2 id="print-and-logging-in-python" class="heading-element"><span>2 <code>print</code> and <code>logging</code> in Python</span>
  <a href="#print-and-logging-in-python" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>In Python, the <code>print</code> function sends information to <code>stdout</code> by default, while the <code>logging</code> module sends log messages to <code>stderr</code> by default. This distinction is made to separate normal program output from logs (including error and debug information), making it easier for developers to manage and filter output information.</p>
<h3 id="using-print" class="heading-element"><span>2.1 Using <code>print</code></span>
  <a href="#using-print" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p><code>print</code> is the most basic output function in Python, used to send information to the standard output stream. It is simple and easy to use, suitable for quick debugging or displaying information to the user. For example:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="s2">&#34;Hello, world!&#34;</span><span class="p">)</span></span></span></code></pre></td></tr></table>
</div>
</div><h3 id="using-logging" class="heading-element"><span>2.2 Using <code>logging</code></span>
  <a href="#using-logging" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p>The <code>logging</code> module provides a flexible framework for adding log messages to an application. Unlike <code>print</code>, <code>logging</code> supports different log levels (DEBUG, INFO, WARNING, ERROR, CRITICAL), allowing developers to adjust the detail level and output location of logs as needed. For example:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">logging</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">logging</span><span class="o">.</span><span class="n">error</span><span class="p">(</span><span class="s1">&#39;This is an error message&#39;</span><span class="p">)</span></span></span></code></pre></td></tr></table>
</div>
</div><h2 id="tqdm-and-stderr" class="heading-element"><span>3 tqdm and <code>stderr</code></span>
  <a href="#tqdm-and-stderr" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>In complex or long-running programs, using a progress bar is an effective way to show the progress of a process to the user. Python&rsquo;s <code>tqdm</code> library is a widely used tool for adding progress bars to the command line. <code>tqdm</code> outputs progress information to <code>stderr</code> by default to avoid interfering with normal program output (<code>stdout</code>).</p>
<h3 id="splitting-stdout-and-stderr" class="heading-element"><span>3.1 Splitting <code>stdout</code> and <code>stderr</code></span>
  <a href="#splitting-stdout-and-stderr" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p>In some cases, it is useful to separate normal output from error or log messages, such as redirecting them to different files or terminals. On the command line, redirection operators <code>&gt;</code> and <code>2&gt;</code> can be used to achieve this. In Python code, finer control can be achieved by configuring the <code>logging</code> module or using specific file objects.</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">python script.py &gt; output.log 2&gt; error.log</span></span></code></pre></td></tr></table>
</div>
</div><p>By using command line redirection, Python&rsquo;s <code>print</code> function, or even the <code>logging</code> module, you can flexibly control and split these two types of output, making error handling, logging, and user interaction clearer and more orderly.</p>
<h2 id="managing-stdout-and-stderr-with-nohup" class="heading-element"><span>4 Managing <code>stdout</code> and <code>stderr</code> with <code>nohup</code></span>
  <a href="#managing-stdout-and-stderr-with-nohup" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>When deploying long-running background processes, the <code>nohup</code> command becomes an important tool. <code>nohup</code>, or &ldquo;no hang up&rdquo;, allows commands to continue running after the user logs out, which is especially useful for remotely started tasks. A key feature of <code>nohup</code> is its ability to manage <code>stdout</code> and <code>stderr</code>.</p>
<p>By default, using <code>nohup</code> to run a command will merge <code>stdout</code> and <code>stderr</code> and redirect them to the <code>nohup.out</code> file unless otherwise specified. This means that both normal output and error messages will be captured in the same file for later review. However, in some cases, it may be more useful to separate these two outputs.</p>
<h3 id="separating-stdout-and-stderr-with-nohup" class="heading-element"><span>4.1 Separating <code>stdout</code> and <code>stderr</code> with <code>nohup</code></span>
  <a href="#separating-stdout-and-stderr-with-nohup" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p>To output <code>stdout</code> and <code>stderr</code> to different files when using <code>nohup</code>, you can use redirection operators in combination. For example:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">nohup python script.py &gt; output.log 2&gt; error.log <span class="p">&amp;</span></span></span></code></pre></td></tr></table>
</div>
</div><p>This command redirects <code>stdout</code> to <code>output.log</code>, <code>stderr</code> to <code>error.log</code>, and runs in the background using <code>&amp;</code>. This way, even if the terminal or SSH session is closed, the program will continue to run, and its output will be properly recorded.</p>
<h2 id="buffering-behavior-in-python" class="heading-element"><span>5 Buffering Behavior in Python</span>
  <a href="#buffering-behavior-in-python" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p><code>stdout</code> and <code>stderr</code> behave differently when buffering data. By default, <code>stdout</code> is line-buffered when connected to a terminal, caching data until a newline character is received or the buffer is full; in non-interactive mode, <code>stdout</code> is block-buffered (like a file). <code>stderr</code>, however, is always line-buffered (before Python 3.9, it was block-buffered in non-interactive mode). The following content is from the official documentation <a href="https://docs.python.org/3.12/library/sys.html#sys.stdout"target="_blank" rel="external nofollow noopener noreferrer">sys &mdash; System-specific parameters and functions — Python 3.12.2 documentation<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a></p>
<blockquote>
<p>When interactive, the <code>stdout</code> stream is line-buffered. Otherwise, it is block-buffered like regular text files. The <code>stderr</code> stream is line-buffered in both cases. You can make both streams unbuffered by passing the <code>[u](&lt;https://docs.python.org/3.12/using/cmdline.html#cmdoption-u&gt;)</code> command-line option or setting the <code>[PYTHONUNBUFFERED](&lt;https://docs.python.org/3.12/using/cmdline.html#envvar-PYTHONUNBUFFERED&gt;)</code> environment variable.</p>
<p><em>Changed in version 3.9:</em> Non-interactive <code>stderr</code> is now line-buffered instead of fully buffered.</p>
</blockquote>
<p>The smaller the buffering granularity, the more timely the output, but the greater the IO cost. In Python 3.8 and earlier, <code>stdout</code> and <code>stderr</code> had the same buffering granularity, which was not very reasonable; after version 3.9, <code>stderr</code> has a smaller buffering granularity, meaning each write operation&rsquo;s output is more timely than <code>stdout</code>. This difference makes <code>stderr</code> suitable for error and log information, ensuring that even if a program crashes or exits abnormally, this information has a higher priority than standard output.</p>
<p>In C++, standard error is unbuffered (see below), which is more aggressive, but I personally think this is more reasonable.</p>
<p>Fortunately, in Python, you can disable this buffering behavior using <code>python -u</code> or by setting the <code>PYTHONUNBUFFERED</code> environment variable, or directly control the output timing by operating <code>sys.stdout.flush()</code>.</p>
<h2 id="behavior-in-python-concurrent-environments" class="heading-element"><span>6 Behavior in Python Concurrent Environments</span>
  <a href="#behavior-in-python-concurrent-environments" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>When using <code>stdout</code> and <code>stderr</code> in multithreaded or multiprocess environments, output may interleave or become chaotic because output from different threads or processes may interfere with each other when writing to a terminal or file. One way to solve this problem is to create separate output files for each thread or process, or use thread locks (thread locks) or process synchronization mechanisms (such as <code>multiprocessing.Lock</code>) to synchronize access to <code>stdout</code> or <code>stderr</code>.</p>
<h2 id="controlling-stdout-and-stderr-in-python" class="heading-element"><span>7 Controlling <code>stdout</code> and <code>stderr</code> in Python</span>
  <a href="#controlling-stdout-and-stderr-in-python" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>In complex applications, you may need more flexible control over the destination of output streams. Python provides several ways to achieve this:</p>
<ul>
<li><strong>Redirecting <code>stdout</code> and <code>stderr</code></strong>: You can redirect the standard output and error output of a Python program by changing the values of <code>sys.stdout</code> and <code>sys.stderr</code>. This is particularly useful for capturing and analyzing output, or redirecting output to non-standard output devices such as graphical interfaces.</li>
<li><strong>Using the <code>subprocess</code> module</strong>: When running external commands or scripts, the <code>subprocess</code> module allows you to control the <code>stdout</code> and <code>stderr</code> streams of the command, including redirecting them to variables within the Python program, or separating or merging them.</li>
<li><strong>Advanced applications of the logging module</strong>: Python&rsquo;s <code>logging</code> module supports outputting logs to multiple destinations, including files, standard output, networks, etc. By configuring different log handlers, you can implement complex log management schemes, such as splitting logs to different outputs based on log level or message content.</li>
</ul>
<h3 id="recommendations" class="heading-element"><span>7.1 Recommendations</span>
  <a href="#recommendations" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><ul>
<li><strong>Manage output carefully</strong>: When designing software, clearly distinguish between output for user interaction (<code>stdout</code>) and output for error reporting or logging (<code>stderr</code>). This helps improve the usability and maintainability of the program.</li>
<li><strong>Optimize performance</strong>: Consider the performance impact of output operations, especially in scenarios with high-frequency logging or data output. Reasonable use of buffering and batch processing can reduce the impact on performance.</li>
<li><strong>Security considerations</strong>: Before outputting sensitive information, perform appropriate filtering and desensitization to avoid leaking sensitive data through logs.</li>
</ul>
<p>By deeply understanding and flexibly applying <code>stdout</code> and <code>stderr</code>, you can build more robust and manageable Python applications, effectively handle logs and output, and improve user experience and application stability.</p>
<h2 id="buffering-behavior-in-c" class="heading-element"><span>8 Buffering Behavior in C++</span>
  <a href="#buffering-behavior-in-c" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>In C++, <code>stdout</code> (usually corresponding to <code>std::cout</code>) and <code>stderr</code> (corresponding to <code>std::cerr</code>) have different buffering strategies:</p>
<ul>
<li><strong><code>std::cout</code></strong> is line-buffered by default, which means that when it is connected to a terminal, the output is flushed on each newline or when the buffer is full.</li>
<li><strong><code>std::cerr</code></strong> is unbuffered by default, so data written to <code>std::cerr</code> is immediately output, which is very useful for reporting error information because it reduces the risk of error information not being output due to program crashes.</li>
</ul>
<h2 id="redirecting-stdout-and-stderr" class="heading-element"><span>9 Redirecting <code>stdout</code> and <code>stderr</code></span>
  <a href="#redirecting-stdout-and-stderr" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>In a C++ program, <code>stdout</code> and <code>stderr</code> can be redirected in several ways. A common method is to use the <code>freopen</code> function to redirect standard output or error output to a file at runtime:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-cpp" data-lang="cpp"><span class="line"><span class="cl"><span class="n">freopen</span><span class="p">(</span><span class="s">&#34;output.txt&#34;</span><span class="p">,</span> <span class="s">&#34;w&#34;</span><span class="p">,</span> <span class="n">stdout</span><span class="p">);</span>
</span></span><span class="line"><span class="cl"><span class="n">freopen</span><span class="p">(</span><span class="s">&#34;error.log&#34;</span><span class="p">,</span> <span class="s">&#34;w&#34;</span><span class="p">,</span> <span class="n">stderr</span><span class="p">);</span></span></span></code></pre></td></tr></table>
</div>
</div><p>This method can be used to redirect output to a file for later analysis and debugging.</p>
<h2 id="use-in-c-multithreaded-environments" class="heading-element"><span>10 Use in C++ Multithreaded Environments</span>
  <a href="#use-in-c-multithreaded-environments" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>When using <code>std::cout</code> and <code>std::cerr</code> in multithreaded C++ programs, race conditions may occur, leading to chaotic output. To avoid this, it is recommended to use mutex locks (such as <code>std::mutex</code>) to synchronize access to these streams:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span><span class="lnt">19
</span><span class="lnt">20
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-cpp" data-lang="cpp"><span class="line"><span class="cl"><span class="cp">#include</span> <span class="cpf">&lt;iostream&gt;</span><span class="cp">
</span></span></span><span class="line"><span class="cl"><span class="cp">#include</span> <span class="cpf">&lt;mutex&gt;</span><span class="cp">
</span></span></span><span class="line"><span class="cl"><span class="cp">#include</span> <span class="cpf">&lt;thread&gt;</span><span class="cp">
</span></span></span><span class="line"><span class="cl"><span class="cp"></span>
</span></span><span class="line"><span class="cl"><span class="n">std</span><span class="o">::</span><span class="n">mutex</span> <span class="n">cout_mutex</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kt">void</span> <span class="nf">thread_function</span><span class="p">(</span><span class="kt">int</span> <span class="n">id</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="n">std</span><span class="o">::</span><span class="n">lock_guard</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="n">mutex</span><span class="o">&gt;</span> <span class="n">lock</span><span class="p">(</span><span class="n">cout_mutex</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="s">&#34;Thread &#34;</span> <span class="o">&lt;&lt;</span> <span class="n">id</span> <span class="o">&lt;&lt;</span> <span class="s">&#34; is running</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="n">std</span><span class="o">::</span><span class="kr">thread</span> <span class="n">t1</span><span class="p">(</span><span class="n">thread_function</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="n">std</span><span class="o">::</span><span class="kr">thread</span> <span class="n">t2</span><span class="p">(</span><span class="n">thread_function</span><span class="p">,</span> <span class="mi">2</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">t1</span><span class="p">.</span><span class="n">join</span><span class="p">();</span>
</span></span><span class="line"><span class="cl">    <span class="n">t2</span><span class="p">.</span><span class="n">join</span><span class="p">();</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span></span></span></code></pre></td></tr></table>
</div>
</div><h2 id="controlling-output-in-c" class="heading-element"><span>11 Controlling Output in C++</span>
  <a href="#controlling-output-in-c" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>The C++ standard library provides <code>std::streambuf</code>, which can be used to implement finer control over <code>std::cout</code> and <code>std::cerr</code>, including redirection and custom buffering behavior. By inheriting from <code>std::streambuf</code> and overriding the corresponding member functions, you can create custom buffering strategies or redirect output to GUI components, network connections, etc.</p>
<h3 id="recommendations-1" class="heading-element"><span>11.1 Recommendations</span>
  <a href="#recommendations-1" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><ul>
<li><strong>Use buffering wisely</strong>: Choose an appropriate buffering strategy based on the application scenario. For error information that requires immediate feedback, use <code>std::cerr</code> or manually flush <code>std::cout</code>.</li>
<li><strong>Avoid direct use of standard output in multithreading</strong>: Use mutex locks or other synchronization mechanisms to ensure the consistency and order of output.</li>
<li><strong>Use redirection and custom <code>streambuf</code></strong>: To handle output more flexibly, consider using redirection or custom <code>streambuf</code> to meet special output needs, such as logging, network transmission, etc.</li>
</ul>
<p>By mastering these advanced techniques, you can effectively manage and control program output while ensuring the robustness and flexibility of C++ programs.</p>
]]></description>
    </item>
    <item>
      <title>Automatic Segmentation Tool for Long Webpage Screenshots</title>
      <link>https://www.zair.top/en/post/web-page-screenshot-segmentation/</link>
      <pubDate>Tue, 06 Feb 2024 11:29:01 +0000</pubDate><author>blog@zair.top (Tim)</author>
      <guid>https://www.zair.top/en/post/web-page-screenshot-segmentation/</guid>
      <category domain="https://www.zair.top/en/categories/tools-applications/">Tools &amp; Applications</category>
      <description><![CDATA[<h2 id="background" class="heading-element"><span>1 Background</span>
  <a href="#background" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>When needing to share or analyze web content, long screenshots are a very practical form as they can fully display the page. However, processing these long screenshots while maintaining their information integrity and readability, and facilitating subsequent operations, has always been a challenge. For example, as of early 2024, mainstream AI image models on the market still cannot handle very large and complex images. If a long screenshot is forcibly input into the model, it will result in degraded performance (many details cannot be recognized). To solve this problem, I developed a tool based on OpenCV, aimed at simplifying the processing of long screenshots while maintaining their content integrity and readability.</p>
<p>This project is open source on my Github: <a href="https://github.com/Ryaang/Web-page-Screenshot-Segmentation"target="_blank" rel="external nofollow noopener noreferrer">https://github.com/Ryaang/Web-page-Screenshot-Segmentation<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a></p>
<p>Unlike many existing tools or methods, Web-page-Screenshot-Segmentation uses OpenCV to automatically identify and follow the natural dividing lines of web content, automatically finding the most suitable segmentation points. This means that whether it is titles, paragraphs, or charts, they can be neatly retained in the segmented images without content breakage or omission.</p>
<p>Using Web-page-Screenshot-Segmentation is very simple. You only need to prepare a long screenshot, and the tool will automatically analyze the image content and intelligently decide the segmentation points. The result will be a series of complete and well-structured images, convenient for sharing and further processing.</p>
<h2 id="introduction" class="heading-element"><span>2 Introduction</span>
  <a href="#introduction" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>This project is used to segment long screenshots of web pages into several parts based on the height of the text. The main idea is to find the low-variation areas of the image and then find the segmentation lines in these areas.</p>
<p><img loading="lazy" src="https://cdn1.zair.top/images/2024/02/7288ed320d403a17bd010e49cf7c2ea5.png" alt="image-20240229161346869" srcset="https://cdn1.zair.top/images/2024/02/7288ed320d403a17bd010e49cf7c2ea5.png?size=small, https://cdn1.zair.top/images/2024/02/7288ed320d403a17bd010e49cf7c2ea5.png?size=medium 1.5x, https://cdn1.zair.top/images/2024/02/7288ed320d403a17bd010e49cf7c2ea5.png?size=large 2x" data-title="image-20240229161346869" style="background: url(/images/loading.min.svg) no-repeat center;" onload="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}this.dataset.lazyloaded='';" onerror="this.title=this.dataset.title;for(const i of ['style', 'data-title','onerror','onload']){this.removeAttribute(i);}"/></p>
<p>The output is small but complete images of the webpage, which can be used to generate webpages or train models using <a href="https://github.com/abi/screenshot-to-code"target="_blank" rel="external nofollow noopener noreferrer">Screen-to-code<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a>. More results can be found in the <a href="https://raw.githubusercontent.com/Ryaang/Web-page-Screenshot-Segmentation/main/images/"target="_blank" rel="external nofollow noopener noreferrer">images<i class="fa-solid fa-external-link-alt fa-fw fa-xs ms-1 text-secondary" aria-hidden="true"></i></a> directory.</p>
<h2 id="getting-started" class="heading-element"><span>3 Getting Started</span>
  <a href="#getting-started" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><h3 id="installation" class="heading-element"><span>3.1 Installation</span>
  <a href="#installation" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl"> pip install Web-page-Screenshot-Segmentation</span></span></code></pre></td></tr></table>
</div>
</div><h2 id="using-in-command-line" class="heading-element"><span>4 Using in Command Line</span>
  <a href="#using-in-command-line" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><p>Get the height of the segmentation lines of the image</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">python -m Web_page_Screenshot_Segmentation.master -f <span class="s2">&#34;path/to/img&#34;</span></span></span></code></pre></td></tr></table>
</div>
</div><p>The output should be a list: <code>[6, 868, 1912, 2672, 3568, 4444, 5124, 6036, 7698]</code>. It is a list of the heights of the image segmentation lines.
If you want to display this segmentation line in the image, you can add the <code>-s True</code> parameter:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">python -m Web_page_Screenshot_Segmentation.master -f <span class="s2">&#34;path/to/img&#34;</span> -s True</span></span></code></pre></td></tr></table>
</div>
</div><h3 id="draw-segmentation-lines-in-the-image" class="heading-element"><span>4.1 Draw Segmentation Lines in the Image</span>
  <a href="#draw-segmentation-lines-in-the-image" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">python -m Web_page_Screenshot_Segmentation.drawer --image_file path/to/image.jpg --hl <span class="o">[</span>100,200<span class="o">]</span> --color <span class="o">(</span>0,255,0<span class="o">)</span></span></span></code></pre></td></tr></table>
</div>
</div><h3 id="split-image" class="heading-element"><span>4.2 Split Image</span>
  <a href="#split-image" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">python -m Web_page_Screenshot_Segmentation.spliter --f path/to/image.jpg -ht <span class="s2">&#34;[233,456]&#34;</span></span></span></code></pre></td></tr></table>
</div>
</div><p>You will get the segmented images saved in the path returned by the command.</p>
<p>For more usage explanations, please refer to the help:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">python master.py --help
</span></span><span class="line"><span class="cl">python spliter.py --help</span></span></code></pre></td></tr></table>
</div>
</div><h2 id="using-from-source-code" class="heading-element"><span>5 Using from Source Code</span>
  <a href="#using-from-source-code" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h2><h3 id="split_heights-function" class="heading-element"><span>5.1 split_heights Function</span>
  <a href="#split_heights-function" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p>The <code>split_heights</code> function is used to segment the image into several parts based on various thresholds. It accepts the following parameters:</p>
<ul>
<li><code>file_path</code>: The path of the image file.</li>
<li><code>split</code>: A boolean indicating whether to split the image.</li>
<li><code>height_threshold</code>: The height threshold of the low-variation area.</li>
<li><code>variation_threshold</code>: The variation threshold of the low-variation area.</li>
<li><code>color_threshold</code>: The color difference threshold.</li>
<li><code>color_variation_threshold</code>: The color difference variation threshold.</li>
<li><code>merge_threshold</code>: The minimum distance threshold between two lines.</li>
</ul>
<p>If <code>split</code> is <code>False</code>, the function returns a list of the heights of the segmentation lines; if <code>split</code> is <code>True</code>, it returns the path of the segmented images.</p>
<h4 id="example-usage" class="heading-element"><span>5.1.1 Example Usage</span>
  <a href="#example-usage" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h4><div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">Web_page_Screenshot_Segmentation</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">Web_page_Screenshot_Segmentation.master</span> <span class="kn">import</span> <span class="n">split_heights</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Split the image at &#39;path/to/image.jpg&#39; into several parts</span>
</span></span><span class="line"><span class="cl"><span class="n">split_image_path</span> <span class="o">=</span> <span class="n">split_heights</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="n">file_path</span><span class="o">=</span><span class="s1">&#39;path/to/image.jpg&#39;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">split</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">height_threshold</span><span class="o">=</span><span class="mi">102</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">variation_threshold</span><span class="o">=</span><span class="mf">0.5</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">color_threshold</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">color_variation_threshold</span><span class="o">=</span><span class="mi">15</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">merge_threshold</span><span class="o">=</span><span class="mi">350</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;The segmented images are saved in </span><span class="si">{</span><span class="n">split_image_path</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span></span></span></code></pre></td></tr></table>
</div>
</div><p>In this example, the image at &lsquo;path/to/image.jpg&rsquo; is segmented into several parts based on the provided thresholds. The segmented images are saved in the path returned by the function.</p>
<h3 id="draw_line_from_file-function" class="heading-element"><span>5.2 draw_line_from_file Function</span>
  <a href="#draw_line_from_file-function" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h3><p>The <code>draw_line_from_file</code> function is used to draw lines on the image at specified heights. It accepts the following parameters:</p>
<ul>
<li><code>image_file</code>: The path of the image file.</li>
<li><code>heights</code>: A list of heights at which to draw lines.</li>
<li><code>color</code>: The color of the lines. The default color is red <code>(0, 0, 255)</code>.</li>
</ul>
<p>The function reads the image from the provided file path, draws lines at the specified heights, and then saves the modified image to a new file. The new file is saved in the <code>result</code> directory, with the same name as the original file but with &lsquo;result&rsquo; added before the file extension.</p>
<p>If the function encounters an error reading the image file (e.g., if the file path contains &lsquo;.&rsquo; or Chinese characters), it will throw an exception.</p>
<h4 id="example-usage-1" class="heading-element"><span>5.2.1 Example Usage</span>
  <a href="#example-usage-1" class="heading-mark">
    <svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg>
  </a>
</h4><div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">Web_page_Screenshot_Segmentation</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">Web_page_Screenshot_Segmentation.spliter</span> <span class="kn">import</span> <span class="n">draw_line_from_file</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Draw lines at heights 100 and 200 on the image at &#39;path/to/image.jpg&#39;</span>
</span></span><span class="line"><span class="cl"><span class="n">result_image_path</span> <span class="o">=</span> <span class="n">draw_line_from_file</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="n">image_file</span><span class="o">=</span><span class="s1">&#39;path/to/image.jpg&#39;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">heights</span><span class="o">=</span><span class="p">[</span><span class="mi">100</span><span class="p">,</span> <span class="mi">200</span><span class="p">],</span>
</span></span><span class="line"><span class="cl">    <span class="n">color</span><span class="o">=</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">255</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>  <span class="c1"># Draw lines in green</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;The modified image is saved in </span><span class="si">{</span><span class="n">result_image_path</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span></span></span></code></pre></td></tr></table>
</div>
</div><p>In this example, the image at &lsquo;path/to/image.jpg&rsquo; is modified to draw green lines at heights 100 and 200. The modified image is saved in the path returned by the function.</p>
]]></description>
    </item>
  </channel>
</rss>
