Building Systems with the ChatGPT API

LLM OpenAI

课程链接：https://learn.deeplearning.ai/courses/chatgpt-building-system/lesson/1/introduction

github: https://github.com/MSzgy/Building-Systems-with-the-ChatGPT-API

Introduction

在本课程中介绍了LLM的基本原理，LLM的使用场景，LLM对输入的安全检测。同时，也介绍了两种LLM prompt的两种方式，”Chain of Thought Reasoing“ 与 ”Chaining Prompts“。最后介绍了如何评估模型的输出结果质量。

Language Models, the Chat Format and Tokens

这里简单回顾下LLM是怎样训练的，不断根据前面的词去预测下一个词。

在训练LLM时，有两个模型需要训练，首先是基础模型可以进行文本补全，然后是Instruction model，会给出相关question和answer作为一个训练对进行训练。

下面介绍了如何从base model到instruction tuned model的方法

1 首先要准备instruction的数据集

2 对与LLM的输出进行打分，选取有用的高质量的回答。

LLM在处理句子时是按照token划分的

Classification

在这节举了一个例子，根据客户的sentence进行问题分类：

delimiter = "####"
system_message = f"""
You will be provided with customer service queries. \
The customer service query will be delimited with \
{delimiter} characters.
Classify each query into a primary category \
and a secondary category. 
Provide your output in json format with the \
keys: primary and secondary.

Primary categories: Billing, Technical Support, \
Account Management, or General Inquiry.

Billing secondary categories:
Unsubscribe or upgrade
Add a payment method
Explanation for charge
Dispute a charge

Technical Support secondary categories:
General troubleshooting
Device compatibility
Software updates

Account Management secondary categories:
Password reset
Update personal information
Close account
Account security

General Inquiry secondary categories:
Product information
Pricing
Feedback
Speak to a human

"""
user_message = f"""\
I want you to delete my profile and all of my user data"""
messages =  [  
{'role':'system', 
 'content': system_message},    
{'role':'user', 
 'content': f"{delimiter}{user_message}{delimiter}"},  
] 
response = get_completion_from_messages(messages)
print(response)

Moderation

来自用户的prompt有可能会包含一些色情，暴力信息，对于这一类的问题，不应该被回到。ChatGPT也提供了相应的接口去检测这些不合适的问题。

在处理时，应该对于用户的问题做隔离处理，比如利用 ### 去特意指明用户的问题，与预置的prompt区分开来。

Chain of Thought Reasoning

有时在向LLM询问问题时，如果仅是直接询问，则会输出错误的答案，因此我们需要将复杂的问题给分解开，在prompt中按步骤说明，让LLM一步步给出答案，回答准确率则会大大提高。

Follow these steps to answer the customer queries.
The customer query will be delimited with four hashtags,\
i.e. {delimiter}. 

Step 1:{delimiter} First decide whether the user is \
asking a question about a specific product or products. \
Product cateogry doesn't count. 

Step 2:{delimiter} If the user is asking about \
specific products, identify whether \
the products are in the following list.
All available products: 
1. Product: TechPro Ultrabook
   Category: Computers and Laptops
   Brand: TechPro
   Model Number: TP-UB100
   Warranty: 1 year
   Rating: 4.5
   Features: 13.3-inch display, 8GB RAM, 256GB SSD, Intel Core i5 processor
   Description: A sleek and lightweight ultrabook for everyday use.
   Price: $799.99

2. Product: BlueWave Gaming Laptop
   Category: Computers and Laptops
   Brand: BlueWave
   Model Number: BW-GL200
   Warranty: 2 years
   Rating: 4.7
   Features: 15.6-inch display, 16GB RAM, 512GB SSD, NVIDIA GeForce RTX 3060
   Description: A high-performance gaming laptop for an immersive experience.
   Price: $1199.99

3. Product: PowerLite Convertible
   Category: Computers and Laptops
   Brand: PowerLite
   Model Number: PL-CV300
   Warranty: 1 year
   Rating: 4.3
   Features: 14-inch touchscreen, 8GB RAM, 256GB SSD, 360-degree hinge
   Description: A versatile convertible laptop with a responsive touchscreen.
   Price: $699.99

4. Product: TechPro Desktop
   Category: Computers and Laptops
   Brand: TechPro
   Model Number: TP-DT500
   Warranty: 1 year
   Rating: 4.4
   Features: Intel Core i7 processor, 16GB RAM, 1TB HDD, NVIDIA GeForce GTX 1660
   Description: A powerful desktop computer for work and play.
   Price: $999.99

5. Product: BlueWave Chromebook
   Category: Computers and Laptops
   Brand: BlueWave
   Model Number: BW-CB100
   Warranty: 1 year
   Rating: 4.1
   Features: 11.6-inch display, 4GB RAM, 32GB eMMC, Chrome OS
   Description: A compact and affordable Chromebook for everyday tasks.
   Price: $249.99

Step 3:{delimiter} If the message contains products \
in the list above, list any assumptions that the \
user is making in their \
message e.g. that Laptop X is bigger than \
Laptop Y, or that Laptop Z has a 2 year warranty.

Step 4:{delimiter}: If the user made any assumptions, \
figure out whether the assumption is true based on your \
product information. 

Step 5:{delimiter}: First, politely correct the \
customer's incorrect assumptions if applicable. \
Only mention or reference products in the list of \
5 available products, as these are the only 5 \
products that the store sells. \
Answer the customer in a friendly tone.

Use the following format:
Step 1:{delimiter} <step 1 reasoning>
Step 2:{delimiter} <step 2 reasoning>
Step 3:{delimiter} <step 3 reasoning>
Step 4:{delimiter} <step 4 reasoning>
Response to user:{delimiter} <response to customer>

Make sure to include {delimiter} to separate every step.

Chaining Prompts

好处：

与上节Chain of Thought Reasoning异同点

Chaining Prompts 和 Chain of Thought Reasoning 是两个在大语言模型（LLM）中常用的方法，分别用来处理复杂任务或提高模型的推理能力。虽然名称相似，它们的核心思想和应用场景有所不同。

1. Chaining Prompts

Chaining Prompts 是指将多个任务或问题拆分成更小的步骤，用多个提示（prompts）串联起来，每个提示的输出会作为下一个提示的输入。这种方法特别适用于复杂任务的分解和逐步处理。

• 核心理念： 分解任务，将一个大问题拆成多个小问题，逐步解决。

• 典型场景：

1. 数据处理： 清洗数据 -> 结构化数据 -> 提取关键信息。

2. 内容生成： 先生成大纲 -> 生成每一部分的内容 -> 合并。

3. 多步骤逻辑： 分步求解复杂逻辑问题。

实现方式：

from langchain.prompts import PromptTemplate

from langchain.chains import LLMChain, SequentialChain



# Step 1: Define prompts

prompt1 = PromptTemplate(input_variables=["topic"], template="Write an outline about {topic}.")

prompt2 = PromptTemplate(input_variables=["outline"], template="Expand on this outline: {outline}.")



# Step 2: Create chains

chain1 = LLMChain(prompt=prompt1, llm=llm)

chain2 = LLMChain(prompt=prompt2, llm=llm)



# Step 3: Chain them sequentially

overall_chain = SequentialChain(chains=[chain1, chain2])

• 优点：

• 更易于控制模型行为。

• 每一步输出更透明，可以手动检查和调整。

• 缺点：

• 效率低，需要多次调用模型。

• 每一步的错误可能会积累。

2. Chain of Thought Reasoning

Chain of Thought Reasoning (CoT) 是一种推理方法，旨在通过显式的推理链条帮助模型得出最终结论。这种方法主要用于解决需要多步推理或复杂逻辑的问题，比如数学题、常识推理题。

• 核心理念： 显示推理过程，让模型“想出来”答案，而不是直接给出答案。

• 典型场景：

1. 数学推理： 分步计算复杂数学问题。

2. 逻辑推理： 逐步分析问题的原因和结果。

3. 问答： 综合多条信息逐步得出结论。

提示示例：

Q: A train leaves the station at 2 PM and travels at a speed of 60 miles per hour. Another train leaves the same station at 3 PM and travels at a speed of 80 miles per hour. At what time will the second train catch up to the first train?

A: Let's think step by step.

- The first train has a head start of 1 hour at 60 miles per hour, so it is 60 miles ahead.

- The relative speed of the second train compared to the first train is 80 - 60 = 20 miles per hour.

- Time taken to close the gap is 60 miles / 20 miles per hour = 3 hours.

- The second train will catch up at 3 PM + 3 hours = 6 PM.

Answer: 6 PM.

• 优点：

• 提高复杂推理的准确性。

• 模型输出的推理过程可解释性更强。

• 缺点：

• 有时会生成冗长且不必要的推理。

• 对 prompt 工程要求高。

实现方式：

from langchain.prompts import PromptTemplate

from langchain.chains import LLMChain



# Define a CoT prompt

prompt = PromptTemplate(

    input_variables=["question"],

    template="You are a helpful assistant. Solve this problem step by step:\nQuestion: {question}\nAnswer:"

)



chain = LLMChain(prompt=prompt, llm=llm)

对比总结

特性 Chaining Prompts Chain of Thought Reasoning

目标任务分解，逐步完成复杂任务显示推理过程，提高逻辑准确性

应用场景 工作流分解、多步骤数据处理数学推理、逻辑推理、问答

实现复杂度 高，需要设计多步任务中等，需要设计好的 prompt

效率较低，多次调用模型较高，单次调用模型即可

透明性和可解释性 高，每一步可以独立检查高，显式展示推理过程

两者并不互斥，可以结合使用：例如通过 Chaining Prompts 分解任务，在每个任务步骤中使用 CoT 提高准确性。

Evaluation

在评价LLM输出结果时，可以利用事先准备好的数据集做测试，如果返回没有固定的数据集，只想对一个回答做相关度测试，那可以直接利用LLM做评价是否满足要求。

user_message = f"""\
You are evaluating a submitted answer to a question based on the context \
that the agent uses to answer the question.
Here is the data:
    [BEGIN DATA]
    ************
    [Question]: {cust_msg}
    ************
    [Context]: {context}
    ************
    [Submission]: {completion}
    ************
    [END DATA]

Compare the factual content of the submitted answer with the context. \
Ignore any differences in style, grammar, or punctuation.
Answer the following questions:
    - Is the Assistant response based only on the context provided? (Y or N)
    - Does the answer include information that is not provided in the context? (Y or N)
    - Is there any disagreement between the response and the context? (Y or N)
    - Count how many questions the user asked. (output a number)
    - For each question that the user asked, is there a corresponding answer to it?
      Question 1: (Y or N)
      Question 2: (Y or N)
      ...
      Question N: (Y or N)
    - Of the number of questions asked, how many of these questions were addressed by the answer? (output a number)
"""

 user_message = f"""\
You are comparing a submitted answer to an expert answer on a given question. Here is the data:
    [BEGIN DATA]
    ************
    [Question]: {cust_msg}
    ************
    [Expert]: {ideal}
    ************
    [Submission]: {completion}
    ************
    [END DATA]

Compare the factual content of the submitted answer with the expert answer. Ignore any differences in style, grammar, or punctuation.
    The submitted answer may either be a subset or superset of the expert answer, or it may conflict with it. Determine which case applies. Answer the question by selecting one of the following options:
    (A) The submitted answer is a subset of the expert answer and is fully consistent with it.
    (B) The submitted answer is a superset of the expert answer and is fully consistent with it.
    (C) The submitted answer contains all the same details as the expert answer.
    (D) There is a disagreement between the submitted answer and the expert answer.
    (E) The answers differ, but these differences don't matter from the perspective of factuality.
  choice_strings: ABCDE
"""

如果觉得文章对你有用，请随意赞赏

Building Systems with the ChatGPT API

https://halo.mosuyang.org/archives/building-systems-with-the-chatgpt-api

作者

Administrator

发布于

2025-01-20

更新于

2025-01-20

许可协议

CC BY 4.0

Building Systems with the ChatGPT API

Introduction

Language Models, the Chat Format and Tokens

Classification

Moderation

Chain of Thought Reasoning

Chaining Prompts

Evaluation

作者

发布于

更新于

许可协议

评论