课程链接:https://learn.deeplearning.ai/courses/chatgpt-building-system/lesson/1/introduction

github: https://github.com/MSzgy/Building-Systems-with-the-ChatGPT-API

Introduction

在本课程中介绍了LLM的基本原理,LLM的使用场景,LLM对输入的安全检测。同时,也介绍了两种LLM prompt的两种方式,”Chain of Thought Reasoing“ 与 ”Chaining Prompts“。最后介绍了如何评估模型的输出结果质量。

Language Models, the Chat Format and Tokens

这里简单回顾下LLM是怎样训练的,不断根据前面的词去预测下一个词。

在训练LLM时,有两个模型需要训练,首先是基础模型可以进行文本补全,然后是Instruction model,会给出相关question和answer作为一个训练对进行训练。

下面介绍了如何从base model到instruction tuned model的方法

1 首先要准备instruction的数据集

2 对与LLM的输出进行打分,选取有用的高质量的回答。

LLM在处理句子时是按照token划分的

Classification

在这节举了一个例子,根据客户的sentence进行问题分类:

delimiter = "####"
system_message = f"""
You will be provided with customer service queries. \
The customer service query will be delimited with \
{delimiter} characters.
Classify each query into a primary category \
and a secondary category. 
Provide your output in json format with the \
keys: primary and secondary.

Primary categories: Billing, Technical Support, \
Account Management, or General Inquiry.

Billing secondary categories:
Unsubscribe or upgrade
Add a payment method
Explanation for charge
Dispute a charge

Technical Support secondary categories:
General troubleshooting
Device compatibility
Software updates

Account Management secondary categories:
Password reset
Update personal information
Close account
Account security

General Inquiry secondary categories:
Product information
Pricing
Feedback
Speak to a human

"""
user_message = f"""\
I want you to delete my profile and all of my user data"""
messages =  [  
{'role':'system', 
 'content': system_message},    
{'role':'user', 
 'content': f"{delimiter}{user_message}{delimiter}"},  
] 
response = get_completion_from_messages(messages)
print(response)

Moderation

来自用户的prompt有可能会包含一些色情,暴力信息,对于这一类的问题,不应该被回到。ChatGPT也提供了相应的接口去检测这些不合适的问题。

在处理时,应该对于用户的问题做隔离处理,比如利用 ### 去特意指明用户的问题,与预置的prompt区分开来。

Chain of Thought Reasoning

有时在向LLM询问问题时,如果仅是直接询问,则会输出错误的答案,因此我们需要将复杂的问题给分解开,在prompt中按步骤说明,让LLM一步步给出答案,回答准确率则会大大提高。

Follow these steps to answer the customer queries.
The customer query will be delimited with four hashtags,\
i.e. {delimiter}. 

Step 1:{delimiter} First decide whether the user is \
asking a question about a specific product or products. \
Product cateogry doesn't count. 

Step 2:{delimiter} If the user is asking about \
specific products, identify whether \
the products are in the following list.
All available products: 
1. Product: TechPro Ultrabook
   Category: Computers and Laptops
   Brand: TechPro
   Model Number: TP-UB100
   Warranty: 1 year
   Rating: 4.5
   Features: 13.3-inch display, 8GB RAM, 256GB SSD, Intel Core i5 processor
   Description: A sleek and lightweight ultrabook for everyday use.
   Price: $799.99

2. Product: BlueWave Gaming Laptop
   Category: Computers and Laptops
   Brand: BlueWave
   Model Number: BW-GL200
   Warranty: 2 years
   Rating: 4.7
   Features: 15.6-inch display, 16GB RAM, 512GB SSD, NVIDIA GeForce RTX 3060
   Description: A high-performance gaming laptop for an immersive experience.
   Price: $1199.99

3. Product: PowerLite Convertible
   Category: Computers and Laptops
   Brand: PowerLite
   Model Number: PL-CV300
   Warranty: 1 year
   Rating: 4.3
   Features: 14-inch touchscreen, 8GB RAM, 256GB SSD, 360-degree hinge
   Description: A versatile convertible laptop with a responsive touchscreen.
   Price: $699.99

4. Product: TechPro Desktop
   Category: Computers and Laptops
   Brand: TechPro
   Model Number: TP-DT500
   Warranty: 1 year
   Rating: 4.4
   Features: Intel Core i7 processor, 16GB RAM, 1TB HDD, NVIDIA GeForce GTX 1660
   Description: A powerful desktop computer for work and play.
   Price: $999.99

5. Product: BlueWave Chromebook
   Category: Computers and Laptops
   Brand: BlueWave
   Model Number: BW-CB100
   Warranty: 1 year
   Rating: 4.1
   Features: 11.6-inch display, 4GB RAM, 32GB eMMC, Chrome OS
   Description: A compact and affordable Chromebook for everyday tasks.
   Price: $249.99

Step 3:{delimiter} If the message contains products \
in the list above, list any assumptions that the \
user is making in their \
message e.g. that Laptop X is bigger than \
Laptop Y, or that Laptop Z has a 2 year warranty.

Step 4:{delimiter}: If the user made any assumptions, \
figure out whether the assumption is true based on your \
product information. 

Step 5:{delimiter}: First, politely correct the \
customer's incorrect assumptions if applicable. \
Only mention or reference products in the list of \
5 available products, as these are the only 5 \
products that the store sells. \
Answer the customer in a friendly tone.

Use the following format:
Step 1:{delimiter} <step 1 reasoning>
Step 2:{delimiter} <step 2 reasoning>
Step 3:{delimiter} <step 3 reasoning>
Step 4:{delimiter} <step 4 reasoning>
Response to user:{delimiter} <response to customer>

Make sure to include {delimiter} to separate every step.

Chaining Prompts

好处:

与上节Chain of Thought Reasoning异同点

Chaining PromptsChain of Thought Reasoning 是两个在大语言模型(LLM)中常用的方法,分别用来处理复杂任务或提高模型的推理能力。虽然名称相似,它们的核心思想和应用场景有所不同。

1. Chaining Prompts

Chaining Prompts 是指将多个任务或问题拆分成更小的步骤,用多个提示(prompts)串联起来,每个提示的输出会作为下一个提示的输入。这种方法特别适用于复杂任务的分解和逐步处理。

核心理念: 分解任务,将一个大问题拆成多个小问题,逐步解决。

典型场景:

1. 数据处理: 清洗数据 -> 结构化数据 -> 提取关键信息。

2. 内容生成: 先生成大纲 -> 生成每一部分的内容 -> 合并。

3. 多步骤逻辑: 分步求解复杂逻辑问题。

实现方式:

from langchain.prompts import PromptTemplate

from langchain.chains import LLMChain, SequentialChain



# Step 1: Define prompts

prompt1 = PromptTemplate(input_variables=["topic"], template="Write an outline about {topic}.")

prompt2 = PromptTemplate(input_variables=["outline"], template="Expand on this outline: {outline}.")



# Step 2: Create chains

chain1 = LLMChain(prompt=prompt1, llm=llm)

chain2 = LLMChain(prompt=prompt2, llm=llm)



# Step 3: Chain them sequentially

overall_chain = SequentialChain(chains=[chain1, chain2])

优点:

• 更易于控制模型行为。

• 每一步输出更透明,可以手动检查和调整。

缺点:

• 效率低,需要多次调用模型。

• 每一步的错误可能会积累。

2. Chain of Thought Reasoning

Chain of Thought Reasoning (CoT) 是一种推理方法,旨在通过显式的推理链条帮助模型得出最终结论。这种方法主要用于解决需要多步推理或复杂逻辑的问题,比如数学题、常识推理题。

核心理念: 显示推理过程,让模型“想出来”答案,而不是直接给出答案。

典型场景:

1. 数学推理: 分步计算复杂数学问题。

2. 逻辑推理: 逐步分析问题的原因和结果。

3. 问答: 综合多条信息逐步得出结论。

提示示例:

Q: A train leaves the station at 2 PM and travels at a speed of 60 miles per hour. Another train leaves the same station at 3 PM and travels at a speed of 80 miles per hour. At what time will the second train catch up to the first train?

A: Let's think step by step.

- The first train has a head start of 1 hour at 60 miles per hour, so it is 60 miles ahead.

- The relative speed of the second train compared to the first train is 80 - 60 = 20 miles per hour.

- Time taken to close the gap is 60 miles / 20 miles per hour = 3 hours.

- The second train will catch up at 3 PM + 3 hours = 6 PM.

Answer: 6 PM.

优点:

• 提高复杂推理的准确性。

• 模型输出的推理过程可解释性更强。

缺点:

• 有时会生成冗长且不必要的推理。

• 对 prompt 工程要求高。

实现方式:

from langchain.prompts import PromptTemplate

from langchain.chains import LLMChain



# Define a CoT prompt

prompt = PromptTemplate(

    input_variables=["question"],

    template="You are a helpful assistant. Solve this problem step by step:\nQuestion: {question}\nAnswer:"

)



chain = LLMChain(prompt=prompt, llm=llm)

对比总结

特性 Chaining Prompts Chain of Thought Reasoning

目标 任务分解,逐步完成复杂任务 显示推理过程,提高逻辑准确性

应用场景 工作流分解、多步骤数据处理 数学推理、逻辑推理、问答

实现复杂度 高,需要设计多步任务 中等,需要设计好的 prompt

效率 较低,多次调用模型 较高,单次调用模型即可

透明性和可解释性 高,每一步可以独立检查 高,显式展示推理过程

两者并不互斥,可以结合使用:例如通过 Chaining Prompts 分解任务,在每个任务步骤中使用 CoT 提高准确性。

Evaluation

在评价LLM输出结果时,可以利用事先准备好的数据集做测试,如果返回没有固定的数据集,只想对一个回答做相关度测试,那可以直接利用LLM做评价是否满足要求。

user_message = f"""\
You are evaluating a submitted answer to a question based on the context \
that the agent uses to answer the question.
Here is the data:
    [BEGIN DATA]
    ************
    [Question]: {cust_msg}
    ************
    [Context]: {context}
    ************
    [Submission]: {completion}
    ************
    [END DATA]

Compare the factual content of the submitted answer with the context. \
Ignore any differences in style, grammar, or punctuation.
Answer the following questions:
    - Is the Assistant response based only on the context provided? (Y or N)
    - Does the answer include information that is not provided in the context? (Y or N)
    - Is there any disagreement between the response and the context? (Y or N)
    - Count how many questions the user asked. (output a number)
    - For each question that the user asked, is there a corresponding answer to it?
      Question 1: (Y or N)
      Question 2: (Y or N)
      ...
      Question N: (Y or N)
    - Of the number of questions asked, how many of these questions were addressed by the answer? (output a number)
"""
 user_message = f"""\
You are comparing a submitted answer to an expert answer on a given question. Here is the data:
    [BEGIN DATA]
    ************
    [Question]: {cust_msg}
    ************
    [Expert]: {ideal}
    ************
    [Submission]: {completion}
    ************
    [END DATA]

Compare the factual content of the submitted answer with the expert answer. Ignore any differences in style, grammar, or punctuation.
    The submitted answer may either be a subset or superset of the expert answer, or it may conflict with it. Determine which case applies. Answer the question by selecting one of the following options:
    (A) The submitted answer is a subset of the expert answer and is fully consistent with it.
    (B) The submitted answer is a superset of the expert answer and is fully consistent with it.
    (C) The submitted answer contains all the same details as the expert answer.
    (D) There is a disagreement between the submitted answer and the expert answer.
    (E) The answers differ, but these differences don't matter from the perspective of factuality.
  choice_strings: ABCDE
"""