Large Multimodal Model Prompting with Gemini

AI LLM MultiModal

Introduction

https://learn.deeplearning.ai/courses/large-multimodal-model-prompting-with-gemini/lesson/1/introduction

Gemini 是谷歌推出的LLM model.

介绍了LMM的含义，Large Multimodal Model ，不仅仅局限于文本训练，还有图像，音频，视频训练。

由于本课程没有在DeepLearning.ai 搭建开发环境，因此，这里直接下载github仓库文件。

sc-gc-c4-gemini-public-main.zip

Introduction to Gemini models

Gemini is built from the ground up for multimodality - reasoning across text, images, video and code.

There are many types of Gemini models by different sizes.

Ultra
- for highly complex tasks like reasoning and multi-model tasks.
Pro
- A performance-optimized model balancing model performance and speed.
Flash
- light-weight model with low latency and cost.
Nano
- can be run user device such as phone.
- using model distillation to pass knowledge from Large model to Smaller model.

在选择model时，应该同时兼顾model的能力性能、成本花费、model的效率。

MultiModel Prompting and Parameter Control

Parameters:

Top-K
- top-k 采样就是在预测下一个单词时，仅考虑概率最高的 k 个单词，而忽略其余的单词。然后在这 k 个单词中依据概率进行采样。这个方法可以有效减少模型生成低概率但不合理的单词。
Top-P
- top-p 采样会不同于 top-k 固定数量的单词选择法。它会动态地选择一组单词，使得这些单词的累计概率达到一个设定的阈值 p，然后在这些单词中进行采样。这种方法可以更灵活地调整生成结果的多样性。
Temperature
- - 定义：`temperature` 参数是一种温度采样技术，通过缩放模型输出的概率分布来控制生成文本的随机性。 - 范围：通常取值范围在 0 到 1，但也可以大于 1。

Best practices for Multimodel Prompting

advice:

Be clear and Consie
Assign a role to LLM model
Structure Prompts
Order can matter

Creating Use Case with Images

举了两个简单的例子：

1 给了4张椅子图片还有1张living room的图片，提问椅子适不适合living room的风格。

2 给了员工出差开销单还有公司的政策，然后提问员工有没有违反公司的政策。

Developing Use Case with Videos

举了两个例子，其中第二个例子印象比较深刻，给了几个视频（总长15分钟的学习视频），然后提问每个视频讲了什么，并且提供了简单的一个python code，询问在哪个视频出现的，并给出具体的时间点。Gemini准确地给出了答案。

Integrating Real-Time Data with Function Calling

与chatgpt类似，Gemini也引入了Function Calling，通过function calling可以调用外部api然后再利用model对response进行信息处理。如果查看代码，可以看到Gemini中的function calling的格式与chatgpt类似，需要给function name，function parameters.

如果觉得文章对你有用，请随意赞赏

Large Multimodal Model Prompting with Gemini

https://halo.mosuyang.org/archives/large-multimodal-model-prompting-with-gemini

作者

Administrator

发布于

2024-09-16

更新于

2024-09-16

许可协议

CC BY 4.0

Large Multimodal Model Prompting with Gemini

Introduction

Introduction to Gemini models

MultiModel Prompting and Parameter Control

Best practices for Multimodel Prompting

Creating Use Case with Images

Developing Use Case with Videos

Integrating Real-Time Data with Function Calling

作者

发布于

更新于

许可协议

评论