累计撰写 40 篇文章
累计创建 21 个分类
累计收获 1406 次访问

Evaluating AI Agents

Agent Evaluate

课程链接：https://learn.deeplearning.ai/courses/evaluating-ai-agents/lesson/sqkza/introduction?courseName=evaluating-ai-agents

github链接：https://github.com/MSzgy/Evaluating-AI-Agents

Introduction

Evaluation in the time of LLMs

Agents可能会有一些如下的错误：

像传统的软件一样需要单元测试、集成测试，而对于Agent来讲，也需要一些测试去评估效果以及质量：

Decomposing agents

Building your agent

本节实现了一个分析sales report的agent，详情看代码

Tracing agents

Tracing your agent

在这节代码实现了如何用Phoenix框架进行agent的tracing。

Adding router and skill evaluations

Code-Based

LLM-Judge

Annotations

Evaluating

Adding trajectory evaluations

Convergence

Adding structure to your evaluations

Improving your LLM-as-a-judge

Monitoring agents

problems in production env and solutions

如果觉得文章对你有用，请随意赞赏

Evaluating AI Agents

https://halo.mosuyang.org/archives/evaluating-ai-agents

作者

Administrator

发布于

2025-02-27

更新于

2025-02-27

许可协议

CC BY 4.0

评论

弹