LangChain 71 字符串评估器String Evaluation衡量在多样化数据上的性能和完整性

LangChain系列文章

  1. LangChain 60 深入理解LangChain 表达式语言23 multiple chains链透传参数 LangChain Expression Language (LCEL)
  2. LangChain 61 深入理解LangChain 表达式语言24 multiple chains链透传参数 LangChain Expression Language (LCEL)
  3. LangChain 62 深入理解LangChain 表达式语言25 agents代理 LangChain Expression Language (LCEL)
  4. LangChain 63 深入理解LangChain 表达式语言26 生成代码code并执行 LangChain Expression Language (LCEL)
  5. LangChain 64 深入理解LangChain 表达式语言27 添加审查 Moderation LangChain Expression Language (LCEL)
  6. LangChain 65 深入理解LangChain 表达式语言28 余弦相似度Router Moderation LangChain Expression Language (LCEL)
  7. LangChain 66 深入理解LangChain 表达式语言29 管理prompt提示窗口大小 LangChain Expression Language (LCEL)
  8. LangChain 67 深入理解LangChain 表达式语言30 调用tools搜索引擎 LangChain Expression Language (LCEL)
  9. LangChain 68 LLM Deployment大语言模型部署方案
  10. LangChain 69 向量数据库Pinecone入门
  11. LangChain 70 Evaluation 评估、衡量在多样化数据上的性能和完整性

在这里插入图片描述

1. 字符串评估器String Evaluation

字符串评估器是LangChain内的一个组件,旨在通过将语言模型生成的输出(预测)与参考字符串或输入进行比较,来评估语言模型的性能。这种比较是评估语言模型的关键步骤,为生成文本的准确性或质量提供了衡量标准。

在实践中,字符串评估器通常用于评估预测字符串与给定输入(如问题或提示)的一致性。通常会提供参考标签或上下文字符串,以定义正确或理想回应的外观。这些评估器可以根据您的应用程序的具体需求进行定制。

要创建自定义字符串评估器,请继承StringEvaluator类并实现_evaluate_strings方法。如果您需要异步支持,还应实现_aevaluate_strings方法。

以下是与字符串评估器相关的关键属性和方法的总结:

  • evaluation_name评估名称:指定评估的名称。
  • requires_input 必要输入:布尔属性,用于指示评估器是否需要输入字符串。如果为真,当未提供输入时,评估器将抛出错误。如果为假,如果提供了输入,则会记录警告,表明输入在评估中不会被考虑。
  • requires_reference 需要参考:布尔属性,用于指定评估器是否需要参考标签。如果为真,当未提供参考时,评估器将抛出错误。如果为假,如果提供了参考,则会记录警告,表明参考在评估中不会被考虑。

字符串评估器还实现了以下方法:

  • aevaluate_strings 异步评估字符串:异步评估链或语言模型的输出,支持可选的输入和标签。
  • evaluate_strings 同步评估字符串:同步评估链或语言模型的输出,支持可选的输入和标签。

以下部分提供了关于可用的字符串评估器实现以及如何创建自定义字符串评估器的详细信息。

2. 标准评估 Criteria Evaluation

在您希望使用特定评分标准或标准集来评估模型输出的场景中,标准评估器是一个非常实用的工具。它可以帮助您检查LLM或Chain的输出是否符合定义的一套标准。

要深入了解其功能和可配置性,请参阅CriteriaEvalChain类的参考文档。

3. 使用CriteriaEvalChain无需参考资料 Usage without references

在这个例子中,你将使用CriteriaEvalChain来检查一个输出是否简洁。首先,创建评估链以预测输出是否“简洁”。

from langchain.evaluation import load_evaluatorfrom dotenv import load_dotenv  # 导入从 .env 文件加载环境变量的函数
load_dotenv()  # 调用函数实际加载环境变量from langchain.globals import set_debug  # 导入在 langchain 中设置调试模式的函数
set_debug(True)  # 启用 langchain 的调试模式# from langchain.evaluation import load_evaluator
# evaluator = load_evaluator("criteria", criteria="conciseness")# This is equivalent to loading using the enum
from langchain.evaluation import EvaluatorType
evaluator = load_evaluator(EvaluatorType.CRITERIA, criteria="conciseness")eval_result = evaluator.evaluate_strings(prediction="What's 2+2? That's an elementary question. The answer you're looking for is that two and two is four.",input="What's 2+2?",
)
print('eval_result >> ', eval_result)

3.1 输出格式

所有字符串评估器都暴露了一个 evaluate_strings(或 async aevaluate_strings)方法,该方法接受:

  • 输入input (str)- 发送给agent代理的输入。
  • 预测 prediction(str)- 预测的回应。

评估器返回包含以下值的字典:- 分数:二进制整数0到1,其中1意味着输出符合标准,0则相反 - 值:对应分数的“Y”或“N” - 推理:从LLM生成的“思维链条推理”字符串,在创建分数之前产生。

输出

(.venv)  ~/Workspace/LLM/langchain-llm-app/ [develop*] python Evaluate/criteria.py                                                       ⏎
[chain/start] [1:chain:CriteriaEvalChain] Entering Chain run with input:
{"input": "What's 2+2?","output": "What's 2+2? That's an elementary question. The answer you're looking for is that two and two is four."
}
[llm/start] [1:chain:CriteriaEvalChain > 2:llm:ChatOpenAI] Entering LLM run with input:
{"prompts": ["Human: You are assessing a submitted answer on a given task or input based on a set of criteria. Here is the data:\n[BEGIN DATA]\n***\n[Input]: What's 2+2?\n***\n[Submission]: What's 2+2? That's an elementary question. The answer you're looking for is that two and two is four.\n***\n[Criteria]: conciseness: Is the submission concise and to the point?\n***\n[END DATA]\nDoes the submission meet the Criteria? First, write out in a step by step manner your reasoning about each criterion to be sure that your conclusion is correct. Avoid simply stating the correct answers at the outset. Then print only the single character \"Y\" or \"N\" (without quotes or punctuation) on its own line corresponding to the correct answer of whether the submission meets all criteria. At the end, repeat just the letter again by itself on a new line."]
}
[llm/end] [1:chain:CriteriaEvalChain > 2:llm:ChatOpenAI] [7.17s] Exiting LLM run with output:
{"generations": [[{"text": "The criterion to evaluate the submission is \"conciseness\". This requires the answer to be brief, to the point, and without unnecessary information or explanation.\n\nAssessing the submission, the responder did not solely provide the answer. The submission included additional commentary: \"That's an elementary question.\" This part of the response is not integral to answering the question and thus adds unnecessary length and detail.\n\nFurthermore, the phrase, \"The answer you're looking for is\" also adds unneeded length to the answer. A more concise response would simply state the answer: \"four\".\n\nConsidering these points, the submission does not meet the criterion of conciseness, as it contains unnecessary extraneous detail and is not as brief as it could be.\n\nN\nN","generation_info": {"finish_reason": "stop","logprobs": null},"type": "ChatGeneration","message": {"lc": 1,"type": "constructor","id": ["langchain","schema","messages","AIMessage"],"kwargs": {"content": "The criterion to evaluate the submission is \"conciseness\". This requires the answer to be brief, to the point, and without unnecessary information or explanation.\n\nAssessing the submission, the responder did not solely provide the answer. The submission included additional commentary: \"That's an elementary question.\" This part of the response is not integral to answering the question and thus adds unnecessary length and detail.\n\nFurthermore, the phrase, \"The answer you're looking for is\" also adds unneeded length to the answer. A more concise response would simply state the answer: \"four\".\n\nConsidering these points, the submission does not meet the criterion of conciseness, as it contains unnecessary extraneous detail and is not as brief as it could be.\n\nN\nN","additional_kwargs": {}}}}]],"llm_output": {"token_usage": {"completion_tokens": 151,"prompt_tokens": 192,"total_tokens": 343},"model_name": "gpt-4","system_fingerprint": null},"run": null
}
[chain/end] [1:chain:CriteriaEvalChain] [7.18s] Exiting Chain run with output:
{"results": {"reasoning": "The criterion to evaluate the submission is \"conciseness\". This requires the answer to be brief, to the point, and without unnecessary information or explanation.\n\nAssessing the submission, the responder did not solely provide the answer. The submission included additional commentary: \"That's an elementary question.\" This part of the response is not integral to answering the question and thus adds unnecessary length and detail.\n\nFurthermore, the phrase, \"The answer you're looking for is\" also adds unneeded length to the answer. A more concise response would simply state the answer: \"four\".\n\nConsidering these points, the submission does not meet the criterion of conciseness, as it contains unnecessary extraneous detail and is not as brief as it could be.\n\nN","value": "N","score": 0}
}
eval_result >>  {'reasoning': 'The criterion to evaluate the submission is "conciseness". This requires the answer to be brief, to the point, and without unnecessary information or explanation.\n\nAssessing the submission, the responder did not solely provide the answer. The submission included additional commentary: "That\'s an elementary question." This part of the response is not integral to answering the question and thus adds unnecessary length and detail.\n\nFurthermore, the phrase, "The answer you\'re looking for is" also adds unneeded length to the answer. A more concise response would simply state the answer: "four".\n\nConsidering these points, the submission does not meet the criterion of conciseness, as it contains unnecessary extraneous detail and is not as brief as it could be.\n\nN', 'value': 'N', 'score': 0}

代码

https://github.com/zgpeace/pets-name-langchain/tree/develop

参考

https://python.langchain.com/docs/guides/evaluation/string/criteria_eval_chain

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.hqwc.cn/news/344469.html

如若内容造成侵权/违法违规/事实不符,请联系编程知识网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

如何让GPT/GPT4成为你的编程助手?

详情点击链接:如何让GPT/GPT4成为你的编程助手? 一OpenAI 1.最新大模型GPT-4 Turbo 2.最新发布的高级数据分析,AI画图,图像识别,文档API 3.GPT Store 4.从0到1创建自己的GPT应用 5. 模型Gemini以及大模型Claude2二…

Django框架完成读者浏览书籍,图书详情页,借阅管理

前情回顾: 使用Django框架实现简单的图书借阅系统——完成图书信息管理 文章目录 1.完成展示图书信息功能1.1django 静态资源管理问题1.2编写图书展示模板HTML 2.完成图书详情页功能2.1从后端获取图书详情信息2.2详情页面展示图书数据 3.完成借阅管理功能3.1管理员…

高压消防泵:科技与安全性的完美结合

在现代社会,随着科技的不断发展,各种高科技设备层出不穷,为我们的生活带来了极大的便利。在森林火灾扑救领域,恒峰智慧科技研发的高压消防泵作为一种高效、节能、绿色、环保的优质设备,将科技与安全性完美地结合在一起…

蓝桥杯练习题(五)

📑前言 本文主要是【算法】——蓝桥杯练习题(五)的文章,如果有什么需要改进的地方还请大佬指出⛺️ 🎬作者简介:大家好,我是听风与他🥇 ☁️博客首页:CSDN主页听风与他 …

【计算机组成原理】IEEE 754 标准定义的浮点数表示格式

IEEE 754 IEEE 754是一种由美国电气和电子工程师协会(IEEE)制定的标准,用于定义浮点数的表示和运算。这个标准定义了浮点数的格式、舍入规则、特殊值的处理以及算术操作的执行方式。 IEEE 754浮点数标准主要定义了两种浮点数格式&#xff1…

Windows11 安装MySQL8.0操作

一、从MySQL官网下载MySQL安装包 官网地址: www.mysql.com (1)首先 选择 DOWNLOADS 下载界面 (2)其次选择 MySQL 客户端 下载 (3)选择windows安装MySQL (4)选择MySQL类型…

Jenkins 问题

从gitlab 仓库拉去代码到Jenkins本地报错 ERROR: Couldn’t find any revision to build. Verify the repository and branch configuration for this job. 问题原因: 创建条目》配置的时候,gitlab仓库不存在master分支 修复后:

分裂联邦学习论文-混合联邦分裂学习GAN驱动的预测性多目标优化

论文标题:《Predictive GAN-Powered Multi-Objective Optimization for Hybrid Federated Split Learning》 期刊:IEEE Transactions on Communications, 2023 一、论文介绍 背景:联邦学习作为一种多设备协同训练的边缘智能算法&#xff0…

使用nginx+HTML2canvas将任意html网页转为png图片自定义张数

文章目录 概述网页的转换html2canvas的使用导入导入HTML2canvas库函数定义 nginx部署编写控制截图网页代码iframe 网页控制代码 测试说明 概述 本文简述如何使用nginxhtml2canvas将任意网页html转为png图片 网页的转换 如果是本地网页,直接进行nginx反向代理就行…

30天精通Nodejs--第十七天:express-路由配置

目录 引言基础路由配置路由参数与查询参数路由前缀与子路由路由重定向结语 引言 上篇文章我们简单介绍了express的基础用法,包括express的安装、创建路由及项目启动,对express有了一个基础的了解,这篇开始我们将详细介绍express的一些高级用…

【开源】基于JAVA语言的民宿预定管理系统

目录 一、摘要1.1 项目介绍1.2 项目录屏 二、功能模块2.1 用例设计2.2 功能设计2.2.1 租客角色2.2.2 房主角色2.2.3 系统管理员角色 三、系统展示四、核心代码4.1 查询民宿4.2 新增民宿4.3 新增民宿评价4.4 查询留言4.5 新增民宿订单 五、免责说明 一、摘要 1.1 项目介绍 基于…

python MySQL学习

免费 MySQL Community Server 社区版本 免费 但是MySQL 不提供官方技术支持 MySQL Cluster 集群版 开源免费 可将几个 MySQL Server 封装乘一个Server 收费 MySQL Enterprise Edition 商业版 该版本是收费的 可以试用30天 官方提供技术支持 MySQL Cluster CGE 高级集群版…