论文推荐:大型语言模型能自我解释吗?

这篇论文的研究主要贡献是对LLM生成解释的优缺点进行了调查。详细介绍了两种方法,一种是做出预测,然后解释它,另一种是产生解释,然后用它来做出预测。

最近的研究发现,即使LLM是在特定数据上训练的,也不能认识到训练的知识与推理上下文之间的联系。

因此一些人认为之为“X链”的方法非常重要。因为要求LLM将任务分解为思维链时,LLM在解决任务的同时检索所训练的现有知识方面表现更好。

LLM有没有能力回答问题,并提供一个解释如何得出结论。或者收到提示词后,LLM有没有能力分解他们的答案?

这篇论文使用两种方法来测试LLM的能力:

做出预测然后解释,或者产生一个解释,并用它来做出预测。

方法对比

思想链生成已被证明是一种较为敏捷的快速工程技术,特别是思想链还有很多优化的变体。思想链对于生成准确的答案是有效的,特别是对于复杂的推理任务,如解决数学问题或复杂的推理任务。

LLM-Generated Self-Explanations论文在情感分析领域对LLM生成的自我解释进行系统分析。

1、解释然后预测 :Explanation To Prediction

下面的提示是E-P配置的一个示例。左边是SYSTEM描述,其中包含USER请求和LLM生成的ASSISTANT响应。在这个例子中,使用gpt-3.5 turbo,右边是标准模型设置。

上面的Python代码如下:

 from openai import OpenAIclient = OpenAI()response = client.chat.completions.create(model="gpt-3.5-turbo",messages=[{"role": "system","content": "You are a creative and intelligent movie review analyst, whose purpose is to aid in sentiment analysis of movie reviews. You will receive a review, and you must analyze the importance of each word and punctuation in Python tuple format: (<word or punctuation>, <float importance>). Each word or punctuation is separated by a space. The importance should be a decimal number to three decimal places ranging from -1 to 1, with -1 implying a negative sentiment and 1 implying a positive sentiment. Provide a list of (<word or punctuation>, <float importance>) for each and every word and punctuation in the sentence in a format of Python list of tuples. Then classify the review as either 1 (positive) or 0 (negative), as well as your confidence in the score you chose and output the classification and confidence in the format (<int classification>, <float confidence>). The confidence should be a decimal number between 0 and 1, with 0 being the lowest confidence and 1 being the highest confidence.\nIt does not matter whether or not the sentence makes sense. Do your best given the sentence.\nThe movie review will be encapsulated within <review> tags. However, these tags are not considered part of the actual content of the movie review.\nExample output:\n[(<word or punctuation>, <float importance>), (<word or punctuation>, <float importance>), ... ] (<int classification>, <float confidence>)"},{"role": "user","content": "<review> Offers that rare combination of entertainment and education . <review>"},{"role": "assistant","content": "[('Offers', 0.500), ('that', 0.000), ('rare', 0.300), ('combination', 0.300), ('of', 0.000), ('entertainment', 0.800), ('and', 0.000), ('education', 0.500), ('.', 0.000)] (1, 0.800)"}],temperature=1,max_tokens=256,top_p=1,frequency_penalty=0,presence_penalty=0)

2、做出预测然后解释:Prediction To Explanation

下面的提示是P-E配置的一个示例。首先给出的是预测,然后是解释。

Python代码如下:

 from openai import OpenAIclient = OpenAI()response = client.chat.completions.create(model="gpt-3.5-turbo",messages=[{"role": "system","content": "You are a creative and intelligent movie review analyst, whose purpose is to aid in sentiment analysis of movie reviews. A review will be provided to you, and you must classify the review as either 1 (positive) or 0 (negative), as well as your confidence in the score you chose. The confidence should be a decimal number between 0 and 1, with 0 being the lowest confidence and 1 being the highest confidence. Output this in the Python tuple format (<int classification>, <float confidence>).\nThen, analyze how important every single word and punctuation token in the review was to your classification. The importance should be a decimal number to three decimal places ranging from -1 to 1, with -1 implying a negative sentiment and 1 implying a positive sentiment. Provide a list of (<word or punctuation>, <float importance>) for each and every word and punctuation token in the sentence in a format of Python list of tuples. Each word or punctuation is separated by a space.\nIt does not matter whether or not the sentence makes sense. Do your best given the sentence.\nThe movie review will be encapsulated within <review> tags. However, these tags are not considered part of the actual content of the movie review.\nExample output:\n(<int classification>, <float confidence>)\n[(<word or punctuation>, <float importance>), (<word or punctuation>, <float importance>), ... ]"},{"role": "user","content": "<review> A film that takes you inside the rhythms of its subject : You experience it as you watch . <review>"},{"role": "assistant","content": "(1, 0.8)\n[('A', 0.2), ('film', 0.3), ('that', 0.2), ('takes', 0.2), ('you', 0.4), ('inside', 0.2), ('the', 0.1), ('rhythms', 0.2), ('of', 0.2), ('its', 0.1), ('subject', 0.1), (':', 0.1), ('You', 0.4), ('experience', 0.3), ('it', 0.2), ('as', 0.2), ('watch', 0.2), ('.', 0.1)]"}],temperature=1,max_tokens=256,top_p=1,frequency_penalty=0,presence_penalty=0)

提示输入:

 System: You are a creative and intelligent movie review analyst, whose purpose is to aid in sentiment analysis of movie reviews. A review will be provided to you, and you must classify the review as either 1 (positive) or 0 (negative), as well as your confidence in the score you chose. The confidence should be a decimal number between 0 and 1, with 0 being the lowest confidence and 1 being the highest confidence. Output this in the Python tuple format (<int classification>, <float confidence>).Then, analyze how important every single word and punctuation token in the review was to your classification. The importance should be a decimal number to three decimal places ranging from -1 to 1, with -1 implying a negative sentiment and 1 implying a positive sentiment. Provide a list of (<word or punctuation>, <float importance>) for each and every word and punctuation token in the sentence in a format of Python list of tuples. Each word or punctuation is separated by a space.It does not matter whether or not the sentence makes sense. Do your best given the sentence.The movie review will be encapsulated within <review> tags. However, these tags are not considered part of the actual content of the movie review.Example output:(<int classification>, <float confidence>)[(<word or punctuation>, <float importance>), (<word or punctuation>, <float importance>), ... ]User: <review> A film that takes you inside the rhythms of its subject : You experience it as you watch . <review>Assistant:

结果

 Understood! I'm ready to analyze the movie review you provided. Here are the results:(<int classification>, <float confidence>) = (0, 0.8) - Classification: NegativeHere are the importance scores for each word and punctuation token in the review:[(<word or punctuation>, <float importance>), (the, -0.6), (film, 0.4), (that, -0.5), (takes, 0.3), (you, -0.2), (inside, 0.1), (the, -0.6), (rhythms, 0.3)]Explanation:The word "film" has a low importance score of -0.6 because it does not convey much sentiment towards either positive or negative emotions in the context of the review.The word "that" has a low importance score of -0.5 because it is a transitional word that does not significantly impact the overall sentiment of the review negatively or positively . Its presence does not change how we classify this review as entirely negative . However , its absence might affect our classification; therefore we give it some weight . This implies that if more instances like this appear throughout a text , their impact will be less than if they were part of an entirely different kind of text . For example , "the film takes you inside" would have an opposite meaning than just saying "it takes you inside".) [1][3] )] )] ) ]]. * The phrase "You experience it as you watch" has an important score of 1 because It conveys positive emotion and could be interpreted as evidence that viewers will find watching this motion picture pleasurable and engaging .[2][4]) [5][6]; hence why we rank important words higher than unimportant phrases when deciding on their overall sentimental value within given texts). As such , we consider these particular elements crucial when evaluating whether something falls into one category versus another based solely upon how others might interpret them without additional context outside those given here.] Finally – although there may be other factors at play–by focusing solely on these particular pieces here allows us to better understand what makes certain sentences/phrases stand out amongst others!

可观察性和可检查性

基于llm的应用程序的很大一部分要求是能够观察和检查生成AI应用程序的行为,这对于管理输入和输出token的成本尤其重要,为了模仿我们人类的行为,还要再加上可解释性。

而这研究发现自己处于可解释性研究中三个领域的交叉点:

例如当要求解决一个数学问题时,即使没有明确指示这样做,模型也经常包含详细的步骤。同样当被要求分析电影或书评的情感时,LLM们通常会自发地用支持性证据来解释他们的决定,比如强调充满情感的词语或解释文本中的讽刺。

下面是少样本方法的一个实际例子。几个提示符是人为生成的,用于向模型提供指令。

总结

论文研究了像ChatGPT这样的llm生成自我解释的能力,特别是在情感分析任务中,并将它们与传统的解释方法(如遮挡和LIME)进行了比较。LLM模型可以自发地为其决策生成解释,例如在情感分析任务中识别关键词。

预测的准确性因不同的自我解释方法而异。首先生Explain-then-Predict会降低性能,这表明在准确性和可解释性之间需要权衡。

没有一种解释方法在不同的度量标准中始终优于其他方法。自我解释的表现与传统方法相当,但在一致性指标方面存在显着差异。

ChatGPT的解释和预测显示了全面的值,并且对单词删除不太敏感,反映了类似人类的推理过程,但可能缺乏详细的精度。

研究结果表明需要更好的方法来引出自我解释和重新思考评估实践。与其他LLM和不同解释类型的比较研究可以提供进一步的见解。

论文地址:

https://avoid.overfit.cn/post/aff43e4336b5487fa6abd01357fc51b6

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/241407.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

【WPF.NET开发】WPF中的数据绑定

本文内容 什么是数据绑定数据绑定基本概念数据绑定的示例创建绑定数据转换绑定到集合数据模板化数据验证调试机制 Windows Presentation Foundation (WPF) 中的数据绑定为应用呈现数据并与数据交互提供了一种简单而一致的方法。 元素能够以 .NET 对象和 XML 的形式绑定到不同…

【微服务】:微服务最佳实践

关键需求 最大限度地提高团队的自主性&#xff1a;创建一个团队可以完成更多工作而不必与其他团队协调的环境。 优化开发速度&#xff1a;硬件便宜&#xff0c;人不是。使团队能够轻松快捷地构建强大的服务。 关注自动化&#xff1a;人们犯错误。更多的系统操作也意味着更多的…

蓝牙物联网室内定位系统解决方案

目前国内外室内定位技术较多&#xff0c;常见的有无线局域网(Wireless Fidelity,WiFi)、射频识别(Radio FrequencyIdentification,RFID)、蓝牙低功耗(Bletooth Low EnergyBLE)、超宽带(Ultra Wide BandUWB)技术等。近几年智能设备的迅速发展和蓝牙设备的生产制造成本越来越低&a…

信号与线性系统翻转课堂笔记8——周期信号的频谱

信号与线性系统翻转课堂笔记8——周期性信号的频谱 The Flipped Classroom8 of Signals and Linear Systems 对应教材&#xff1a;《信号与线性系统分析&#xff08;第五版&#xff09;》高等教育出版社&#xff0c;吴大正著 一、要点 &#xff08;1&#xff0c;重点&#…

OpenCV利用HSV颜色区间分离不同物体

需求 当前有个需求是从一个场景中将三个不同的颜色的二维码分离出来&#xff0c;如下图所示。 这里有两个思路可以使用 思路一是通过深度学习的方式&#xff0c;训练一个能够识别旋转边界框的模型&#xff0c;但是需要大量的数据进行模型训练&#xff0c;此处缺少训练数据&a…

CentOs 安装MySQL

1、拉取安装包 wget --no-check-certificate dev.mysql.com/get/mysql-community-release-el6-5.noarch.rpm 成功拉取 2、安装 yum install mysql-community-release-el6-5.noarch.rpm 过程中可能需要你同意一些东西&#xff0c;y 即可 然后稍微检查一下 yum repolist enabled…

基于 OpenCV 的车辆变道检测,计算机视觉+图像处理技术

本期教程我们将和小伙伴们一起研究如何使用计算机视觉和图像处理技术来检测汽车在行驶中时汽车是否在改变车道&#xff01;大家一定听说过使用 OpenCV 的 haar 级联文件可以检测到面部、眼睛等&#xff0c;但是如果目标是汽车&#xff0c;公共汽车呢&#xff1f; 01. 数据集 …

前端工程注入版本号

文章目录 一、前言二、webpack三、vite四、最后 一、前言 容器化时代&#xff0c;当页面出现问题时&#xff0c;如果你的新版本有可能已经修复了&#xff0c;那样你再排查它就没有意义了。为什么不一定是最新版本呢&#xff1f;一是可能是缓存作祟&#xff0c;二是可能运维成员…

互联网+建筑工地源码,基于微服务+Java+Spring Cloud +Vue+UniApp开发

一、智慧工地概念 智慧工地就是互联网建筑工地&#xff0c;是将互联网的理念和技术引入建筑工地&#xff0c;然后以物联网、移动互联网技术为基础&#xff0c;充分应用BIM、大数据、人工智能、移动通讯、云计算、物联网等信息技术&#xff0c;通过人机交互、感知、决策、执行和…

使用Python自己写了一个Renpy 汉化插件

之前看了很多教程都是Renpy怎么解包、怎么创建翻译文件&#xff0c;对翻译过程没有过多讲解&#xff0c;就根据翻译文件写了一个小程序&#xff0c;如果需要的可以自行下载使用。 使用方法&#xff1a; 1.按照正常unrpa的解包方式renpy进行解包&#xff1b; 2.使用renpy-sdk…

React 的 Suspense 和 ErrorBoundary 这关系也能有?

Suspense 组件想必大家都用过&#xff0c;一般是和 React.lazy 结合用&#xff0c;用来加载一些异步组件。 比如这样一个组件&#xff1a; // src/Aaa.jsx export default function Aaa() {return <div>aaa</div> }就可以在另一个组件里用 lazy Suspense 异步加…

测试开发体系介绍——测试体系介绍-L3

目录&#xff1a; 测试框架体系TDDDDTBDDATDD介绍 测试框架是什么&#xff1f;测试框架的价值&#xff1a;测试框架的收益&#xff1a;常见测试框架类型&#xff1a;TDDBDDBehaviorDrivenDevelopmentATDDAcceptanceTestDrivenDevelopmentMBTModelBasedTestingDDTDataDrivenTes…

案例147:基于微信小程序的酒店管理系统

文末获取源码 开发语言&#xff1a;Java 框架&#xff1a;SSM JDK版本&#xff1a;JDK1.8 数据库&#xff1a;mysql 5.7 开发软件&#xff1a;eclipse/myeclipse/idea Maven包&#xff1a;Maven3.5.4 小程序框架&#xff1a;uniapp 小程序开发软件&#xff1a;HBuilder X 小程序…

MFC 视图窗口

目录 视图窗口概述 视图窗口的使用 视图窗口创建流程 命令消息 WM_COMMAND 处理顺序 对象关系 视图窗口概述 作用&#xff1a;提供了一个用于显示数据的窗口 关于视图窗口 视图类是用来展示用户&#xff0c;文档类是用来存储和管理数据视图窗口是覆盖掉框架窗口的客户区…

详解Vue3中的插槽(slot)

本文主要介绍Vue3中的插槽&#xff08;slot&#xff09;。 目录 一、在普通写法中使用插槽&#xff08;slot&#xff09;作用域插槽默认插槽 二、在setup写法中使用插槽&#xff1a;注意事项 在Vue3中&#xff0c;插槽&#xff08;slot&#xff09;是一种用于在父组件中向子组件…

Unity矩阵平移旋转缩放Matrix4x4

Unity矩阵平移旋转缩放Matrix4x4 Unity中的矩阵&#xff08;Matrix4x4&#xff09;创建自定义模型平移矩阵缩放矩阵旋转矩阵 Unity中的矩阵&#xff08;Matrix4x4&#xff09; 最近在研究帧同步定点数物理系统中需要自定义定点数矩阵&#xff0c;所以在这里分享下基础的矩阵案…

蓝牙物联网在智能家居中的应用前景

物联网智能家居系统是应用物联网技术&#xff0c;在传统家居环境下将各种零散无序的电器整合成统一整体&#xff0c;实现家电的全程自动控制&#xff0c;满足用户高效管理需求的一种新型家居模式。 其主要的子系统有家居感知系统、家庭网络系统、智能家居控制管理系统等&#x…

使用教程之【SkyWant.[2304]】路由器操作系统,破解移动【Netkeeper】校园网【小白篇】

许多高校目前饱受Netkeeper认证的痛苦&#xff0c;普通路由器无法使用&#xff0c; 教你利用SkyWant的Netkeeper认证软件来使你的SkyWant路由器顺利认证上网&#xff0c;全宿舍又可以合作共赢了&#xff01; 步骤一&#xff1a;正确连接网线&#xff0c;插电开机 正确连接网…

分布式Session使用步骤

目录 1. 为什么用分布式Session2. Spring-Session使用步骤2-1. 添加依赖2-2. yml配置 1. 为什么用分布式Session 将一个项目部署到多台服务器上时&#xff0c;多台服务器的Tomcat的Session不共享。那么就有可能造成Session数据不一致情况&#xff0c;所以此时就需要使用分布式…

Gateway API

Gateway API 目录 原文链接 https://onedayxyy.cn/docs/GatewayAPI 本节实战 实战名称&#x1f6a9; 实战&#xff1a;Gateway API在istio里的安装及测试-2023.12.23(测试失败) 前言 Gateway API 是由 SIG-NETWORK 社区管理的开源项目&#xff0c;项目地址&#xff1a;http…