Proj CJI Paper Reading: Detecting language model attacks with perplexity-编程知识

Proj CJI Paper Reading: Detecting language model attacks with perplexity

news/2025/2/8 1:46:44/文章来源:https://www.cnblogs.com/xuesu/p/18703521

Abstract

Tool: PPL
Findings:
1. queries with adversarial suffixes have a higher perplexity, 可以利用这一点检测
2. 仅仅使用perplexity filter对mix of prompt types不合适，会带来很高的假阳率
Method: 使用Light-GDB根据perplexity和token length filter带有adversarial suffixes的prompts
base model: GPT-2
Metric for detection: Perplexity
- \(PPL(x) = exp{-\frac{1}{t}\sum_{i=1}^t{logp(x_i|x_{<i})}}\)
- the exponential of the average negative log-likelihood of the sequence
Use Metric: \(F_{\beta}\) to assess detection performance
- \(F_{\beta} = (1+\beta^2) \times \frac{precision x recall}{\beta^2 \times precision + recall}\)
- beta = 2
  - how to choose beta
    - The cost of failing to respond to legitimate inquiries versus the cost of releasing forbidden responses.
      - failing to respond to legitimate inquires: false positive
      - releasing forbidden responses: false negative
    - The expected distribution of different types of prompts, such as English, multilingual, or prompts containing symbols and math.
    - The effectiveness of the LLM's built-in defenses or other defensive measures.

4. Data

4.2 Adversarial prompt clusters

4.4 Non-adversarial prompts

non-adversarial prompts
- 6994 prompts from humans with GPT-4. (see Appendix B.5).
- 998 prompts from the DocRED dataset (see Appendix B.1).
- 3270 prompts from the SuperGLUE (boolq) dataset (see Appendix B.2).
- 11873 prompts from the SQuAD-v2 dataset (see Appendix B.3).
- 24926 prompts with instructions from the Platypus dataset, which were used to train the Platypus models (see Appendix B.4).
- 116862 prompts derived from the “Tapir” dataset by concatenating instructions and input (see Appendix B.6).
- 10000 instructional code search prompts extracted from the instructional code-search-net python dataset (see Appendix B.7).
adversarial prompts
- 1407 prompts generated from GCG + Vicuna-7b-1.5
- 79 human-designed prompts to break GPT-4
  - rubend18/ChatGPT-Jailbreak-Prompts https://huggingface.co/datasets/rubend18/ChatGPT-Jailbreak-Prompts

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.hqwc.cn/news/880311.html

如若内容造成侵权/违法违规/事实不符，请联系编程知识网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！

年化收益17倍的红中探底（单针探底）操作策略

作者QQ： 396068801，加Q分享交流通达信红中探底指标。红中探底条件： # 更新日志：# 2025-2-7 改为红中创10天新低，25日均线向上，尾盘进。# 止盈操作：涨停不卖，每涨8%就卖一半，直到浮盈达到3倍清仓；高点跌8%清仓。# 止损：买入后只要跌超6%就清仓止损。# 卖票时间： …

Java基础学习（十五）

Java基础学习（十五）：IO流目录Java基础学习（十五）：IO流概念基本流字节输出流 FileOutputStream字节输入流 FileInputStream字符集Java 中的编码和解码字符输入流 FileReader字符输出流 FileWriter缓冲流字节缓冲流字符缓冲流转换流序列化流和反序列化流序列化流反序列化…

linux vm tools 问题

转载vmware tools 失效问题解决方式（Ubuntu 22 以及其他系统） - 知乎今天新装了Kubuntu 以及 Ubuntu 版本均为 22x，一如既往的操作，最后发现VMware tools失效。尝试输入vmware-user发现又可以了。具体表现： 1、窗口分辨率可以缩放，无法复制粘贴，无法复制粘贴文件。 2…

【shell脚本】轻松搞定打包与Shell部署

本篇和大家分享的是springboot打包并结合shell脚本命令部署，重点在分享一个shell程序启动工具，希望能便利工作； 1. profiles指定不同环境的配置通常一套程序分为了很多个部署环境：开发，测试，uat，线上等，我们要想对这些环境区分配置文件，可以通过两种方式： 1、通过a…

01 HTML详解

一. HTML语言 HTML是超文本标记语言。超文本：文本、图片、声音、视频、表格、链接等等。标记：由许许多多的标签组成。二. HTML结构 HTML 代码是由 "标签" 构成的。形如: <body>hello</body>标签名 (body) 放到 < > 中。大部分标签成对出现。…

【Nginx】Nginx 配置页面请求不走缓存浏览器页面禁用缓存

我是Superman丶巴韭特锁螺丝 2025年02月07日 08:50 陕西前言使用缓存的优点在于减少数据传输，节省网络流量，加快响应速度；减轻服务器压力；提供服务端的高可用性；缺点在于数据的不一致问题；增加成本 Nginx作为Web缓存服务器，介于客户端和应用服务器之间，当用户通过浏…

NLog日志(三)

程序开发日志输出常用逻辑 1.应用启动和关闭添加新配置<rules><logger name="*" minlevel="Debug" writeTo="logconsole" /><logger n…

XXL-CACHE v1.2.0 ｜多级缓存框架

Release Notes1、【增强】多序列化协议支持：针对L2缓存，组件化抽象Serializer，可灵活扩展更多序列化协议； 2、【优化】移除冗余依赖，精简Core体积；XXL- CACHE 快速接入示例代码参考github仓库 /test 目录：https://github.com/xuxueli/xxl-cache/tree/master/xxl-cache-s…

将模型api集成到python中

1.今日成果 1-1从阿里百炼上获取使用API的代码，在本地配置好环境，运行。 1-2ollama上拉取视频理解的模型，却没有上传视频的界面，可以使用python代码加载模型 1-3huggingface上的模型可以通过transformer集成到python运行。 1-4Qwen模型本地部署的环境搭建好了 2.未解决的问…

代码如下package com.loubin;import java.lang.annotation.*; import java.lang.reflect.Constructor; import java.lang.reflect.Field; import java.lang.reflect.InvocationTargetException; import java.lang.reflect.Method;public class Main {public static void main(S…

ACM寒假集训第三次专题任务

ACM寒假集训第三次专题任务一、Priority Queue 题目：解题思路：对优先队列的直接运用，直接翻译题目即可。 AC代码： #include<iostream> #include<string> #include<queue> using namespace std; int main() {int k;string operation;priority_queue<…