大模型水印——《动手学大模型》实践教程第四章

1 前言

从一些大模型相关的论文里学了一些理论知识,但是还是欠缺实践经验,本系列博文是在学习上交大张倬胜老师的开源项目的基础上写的相关总结,旨在提升自己的大模型实践能力。开源项目地址:dive-into-llms
备注:本项目大部分资源都需要科学上网才能获取。

免责声明
本系列博文所有技巧仅供参考,不保证百分百正确。若有任何问题,欢迎联系本人。
本系列博文所涉及的资源均来自互联网,如侵犯了您的版权请联系我删除,谢谢。

2 大模型水印

简要介绍:
A Watermark for Large Language Models

基础方法:

1、首先,使用语言模型根据已知的提示语句生成预测下一个token的词概率向量。

2、根据已知提示语句的最后一个token,将其用哈希值表示(这里使用哈希值是为了唯一标注这个token,因为不同的token会有截然不同的哈希值),然后根据token的哈希值赋予其一个唯一的随机种子。

3、根据随机种子对词概率向量进行随机拆分,并且保证红绿列表中包含的token数一致。拆分的方法大概是这样的,假设有词概率向量的维度为10000,这就相当于有10000个token,当随机种子为0.4的时候,我们就从第4000个词概率进行拆分,选取4000-8999作为红色列表中的token,选取9000-10000和1-3999作为绿色列表中的token。这样就实现了对词概率向量的随机拆分。

4、接下来在选择预测的token中,我们就只能从绿色列表中的token去挑选,而不能从红色列表中的token去挑选。

5、循环上述1-4步骤,直到生成全部语句。

如下图所示:

hard
上述水印的注入方法虽然满足了作者所设置的五个条件,但是如何低熵序列加注水印的问题就更加突出。因为上述方法在注入水印的过程中,由于红绿列表的选择是随机的,不可避免的会将一些原本确定性很强的token划分到红色列表中,这就使得语言模型的输出会产生简单的错误。举个例子我们都知道“1+1=2”但是注入水印后“=”后面的“2”有1/2的可能被划分到红色token中,这就使得语言模型的输出结果不再是“1+1=2”,这种错误是致命的。

一种更加复杂的水印加注方法:

1、首先,使用语言模型根据已知的提示语句生成预测下一个token的词概率向量。

2、根据已知提示语句的最后一个token,将其用哈希值表示(这里使用哈希值是为了唯一标注这个token,因为不同的token会有截然不同的哈希值),然后根据token的哈希值赋予其一个唯一的随机种子。

3、通过随机种子对词概率向量进行随机拆分,但是在这里还有所不同,作者加入了超参数用于控制绿色列表相对于整体列表的占比。随机种子的作用并没有改变还是确定起始拆分位置,但是超参数又进一步规定了末尾拆分位置。

4、将超参数加到绿色列表的token向量中,并且重新进行softmax操作。

5、将新产生的token加入到输出提示语句中,并进行下一个词概率向量的计算。

6、循环上述步骤直到得到完整的最终输出语句。

如下图所示:
soft

1
2
3
4
5
6
7
8
9

3 实践代码

目标

  1. 水印嵌入:在语言模型生成内容时嵌入水印
  2. 水印检测:检测给定文本的水印强度
  3. 水印评估:评估水印方法的检测性能
  4. 评估水印的鲁棒性(可选)

3.1 准备工作

3.1.1 了解X-SIR代码仓库

https://github.com/zwhe99/X-SIR

X-SIR仓库包含以下内容的实现

  • 三种文本水印算法:X-SIR, SIR和KGW
  • 两种水印去除攻击方法:paraphrase和translation
    x-sir

3.1.2 环境准备

git clone https://github.com/zwhe99/X-SIR && cd X-SIR
conda create -n xsir python==3.10.10
conda activate xsir
pip3 install -r requirements.txt
# [optional] pip3 install flash-attn==2.3.3

requirements.txt里的版本均为建议版本,并非强制要求。

3.2 实操案例

使用KGW算法在语言模型生成内容中嵌入水印

3.2.1 数据准备

将待输入给语言模型的提示(prompt)组织成jsonl文件:

{"prompt": "Ghost of Emmett Till: Based on Real Life Events "}
{"prompt": "Antique Cambridge Glass Pink Decagon Console Bowl Engraved Gold Highlights\n"}
{"prompt": "2009 > Information And Communication Technology Index statistics - Countries "}
......
  • 每行是一个json object,并至少包含名为“prompt”的键
  • 后续内容以data/dataset/mc4/mc4.en.jsonl文件为例。此文件一共包含500条数据,如果觉得模型处理时间过长,可以考虑自行缩减数据。

3.2.2 水印嵌入

  • 选择模型和水印算法。这里我们选择baichuan-inc/Baichuan-7B模型,以及KGW水印算法

    • MODEL_NAME=baichuan-inc/Baichuan-7B
      MODEL_ABBR=baichuan-7b
      WATERMARK_METHOD_FLAG="--watermark_method kgw"
      
  • 生成内容,并嵌入水印

    • python3 gen.py \
          --base_model $MODEL_NAME \
          --fp16 \
          --batch_size 32 \
          --input_file data/dataset/mc4/mc4.en.jsonl \
          --output_file gen/$MODEL_ABBR/kgw/mc4.en.mod.jsonl \
          $WATERMARK_METHOD_FLAG
      
    • 此命令将模型生成的内容保存至输出文件:gen/$MODEL_ABBR/kgw/mc4.en.mod.jsonl

    • 输出文件的格式如下,其中response为模型的输出内容:

      • {"prompt": "Ghost of Emmett Till: Based on Real Life Events ", "response": ".In August if 1955 African American Emmett Louis Till (21)\nThe second part of The Man From Waco, about Dan Millers trial for murdering his friend Michael Capps in a Texas wiener wrastle as I believe the statute says called it then; back at that time that would have surely occurred since Dan kept his pistol in one of those watery doggy bags he keeps around to clean himself with after emptying can into a nearby lake just minutes before committing his crime. If what we read is true thats exactly where Dan left his stolen gun and later used it in the robbery gone wrong which killed two innocent boys when his own accomplice got into an argument over not being paid enough therefore wanting out. This angered Miller whos history of mental instability could be taken one way or another but this criminal act was unavoidable once they entered FBIs hands and some other very powerful law officers who were involved either directly"}
        {"prompt": "Antique Cambridge Glass Pink Decagon Console Bowl Engraved Gold Highlights\n", "response": "An exceptionally fine decorative antique pink decagonal glass side bowl is the perfect example of early art and innovation . Fully engraved, this beautiful English vintage tableware piece exhibits a great degree on craftsmanship! Made in England during the mid 1800's it features three sets of concentric ribbons on the exterior to elegantly highlight an intricate, deep reddish color which evokes warmth and comfort for years to come! This historically significant vase has been featured within numerous museum exhibitions including \"Glass at The Corning Museum\" ; \"The First Half Century\" & a special travelling exhibit called:\" Sight Of Glass: British Cut Glass\" by ibex limited (retailer) as well as \"SIGNALS - Celebrating History In American Silver Through The Articulated Bottle Vessel\" presented at the Corning Museum of Glass 2012 ASA national symposium! We provide our customers with quality phot"}
        {"prompt": "2009 > Information And Communication Technology Index statistics - Countries ", "response": "5/22/2016\nAnnual change of mobile telephone subscriptions in Armenia (per 1 population). 2.2% increase is equivalent to 38 subscriptions per 100 people. Density rank: 121 out of 222.\nCyclist(s)/month(S). Likes bike riding? Take advantage of discount and cheap rental bikes at Rimon Bike Rentals in Yerevan! No advance payments or additional deposits are required. They have a good range of bicycles, including mountainbikes. More on their Facebook page \nYou must know about electric cars. The Renault Fluence KZERO gets it right in the city but I'm not sure what mileage you can expect from it. Still its fun project http://www.renault-kzen.com\nFor more on this and related issues : Armenian Institute for Electronic Governance reports |"}
        ......
        

3.2.3 水印检测

水印检测即给定一段文本,计算该段文本的水印强度(z-score)。

  • 计算有水印文本的水印强度

    • python3 detect.py \
          --base_model $MODEL_NAME \
          --detect_file gen/$MODEL_ABBR/kgw/mc4.en.mod.jsonl \
          --output_file gen/$MODEL_ABBR/kgw/mc4.en.mod.z_score.jsonl \
          $WATERMARK_METHOD_FLAG
      
  • 计算无水印文本的水印强度

    • python3 detect.py \
          --base_model $MODEL_NAME \
          --detect_file data/dataset/mc4/mc4.en.jsonl \
          --output_file gen/$MODEL_ABBR/kgw/mc4.en.hum.z_score.jsonl \
          $WATERMARK_METHOD_FLAG
      
  • 输出的文件格式为:

    • {"z_score": 12.105422509165574, "prompt": "Ghost of Emmett Till: Based on Real Life Events ", "response": ".In August if 1955 African American Emmett Louis Till (21)\nThe second part of The Man From Waco, about Dan Millers trial for murdering his friend Michael Capps in a Texas wiener wrastle as I believe the statute says called it then; back at that time that would have surely occurred since Dan kept his pistol in one of those watery doggy bags he keeps around to clean himself with after emptying can into a nearby lake just minutes before committing his crime. If what we read is true thats exactly where Dan left his stolen gun and later used it in the robbery gone wrong which killed two innocent boys when his own accomplice got into an argument over not being paid enough therefore wanting out. This angered Miller whos history of mental instability could be taken one way or another but this criminal act was unavoidable once they entered FBIs hands and some other very powerful law officers who were involved either directly", "biases": null}
      {"z_score": 12.990684249887122, "prompt": "Antique Cambridge Glass Pink Decagon Console Bowl Engraved Gold Highlights\n", "response": "An exceptionally fine decorative antique pink decagonal glass side bowl is the perfect example of early art and innovation . Fully engraved, this beautiful English vintage tableware piece exhibits a great degree on craftsmanship! Made in England during the mid 1800's it features three sets of concentric ribbons on the exterior to elegantly highlight an intricate, deep reddish color which evokes warmth and comfort for years to come! This historically significant vase has been featured within numerous museum exhibitions including \"Glass at The Corning Museum\" ; \"The First Half Century\" & a special travelling exhibit called:\" Sight Of Glass: British Cut Glass\" by ibex limited (retailer) as well as \"SIGNALS - Celebrating History In American Silver Through The Articulated Bottle Vessel\" presented at the Corning Museum of Glass 2012 ASA national symposium! We provide our customers with quality phot", "biases": null}
      {"z_score": 11.455466938203664, "prompt": "2009 > Information And Communication Technology Index statistics - Countries ", "response": "5/22/2016\nAnnual change of mobile telephone subscriptions in Armenia (per 1 population). 2.2% increase is equivalent to 38 subscriptions per 100 people. Density rank: 121 out of 222.\nCyclist(s)/month(S). Likes bike riding? Take advantage of discount and cheap rental bikes at Rimon Bike Rentals in Yerevan! No advance payments or additional deposits are required. They have a good range of bicycles, including mountainbikes. More on their Facebook page \nYou must know about electric cars. The Renault Fluence KZERO gets it right in the city but I'm not sure what mileage you can expect from it. Still its fun project http://www.renault-kzen.com\nFor more on this and related issues : Armenian Institute for Electronic Governance reports |", "biases": null}
      ......
      
  • 肉眼查看一下两个文件水印强度的区别

3.2.4 水印评估

  • 输入水印检测的z-score文件,计算检测准确度,绘制ROC曲线

    • python3 eval_detection.py \
              --hm_zscore gen/$MODEL_ABBR/kgw/mc4.en.hum.z_score.jsonl \
              --wm_zscore gen/$MODEL_ABBR/kgw/mc4.en.mod.z_score.jsonl \
              --roc_curve roc
      
      AUC: 1.000
      
      TPR@FPR=0.1: 0.998
      TPR@FPR=0.01: 0.998
      
      F1@FPR=0.1: 0.999
      F1@FPR=0.01: 0.999
      

tpr

3.3 评估水印的鲁棒性

对水印文本进行paraphrase和translation攻击后,重新评估其检测效果

3.3.1 准备工作

我们使用gpt-3.5-turbo-1106模型对水印文本进行paraphrase和translation。也可以自行选择其它工具。

  • 设置openai的apikey

    • export OPENAI_API_KEY=xxxx
      
  • 修改attack/const.py中的RPM (requests per min) and TPM (tokens per min)

3.3.2 进行攻击(以翻译为例)

  • 将水印文本翻译成中文

    • python3 attack/translate.py \
          --input_file gen/$MODEL_ABBR/kgw/mc4.en.mod.jsonl \
          --output_file gen/$MODEL_ABBR/kgw/mc4.en-zh.mod.jsonl \
          --model gpt-3.5-turbo-1106 \
          --src_lang en \
          --tgt_lang zh
      
  • 重新评估

  • 比较攻击前后水印性能的变化

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值