https://arxiv.org/pdf/2307.02483
https://proceedings.neurips.cc/paper_files/paper/2023/file/fd6613131889a4b656206c50a8bd7790-Paper-Conference.pdf
越狱:LLM安全训练为何失败?
文章目录
https://proceedings.neurips.cc/paper_files/paper/2023/hash/fd6613131889a4b656206c50a8bd7790-Abstract-Conference.html
摘要
为安全和无害性而训练的大型语言模型(LLMs)仍然容易受到对抗性滥用的