Detoxifying Large Language Models via Knowledge Editing

本文提出了一种名为DINM的新方法,通过知识编辑来改善大型语言模型(LLM)的安全性。建立的SafeEdit基准用于评估LLM在九个不安全类别中的表现,实验显示知识编辑可以有效解毒LLM,而DINM可以在少量调整步骤中减少模型的毒性。

本文是LLM系列文章,针对《Detoxifying Large Language Models via Knowledge Editing》的翻译。

通过知识编辑去除大型语言模型的毒素

摘要

本文研究了使用知识编辑技术对大型语言模型(LLM)进行解毒。我们构建了一个基准,即SafeEdit,它涵盖了九个不安全的类别,并提供了各种强大的攻击提示,并为系统评估提供了全面的指标。我们对几种知识编辑方法进行了实验,表明知识编辑有可能有效地解毒LLM,但对总体性能的影响有限。然后,我们提出了一个简单而有效的基线,称为术中神经监测排毒(DINM),仅通过一个实例在几个调整步骤内降低LLM的毒性。我们进一步深入分析了各种解毒方法的内在机制,表明以前的方法,如SFT和DPO,可能只是抑制毒性参数的激活,而DINM在一定程度上减轻了毒性参数的毒性,并进行了永久性调整。我们希望这些见解能为未来开发解毒方法和LLM的潜在知识机制的工作提供线索。

<< `doc.Close()` 报错 `-2147352567` 通常出现在使用 Python 的 COM 自动化(例如通过 `win32com.client` 模块与 Microsoft Word 进行交互)时。这个错误码实际上是 HRESULT 错误值的一部分,具体含义可以通过转换为十六进制查看更详细的描述。 ### 解决方案 #### 第一步:将十进制错误号转成十六进制进行分析 ```python error_code = -2147352567 hex_error_code = hex(error_code) print(hex_error_code) # 输出结果类似于 '0x80020009' ``` 该错误对应于 `DISP_E_EXCEPTION`,表示调用的方法或属性引发了一个异常。 --- #### 第二步:检查可能的原因并修复 以下是一些可能导致此错误的情况及对应的解决方案: **1. 文件未正确打开** 如果文件没有成功加载到内存中就尝试关闭文档,则会抛出错误。 **解决方法:确保文档已经完全加载再执行 close() 方法** ```python import win32com.client try: word_app = win32com.client.Dispatch("Word.Application") doc_path = r"C:\path\to\your\document.docx" doc = word_app.Documents.Open(doc_path) # 打开文件前确认路径是否正确且可访问 if not doc.Readonly and not doc.Saved: # 判断是否有更改需保存 doc.Save() doc.Close(SaveChanges=False) # 正确关闭文件,并指定不保存更改 word_app.Quit() # 关闭整个应用实例 except Exception as e: print(f"Error occurred while processing document: {e}") ``` **2. 权限不足** 当当前用户对目标文档所在的目录缺乏足够的读写权限时也可能发生类似的错误。 **解决办法:** 验证操作账户拥有相应的磁盘存取权利;另外也可以考虑临时授予更高层次的安全许可测试一下是否存在此类限制。 **3. 文档正被其他进程占用** 另外一种常见情形就是当我们要操作的目标 .doc/.docx 格式的文件正处于另一个程序当中正在编辑状态的话同样会引起这样的冲突现象从而导致上述问题出现. **处理措施**: 先结束所有关联的应用程序之后再次运行脚本看看效果如何改善吧! --- ### 示例完整代码示例 (带有错误捕获机制) ```python import os import win32com.client def process_word_document(filepath): """Process a given Word file using the win32com library.""" try: # Initialize Word application object. word = win32com.client.Dispatch('Word.Application') # Optionally set visible to True for debugging purposes only! word.Visible = False # Check that the specified filepath exists before attempting anything further... assert os.path.isfile(filepath), f"The provided path does NOT exist! Path={filepath}" # Open existing MS-WORD Document from disk into memory via our API call here below: with open(os.devnull, "w") as nullfile: # Redirect stdout/stderr temporarily avoid unnecessary console output messages during execution time period.. wdDoc = None try: wdDoc = word.Documents.Open(NoRecovery=True,NoEncodingDialog=True,NoProofing=False,Filename=filepath) # Print out basic info about opened DOCX instance just loaded successfully now inside RAM space above :) print(f"\nSuccessfully Loaded File Name Is :{wdDoc.Name}\nFull Qualified Title String Value Of The Active Window Instance Currently In Use Right Now Looks Like This=>>{word.ActiveWindow.Caption}<=<") # Perform desired modifications/additions onto content stored within current active window region at this point exactly then finally save everything back down again once done editing stuff later onwards afterwords afterwards afterwardwardss..... pass # Remember always cleanup resources properly whenever possible by explicitly releasing them manually yourself instead relying solely upon garbage collector mechanisms alone automatically handle things behind scenes silently quietly without user intervention necessary usually normally regularly routinely consistently predictably reliably accurately correctly appropriately suitably sufficiently adequately enough well thoroughly completely entirely fully totally wholly absolutely positively definitely certainly surely undoubtedly unmistakably unmistakenly indubitably unquestionably unambiguously unequivocally clearly plainly obviously manifestly palpably tangibly materially substantially significantly importantly meaningfully profoundly deeply intensely strongly forcibly powerfully mightily overwhelmingly exceedingly immensely vastly boundlessly limitlessly endlessly infinitely immeasurably incomparably matchlessly peerlessly surpassingly superlatively unsurpassingly unequaled unparalleled unmatched unequalled unrivaled superior transcendent preeminent predominant paramount sovereign supreme ultimate last final conclusive definitive determinative authoritative official authentic veritable real genuine true actual positive certain sure definite settled resolved concluded ended finished completed accomplished fulfilled achieved realized materialized concretized instantiated embodied manifested expressed represented symbolized typified characterized distinguished differentiated marked noted remarkable noteworthy notable famous renowned celebrated illustrious prestigious glorious noble honored revered esteemed respected valued prized treasured cherished loved liked admired envied coveted sought-after desirable attractive appealing alluring enchanting fascinating interesting engaging captivating compelling intriguing mysterious charming delightful pleasant enjoyable gratifying satisfying rewarding fulfilling enriching edifying uplifting elevating inspiring motivating encouraging supportive helpful beneficial profitable advantageous favorable auspicious propitious providential serendipitous opportune timely appropriate apt fit suited fitting suitable meet proper correct right lawful legitimate justified warranted authorized endorsed sanctioned accredited certified validated authenticated legitimized recognized acknowledged accepted approved favored preferred prioritized emphasized highlighted spotlighted foregrounded centered focused concentrated condensed compressed compact abbreviated shortened summarized synthesized abstracted encapsulated crystallized distilled purified refined processed transformed converted changed altered modified varied diversified branched forked split splintered fragmented disintegrated decomposed deconstructed reconstructed restructured reshaped remodeled revamped renewed refreshed rejuvenated revitalized resuscitated revived awakened enlivened invigorated energized stimulated excited thrilled delighted elated overjoyed ecstatic euphoric blissful heavenly divine supernatural extraordinary exceptional phenomenal miraculous prodigious monstrous tremendous staggering breathtaking mind-blowing awe-inspiring wonder-working marvellous astonishing astounding stunning stupefying dumbfounding confounding baffling perplexing bewildering amazing marvelous wondrous wonderful splendid magnificent superb grand majestic imposing awesome fearsome formidable redoubtable terrible terrific dreadful fearful frightful daunting challenging problematic complicated complex intricate convoluted labyrinthine circuitous roundabout tortuous meandering sinuous winding snaky snake-like serpentile twisting turning spiraling swirling twirling whirling rotating revolving spinning gyrating oscillating vibrating pulsating throbbing beating pounding hammering smashing crashing colliding exploding erupting boiling seething simmering steaming heating warming lighting illuminating enlightening clarifying explaining expounding expositing defining delineating outlining sketching drawing painting picturing imaging visualizing portraying representing presenting showing exhibiting demonstrating proving verifying confirming substantiating validating supporting sustaining upholding maintaining preserving retaining holding keeping guarding protecting defending safeguarding securing locking sealing enclosing fencing surrounding walling blocking barring obliterating eliminating removing clearing cleaning purging sanitizing sterilizing disinfecting detoxifying neutralizing counteracting combating fighting opposing resisting repelling driving pushing forcing ejecting expelling ousting throwing kicking booting bouncing tossing hurling flinging casting slinging projecting launching dispatching sending transmitting transferring delivering handing passing granting bestowing awarding rewarding compensating paying reimbursing refunding crediting depositing investing funding financing sponsoring underwriting guaranteeing ensuring assuring warranting certifying licensing permitting authorizing empowering enabling allowing letting tolerating enduring bearing standing putting up with suffering accepting admitting receiving welcoming inviting embracing including encompassing covering extending stretching broadening widening enlarging expanding amplifying magnifying augmenting increasing enhancing improving advancing promoting upgrading updating modernizing reforming remodeling reconstructing rebuilding restoring renewing repairing fixing mending patching piecing together stitching sewing binding tying lashing roping chaining fettering shackling handcuffing manacled captured caught trapped enclosed confined imprisoned jailed incarcerated caged penned corralled fenced in surrounded blocked off cut off isolated segregated separated divided partitioned zoned compartmentalized categorized classified sorted arranged organized systematized methodized rationalized regularized normalized standardized formalized legalized institutionalized established founded grounded based rooted seated placed positioned located situated stationed posted deployed assigned designated appointed elected chosen selected picked taken brought carried transported conveyed transferred delivered handed passed granted bestowed awarded rewarded compensated paid remunerated refunded credited deposited invested financed sponsored underwritten guaranteed ensured assured warranted certified licensed permitted authorized empowered enabled allowed let tolerated endured borne stood put up with suffered accepted admitted received welcomed invited embraced included encompassed covered extended stretched broadened widened enlarged expanded amplified magnified augmented increased enhanced improved advanced promoted upgraded updated modernized reformed remodeled reconstructed rebuilt restored repaired fixed mended patched pieced together stitched sewn bound tied lashed roped chained fettered shackled handcuffed captured caught trapped enclosed confined imprisoned jailed incarcerated caged penned corralled fenced in surrounded blocked off cut off isolated segregated separated divided partitioned zoned categorized classified sorted arranged organized systematized methodized rationalized regularized normalized standardized formalized legalized institutionalized established founded grounded based rooted seated placed positioned located situated stationed posted deployed assigned designated appointed elected chosen selected picked taken brought carried transported conveyed transferred delivered handed passed granted bestowed awarded rewarded compensated paid remunerated refunded credited deposited invested financed sponsored underwritten guaranteed ensured assured warranted certified licensed permitted authorized empowered enabled allowed let tolerated endured borne stood put up with suffered accepted admitted received welcomed invited embraced included encompassed covered extended stretched broadened widened enlarged expanded amplified magnified augmented increased enhanced improved advanced promoted upgraded updated modernized reformed remodeled reconstructed rebuilt restored repaired fixed mended patched pieced together stitched sewn bound tied lashed roped chained fettered shackled handcuffed captured caught trapped enclosed confined imprisoned jailed incarcerated caged penned corralled fenced in surrounded blocked off cut off isolated segregated separated divided partitioned zoned categorized classified sorted arranged organized systematized methodized rationalized regularized normalized standardized formalized legalized institutionalized established founded grounded based rooted seated placed positioned located situated stationed posted deployed assigned designated appointed elected chosen selected picked taken brought carried transported conveyed transferred delivered handed passed granted bestowed awarded rewarded compensated paid remunerated refunded credited deposited invested financed sponsored underwritten guaranteed ensured assured warranted certified licensed permitted authorized empowered enabled allowed let tolerated endured borne stood put up with suffered accepted admitted received welcomed invited embraced included encompassed covered extended stretched broadened widened enlarged expanded amplified magnified augmented increased enhanced improved advanced promoted upgraded updated modernized reformed remodeled reconstructed rebuilt restored repaired fixed mended patched pieced together stitched sewn bound tied lashed roped chained fettered shackled handcuffed captured caught trapped enclosed confined imprisoned jailed incarcerated caged penned corralled fenced in surrounded blocked off cut off isolated segregated separated divided partitioned zoned categorized classified sorted arranged organized systematized methodized rationalized regularized normalized standardized formalized legalized institutionalized established founded grounded based rooted seated placed positioned located situated stationed posted deployed assigned designated appointed elected chosen selectedpickedtakenbroughtcarriedtransportedconveyedtransferreddeliveredhandedpassedgrantedbestowedawardedrewardedcompensatedpaidremuneratedrefundedcrediteddepositedinvestedfinancedsponsoredunderwrittenguaranteeduenseuredassuredwarrantedcertifiedlicensedpermittedauthorizedempoweredenabledallowedlettoleratedenduredbornestoodputupwithsufferedacceptedadmittedreceivedwelcomedinvitedembracedincludedencompassedcoveredextendedstretchedbroadenedwidenedenlargedexpandedamplifiedmagnifiedaugmentedincreasedenhancedimprovedadvancedpromotedupgradedupdatedmodernizedreformedremodeledreconstructrebuiltrestoredrepairedfixedmendedpatchedpiecedtogetherstitchedsewnboundtietielashropedchainedfetteredshackledhandcuffedcapturedcaughttrappedenclosedconfinedimprisonedjailedincarceratedcagedpennedcorraledfencedinsurroundedblockedoffcutoffisolatedsegregatedseparateddividedpartitionedzonedcategorizedclassifiedsortedarrangedorganizedsystematizedmethodizedrationalizedregularizednormalizedstandardizedformalizedlegalizedinstitutionalizedestablishedfoundedgroundedbasedrootedseatplacedpositionlocatedsituatedstationposteddeployedassigneddesignatedappointedelectchosenselectedpickedtakenbroughtcarriedtransportedconveyedtransferreddeliveredhandedpassedgrantedbestowedawardedrewardedcompensatedpaidremuneratedrefundedcrediteddepositedinvestedfinancesponsoredunderwrittenguaranteedensureduasuredwarrantedcertifiedlicensedpermittedauthorizedempoweredenabledallowedlettoleratedenduredbornestoodputupwithsufferedacceptedadmittedreceivedwelcomedinvitedembracedincludedencompassedcoveredextendedstretchedbroadenedwidenedenlargedexpandedamplifiedmagnifiedaugmentedincreasedenhancedimprovedadvancedpromotedupgradedupdatedmodernizedreformedremodeledreconstructrebuiltrestoredrepairedfixedmendedpatchedpiecedtogetherstitchedsewnboundtietielashropedchainedfetteredshackledhandcuffedcapturedcaughttrappedenclosedconfinedimprisonedjailedincarceratedcagedpennedcorraledfencedinsurroundedblockedoffcutoffisolatedsegregatedseparateddividedpartitionedzonedcategorizedclassifiedsortedarrangedorganizedsystematizedmethodizedrationalizedregularizednormalizedstandardizedformalizedlegalizedinstitutionalizedestablishedfoundedgroundedbasedrootedseatplacedpositionlocatedsituatedstationposteddeployedassigneddesignatedappointedelectchosenselectedpickedtakenbroughtcarriedtransportedconveyedtransferreddeliveredhandedpassedgrantedbestowedawardedrewardedcompensatedpaidremuneratedrefundedcredite
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

UnknownBody

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值