
Off-Policy Sequence Masking和Unbiased KL Estimate

Off-Policy Sequence Masking



Unbiased KL Estimate

Keep Routing和Keep Sampling Mask

转载自
- https://www.xiaohongshu.com/explore/692e7787000000001f0067fd?app_platform=android&ignoreEngage=true&app_version=9.11.0&share_from_user_hidden=true&xsec_source=app_share&type=normal&xsec_token=CBYLSwOH9z48mqc3nfSDJupG99xpbPRs7Qu5ePUWI2mjM=&author_share=1&xhsshare=WeixinSession&shareRedId=Nz03QkZKRUA9TUg3S0A1SUg4QElGR0g9&apptime=1764825018&share_id=a4f657722e5a4df98d01cfe0c94199ac&share_channel=wechat&wechatWid=4b25c7eb82ac25153b779361d18a9366&wechatOrigin=menu
- https://arxiv.org/pdf/2512.02556
193

被折叠的 条评论
为什么被折叠?



