Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation

828 篇文章

已下架不支持订阅

本文提出了一种名为MATRIX的社会场景模拟器,用于调整大型语言模型(LLM),使其在响应前考虑社会后果。通过MATRIX进行微调,LLM能更好地遵循人类价值观,实验表明这种方法在多个基准测试中超越了GPT-4。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

本文是LLM系列文章,针对《Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation》的翻译。

摘要

将大型语言模型(LLM)与人类价值观相结合,对于减轻其滥用带来的潜在不利影响至关重要。从社会学的视角来看,承认各方的关切是塑造人类价值观的关键因素,本文提出了一个新的方向来调整LLM:社会场景模拟。为了实现这一点,我们提出了MATRIX,这是一种新颖的社交场景模拟器,它模拟用户输入查询周围的真实场景,使LLM能够在响应之前考虑社会后果。MATRIX是一个虚拟排练空间,类似于独白,LLM在这里独自扮演与查询和练习相关的各种角色。为了注入这种对齐,我们使用MATRIX模拟数据对LLM进行微调,确保在不影响推理速度的情况下遵守人类价值观。我们从理论上证明,在温和的假设下,带矩阵的LLM优于宪法人工智能。最后,大量实验验证了我们的方法在4个基准测试中优于10个基线。875个用户评分证明,我们微调的13B大小LLM在与人类价值观一致方面超过了GPT-4。我们的项目页面位于https://shuotang123.github.io/MATRIX.

已下架不支持订阅

### Soft-Alignment in Natural Language Processing Soft-alignment refers to a technique used primarily within the domains of natural language processing (NLP) and machine learning, where alignments between sequences are not strictly one-to-one but rather probabilistic or weighted across multiple elements[^2]. This approach allows models to capture more nuanced relationships between tokens from different sequences without enforcing rigid mapping constraints. In practice, soft-alignments can be implemented using attention mechanisms. Attention enables each position in an output sequence to attend over all positions in another input sequence with varying degrees of focus. The alignment scores determine how much weight should be given to corresponding parts when computing representations for target words during tasks like neural machine translation. #### Example Implementation Using Transformer Model's Self-Attention Mechanism Below is a simplified version demonstrating how self-attention could achieve soft-alignment: ```python import torch import torch.nn as nn class MultiHeadedAttention(nn.Module): def __init__(self, d_model, num_heads): super().__init__() assert d_model % num_heads == 0 self.d_k = d_model // num_heads self.num_heads = num_heads self.linears = clones(nn.Linear(d_model, d_model), 4) def forward(self, query, key, value, mask=None): batch_size = query.size(0) # Linear projections in batch from [batch_size, seq_len, d_model] -> [batch_size, h, seq_len, d_k] query, key, value = [ lin(x).view(batch_size, -1, self.num_heads, self.d_k).transpose(1, 2) for lin, x in zip(self.linears, (query, key, value)) ] # Apply attention on all the projected vectors in batch. x, _ = attention(query, key, value, mask=mask, dropout=self.dropout) # Concatenate heads and apply final linear transformation. x = ( x.transpose(1, 2) .contiguous() .view(batch_size, -1, self.h * self.d_k) ) del query, key, value return self.linears[-1](x) def attention(query, key, value, mask=None, dropout=None): "Compute 'Scaled Dot Product Attention'" d_k = query.size(-1) scores = torch.matmul(query, key.transpose(-2, -1)) / math.sqrt(d_k) if mask is not None: scores = scores.masked_fill(mask == 0, -1e9) p_attn = F.softmax(scores, dim=-1) if dropout is not None: p_attn = dropout(p_attn) return torch.matmul(p_attn, value), p_attn ``` This code snippet illustrates part of what constitutes multi-headed attention—a core component facilitating soft-alignment by allowing queries to associate themselves softly with keys through scaled dot-product operations followed by softmax normalization. --related questions-- 1. How does hard-alignment differ fundamentally from soft-alignment? 2. Can you provide examples beyond NLP where soft-alignment proves beneficial? 3. What challenges arise specifically due to implementing soft-alignment techniques? 4. In which scenarios might traditional hard-alignment outperform its softer counterpart?
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

UnknownBody

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值