Revisiting Zero-Shot Abstractive Summarization in the Era of Large Language Models

828 篇文章

已下架不支持订阅

本文探讨了大型语言模型(LLM)在零样本抽象概括中的位置偏差问题,揭示了模型不公平地优先处理输入文本部分的倾向。通过对GPT 3.5-Turbo、Llama-2等模型的实验,研究显示位置偏差影响了摘要生成的性能,为理解和改进LLM在摘要任务中的应用提供了新视角。

本文是LLM系列文章,针对《Revisiting Zero-Shot Abstractive Summarization in the Era of Large
Language Models from the Perspective of Position Bias》的翻译。

从位置偏差看大型语言模型时代零样本抽象概括

摘要

我们通过测量位置偏差来表征和研究大型语言模型(LLM)中的零样本抽象概括,我们提出这是文献中先前研究的更具限制性的引导偏差现象的一般公式。位置偏差反映了模型不公平地将输入文本的某些部分的信息优先于其他部分的倾向,从而导致不期望的行为。通过在四个不同的真实世界数据集上进行的大量实验,我们研究了多个LLM模型(如GPT 3.5-Turbo、Llama-2和Dolly-v2)以及最先进的预训练编码器-解码器抽象摘要模型(如Pegasus和BART)中的位置偏误。我们的发现为零样本摘要任务模型的性能和位置偏误带来了新的见解和讨论。

1 引言

2 相关工作

3 提出的方法

4 结果

5 讨论

6 结论

我们通过一个新的位置偏差公式来分析LLM的零样本抽象摘要。位置偏差衡量的是模型生成摘要的趋势,这些摘要公开和不公平地使用了输入文本的某些部分,而不是其他部分。通过对CNN/DM、XSum、Reddit、News数据集以及各种模型(GPT 3.5-T、Lla

已下架不支持订阅

### Skeleton-Based Action Recognition Research and Techniques In the field of skeleton-based action recognition, researchers have developed various methods to interpret human actions from skeletal data. These approaches leverage deep learning models that can effectively capture spatial-temporal features inherent in sequences of joint positions over time. One prominent technique involves utilizing recurrent neural networks (RNNs), particularly long short-term memory (LSTM) units or gated recurrent units (GRUs). Such architectures are adept at handling sequential information due to their ability to maintain a form of memory across timesteps[^1]. This characteristic makes them suitable for modeling temporal dependencies present within motion capture datasets. Convolutional Neural Networks (CNNs) also play an essential role when applied on graphs representing skeletons as nodes connected by edges denoting limb segments between joints. Graph Convolutional Networks (GCNs) extend traditional CNN operations onto non-Euclidean domains like point clouds or meshes formed around articulated bodies during movement execution phases[^2]. Furthermore, some studies integrate both RNN variants with GCN layers into hybrid frameworks designed specifically for this task domain; these combined structures aim to simultaneously exploit local appearance cues alongside global structural patterns exhibited throughout entire pose configurations captured frame-by-frame via sensors such as Microsoft Kinect devices or other depth cameras capable of tracking multiple individuals performing diverse activities indoors under varying lighting conditions without requiring any wearable markers attached directly onto participants' limbs/skin surfaces. ```python import torch.nn.functional as F from torch_geometric.nn import GCNConv class ST_GCN(torch.nn.Module): def __init__(self, num_features, hidden_channels, class_num): super(ST_GCN, self).__init__() self.conv1 = GCNConv(num_features, hidden_channels) self.fc1 = Linear(hidden_channels, class_num) def forward(self, x, edge_index): h = self.conv1(x, edge_index) h = F.relu(h) h = F.dropout(h, training=self.training) z = self.fc1(h) return F.log_softmax(z, dim=1) ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

UnknownBody

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值