Sequence Models (一)

本文介绍了一种使用循环神经网络(RNN)进行人名识别的方法。通过将输入序列映射到输出序列来判断每个词是否为人名的一部分。文章讨论了传统神经网络在处理不同长度序列时的问题,并详细解释了RNN如何克服这些限制。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

这里写图片描述
这里写图片描述
这里写图片描述
任务:识别人名
给定xx,输出y标识每个词是否是人名的一部分。

现在,输入有9个词,因此我们最后的输出长度也为9,分别表示这9个词是否为人名的一部分。

标识:
x:x<1>,x<2>,,x<t>,,x<Tx>x:x<1>,x<2>,⋯,x<t>,⋯,x<Tx>
x<t>x<t>: ttht−th position in the sequence
x(i)x(i): ithi−th input sequence
TxTx: the length of input sequence
T(i)xTx(i): the input sequence length for training example ii

y:y<1>,y<2>,,y<t>,,y<Ty>
y<t>y<t>: ttht−th position in the output sequence
y(i)y(i): ithi−th output
TyTy: the length of output sequence
T(i)yTy(i): the output sequence length for training example ii

那么句子中的每个词又如何表示呢?
So, to represent a word in the sentence the first thing you do is come up with a Vocabulary. Sometimes also called a Dictionary and that means making a list of the words that you will use in your representations.

这里写图片描述

So the first word in the vocabulary is a, that will be the first word in the dictionary. The second word is Aaron and then a little bit further down is the word and, and then eventually you get to the words Harry then eventually the word Potter, and then all the way down to maybe the last word in dictionary is Zulu. And so, a will be word one, Aaron is word two, and in my dictionary the word and appears in positional index 367. Harry appears in position 4075, Potter in position 6830, and Zulu is the last word to the dictionary is maybe word 10,000.

So in this example, I’m going to use a dictionary with size 10,000 words.
If you have chosen a dictionary of 10,000 words and one way to build this dictionary will be be to look through your training sets and find the top 10,000 occurring words, also look through some of the online dictionaries that tells you what are the most common 10,000 words in the English Language saved. What you can do is then use one hot representations to represent each of these words. For example, x<1> which represents the word Harry would be a vector with all zeros except for a 1 in position 4075 because that was the position of Harry in the dictionary. And then x<2>x<2> will be again similarly a vector of all zeros except for a 1 in position 6830 and then zeros everywhere else. And each of these would be a 10,000 dimensional vector if your vocabulary has 10,000 words.

So in this representation, x<t>x<t> for each of the values of tt in a sentence will be a one-hot vector, one-hot because there's exactly one one is on and zero everywhere else and you will have nine of them to represent the nine words in this sentence. And the goal is given this representation for X to learn a mapping using a sequence model to then target output y, I will do this as a supervised learning problem.

对于在词汇表中不包括的词,可以单设一个‘UNK’。

这里写图片描述

对于之前所述的任务,我们可以首先尝试使用一个传统的神经网络。
这里写图片描述
Now, one thing you could do is try to use a standard neural network for this task. So, in our previous example, we had nine input words. So, you could imagine trying to take these nine input words, maybe the nine one-hot vectors and feeding them into a standard neural network, maybe a few hidden layers, and then eventually had this output the nine values zero or one that tell you whether each word is part of a person’s name.

But this turns out not to work well and there are really two main problems of this. The first is that the inputs and outputs can be different lengths and different examples. 每个样例可能有不同的Tx(i)T(x)yTy(x)。或许可以通过zero-padding,但仍然不是一个好的表达方式。

And then a second and maybe more serious problem is that a naive neural network architecture like this, it doesn't share features learned across different positions of texts. 序列中的每个位置独立地输入,但是对于序列型数据,前一个位置x<t1>x<t−1>会极大影响后一个位置x<t>x<t>

So, what is a recurrent neural network?
这里写图片描述

So if you are reading the sentence from left to right, the first word you will read is the some first words say x<1>x<1>, and what we’re going to do is take the first word and feed it into a neural network layer. I’m going to draw it like this. So there’s a hidden layer of the first neural network and we can have the neural network maybe try to predict the output. So is this part of the person’s name or not.

And what a recurrent neural network does is, when it then goes on to read the second word in the sentence, say x<2>x<2>, instead of just predicting y<2>y<2> using only x<2>x<2>, it also gets to input some information from whether the computer that time step one . So in particular, deactivation value from time step one is passed on to time step two. Then at the next time step, recurrent neural network inputs the third word x<3>x<3> and it tries to output some prediction, y^<3>y^<3> and so on up until the last time step where it inputs x<Tx>x<Tx> and then it outputs y^Tyy^Ty.

在本例中Tx=TyTx=Ty。如果不等的化,网络结构需要做出一些调整。
So at each time step, the recurrent neural network that passes on as activation to the next time step for it to use. And to kick off the whole thing, we'll also have some either made-up activation at time zero, this is usually the vector of zeros. Some researchers will initialized a0a0 randomly. You have other ways to initialize a0a0 but really having a vector of zeros as the fake times zero activation is the most common choice. So that gets input to the neural network.

I’ll tend to draw the unrolled diagram like the one you have on the left, but if you see something like the diagram on the right in a textbook or in a research paper, what it really means or the way I tend to think about it is to mentally unrow it into the diagram you have on the left instead. The recurrent neural network scans through the data from left to right. The parameters it uses for each time step are shared. The parameters governing the connection from x<1>x<1> to the hidden layer, will be some set of parameters we’re going to write as WaxWax and is the same parameters WaxWax that it uses for every time step.

这里写图片描述
每个time step的参数是共享的。
Deactivations, the horizontal connections will be governed by some set of parameters WaaWaa and the same parameters WaaWaa use on every timestep and similarly WyaWya that governs the output predictions. I’ll describe on the next line exactly how these parameters work.

这里写图片描述

RNN所做的是:在预测y<3>y<3>时,不仅使用当前输入x<3>x<3>的信息,还会使用之前的输入x<1>,x<2>x<1>,x<2>的信息。当然,目前我们的RNN结构还有一个缺陷,因为它没有利用之后位置的输入信息,在之后的双向RNN中我们会解决这个问题。
So one limitation of this particular neural network structure is that the prediction at a certain time uses inputs or uses information from the inputs earlier in the sequence but not information later in the sequence. We will address this in a later video where we talk about bi-directional recurrent neural networks or BRNNs.
这里写图片描述

这里写图片描述

a<0>=0⃗ a<t>=g1(Waaa<t1>+Waxx<t>+ba)y^<t>=g2(Wyaa<t>+by)a<0>=0→a<t>=g1(Waaa<t−1>+Waxx<t>+ba)y^<t>=g2(Wyaa<t>+by)

这里写图片描述

这里写图片描述

这里写图片描述
For forward prop, you would computes these activations from left to right as follows in the neural network, and so you’ve outputs all of the predictions. In backprop, as you might already have guessed, you end up carrying backpropagation calculations in basically the opposite direction of the forward prop arrows.

这里写图片描述
In this back propagation procedure, the most significant message or the most significant recursive calculation is this one, which goes from right to left, and that’s why it gives this algorithm as well, a pretty fast full name called backpropagation through time. And the motivation for this name is that for forward prop, you are scanning from left to right, increasing indices of the time, t, whereas, the backpropagation, you’re going from right to left, you’re kind of going backwards in time.

这里写图片描述
这里写图片描述
这里写图片描述
这里写图片描述
这里写图片描述

### Sequence Classification in Computer Science In computer science, sequence classification refers to a type of problem where models are trained to assign labels or categories to sequences of data points. This task is crucial across various domains including natural language processing (NLP), bioinformatics, speech recognition, and more. #### Definition and Importance Sequence classification involves predicting discrete class labels for entire sequences rather than individual elements within them. For instance, identifying whether an email message should be classified as spam based on its content constitutes one application area[^2]. The ability to automate such classifications significantly impacts how efficiently systems handle large volumes of sequential data. #### Techniques Used Several techniques have been developed specifically targeting this challenge: - **Recurrent Neural Networks (RNNs)**: These networks maintain internal states that allow information from previous steps to influence future predictions, making RNN suitable candidates when dealing with temporal dependencies present in many types of sequences. - **Convolutional Neural Networks (CNNs)** applied over time-series-like structures also prove effective due to their capacity to capture local patterns while being computationally efficient compared to traditional methods like Hidden Markov Models used earlier in fields such as computational biology[^3]. - **Transformers**: Introduced initially for NLP tasks but now widely adopted beyond text-based applications because they excel at handling long-range dependencies without suffering vanishing gradient problems associated with deep recurrent architectures. ```python import torch.nn as nn class SimpleLSTM(nn.Module): def __init__(self, input_dim, hidden_dim, output_dim): super(SimpleLSTM, self).__init__() self.lstm = nn.LSTM(input_dim, hidden_dim, batch_first=True) self.fc = nn.Linear(hidden_dim, output_dim) def forward(self, x): lstm_out, _ = self.lstm(x) out = self.fc(lstm_out[:, -1, :]) return out ``` This code snippet demonstrates a simple LSTM model which could serve as
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值