目录
- Problem of N-gram Language Model N-gram 语言模型的问题
- Recurrent Neural Network(RNN) 循环神经网络
- RNN Language Model: RNN 语言模型
- Long Short-Term Memory Model (LSTM) 长短期记忆模型(LSTM)
- Gating Vector 门向量
- Forget Gate 忘记门
- Input Gate 输入门
- Update Memory Cell 更新记忆单元
- Output Gate 输出门
- Disadvantages of LSTM LSTM 的缺点
- Example Applications 示例应用
- Variants of LSTM LSTM的变种
Recurrent Networks 循环神经网络
Problem of N-gram Language Model N-gram 语言模型的问题
-
Cen be implemented using counts with smoothing 可以用平滑计数实现
-
Can be implemented using feed-forward neural networks 可以用前馈神经网络实现
-
Problem: limited context 问题:上下文限制
-
E.g. Generate sentences using trigram model: 例如:使用 trigram 模型生成句子:
Recurrent Neural Network(RNN) 循环神经网络
-
Allow representation of arbitrarily sized inputs 允许表示任意大小的输入
-
Core idea: processes the input sequence one at a time, by applying a recurrence formula 核心思想:一次处理一个输入序列,通过应用递归公式
-
Uses a state vector to represent contexts that have been previously processed 使用状态向量表示之前处理过的上下文
-
RNN Neuron: RNN 神经元
-
RNN States: RNN 状态
Activation 激活函数:
-
RNN Unrolled: 展开的 RNN
- Same parameters are used across all time steps 同一参数 在所有时间步长中都被使用
-
Training RNN: 训练 RNN
- An unrolled RNN is a very deep neural network. But parameters are shared across all time steps 展开的 RNN 是一个非常深的神经网络。但是参数在所有时间步中都是共享的
- To train RNN, just need to create the unrolled computation graph given an input sequence and use backpropagation algorithm to compute gradients as usual. 要训练 RNN,只需根据输入序列创建展开的计算图,并使用反向传播算法计算梯度
- This procedure is called backpropagation through time. 这个过程叫做时间反向传播
E.g of unrolled equation: 展开方程的例子
RNN Language Model: RNN 语言模型
-
is current word (e.g.
eats
) mapped to an embedding 是当前词(例如 eats)映射到一个嵌入 -
contains information of the previous words (e.g.
a
andcow
) 包含前面词的信息(例如 a 和 cow) -
is the next word (e.g.
grass
) 是下一个词(例如 grass) -
Training:
-
Vocabulary 词汇:
[a, cow, eats, grass]
-
Training example 训练样本:
a cow eats grass
-
Training process 训练过程:
-
Losses:
-
Total loss:
-
-
Generation:
-
Problems of RNN: RNN 的问题
- Error Propagation: Unable to recover from errors in intermediate steps 错误传播:无法从中间步骤的错误中恢复
- Low diversity in generated language 生成的语言多样性低
- Tend to generate bland or generic language 倾向于生成乏味或通用的语言
Long Short-Term Memory Networks
Long Short-Term Memory Model (LSTM) 长短期记忆模型(LSTM)
-
RNN has the capability to model infinite context. But it cannot capture long-range dependencies in practice due to the vanishing gradients RNN 具有建模无限上下文的能力。但由于梯度消失,实际上无法捕捉长距离依赖性
-
Vanishing Gradient: Gradients in later steps diminish quickly during backpropagation. Earlier inputs do not get much update. 梯度消失:在反向传播过程中,后续步骤的梯度快速减小。较早的输入没有得到太多更新。
-
LSTM is introduced to solve vanishing gradients LSTM 用来解决梯度消失问题
-
Core idea: have memory cells that preserve gradients across time. Access to the memory cells is controlled by gates. 核心思想:拥有跨时间保存梯度的记忆单元。通过门控制对记忆单元的访问。
-
Gates: For each input, a gate decides: 门:对于每个输入,门决定
- How much the new input should be written to the memory cell 应该将多少新输入写入记忆单元
- How much content of the current memory cell should be forgotten 应该忘记当前记忆单元的多少内容
-
Comparison between simple RNN and LSTM: 简单 RNN 和 LSTM 的比较
Gating Vector 门向量
-
A gate is a vector. Each element of the gate has values between 0 and 1. Use sigmoid function to produce . 门 是一个向量。门的每个元素的值在 0 到 1 之间。使用 sigmoid 函数来产生 。
-
is multiplied component-wise with vector to determine how much information to keep for 和向量 乘以 component-wise 来确定对 保留多少信息
Forget Gate 忘记门
-
Controls how much information to forget in the memory cell 控制在记忆单元 中忘记多少信息
-
E.g. Given
Tha cas that the boy
predict the next wordlikes
例如,给定Tha cas that the boy
预测下一个词likes
- Memory cell was storing noun information
cats
记忆单元正在存储名词信息cats
- The cell should now forget
cats
and storeboy
to correctly predict the singular verblikes
该单元现在应该忘记cats
并存储boy
以正确预测单数动词likes
- Memory cell was storing noun information
Input Gate 输入门
-
Input gate controls how much new information to put to memory cell 输入门控制将多少新信息放入记忆单元
-
is new distilled information to be added 是要添加的新提炼信息
Update Memory Cell 更新记忆单元
- Use the forget and input gates to update memory cell 使用忘记门和输入门来更新记忆单元
Output Gate 输出门
- Output gate controls how much to distill the content of the memory cell to create the next state 输出门控制如何提炼记忆单元的内容以创建下一个状态
Disadvantages of LSTM LSTM 的缺点
- Introduces some but not many parameters 引入了一些但并不多的参数
- Still unable to capture very long range dependencies 仍无法捕获非常长的依赖性
- Slower but not much slower than simple RNN 比简单的 RNN 慢,但并不比 RNN 慢太多
Applications of RNN RNN 的应用
Example Applications 示例应用
-
Shakespeare Generator 莎士比亚生成器:
- Training data: all works fo Shakespeare 训练数据:莎士比亚的所有作品
- Model: Character RNN, hidden dimension = 512 模型:Character RNN,隐藏维度 = 512
-
Wikipedia Generator: 维基百科生成器
- Training data: 100MB of Wikipedia raw data 训练数据:100MB的维基百科原始数据
-
Code Generator 代码生成器
-
Text Classification 文本分类
- RNNs can be used in variety NLP tasks. Particularly suited for tasks where order of words matter. E.g. sentiment analysis RNNs可以用于各种NLP任务。特别适合于单词顺序很重要的任务。例如,情感分析
-
Sequence Labeling: E.g. POS tagging 序列标记:例如,词性标注
Variants of LSTM LSTM的变种
-
Peephole connections: allow gates to look at cell state 窥视孔连接:允许门看到单元状态
-
Gated recurrent unit (GRU): Simplified variant with only 2 gates and no memory cell 门控循环单元(GRU):简化的变体,只有2个门,没有记忆单元
-
Multi-layer LSTM 多层LSTM
-
Bidirectional LSTM 双向LSTM