Positional Encoding 位置编码
flyfish
Transformer模型没有使用循环神经网络,无法从序列中学习到位置信息,并且它是并行结构,不是按位置来处理序列的,所以为输入序列加入了位置编码,将每个词的位置加入到了词向量中。
如果采用自然数列作为位置编码,编码就是线性的,相邻位置之间的差异就在整个序列中保持恒定。如果采用正弦余弦函数生成的位置嵌入变量具有周期性和正交性,就可以产生各个尺度上具有区分性的位置嵌入,这样在捕捉长距离依赖关系时会表现的更好一点。
PE ( p o s , 2 i ) = sin ( p o s / 1000 0 2 i / d model ) PE ( p o s , 2 i + 1 ) = cos ( p o s / 1000 0 2 i / d model ) {\Large \begin{aligned} \text{PE}(pos, 2i) = \sin(pos/10000^{2i/d_\text{model}}) \\ \text{PE}(pos, 2i+1) = \cos(pos/10000^{2i/d_\text{model}}) \\ \end{aligned} } PE(pos,2i)=sin(pos/100002i/dmodel)PE(pos,2i+1)=cos(pos/100002i/dmodel)
from collections import Counter
import torch
import torch.nn as nn
import numpy as np# 生成正弦位置编码表的函数,用于在 Transformer 中引入位置信息
def get_sin_enc_table(n_position, embedding_dim):#------------------------- 维度信息 --------------------------------# n_position: 输入序列的最大长度# embedding_dim: 词嵌入向量的维度#----------------------------------------------------------------- # 根据位置和维度信息,初始化正弦位置编码表sinusoid_table = np.zeros((n_position, embedding_dim)) # 遍历所有位置和维度,计算角度值for pos_i in range(n_position):for hid_j in range(embedding_dim):angle = pos_i / np.power(10000, 2 * (hid_j // 2) / embedding_dim)sinusoid_table[pos_i, hid_j] = angle # 计算正弦和余弦值sinusoid_table[:, 0::2] = np.sin(sinusoid_table[:, 0::2]) # dim 2i 偶数维sinusoid_table[:, 1::2] = np.cos(sinusoid_table[:, 1::2]) # dim 2i+1 奇数维 #------------------------- 维度信息 --------------------------------# sinusoid_table 的维度是 [n_position, embedding_dim]#---------------------------------------------------------------- return torch.FloatTensor(sinusoid_table) # 返回正弦位置编码表
sentences = [['like tree like fruit','羊毛 出在 羊身上'],['East west home is best', '金窝 银窝 不如 自己的 草窝'],] for sentence in sentences:r=sentence[0].split()print(r)# 计算源语言的最大句子长度,并加 1 以容纳填充符<pad>
src_len = max(len(sentence[0].split()) for sentence in sentences) + 1
print(src_len)
d_embedding = 3 # Embedding 的维度
r=get_sin_enc_table(src_len+1, d_embedding)
print(r)
结果
['like', 'tree', 'like', 'fruit']
['East', 'west', 'home', 'is', 'best']
6
tensor([[ 0.0000, 1.0000, 0.0000],[ 0.8415, 0.5403, 0.0022],[ 0.9093, -0.4161, 0.0043],[ 0.1411, -0.9900, 0.0065],[-0.7568, -0.6536, 0.0086],[-0.9589, 0.2837, 0.0108],[-0.2794, 0.9602, 0.0129]])
j假如 Embedding 的维度d_embedding = 512,这样就有了256对正弦值和余弦值
PE ( pos , 0 ) = sin ( pos 1000 0 0 512 ) PE ( pos , 1 ) = cos ( pos 1000 0 0 512 ) PE ( pos , 2 ) = sin ( pos 1000 0 2 512 ) PE ( pos , 3 ) = cos ( pos 1000 0 2 512 ) PE ( pos , 4 ) = sin ( pos 1000 0 4 512 ) PE ( pos , 5 ) = cos ( pos 1000 0 4 512 ) ⋮ PE ( pos , 510 ) = sin ( pos 1000 0 510 512 ) PE ( pos , 511 ) = cos ( pos 1000 0 510 512 ) \begin{aligned} \text{PE}(\text{pos}, 0) &= \sin\left( \dfrac{\text{pos}}{10000^{\frac{0}{512}}} \right) \\ \text{PE}(\text{pos}, 1) &= \cos\left( \dfrac{\text{pos}}{10000^{\frac{0}{512}}} \right) \\ \text{PE}(\text{pos}, 2) &= \sin\left( \dfrac{\text{pos}}{10000^{\frac{2}{512}}} \right) \\ \text{PE}(\text{pos}, 3) &= \cos\left( \dfrac{\text{pos}}{10000^{\frac{2}{512}}} \right) \\ \text{PE}(\text{pos}, 4) &= \sin\left( \dfrac{\text{pos}}{10000^{\frac{4}{512}}} \right) \\ \text{PE}(\text{pos}, 5) &= \cos\left( \dfrac{\text{pos}}{10000^{\frac{4}{512}}} \right) \\ \vdots \\ \text{PE}(\text{pos}, 510) &= \sin\left( \dfrac{\text{pos}}{10000^{\frac{510}{512}}} \right) \\ \text{PE}(\text{pos}, 511) &= \cos\left( \dfrac{\text{pos}}{10000^{\frac{510}{512}}} \right) \\ \end{aligned} PE(pos,0)PE(pos,1)PE(pos,2)PE(pos,3)PE(pos,4)PE(pos,5)⋮PE(pos,510)PE(pos,511)=sin(100005120pos)=cos(100005120pos)=sin(100005122pos)=cos(100005122pos)=sin(100005124pos)=cos(100005124pos)=sin(10000512510pos)=cos(10000512510pos)