1. 多头注意力(Multi‑Head Attention)原理
设输入序列表示为矩阵 X ∈ R B L d model X\in\mathbb{R}^{B\times L\times d_{\text{model}}} X∈RBLdmodel,其中 B B B:批大小(batch size),…
time limit per test
1 second
memory limit per test
256 megabytes
We call an array aa, consisting of kk positive integers, palindromic if [a1,a2,…,ak][ak,ak−1,…,a1][a1,a2,…,ak][ak,ak−1,…,a1]. For example, the arrays [1,2,1][1,2,1] and [5,1,1,5][5,…