三十五周学习周报

摘要

在本周阅读的文献中，作者提出了一种创新的水文时间序列预测模型，其通过将粒子群优化（PSO）与Bi-LSTM和Bi-GRU相结合，并融入特征融合和自注意力层，显著提升了预测能力。该模型利用 PSO 优化超参数（如隐藏单元数），高效探索搜索空间以适应多样化的时序特性。Bi-LSTM 和 Bi-GRU 的双向结构能够同时捕获过去和未来的依赖关系，增强了对复杂时间模式的建模能力，而自注意力层通过动态加权机制突出关键时间步的特征，进一步提升了特征表达的精准性。多模态融合策略整合了 Bi-LSTM 和 Bi-GRU 的互补优势，结合注意力机制实现了加权特征的高效提取，从而在预测精度和鲁棒性上超越传统单向模型。这一架构设计不仅计算效率高，且适应性强，为水文预测任务提供了一种灵活而强大的解决方案。

abstract

In the paper read this week, the author proposed an innovative hydrological time series prediction model that significantly improves prediction ability by combining particle swarm optimization (PSO) with Bi-LSTM and Bi-GRU, and incorporating feature fusion and self attention layers. This model utilizes PSO to optimize hyperparameters (such as the number of hidden units) and efficiently explore the search space to adapt to diverse temporal characteristics. The bidirectional structure of Bi-LSTM and Bi-GRU can simultaneously capture past and future dependencies, enhancing the modeling ability of complex time patterns. The self attention layer highlights the features of key time steps through a dynamic weighting mechanism, further improving the accuracy of feature expression. The multimodal fusion strategy integrates the complementary advantages of Bi-LSTM and Bi-GRU, and combines attention mechanism to achieve efficient extraction of weighted features, surpassing traditional unidirectional models in prediction accuracy and robustness. This architecture design not only has high computational efficiency but also strong adaptability, providing a flexible and powerful solution for hydrological prediction tasks.

文献阅读

本周阅读了一篇名为Multimodal Fusion of Optimized GRU–LSTM with Self‑Attention Layer for Hydrological Time Series Forecasting的论文。
论文地址：Multimodal Fusion of Optimized GRU–LSTM with Self‑Attention Layer for Hydrological Time Series Forecasting
在这里插入图片描述
在论文中，作者提出了一种新型的融合模型。其结合粒子群优化（PSO）与Bi-LSTM和Bi-GRU创新方法，通过特征融合和注意力层增强了水文时间序列预测的准确性，并在多个数据集上取得了优于传统方法的表现。

1.1相关知识

1.1.1 PSO

PSO算法最初是受到飞鸟集群活动的规律性启发，进而利用群体智能建立的一个简化模型。其核心思想是通过一群“粒子”（particles）的协同运动来搜索解空间。每个粒子代表问题的一个潜在解，具有位置和速度两个属性。粒子在搜索过程中根据自身的历史最佳位置（个体经验）和整个群体的最佳位置（群体经验）调整自己的飞行方向和速度，最终收敛到全局最优解。
假设在一个 D 维搜索空间中，有 N 个粒子，每个粒子的状态由位置向量x_i和v_i表示。在此需要引入两个概念：个体最佳位置和全局最佳位置。
个体最佳位置（pbest）： 每个粒子记录自身搜索过程中发现的最佳位置，即为pest_i，pest_i是根据目标函数（如适应度函数）计算得出的个体最优解。
全局最佳位置（gbest）： 整个粒子群记录群体中所有粒子发现的最佳位置，记为gbest_i，gbest_i是群体中最优的解。
在这里插入图片描述

如下图所示，假设某粒子当前位置C，个体极值位置B，全局最优位置A，黄色向量为当前速度方向，绿色向量为向个体极值飞行步长，红色为向全局最值飞行步长。那么该粒子下一步的运动状态即为三者加权所得。
在这里插入图片描述
后通过不断迭代直至达到最大迭代次数或者全局最优位置不变时算法结束。

以下是一个用PSO优化目标函数f(x,y)=x²+y²的代码实现，其目标是找到使函数值最小的 (x,y)：

import numpy as np# 目标函数
def objective_function(x):return np.sum(x**2)# PSO 参数
n_particles = 2
dim = 2
w, c1, c2 = 0.7, 1.5, 1.5
bounds = [-5, 5]
max_iter = 2# 初始化
positions = np.array([[2.0, 1.0], [-1.0, 3.0]])
velocities = np.array([[0.5, 0.3], [-0.2, 0.1]])
pbest_positions = positions.copy()
pbest_scores = np.array([objective_function(p) for p in pbest_positions])
gbest_idx = np.argmin(pbest_scores)
gbest_position = pbest_positions[gbest_idx].copy()
gbest_score = pbest_scores[gbest_idx]# 手动指定随机数
rand_values = [[0.4, 0.6], [0.3, 0.7]]  # [r1, r2] for each iteration# PSO 迭代
for t in range(max_iter):print(f"\nIteration {t + 1}:")r1, r2 = rand_values[t]for i in range(n_particles):# 更新速度velocities[i] = (w * velocities[i] + c1 * r1 * (pbest_positions[i] - positions[i]) + c2 * r2 * (gbest_position - positions[i]))# 更新位置positions[i] += velocities[i]positions[i] = np.clip(positions[i], bounds[0], bounds[1])# 计算适应度score = objective_function(positions[i])# 更新个体最佳if score < pbest_scores[i]:pbest_scores[i] = scorepbest_positions[i] = positions[i].copy()# 更新全局最佳if score < gbest_score:gbest_score = scoregbest_position = positions[i].copy()print(f"Particle {i + 1}: Position={positions[i]}, Score={score}")print(f"\nFinal gbest: {gbest_position}, Score={gbest_score}")

其通过PSO算法的2次迭代获得最优的超参数组合即gbest，输出如下：

Iteration 1:
Particle 1: Position=[2.35 1.21], Score=6.986600000000001
Particle 2: Position=[1.56 1.27], Score=4.0465Iteration 2:
Particle 1: Position=[1.608  1.3255], Score=4.3426142500000005
Particle 2: Position=[3.352 0.059], Score=11.239384999999997Final gbest: [1.56 1.27], Score=4.0465

1.1.2 BI-LSTM

LSTM只能实现单向的传递，无法编码从后到前的信息。当我们语句是承前启后的情况时，自然能完成。但是当语句顺序倒过来，关键次在后面了，LSTM就无能为力了。在更细粒度的分类时，如对于强程度的褒义、弱程度的褒义、中性、弱程度的贬义、强程度的贬义的五分类任务需要注意情感词、程度词、否定词之间的交互。举一个例子，“这个餐厅脏得不行，没有隔壁好”，这里的“不行”是对“脏”的程度的一种修饰，通过BiLSTM可以更好的捕捉双向的语义依赖。

双向LSTM结构中有两个 LSTM 层，一个从前向后处理序列，另一个从后向前处理序列。这样，模型可以同时利用前面和后面的上下文信息。在处理序列时，每个时间步的输入会被分别传递给两个 LSTM 层，然后它们的输出会被合并。通过双向 LSTM，我们可以获得更全面的序列信息，有助于提高模型在序列任务中的性能。
在这里插入图片描述

双向神经网络的单元计算与单向的是相通的。但是双向神经网络隐藏层要保存两个值，一个参与正向计算，另一个值参与反向计算，处理完成后将两个LSTM的输出拼接起来
在这里插入图片描述
BI-LSTM的代码实现如下：

class CustomBiLSTM(Layer):def __init__(self, units):super(CustomBiLSTM, self).__init__()self.units = units# 前向 LSTM 参数self.Wf_f = self.add_weight(shape=(1, units), initializer='glorot_uniform', trainable=True)  # 输入权重self.Uf_f = self.add_weight(shape=(units, units), initializer='glorot_uniform', trainable=True)  # 循环权重self.bf_f = self.add_weight(shape=(units,), initializer='zeros', trainable=True)self.Wi_f = self.add_weight(shape=(1, units), initializer='glorot_uniform', trainable=True)self.Ui_f = self.add_weight(shape=(units, units), initializer='glorot_uniform', trainable=True)self.bi_f = self.add_weight(shape=(units,), initializer='zeros', trainable=True)self.Wc_f = self.add_weight(shape=(1, units), initializer='glorot_uniform', trainable=True)self.Uc_f = self.add_weight(shape=(units, units), initializer='glorot_uniform', trainable=True)self.bc_f = self.add_weight(shape=(units,), initializer='zeros', trainable=True)self.Wo_f = self.add_weight(shape=(1, units), initializer='glorot_uniform', trainable=True)self.Uo_f = self.add_weight(shape=(units, units), initializer='glorot_uniform', trainable=True)self.bo_f = self.add_weight(shape=(units,), initializer='zeros', trainable=True)# 反向 LSTM 参数self.Wf_b = self.add_weight(shape=(1, units), initializer='glorot_uniform', trainable=True)self.Uf_b = self.add_weight(shape=(units, units), initializer='glorot_uniform', trainable=True)self.bf_b = self.add_weight(shape=(units,), initializer='zeros', trainable=True)self.Wi_b = self.add_weight(shape=(1, units), initializer='glorot_uniform', trainable=True)self.Ui_b = self.add_weight(shape=(units, units), initializer='glorot_uniform', trainable=True)self.bi_b = self.add_weight(shape=(units,), initializer='zeros', trainable=True)self.Wc_b = self.add_weight(shape=(1, units), initializer='glorot_uniform', trainable=True)self.Uc_b = self.add_weight(shape=(units, units), initializer='glorot_uniform', trainable=True)self.bc_b = self.add_weight(shape=(units,), initializer='zeros', trainable=True)self.Wo_b = self.add_weight(shape=(1, units), initializer='glorot_uniform', trainable=True)self.Uo_b = self.add_weight(shape=(units, units), initializer='glorot_uniform', trainable=True)self.bo_b = self.add_weight(shape=(units,), initializer='zeros', trainable=True)def call(self, inputs):# inputs: [batch_size, timesteps, features]batch_size = tf.shape(inputs)[0]timesteps = tf.shape(inputs)[1]# 初始化前向和反向状态h_f = tf.zeros((batch_size, self.units))c_f = tf.zeros((batch_size, self.units))h_b = tf.zeros((batch_size, self.units))c_b = tf.zeros((batch_size, self.units))outputs_f, outputs_b = [], []# 前向 LSTMfor t in range(timesteps):x_t = inputs[:, t, :]  # [batch_size, features]ft = tf.sigmoid(tf.matmul(x_t, self.Wf_f) + tf.matmul(h_f, self.Uf_f) + self.bf_f)it = tf.sigmoid(tf.matmul(x_t, self.Wi_f) + tf.matmul(h_f, self.Ui_f) + self.bi_f)ct_tilde = tf.tanh(tf.matmul(x_t, self.Wc_f) + tf.matmul(h_f, self.Uc_f) + self.bc_f)ot = tf.sigmoid(tf.matmul(x_t, self.Wo_f) + tf.matmul(h_f, self.Uo_f) + self.bo_f)c_f = ft * c_f + it * ct_tildeh_f = ot * tf.tanh(c_f)outputs_f.append(h_f)# 反向 LSTMfor t in range(timesteps - 1, -1, -1):x_t = inputs[:, t, :]ft = tf.sigmoid(tf.matmul(x_t, self.Wf_b) + tf.matmul(h_b, self.Uf_b) + self.bf_b)it = tf.sigmoid(tf.matmul(x_t, self.Wi_b) + tf.matmul(h_b, self.Ui_b) + self.bi_b)ct_tilde = tf.tanh(tf.matmul(x_t, self.Wc_b) + tf.matmul(h_b, self.Uc_b) + self.bc_b)ot = tf.sigmoid(tf.matmul(x_t, self.Wo_b) + tf.matmul(h_b, self.Uo_b) + self.bo_b)c_b = ft * c_b + it * ct_tildeh_b = ot * tf.tanh(c_b)outputs_b.insert(0, h_b)# 拼接前向和反向输出outputs = tf.stack(outputs_f + outputs_b, axis=1)  # [batch_size, timesteps*2, units]return outputs[:, :timesteps, :]  # 返回前向部分用于后续融合

1.1.3 BI-GRU

虽然LSTM能够抑制梯度消失问题，但需要以增加时间复杂度和空间复杂度作为代价。GRU在LSTM基础上将忘记门和输入门合并成一个新的门即更新门， GRU包含两个门：更新门与重置门。
重置门负责控制忽略前一时刻的状态信息h_t-1的程度，重置门的值越小说明忽略的越多，更新门：定义了前面记忆保存到当前时间步的量，更新门的值越大说明上一时刻的状态信息h_t-1带入越多。这两个门控向量决定了哪些信息最终能作为门控循环单元的输出，它们能够保存长期序列中的信息，使得重要信息可以跨越长时间步骤传递，且不会随时间而清除或因为与预测不相关而移除。
GRU的内部结构图和计算公式如下:
在这里插入图片描述
Bi-GRU与Bi-LSTM的逻辑相同, 都是不改变其内部结构, 而是将模型应用两次且方向不同, 再将两次得到的结果进行拼接作为最终输出.
BI-GRU的代码实现如下：

class CustomBiGRU(Layer):def __init__(self, units):super(CustomBiGRU, self).__init__()self.units = units# 前向 GRU 参数self.Wz_f = self.add_weight(shape=(1, units), initializer='glorot_uniform', trainable=True)self.Uz_f = self.add_weight(shape=(units, units), initializer='glorot_uniform', trainable=True)self.bz_f = self.add_weight(shape=(units,), initializer='zeros', trainable=True)self.Wr_f = self.add_weight(shape=(1, units), initializer='glorot_uniform', trainable=True)self.Ur_f = self.add_weight(shape=(units, units), initializer='glorot_uniform', trainable=True)self.br_f = self.add_weight(shape=(units,), initializer='zeros', trainable=True)self.Wh_f = self.add_weight(shape=(1, units), initializer='glorot_uniform', trainable=True)self.Uh_f = self.add_weight(shape=(units, units), initializer='glorot_uniform', trainable=True)self.bh_f = self.add_weight(shape=(units,), initializer='zeros', trainable=True)# 反向 GRU 参数self.Wz_b = self.add_weight(shape=(1, units), initializer='glorot_uniform', trainable=True)self.Uz_b = self.add_weight(shape=(units, units), initializer='glorot_uniform', trainable=True)self.bz_b = self.add_weight(shape=(units,), initializer='zeros', trainable=True)self.Wr_b = self.add_weight(shape=(1, units), initializer='glorot_uniform', trainable=True)self.Ur_b = self.add_weight(shape=(units, units), initializer='glorot_uniform', trainable=True)self.br_b = self.add_weight(shape=(units,), initializer='zeros', trainable=True)self.Wh_b = self.add_weight(shape=(1, units), initializer='glorot_uniform', trainable=True)self.Uh_b = self.add_weight(shape=(units, units), initializer='glorot_uniform', trainable=True)self.bh_b = self.add_weight(shape=(units,), initializer='zeros', trainable=True)def call(self, inputs):batch_size = tf.shape(inputs)[0]timesteps = tf.shape(inputs)[1]h_f = tf.zeros((batch_size, self.units))h_b = tf.zeros((batch_size, self.units))outputs_f, outputs_b = [], []# 前向 GRUfor t in range(timesteps):x_t = inputs[:, t, :]zt = tf.sigmoid(tf.matmul(x_t, self.Wz_f) + tf.matmul(h_f, self.Uz_f) + self.bz_f)rt = tf.sigmoid(tf.matmul(x_t, self.Wr_f) + tf.matmul(h_f, self.Ur_f) + self.br_f)ht_tilde = tf.tanh(tf.matmul(x_t, self.Wh_f) + tf.matmul(rt * h_f, self.Uh_f) + self.bh_f)h_f = (1 - zt) * h_f + zt * ht_tildeoutputs_f.append(h_f)# 反向 GRUfor t in range(timesteps - 1, -1, -1):x_t = inputs[:, t, :]zt = tf.sigmoid(tf.matmul(x_t, self.Wz_b) + tf.matmul(h_b, self.Uz_b) + self.bz_b)rt = tf.sigmoid(tf.matmul(x_t, self.Wr_b) + tf.matmul(h_b, self.Ur_b) + self.br_b)ht_tilde = tf.tanh(tf.matmul(x_t, self.Wh_b) + tf.matmul(rt * h_b, self.Uh_b) + self.bh_b)h_b = (1 - zt) * h_b + zt * ht_tildeoutputs_b.insert(0, h_b)# 拼接前向和反向输出outputs = tf.stack(outputs_f + outputs_b, axis=1)return outputs[:, :timesteps, :]  # 返回前向部分用于融合

1.2 整体框架

论文提出的模型将粒子群优化（PSO）与双向长短期记忆网络（Bi-LSTM）和双向门控循环单元（Bi-GRU）相结合，并通过自注意力层进一步优化，其模型架构如下图所示：
在这里插入图片描述
其大致工作原理就是先通过PSO算法优化输入，然后将优化后的数据同步输入到Bi-LSTM 和 Bi-GRU 中进行并行处理，此后将Bi-LSTM和Bi-GRU的输出拼接，后将得到的拼接输出放入自注意力层进行加权，最后将特征融合后得到最终的预测结果。
主体模型的实现代码如下，其主要是调用BI-LSTM，注意力机制等部分进行拼接：

def build_model(timesteps, features, units_lstm, units_gru):inputs = Input(shape=(timesteps, features))# 自定义 Bi-LSTMbilstm_out = CustomBiLSTM(units_lstm)(inputs)# 自定义 Bi-GRUbigru_out = CustomBiGRU(units_gru)(inputs)# 拼接输出concat_out = tf.keras.layers.Concatenate(axis=-1)([bilstm_out, bigru_out])# 自注意力层attention_out = SelfAttention(units=units_lstm + units_gru)(concat_out)# 全连接层dense_out = Dense(32, activation='sigmoid')(attention_out)outputs = Dense(1)(dense_out)model = Model(inputs, outputs)model.compile(optimizer=Adam(learning_rate=0.001), loss='mse')return model

1.3 实验分析

（1）数据集
研究区域位于土耳其 Kizilirmak 流域，位于中安纳托利亚东部，连接黑海，数据来源于当地观测站。使用 2002-2011 年 3652 天的每日流量数据，80% 用于训练，20% 用于测试。
（2）评估标准
均方根误差（RMSE）: 预测值与实际值偏差的平方根，值越低越好。
平均绝对误差（MAE）: 预测值与实际值的平均绝对差，值越低越好。
决定系数（R²）: 模型解释数据变异的比例，值越接近 1 越好。
Kling-Gupta 效率（KGE）: 综合相关性、偏差和变异性的指标，值越高越好。
Brier 分数（BF）: 预测概率的准确性，越低越好。
Nash-Sutcliffe 效率（NSE）: 衡量预测与均值的相对优劣，值越高越好。
（3）实验结果
下图总结了PSO算法的结果，概述了每个模型架构的选定超参数值。
在这里插入图片描述
从实验结果来看，使用PSO进行优化的双向方法优于传统的单向方法(如GRU和LSTM)在所有数据集中，双向模型在各种性能指标(包括RMSE、MAE和R)上始终优于单向模刑。
下图是使用 NSE分析进行预测值与观察值的比较：
在这里插入图片描述

从上述实验结果可以发现，添加注意力层的模型在所有数据集上 RMSE 和 MAE 降低，R²和 NSE 提升,箱线图显示提议方法的 IQR 较窄，预测更稳定。由此可见注意力层在增强预测模型有效捕获数据中相关特征和依赖关系的能力方面的重要性，从而提高了整体性能。