使用LSTM预测股票收盘价

在金融数据预测中，LSTM（长短期记忆网络）凭借其在时间序列数据建模中的优势，成为了分析股票价格趋势的热门选择。本篇博客将以完整的代码实现为例，展示如何利用LSTM网络对股票收盘价进行预测，并从数据处理到模型训练进行全面解析。

一、数据预处理与可视化

1. 导入并整理数据

首先，我们从CSV文件中加载了股票数据，并确保其按照日期递增排序，便于时间序列分析。以下是数据的基本信息：

filepath = './rlData.csv'
data = pd.read_csv(filepath)
data = data.sort_values('Date')
print(data.head())
print(data.shape)

2. 可视化股票价格

为了更直观地理解数据走势，我们对收盘价（Close）进行了可视化：

plt.figure(figsize=(15, 9))
plt.plot(data['Close'])
plt.xticks(range(0, data.shape[0], 20), data['Date'].loc[::20], rotation=45)
plt.title("Stock Price Trend", fontsize=18, fontweight='bold')
plt.xlabel('Date', fontsize=18)
plt.ylabel('Close Price (USD)', fontsize=18)
plt.savefig('StockPrice.jpg')
plt.show()

通过这一步骤，我们能够观察到股票价格的波动规律，为后续建模提供参考。

二、特征工程与数据集制作

1. 数据归一化

LSTM对输入数据的范围较为敏感，因此我们使用MinMaxScaler将数据归一化到[-1, 1]之间：

scaler = MinMaxScaler(feature_range=(-1, 1))
price['Close'] = scaler.fit_transform(price['Close'].values.reshape(-1, 1))

2. 时间序列数据集构建

为了利用前lookback天的数据预测未来一天的收盘价，我们编写了以下函数对数据进行切分：

def split_data(stock, lookback):data_raw = stock.to_numpy()data = []for index in range(len(data_raw) - lookback):data.append(data_raw[index: index + lookback])data = np.array(data)test_set_size = int(np.round(0.2 * data.shape[0]))train_set_size = data.shape[0] - (test_set_size)x_train = data[:train_set_size, :-1, :]y_train = data[:train_set_size, -1, :]x_test = data[train_set_size:, :-1, :]y_test = data[train_set_size:, -1, :]return [x_train, y_train, x_test, y_test]

三、LSTM模型构建与训练

1. 模型定义

LSTM模型由多层循环网络和全连接层组成。模型的输入维度、隐藏层大小和输出维度均可以根据需求调整：

class LSTM(nn.Module):def __init__(self, input_dim, hidden_dim, num_layers, output_dim):super(LSTM, self).__init__()self.hidden_dim = hidden_dimself.num_layers = num_layersself.lstm = nn.LSTM(input_dim, hidden_dim, num_layers, batch_first=True)self.fc = nn.Linear(hidden_dim, output_dim)def forward(self, x):h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_dim).requires_grad_()c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_dim).requires_grad_()out, (hn, cn) = self.lstm(x, (h0.detach(), c0.detach()))out = self.fc(out[:, -1, :])return out

2. 模型训练

在训练过程中，我们使用均方误差（MSE）作为损失函数，并通过Adam优化器进行优化。训练代码如下：

for t in range(num_epochs):y_train_pred = model(x_train)loss = criterion(y_train_pred, y_train_lstm)optimiser.zero_grad()loss.backward()optimiser.step()

四、模型结果与可视化

1. 训练结果分析

模型训练后，我们对预测值与真实值进行了可视化对比：

sns.lineplot(x=original.index, y=original[0], label="Actual Value")
sns.lineplot(x=predict.index, y=predict[0], label="Training Prediction")

2. 测试集性能

利用均方根误差（RMSE）评价模型的预测能力：

trainScore = math.sqrt(mean_squared_error(y_train[:, 0], y_train_pred[:, 0]))
testScore = math.sqrt(mean_squared_error(y_test[:, 0], y_test_pred[:, 0]))
print("Train RMSE: %.2f" % trainScore)
print("Test RMSE: %.2f" % testScore)