Pytorch 张量操作

在深度学习中，数据的表示和处理是至关重要的。PyTorch 作为一个强大的深度学习框架，其核心数据结构是张量（Tensor）。张量是一个多维数组，类似于 NumPy 的数组，但具有更强大的功能，尤其是在 GPU 上进行高效计算。本文将深入探讨 PyTorch 中的张量操作，包括创建张量、维度操作、索引与切片、数学运算等。

1. 基础操作

1.1 创建张量

import torch# 从数据创建张量
tensor_from_data = torch.tensor([[1, 2, 3], [4, 5, 6]])
print("从数据创建的张量:")
print(tensor_from_data)# 创建全零张量
zeros_tensor = torch.zeros((2, 3))
print("\n全零张量:")
print(zeros_tensor)# 创建全一张量
ones_tensor = torch.ones((2, 3))
print("\n全一张量:")
print(ones_tensor)# 创建随机张量
random_tensor = torch.rand((2, 3))
print("\n随机张量:")
print(random_tensor)

从数据创建的张量:
tensor([[1, 2, 3],[4, 5, 6]])全零张量:
tensor([[0., 0., 0.],[0., 0., 0.]])全一张量:
tensor([[1., 1., 1.],[1., 1., 1.]])随机张量:
tensor([[0.0066, 0.5273, 0.7934],[0.8753, 0.8566, 0.3123]])

1.2 张量属性

import torch# 从数据创建张量
tensor_from_data = torch.tensor([[1, 2, 3], [4, 5, 6]])
print("\n张量的形状:", tensor_from_data.shape)
print("张量的大小:", tensor_from_data.size())
print("张量的数据类型:", tensor_from_data.dtype)

张量的形状: torch.Size([2, 3])
张量的大小: torch.Size([2, 3])
张量的数据类型: torch.int64

1.3 张量索引和切片

张量支持类似于 NumPy 的索引和切片操作，可以方便地访问和修改张量中的元素。

import torch# 从数据创建张量
tensor_from_data = torch.tensor([[1, 2, 3], [4, 5, 6]])
value = tensor_from_data[1, 2]  # 获取第二行第三列的值
print("\n特定元素的值:", value)# 切片操作
sliced_tensor = tensor_from_data[:, 1]  # 获取所有行的第二列
print("\n切片后的张量:")
print(sliced_tensor)

特定元素的值: tensor(6)
切片后的张量:
tensor([2, 5])

1.4 张量维度操作

import torch# 创建一个 3 维张量
tensor_3d = torch.tensor([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print("\n3D 张量:")
print(tensor_3d)# 在第 0 维插入一个维度
unsqueezed_tensor = torch.unsqueeze(tensor_3d, 0)
print("\n在第 0 维插入维度后的张量:")
print(unsqueezed_tensor)# 去除大小为 1 的维度
squeezed_tensor = torch.squeeze(unsqueezed_tensor)
print("\n去除大小为 1 的维度后的张量:")
print(squeezed_tensor)# 展平张量
flat_tensor = torch.flatten(tensor_3d)
print("\n展平后的张量:")
print(flat_tensor)

3D 张量:
tensor([[[1, 2],[3, 4]],[[5, 6],[7, 8]]])在第 0 维插入维度后的张量:
tensor([[[[1, 2],[3, 4]],[[5, 6],[7, 8]]]])去除大小为 1 的维度后的张量:
tensor([[[1, 2],[3, 4]],[[5, 6],[7, 8]]])展平后的张量:
tensor([1, 2, 3, 4, 5, 6, 7, 8])

1.5 张量连接和分割

import torch# 创建两个张量
tensor_a = torch.tensor([[1, 2], [3, 4]])
tensor_b = torch.tensor([[5, 6], [7, 8]])# 在第 0 维连接
concatenated_tensor = torch.cat((tensor_a, tensor_b), dim=0)
print("\n在第 0 维连接后的张量:")
print(concatenated_tensor)# 在新维度上堆叠
stacked_tensor = torch.stack((tensor_a, tensor_b), dim=0)
print("\n在新维度上堆叠后的张量:")
print(stacked_tensor)# 垂直堆叠
vstacked_tensor = torch.vstack((tensor_a, tensor_b))
print("\n垂直堆叠后的张量:")
print(vstacked_tensor)# 水平堆叠
hstacked_tensor = torch.hstack((tensor_a, tensor_b))
print("\n水平堆叠后的张量:")
print(hstacked_tensor)# 将张量沿第 0 维分割成两个子张量
split_tensors = torch.split(concatenated_tensor, 2, dim=0)
print("\n分割后的张量:")
for i, t in enumerate(split_tensors):print(f"子张量 {i}:")print(t)# 将张量分割成 2 个子张量
chunked_tensors = torch.chunk(concatenated_tensor, 2, dim=0)
print("\n按数量分割后的张量:")
for i, t in enumerate(chunked_tensors):print(f"子张量 {i}:")print(t)

在第 0 维连接后的张量:
tensor([[1, 2],[3, 4],[5, 6],[7, 8]])在新维度上堆叠后的张量:
tensor([[[1, 2],[3, 4]],[[5, 6],[7, 8]]])垂直堆叠后的张量:
tensor([[1, 2],[3, 4],[5, 6],[7, 8]])水平堆叠后的张量:
tensor([[1, 2, 5, 6],[3, 4, 7, 8]])分割后的张量:
子张量 0:
tensor([[1, 2],[3, 4]])
子张量 1:
tensor([[5, 6],[7, 8]])按数量分割后的张量:
子张量 0:
tensor([[1, 2],[3, 4]])
子张量 1:
tensor([[5, 6],[7, 8]])

1.6 数学运算

import torch# 创建两个张量
tensor_a = torch.tensor([[1, 2], [3, 4]])
tensor_b = torch.tensor([[5, 6], [7, 8]])# 张量加法
result_add = tensor_a + tensor_b
print("\n张量加法结果:")
print(result_add)# 矩阵乘法
result_matmul = torch.matmul(tensor_a, tensor_b.T)  # 转置以进行矩阵乘法
print("\n矩阵乘法结果:")
print(result_matmul)

张量加法结果:
tensor([[ 6,  8],[10, 12]])矩阵乘法结果:
tensor([[17, 23],[39, 53]])

1.7 张量形状改变

import torch# 创建一个一维张量
tensor = torch.arange(12)  # 生成一个包含 0 到 11 的一维张量
print("原始张量:")
print(tensor)# 使用 reshape 改变形状为 (3, 4)
reshaped_tensor = tensor.reshape(3, 4)
print("\n调整后的张量 (3, 4):")
print(reshaped_tensor)
viewed_tensor = reshaped_tensor.view(4, 3)  # 将结果调整为 4x3 的形状
print("\n使用 view 调整后的张量 (4, 3):")
print(viewed_tensor)
# 创建一个 3D 张量
tensor_3d = torch.tensor([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])  # 形状为 (2, 2, 2)# 使用 permute 重新排列维度
permuted_tensor = tensor_3d.permute(1, 0, 2)  # 交换第 0 和第 1 维
print("\n使用 permute 调整后的张量:")
print(permuted_tensor)

原始张量:
tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])调整后的张量 (3, 4):
tensor([[ 0,  1,  2,  3],[ 4,  5,  6,  7],[ 8,  9, 10, 11]])使用 view 调整后的张量 (4, 3):
tensor([[ 0,  1,  2],[ 3,  4,  5],[ 6,  7,  8],[ 9, 10, 11]])使用 permute 调整后的张量:
tensor([[[1, 2],[5, 6]],[[3, 4],[7, 8]]])

在 PyTorch 中，四维张量的元素原始排序遵循行优先（row-major）顺序，也称为 C 风格的顺序。这意味着在内存中，张量的元素是按最后一个维度的变化最快，前面的维度变化最慢的顺序排列的。

1.8 张量在内存中排序

假设我们有一个四维张量，其形状为 (D1, D2, D3, D4)，其中：

D1：第一个维度的大小
D2：第二个维度的大小
D3：第三个维度的大小
D4：第四个维度的大小

import torch# 创建一个四维张量
tensor_4d = torch.arange(120).reshape(2, 3, 4, 5)  # 120个元素，形状为 (2, 3, 4, 5)print("四维张量:")
print(tensor_4d)

在这个例子中，torch.arange(120) 生成了一个包含从 0 到 119 的一维张量。然后使用 reshape(2, 3, 4, 5) 将其转换为四维张量。

第一个维度（D1）：变化最慢，表示有 2 个块。
第二个维度（D2）：每个块有 3 个层。
第三个维度（D3）：每层有 4 行。
第四个维度（D4）：每行有 5 列。

在内存中，元素的排列顺序如下：

tensor_4d[0, :, :, :]  # 第一个块
[[[ 0,  1,  2,  3,  4],  # 第一行[ 5,  6,  7,  8,  9],  # 第二行[10, 11, 12, 13, 14],  # 第三行],[[15, 16, 17, 18, 19],  # 第一行[20, 21, 22, 23, 24],  # 第二行[25, 26, 27, 28, 29],  # 第三行],[[30, 31, 32, 33, 34],  # 第一行[35, 36, 37, 38, 39],  # 第二行[40, 41, 42, 43, 44],  # 第三行]
]tensor_4d[1, :, :, :]  # 第二个块
[[[45, 46, 47, 48, 49],  # 第一行[50, 51, 52, 53, 54],  # 第二行[55, 56, 57, 58, 59],  # 第三行],[[60, 61, 62, 63, 64],  # 第一行[65, 66, 67, 68, 69],  # 第二行[70, 71, 72, 73, 74],  # 第三行],[[75, 76, 77, 78, 79],  # 第一行[80, 81, 82, 83, 84],  # 第二行[85, 86, 87, 88, 89],  # 第三行]
]

四维张量的元素在内存中的原始排序遵循行优先顺序，意味着最后一个维度的变化最快，前面的维度变化最慢。理解这种排序方式对于有效地操作和处理多维数据非常重要。

2. 实战：模型音频频谱通过卷积层和GRU层

import torch
import torch.nn as nnspectrum1 = torch.tensor([[[1.0, 1.0, 1.0, 1.0],[2.0, 2.0, 2.0, 2.0],[3.0, 3.0, 3.0, 3.0]]], dtype=torch.float32)spectrum2 = torch.tensor([[[4.0, 4.0, 4.0, 4.0],[5.0, 5.0, 5.0, 5.0],[6.0, 6.0, 6.0, 6.0]]], dtype=torch.float32)print("频谱 1:")
print(spectrum1)
print("频谱 1 的维度 (batch_size, channels, bins, frames):", spectrum1.shape)print("\n频谱 2:")
print(spectrum2)
print("频谱 2 的维度 (batch_size, channels, bins, frames):", spectrum2.shape)# 在通道上堆叠频谱
stacked_spectra = torch.stack((spectrum1, spectrum2), dim=1)  # 在通道维度上堆叠
print("\n堆叠后的频谱 (维度: batch_size, channels, bins, frames):")
print(stacked_spectra)
print("堆叠后的频谱的维度 (batch_size, channels, bins, frames):", stacked_spectra.shape)# 定义简单的 CNN
class SimpleCNN(nn.Module):def __init__(self):super(SimpleCNN, self).__init__()self.conv1 = nn.Conv2d(in_channels=2, out_channels=5, kernel_size=(1, 1))  # 卷积层def forward(self, x):return self.conv1(x)# 创建 CNN 实例
cnn = SimpleCNN()# 将堆叠的频谱输入到 CNN
cnn_output = cnn(stacked_spectra)
print("\nCNN 输出:")
print(cnn_output)
print("CNN 输出的维度 (batch_size, out_channels, bins, frames):", cnn_output.shape)
batch_size = cnn_output.shape[0]
frames = cnn_output.shape[3]
out_channels = cnn_output.shape[1]
bins = cnn_output.shape[2]
cnn_output_permute = cnn_output.permute(0, 3, 1, 2)
gru_input = cnn_output_permute.reshape(cnn_output.shape[0], cnn_output.shape[3], -1)
print("\nGRU 输入的形状 (batch_size, frames, out_channels * bins):")
print(gru_input.shape)
print(gru_input)

频谱 1:
tensor([[[1., 1., 1., 1.],[2., 2., 2., 2.],[3., 3., 3., 3.]]])
频谱 1 的维度 (batch_size, channels, bins, frames): torch.Size([1, 3, 4])频谱 2:
tensor([[[4., 4., 4., 4.],[5., 5., 5., 5.],[6., 6., 6., 6.]]])
频谱 2 的维度 (batch_size, channels, bins, frames): torch.Size([1, 3, 4])堆叠后的频谱 (维度: batch_size, channels, bins, frames):
tensor([[[[1., 1., 1., 1.],[2., 2., 2., 2.],[3., 3., 3., 3.]],[[4., 4., 4., 4.],[5., 5., 5., 5.],[6., 6., 6., 6.]]]])
堆叠后的频谱的维度 (batch_size, channels, bins, frames): torch.Size([1, 2, 3, 4])CNN 输出:
tensor([[[[-2.5064, -2.5064, -2.5064, -2.5064],[-3.3889, -3.3889, -3.3889, -3.3889],[-4.2714, -4.2714, -4.2714, -4.2714]],[[ 0.6582,  0.6582,  0.6582,  0.6582],[ 1.3287,  1.3287,  1.3287,  1.3287],[ 1.9992,  1.9992,  1.9992,  1.9992]],[[ 0.6646,  0.6646,  0.6646,  0.6646],[ 0.2705,  0.2705,  0.2705,  0.2705],[-0.1235, -0.1235, -0.1235, -0.1235]],[[ 1.5735,  1.5735,  1.5735,  1.5735],[ 1.8892,  1.8892,  1.8892,  1.8892],[ 2.2049,  2.2049,  2.2049,  2.2049]],[[-1.1208, -1.1208, -1.1208, -1.1208],[-0.9246, -0.9246, -0.9246, -0.9246],[-0.7284, -0.7284, -0.7284, -0.7284]]]],grad_fn=<ConvolutionBackward0>)
CNN 输出的维度 (batch_size, out_channels, bins, frames): torch.Size([1, 5, 3, 4])GRU 输入的形状 (batch_size, frames, out_channels * bins):
torch.Size([1, 4, 15])
tensor([[[-2.5064, -3.3889, -4.2714,  0.6582,  1.3287,  1.9992,  0.6646,0.2705, -0.1235,  1.5735,  1.8892,  2.2049, -1.1208, -0.9246,-0.7284],[-2.5064, -3.3889, -4.2714,  0.6582,  1.3287,  1.9992,  0.6646,0.2705, -0.1235,  1.5735,  1.8892,  2.2049, -1.1208, -0.9246,-0.7284],[-2.5064, -3.3889, -4.2714,  0.6582,  1.3287,  1.9992,  0.6646,0.2705, -0.1235,  1.5735,  1.8892,  2.2049, -1.1208, -0.9246,-0.7284],[-2.5064, -3.3889, -4.2714,  0.6582,  1.3287,  1.9992,  0.6646,0.2705, -0.1235,  1.5735,  1.8892,  2.2049, -1.1208, -0.9246,-0.7284]]], grad_fn=<ReshapeAliasBackward0>)

2. 实战：模型音频频谱通过卷积层和 GRU 层

在这一部分，我们将展示如何使用 PyTorch 构建一个简单的模型，处理音频频谱数据。整体思路流程如下：

创建频谱数据：生成两个示例频谱 spectrum1 和 spectrum2，形状为 (1, 3, 4)。
- 1 (batch_size)：样本数量。
- 3 (bins)：频率分量数量。
- 4 (frames)：时间帧数量。
堆叠频谱：使用 torch.stack 将两个频谱在通道维度上堆叠，形成新的张量，形状为 (1, 2, 3, 4)。
- 1 (batch_size)：样本数量。
- 2 (channels)：通道数量（两个频谱）。
- 3 (bins)：频率分量数量。
- 4 (frames)：时间帧数量。
定义卷积神经网络（CNN）：构建一个简单的 CNN 模型，包含一个卷积层，用于提取频谱特征。
前向传播：将堆叠的频谱输入到 CNN 中，获取输出特征，输出形状为 (1, 5, 3, 4)。
- 1 (batch_size)：样本数量。
- 5 (out_channels)：卷积层输出的通道数。
- 3 (bins)：频率分量数量。
- 4 (frames)：时间帧数量。
处理 CNN 输出：调整 CNN 输出的维度，以适应 GRU 层的输入格式。使用 permute 和 reshape 将输出转换为形状 (1, 4, 15)。
- 1 (batch_size)：样本数量。
- 4 (frames)：时间帧数量。
- 15 (out_channels * bins)：每个时间步的特征数量（5 * 3）。