本文是学习这本书的笔记
网站是:https://web.stanford.edu/~boyd/vmls/
矩阵-向量乘法的行与列的解释
矩阵-向量乘法(Matrix-Vector Multiplication)是线性代数中的基本操作,也是机器学习、数据科学和工程中常用的数学工具。本文将详细解释矩阵-向量乘法中的“行与列”的两种视角,并通过实际例子帮助理解其背后的数学意义。
1. 什么是矩阵-向量乘法?
矩阵 ( A A A) 和向量 ( x x x) 的乘积可以用下面的公式表示:
y = A x y = Ax y=Ax
其中:
- 矩阵 ( A ∈ R m × n A \in \mathbb{R}^{m \times n} A∈Rm×n) 是一个 ( m × n m \times n m×n) 的矩阵;
- 向量 ( x ∈ R n x \in \mathbb{R}^n x∈Rn) 是一个 ( n n n)-维列向量;
- 结果 ( y ∈ R m y \in \mathbb{R}^m y∈Rm) 是一个 ( m m m)-维列向量。
矩阵-向量乘法可以从“行”和“列”的两种视角来理解。接下来我们分别介绍这两种解释。
2. 从行的角度解释
矩阵-向量乘法可以视为“将向量 ( x x x) 与矩阵的每一行进行内积计算”,具体来说:
矩阵 ( A A A) 的第 ( i i i) 行记为 ( b i T b_i^T biT):
A = [ b 1 T b 2 T ⋮ b m T ] , A = \begin{bmatrix} b_1^T \\ b_2^T \\ \vdots \\ b_m^T \end{bmatrix}, A= b1Tb2T⋮bmT ,
其中每个 ( b i T ∈ R n b_i^T \in \mathbb{R}^n biT∈Rn) 是 ( A A A) 的一行(转置表示为行向量)。
对于 ( y = A x y = Ax y=Ax),结果向量 ( y y y) 的第 ( i i i) 个元素 ( y i y_i yi) 是矩阵第 ( i i i) 行与向量 ( x x x) 的内积:
y i = b i T x , i = 1 , 2 , … , m . y_i = b_i^T x, \quad i = 1, 2, \dots, m. yi=biTx,i=1,2,…,m.
公式解读:
- ( b i T x b_i^T x biTx) 表示矩阵第 ( i i i) 行与向量 ( x x x) 的内积;
- 每一行的内积结果形成 ( y y y) 中的一个元素。
例子:
假设:
A = [ 1 2 3 4 5 6 7 8 9 ] , x = [ 1 2 3 ] . A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}, \quad x = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}. A= 147258369 ,x= 123 .
计算 ( y = A x y = Ax y=Ax) 时,从行的视角来看:
- 取第 1 行 ( b 1 T = [ 1 , 2 , 3 ] b_1^T = [1, 2, 3] b1T=[1,2,3]),计算内积 ( y 1 = b 1 T x = 1 ⋅ 1 + 2 ⋅ 2 + 3 ⋅ 3 = 14 y_1 = b_1^T x = 1 \cdot 1 + 2 \cdot 2 + 3 \cdot 3 = 14 y1=b1Tx=1⋅1+2⋅2+3⋅3=14);
- 取第 2 行 ( b 2 T = [ 4 , 5 , 6 ] b_2^T = [4, 5, 6] b2T=[4,5,6]),计算内积 ( y 2 = b 2 T x = 4 ⋅ 1 + 5 ⋅ 2 + 6 ⋅ 3 = 32 y_2 = b_2^T x = 4 \cdot 1 + 5 \cdot 2 + 6 \cdot 3 = 32 y2=b2Tx=4⋅1+5⋅2+6⋅3=32);
- 取第 3 行 ( b 3 T = [ 7 , 8 , 9 ] b_3^T = [7, 8, 9] b3T=[7,8,9]),计算内积 ( y 3 = b 3 T x = 7 ⋅ 1 + 8 ⋅ 2 + 9 ⋅ 3 = 50 y_3 = b_3^T x = 7 \cdot 1 + 8 \cdot 2 + 9 \cdot 3 = 50 y3=b3Tx=7⋅1+8⋅2+9⋅3=50)。
最终结果:
y = [ 14 32 50 ] . y = \begin{bmatrix} 14 \\ 32 \\ 50 \end{bmatrix}. y= 143250 .
3. 从列的角度解释
矩阵-向量乘法也可以视为“将矩阵 ( A A A) 的列按照向量 ( x x x) 中的元素加权,并进行线性组合”。具体来说:
矩阵 ( A A A) 的第 ( k k k) 列记为 ( a k a_k ak):
A = [ a 1 , a 2 , … , a n ] , A = [a_1, a_2, \dots, a_n], A=[a1,a2,…,an],
其中每个 ( a k ∈ R m a_k \in \mathbb{R}^m ak∈Rm) 是 ( A A A) 的一列。
矩阵-向量乘法 ( y = A x y = Ax y=Ax) 可以写成:
y = x 1 a 1 + x 2 a 2 + ⋯ + x n a n , y = x_1 a_1 + x_2 a_2 + \dots + x_n a_n, y=x1a1+x2a2+⋯+xnan,
即,结果向量 ( y y y) 是矩阵列向量的线性组合,组合系数由 ( x x x) 的元素给出。
公式解读:
- ( x k a k x_k a_k xkak) 表示用 ( x x x) 中的第 ( k k k) 个元素 ( x k x_k xk) 对矩阵的第 ( k k k) 列进行加权;
- 把加权后的所有列向量相加,得到结果向量 ( y y y)。
例子:
继续使用相同的矩阵和向量:
A = [ 1 2 3 4 5 6 7 8 9 ] , x = [ 1 2 3 ] . A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}, \quad x = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}. A= 147258369 ,x= 123 .
从列的视角来看:
- ( A A A) 的第 1 列是 ( a 1 = [ 1 , 4 , 7 ] T a_1 = [1, 4, 7]^T a1=[1,4,7]T),加权系数是 ( x 1 = 1 x_1 = 1 x1=1),所以贡献向量为 ( 1 ⋅ a 1 = [ 1 , 4 , 7 ] T 1 \cdot a_1 = [1, 4, 7]^T 1⋅a1=[1,4,7]T);
- ( A A A) 的第 2 列是 ( a 2 = [ 2 , 5 , 8 ] T a_2 = [2, 5, 8]^T a2=[2,5,8]T),加权系数是 ( x 2 = 2 x_2 = 2 x2=2),所以贡献向量为 ( 2 ⋅ a 2 = [ 4 , 10 , 16 ] T 2 \cdot a_2 = [4, 10, 16]^T 2⋅a2=[4,10,16]T);
- ( A A A) 的第 3 列是 ( a 3 = [ 3 , 6 , 9 ] T a_3 = [3, 6, 9]^T a3=[3,6,9]T),加权系数是 ( x 3 = 3 x_3 = 3 x3=3),所以贡献向量为 ( 3 ⋅ a 3 = [ 9 , 18 , 27 ] T 3 \cdot a_3 = [9, 18, 27]^T 3⋅a3=[9,18,27]T)。
最终结果是所有列向量的线性组合:
y = 1 ⋅ a 1 + 2 ⋅ a 2 + 3 ⋅ a 3 = [ 1 + 4 + 9 4 + 10 + 18 7 + 16 + 27 ] = [ 14 32 50 ] . y = 1 \cdot a_1 + 2 \cdot a_2 + 3 \cdot a_3 = \begin{bmatrix} 1 + 4 + 9 \\ 4 + 10 + 18 \\ 7 + 16 + 27\end{bmatrix}= \begin{bmatrix} 14 \\ 32 \\ 50 \end{bmatrix}. y=1⋅a1+2⋅a2+3⋅a3= 1+4+94+10+187+16+27 = 143250 .
4. 行与列视角的联系与选择
- 行视角:更适合理解矩阵-向量乘法的每个输出分量 ( y i y_i yi) 是如何计算的,即通过行与向量的内积。
- 列视角:更适合理解结果向量 ( y y y) 是由矩阵列向量的线性组合得到的。
在实际应用中,可以根据问题背景选择合适的视角:
- 行视角通常用于计算或实现算法;
- 列视角常用于分析结果或解释几何意义。
5. 结论
矩阵-向量乘法是线性代数中极为重要的操作,而从行和列的两个视角理解,可以帮助我们更深刻地掌握其计算过程与实际意义。无论是从行的内积出发,还是从列的线性组合出发,这两种视角都揭示了矩阵操作在数学和应用中的多样性。
英文版
Matrix-Vector Multiplication: Row and Column Interpretations
Matrix-vector multiplication is a fundamental operation in linear algebra and is widely used in fields like machine learning, data science, and engineering. This article explains the row and column perspectives of matrix-vector multiplication in detail, with practical examples to help clarify the underlying mathematics.
1. What is Matrix-Vector Multiplication?
The product of a matrix (A) and a vector (x) is written as:
y = A x y = Ax y=Ax
where:
- ( A ∈ R m × n A \in \mathbb{R}^{m \times n} A∈Rm×n) is an ( m × n m \times n m×n) matrix;
- ( x ∈ R n x \in \mathbb{R}^n x∈Rn) is an ( n n n)-dimensional column vector;
- ( y ∈ R m y \in \mathbb{R}^m y∈Rm) is the resulting ( m m m)-dimensional column vector.
Matrix-vector multiplication can be understood from two perspectives:
- The row view: Treat the result as the dot products of ( x x x) with each row of ( A A A).
- The column view: Treat the result as a linear combination of the columns of ( A A A).
2. Row Perspective
In the row perspective, matrix-vector multiplication involves taking the dot product of the vector ( x x x) with each row of the matrix ( A A A). Let ( b i T b_i^T biT) represent the ( i i i)-th row of ( A A A), so:
A = [ b 1 T b 2 T ⋮ b m T ] , A = \begin{bmatrix} b_1^T \\ b_2^T \\ \vdots \\ b_m^T \end{bmatrix}, A= b1Tb2T⋮bmT ,
where each ( b i T ∈ R n b_i^T \in \mathbb{R}^n biT∈Rn) is a row vector.
The ( i i i)-th entry of the result vector ( y y y) is:
y i = b i T x , i = 1 , 2 , … , m . y_i = b_i^T x, \quad i = 1, 2, \dots, m. yi=biTx,i=1,2,…,m.
This means:
- Each entry ( y i y_i yi) is the dot product of ( x x x) with the ( i i i)-th row of ( A A A).
- The result vector ( y y y) consists of ( m m m) such dot products, one for each row.
Example (Row Perspective)
Suppose:
A = [ 1 2 3 4 5 6 7 8 9 ] , x = [ 1 2 3 ] . A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}, \quad x = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}. A= 147258369 ,x= 123 .
To compute ( y = A x y = Ax y=Ax):
- Take the first row ( b 1 T = [ 1 , 2 , 3 ] b_1^T = [1, 2, 3] b1T=[1,2,3]), and compute the dot product with ( x x x):
y 1 = 1 ⋅ 1 + 2 ⋅ 2 + 3 ⋅ 3 = 14. y_1 = 1 \cdot 1 + 2 \cdot 2 + 3 \cdot 3 = 14. y1=1⋅1+2⋅2+3⋅3=14. - Take the second row ( b 2 T = [ 4 , 5 , 6 ] b_2^T = [4, 5, 6] b2T=[4,5,6]), and compute the dot product:
y 2 = 4 ⋅ 1 + 5 ⋅ 2 + 6 ⋅ 3 = 32. y_2 = 4 \cdot 1 + 5 \cdot 2 + 6 \cdot 3 = 32. y2=4⋅1+5⋅2+6⋅3=32. - Take the third row ( b 3 T = [ 7 , 8 , 9 ] b_3^T = [7, 8, 9] b3T=[7,8,9]), and compute the dot product:
y 3 = 7 ⋅ 1 + 8 ⋅ 2 + 9 ⋅ 3 = 50. y_3 = 7 \cdot 1 + 8 \cdot 2 + 9 \cdot 3 = 50. y3=7⋅1+8⋅2+9⋅3=50.
Thus:
y = [ 14 32 50 ] . y = \begin{bmatrix} 14 \\ 32 \\ 50 \end{bmatrix}. y= 143250 .
3. Column Perspective
In the column perspective, matrix-vector multiplication can be interpreted as a linear combination of the columns of ( A A A), with the elements of ( x x x) serving as the coefficients of the combination. Let ( a k a_k ak) represent the ( k k k)-th column of ( A A A), so:
A = [ a 1 , a 2 , … , a n ] , A = [a_1, a_2, \dots, a_n], A=[a1,a2,…,an],
where each ( a k ∈ R m a_k \in \mathbb{R}^m ak∈Rm) is a column vector.
The matrix-vector product ( y = A x y = Ax y=Ax) can be written as:
y = x 1 a 1 + x 2 a 2 + ⋯ + x n a n . y = x_1 a_1 + x_2 a_2 + \cdots + x_n a_n. y=x1a1+x2a2+⋯+xnan.
This means:
- ( x k a k x_k a_k xkak) scales the ( k k k)-th column ( a k a_k ak) of ( A A A) by the ( k k k)-th entry ( x k x_k xk) of the vector ( x x x).
- The result ( y y y) is the sum of these scaled columns.
Example (Column Perspective)
Using the same matrix and vector:
A = [ 1 2 3 4 5 6 7 8 9 ] , x = [ 1 2 3 ] . A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}, \quad x = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}. A= 147258369 ,x= 123 .
From the column perspective:
- The first column of ( A A A) is ( a 1 = [ 1 , 4 , 7 ] T a_1 = [1, 4, 7]^T a1=[1,4,7]T), scaled by ( x 1 = 1 x_1 = 1 x1=1):
1 ⋅ a 1 = [ 1 , 4 , 7 ] T . 1 \cdot a_1 = [1, 4, 7]^T. 1⋅a1=[1,4,7]T. - The second column of ( A A A) is ( a 2 = [ 2 , 5 , 8 ] T a_2 = [2, 5, 8]^T a2=[2,5,8]T), scaled by ( x 2 = 2 x_2 = 2 x2=2):
2 ⋅ a 2 = [ 4 , 10 , 16 ] T . 2 \cdot a_2 = [4, 10, 16]^T. 2⋅a2=[4,10,16]T. - The third column of ( A A A) is ( a 3 = [ 3 , 6 , 9 ] T a_3 = [3, 6, 9]^T a3=[3,6,9]T), scaled by ( x 3 = 3 x_3 = 3 x3=3):
3 ⋅ a 3 = [ 9 , 18 , 27 ] T . 3 \cdot a_3 = [9, 18, 27]^T. 3⋅a3=[9,18,27]T.
Add the scaled columns:
y = 1 ⋅ a 1 + 2 ⋅ a 2 + 3 ⋅ a 3 = [ 1 + 4 + 9 4 + 10 + 18 7 + 16 + 27 ] = [ 14 32 50 ] . y = 1 \cdot a_1 + 2 \cdot a_2 + 3 \cdot a_3 = \begin{bmatrix} 1 + 4 + 9 \\ 4 + 10 + 18 \\ 7 + 16 + 27 \end{bmatrix}= \begin{bmatrix} 14 \\ 32 \\ 50 \end{bmatrix}. y=1⋅a1+2⋅a2+3⋅a3= 1+4+94+10+187+16+27 = 143250 .
4. Connection Between the Two Perspectives
- Row Perspective: Computes the entries of the result vector ( y y y) one at a time, using dot products between the rows of ( A A A) and the vector ( x x x). This perspective is often used in implementation and numerical computation.
- Column Perspective: Views the result vector ( y y y) as a linear combination of the columns of ( A A A), scaled by the entries of ( x x x). This perspective is useful for understanding geometric interpretations and applications like data transformations.
The two perspectives are mathematically equivalent and simply offer different ways to interpret the same operation.
5. Practical Applications
-
Row Perspective in Machine Learning:
- Useful when processing datasets where rows represent individual samples and columns represent features.
- Example: In a neural network, ( A A A) can represent weights, and ( x x x) the input features. The output ( y y y) represents weighted sums for each neuron.
-
Column Perspective in Data Transformation:
- Common in computer graphics and signal processing, where each column represents a basis vector, and the result is a transformation of ( x x x) into a new coordinate system.
- Example: Principal Component Analysis (PCA) involves projecting data onto principal components (columns).
6. Conclusion
Matrix-vector multiplication is a versatile operation that can be understood from two complementary perspectives:
- The row perspective, which emphasizes the dot product computation for each entry of the result vector.
- The column perspective, which highlights the linear combination of matrix columns.
By understanding both views, you can better analyze and apply this operation in various fields like machine learning, data analysis, and linear systems.
后记
2024年12月20日13点01分于上海,在GPT4o的辅助下完成。