机器学习的数学基础：向量篇

本文将总结线性代数中向量的基本知识点。同时理论结合实践，使用 Python 来进行实践。

前言

在上文中我简单概括了矩阵的基本运算，并给出了两个应用实例。这篇文章我们继续谈谈向量。

向量是线性代数中的基本概念，也是机器学习的基础数据表示形式。例如计算机阅读文本的过程首先就会将文本分词，然后用向量表示^[1]。这是因为向量很适合在高维空间中表达和处理。在机器学习中会接触到的诸如投影、降维的概念，都是在向量的基础上做的。

在 \(\mathbb{R}^{n}\) ^[2]空间中定义的向量 \(\vec{\mathbf{V}}\)，可以用一个包含 n 个实数的有序集来表示，即 \(\vec{\mathbf{V}} = \begin{bmatrix}v_1 \\ v_2 \\ \ldots \\ v_n\end{bmatrix}\)，这个有序集里的每个元素称为向量的分量。例如一个 \(\mathbb{R}^{2}\) 空间中的向量 \(\begin{bmatrix}2 \\ 1\end{bmatrix}\) ，有些地方也会用 \((2, 1)\) 或 \(<2, 1>\) 这样的形式来表示。

绘图表示这个变量：

向量的长度被定义为 \[\left\| \vec{\mathbf{v}} \right\| = \sqrt{v_{1}^{2} + v_{2}^{2} + \ldots + v_{n}^{2}}\]，和我们以往所接触的距离公式一模一样。长度为 1 的向量称为 单位向量 。

基本运算

加

向量 \(\mathbf{a}\) 与向量 \(\mathbf{b}\) 的加法定义为:

\[ \mathbf{a} + \mathbf{b} = \begin{bmatrix} a_1 + b_1 \\ a_2 + b_2 \\ \ldots \\ a_n + b_n \end{bmatrix} \]

绘图示意向量 \(\mathbf{a} = \begin{bmatrix}-1 \\ 2\end{bmatrix}\) 与 \(\mathbf{b} = \begin{bmatrix}3 \\ 1\end{bmatrix}\) 的相加，值为 \(\begin{bmatrix}2 \\ 3\end{bmatrix}\) ：

在 Python 中，可以直接用 Numpy 的 ndarray 来表示向量。

import numpy as np
a = np.array([-1, 2])
b = np.array([3, 1])
print a + b # [2 3]

减

\[ \mathbf{a} - \mathbf{b} = \begin{bmatrix} a_1 - b_1 \\ a_2 - b_2 \\ \ldots \\ a_n - b_n \end{bmatrix} \]

从几何角度讲，向量减相当于加上一个反向的向量。

import numpy as np
a = np.array([-1, 2])
b = np.array([3, 1])
print a - b  # [-4,  1]

乘

标量乘向量

标量 \(c\) 乘以向量 \(\mathbf{a}\) 定义为：

\[ c \cdot \mathbf{a} = \begin{bmatrix} c \cdot a_1 \\ c \cdot a_2 \\ \ldots \\ c \cdot a_n \end{bmatrix} = \begin{bmatrix} a_1 \cdot c \\ a_2 \cdot c \\ \ldots \\ a_n \cdot c \end{bmatrix} \]

绘图示意向量 \(\mathbf{a} = \begin{bmatrix} -1 \\ 2 \end{bmatrix}\) 乘以一个标量 3 得到 \(\begin{bmatrix} -3 \\ 6 \end{bmatrix}\) ：

Python 实现：

1
2
3

import numpy as np
a = np.array([-1, 2])
print a * 3 #[-3,  6]

向量点积

向量的点积（又叫点乘）定义如下：

\[\vec{\mathbf{a}}\cdot \vec{\mathbf{b}} = \begin{bmatrix} a_1 \\ a_2 \\ \ldots \\ a_n\end{bmatrix} \cdot \begin{bmatrix} b_1 \\ b_2 \\ \ldots \\ b_n \end{bmatrix} = a_{1}b_{1} + a_{2}b_{2} + \ldots + a_{n}b_{n}\]

可见点积得到的是一个标量。

例如：

\[\begin{bmatrix} 3 \\ 5 \\ 2 \end{bmatrix} \cdot \begin{bmatrix} 1 \\ 4 \\ 7 \end{bmatrix} = 3 \cdot 1 + 5 \cdot 4 + 2 \cdot 7 = 37\]

Python 示例：

import numpy as np
a = np.array([3, 5, 2])
b = np.array([1, 4, 7])
print a.dot(b)  # 37
print np.dot(a, b)  # 37（另一种等价写法）

容易证明点积满足乘法交换律、分配律和结合律。

我们前面知道向量的长度定义为 \(\left\| \vec{\mathbf{v}} \right\| = \sqrt{v_{1}^{2} + v_{2}^{2} + \ldots + v_{n}^{2}}\)，联立点积的定义，可以得出：

eq: 1 »

\[\left\| \vec{\mathbf{v}} \right\| = \sqrt{v_{1}^{2} + v_{2}^{2} + \ldots + v_{n}^{2}} = \sqrt{\vec{\mathbf{v}} \cdot \vec{\mathbf{v}}}\]

关于点积还有一个非常重要的性质，称为 柯西不等式 ^[3]：

对两个非 0 向量 \(\vec{\mathbf{x}}, \vec{\mathbf{y}} \in \mathbb{R}^{n}\)，\(|\vec{\mathbf{x}} \cdot \vec{\mathbf{y}}| \le \left\|\vec{\mathbf{x}}\right\|\left\|\vec{\mathbf{y}}\right\|\)。
当且仅当 \(\vec{\mathbf{x}} = c\vec{\mathbf{y}}\) 时，等式成立。

虽然受限于篇幅不去证明它，但这个性质非常重要，后面会有很多向量的理论都建立在它的基础之上。例如，对一个向量 \((\vec{\mathbf{x}} + \vec{\mathbf{y}})\) ,利用这个性质，结合公式 1，我们可以得到

\[\begin{align} \left\|\vec{\mathbf{x}} + \vec{\mathbf{y}}\right\|^2 & = (\vec{\mathbf{x}} + \vec{\mathbf{y}})\cdot (\vec{\mathbf{x}} + \vec{\mathbf{y}}) \\\ & = \left\|\vec{\mathbf{x}}\right\|^2 + 2\vec{\mathbf{x}}\vec{\mathbf{y}} + \left\|\vec{\mathbf{y}}\right\|^2 \\\ & \le \left\|\vec{\mathbf{x}}\right\|^2 + 2\left\|\vec{\mathbf{x}}\right\|\left\|\vec{\mathbf{y}}\right\| + \left\|\vec{\mathbf{y}}\right\|^2 \end{align}\]

所以：

\[ \left\|\vec{\mathbf{x}} + \vec{\mathbf{y}}\right\|^2 \le (\left\|\vec{\mathbf{x}}\right\| + \left\|\vec{\mathbf{y}}\right\|)^2 \]

两边开平方得到：

\[ \left\|\vec{\mathbf{x}} + \vec{\mathbf{y}}\right\| \le \left\|\vec{\mathbf{x}}\right\| + \left\|\vec{\mathbf{y}}\right\| \]

这就得到了三角不等式。

从几何的角度来说，向量的点积与向量间夹角 \(\theta\) 的余弦有关：\[\vec{\mathbf{a}}\cdot\vec{\mathbf{b}} = \left\|\vec{\mathbf{a}}\right\|\left\|\vec{\mathbf{b}}\right\|cos\theta\]，这意味着向量的点积其实反映了向量 \(\vec{\mathbf{a}}\) 在向量 \(\vec{\mathbf{b}}\) 上的投影，即两个向量在同个方向上的相同程度。当两向量正交时，\(cos\theta\) 的值为0，点积的值为0，投影最小。当两向量平行时，\(cos\theta\) 的值为1，点积值最大，投影也最大。

观察上图，\(L\) 是 \(\vec{\mathbf{v}}\) 向量两端延伸出来的直线，即 \(L={c\vec{\mathbf{v}}|c\in \mathbb{R}}\)。记向量 \(\vec{\mathbf{x}}\) 在 \(L\) 上的投影为 \(Proj_L(\vec{\mathbf{x}})\)。根据点积的性质，可得：

\[ \begin{align} (\vec{\mathbf{x}}-\underbrace { c\vec{\mathbf{v}}}_{ Proj_L({\vec{\mathbf{x}}}) } )\cdot \vec{\mathbf{v}} &= 0 \\\ \vec{\mathbf{x}}\cdot \vec{\mathbf{v}} -c\vec{\mathbf{v}}\cdot \vec{\mathbf{v}} &= 0\\\ c\cdot \vec{\mathbf{v}} \cdot \vec{\mathbf{v}} &= \vec{\mathbf{x}}\cdot \vec{\mathbf{v}}\\\ c &= \frac{\vec{\mathbf{x}}\cdot \vec{\mathbf{v}}}{\vec{\mathbf{v}}\cdot \vec{\mathbf{v}}} \end{align} \]

有了 \(c\)，我们就可以求出投影 \(Proj_L({\vec{\mathbf{x}}})\) 为：

\[Proj_L({\vec{\mathbf{x}}}) = c\vec{\mathbf{v}} = (\frac{\vec{\mathbf{x}}\cdot \vec{\mathbf{v}}}{\vec{\mathbf{v}}\cdot \vec{\mathbf{v}}})\vec{\mathbf{v}}\]

例如，向量 \(\vec{\mathbf{a}} = \begin{bmatrix}1 \\ 2\end{bmatrix}\)，向量 \(\vec{\mathbf{b}} = \begin{bmatrix}1 \\ 1\end{bmatrix}\)，那么 \(\vec{\mathbf{a}}\) 在 \(\vec{\mathbf{b}}\) 方向 \(L\) 上的投影为：

\[Proj_L({\vec{\mathbf{a}}}) = c\vec{\mathbf{b}} = (\frac{\vec{\mathbf{a}}\cdot \vec{\mathbf{b}}}{\vec{\mathbf{b}}\cdot \vec{\mathbf{b}}})\vec{\mathbf{b}} = \frac{3}{2}\vec{\mathbf{b}}\]

Python 示例：

def get_projection(a, b):
    return a.dot(b)*1.0*b/b.dot(b)

a = np.array([1, 2])
b = np.array([2, 2])
print get_projection(a, b)  # [1.5  1.5]

向量外积

向量的（又叫叉乘、向量积、叉积）只在 \(\mathbb{R}^{2}\) 和 \(\mathbb{R}^{3}\) 中定义：

\(\mathbb{R}^{2}\) 的向量外积：

\[\begin{bmatrix} a_1 \\ a_2\end{bmatrix} \times \begin{bmatrix} b_1 \\ b_2 \end{bmatrix} = \begin{bmatrix} a_1 \cdot b_2 - a_2 \cdot b_1\end{bmatrix}\]

例如：

\[\begin{bmatrix} 1 \\ 2 \end{bmatrix} \times \begin{bmatrix} 3 \\ 4 \end{bmatrix} = \begin{bmatrix} 1 \cdot 4 - 3 \cdot 2 \end{bmatrix} = \begin{bmatrix}-2\end{bmatrix}\]

\(\mathbb{R}^{3}\) 的向量外积：

\[\begin{bmatrix} a_1 \\ a_2 \\ a_3\end{bmatrix} \times \begin{bmatrix} b_1 \\ b_2 \\ b_3 \end{bmatrix} = \begin{bmatrix} a_2 \cdot b_3 - a_3 \cdot b_2 \\ a_3 \cdot b_1 - a_1 \cdot b_3 \\ a_1 \cdot b_2 - a_2 \cdot b_1\end{bmatrix}\]

例如：

\[\begin{bmatrix} 3 \\ 5 \\ 2 \end{bmatrix} \times \begin{bmatrix} 1 \\ 4 \\ 7 \end{bmatrix} = \begin{bmatrix} 5 \cdot 7 - 2 \cdot 4 \\ 2 \cdot 1 - 3 \cdot 7 \\ 3 \cdot 4 - 5 \cdot 1\end{bmatrix} = \begin{bmatrix} 27 \\ -19 \\ 7\end{bmatrix} \]

可见向量间外积的结果会得到一个新的向量。

Python 示例：

import numpy as np
a = np.array([3, 5, 2])
b = np.array([1, 4, 7])
print np.cross(a, b)  # [27, -19, 7]

外积的一个重要作用是可以得到一个和 \(\vec{\mathbf{a}}\) 、\(\vec{\mathbf{b}}\) 两个原向量正交的新向量 \(\vec{\mathbf{c}}\) ，且可以通过右手法则来确定新向量的方向（一个简单的确定满足“右手定则”的结果向量的方向的方法是这样的：若坐标系是满足右手定则的，当右手的四指从 \(\vec{\mathbf{a}}\) 以不超过180度的转角转向 \(\vec{\mathbf{b}}\) 时，竖起的大拇指指向是 \(\vec{\mathbf{c}}\) 的方向）。

从几何的角度来说，向量的外积与向量间夹角 \(\theta\) 的正弦有关：\[\left\|\vec{\mathbf{a}}\times\vec{\mathbf{b}}\right\| = \left\|\vec{\mathbf{a}}\right\|\left\|\vec{\mathbf{b}}\right\|sin\theta\]，这意味着向量的外积反映了向量 \(\vec{\mathbf{a}}\) 与向量 \(\vec{\mathbf{b}}\) 的正交程度。当两向量平行时，\(sin\theta\) 的值为0，外积的值为0，正交程度最小。当两向量正交时，\(sin\theta\) 的值为1，外积值最大，正交程度最大。

矩阵向量积

当矩阵 \(\mathbf{A}\) 的列数与向量 \(\vec{\mathbf{x}}\) 的分量数相同时，矩阵和向量的积有定义：

\[\underset{m\times n}{A}\vec{\mathbf{x}}=\begin{bmatrix}a_{11} & a_{12} & \ldots & a_{1n} \\ a_{21} & a_{22} & \ldots & a_{2n} \\ \ldots \\ a_{m1} & a_{m2} & \ldots & a_{mn}\end{bmatrix}\begin{bmatrix}x_1 \\ x_2 \\ \ldots \\ x_n \end{bmatrix} = \begin{bmatrix}a_{11}x_1 + a_{12}x_2 + \ldots + a_{1n}x_n \\ a_{21}x_1 + a_{22}x_2 + \ldots + a_{2n}x_n \\ \ldots \\ a_{m1}x_1 + a_{m2}x_2 + \ldots + a_{mn}x_n \\ \end{bmatrix} \]

例如矩阵 \(\mathbf{A} = \begin{bmatrix}4 & 3 & 1 \\ 1 & 2 & 5\end{bmatrix}\) 乘以向量 \(\vec{\mathbf{x}} = \begin{bmatrix}5 \\ 2 \\ 7\end{bmatrix}\) 的结果为：

\[\begin{bmatrix}4\cdot 5 + 3\cdot 2 + 1\cdot 7 \\ 1 \cdot 5 + 2 \cdot 2 + 5 \cdot 7\end{bmatrix} = \begin{bmatrix}33 \\ 44\end{bmatrix}\]

Python 示例：

1
2
3

a = np.matrix('4 3 1;1 2 5')
x = np.array([[5], [2], [7]])
print a*x  # [[33] [44]]

矩阵的向量积可以当成是矩阵的所有列向量的线性组合：

\[\underset { m\times n }{ \mathbf{A} } \vec { \mathbf{x} } =\begin{bmatrix} \underbrace { \begin{bmatrix} a_{ 11 } \\ a_{ 21 } \\ \ldots \\ a_{ m1 } \end{bmatrix} }_{ \vec { \mathbf{ V }_{ 1 } } } & \underbrace { \begin{bmatrix} a_{ 12 } \\ a_{ 22 } \\\ldots \\ a_{ m2 } \end{bmatrix} }_{ \vec { \mathbf{ V_{ 2 } } } } & \ldots & \underbrace { \begin{bmatrix} a_{ 1n } \\ a_{ 2n } \\ \ldots \\ a_{ mn } \end{bmatrix} }_{ \vec { \mathbf{ V_{ n } } } } \end{bmatrix}\begin{bmatrix} x_{ 1 } \\ x_{ 2 } \\ \ldots \\ x_{ n } \end{bmatrix}=x_1\vec{\mathbf{V}_1}+x_2\vec{\mathbf{V}_2}+\ldots+x_n\vec{\mathbf{V}_n}\]

而向量 \(\vec{\mathbf{x}}\) 的每一个分量可以看成是 \(\mathbf{A}\) 的每个列向量的加权。

一个矩阵其实就是一个线性变换。一个矩阵乘以一个向量后得到的向量，其实就相当于将这个向量进行了线性变换。

向量的转置

向量 \(\vec{\mathbf{V}} = \underbrace{\begin{bmatrix}v_1 \\ v_2 \\ \ldots \\ v_n \end{bmatrix}}_{n\times 1}\) 的转置定义为 \(\vec{\mathbf{V}}^T = \underbrace{\begin{bmatrix}v_1 & v_2 & \ldots & v_n \end{bmatrix}}_{1 \times n}\)

例如向量 \(\vec{\mathbf{A}} = \begin{bmatrix} 2 & 4 \end{bmatrix}\) 的转置就是 \(\vec{\mathbf{A}}^T = \begin{bmatrix} 2 \\ 4\end{bmatrix}\)。

Python 示例：

>>> a = np.array([[2, 4]])
>>> a.T
array([[2],
       [4]])

注意上面声明 a 时用了两对 [] ，以生成一个二维向量。一维的向量转置结果是不会变化的：

1
2
3

>>> b = np.array([2, 4])
>>> b.T
array([2, 4])

向量的转置有一个性质：一个向量 \(\vec{\mathbf{v}}\) 点乘另一个向量 \(\vec{\mathbf{w}}\) ，其结果和向量 \(\vec{\mathbf{v}}\) 转置后和向量 \(\vec{\mathbf{w}}\) 做矩阵乘法相同。即 \(\vec{\mathbf{v}} \cdot \vec{\mathbf{w}} = \vec{\mathbf{v}}^T \vec{\mathbf{w}}\) 。

线性无关

张成空间

一组向量的张成空间说白了就是指这些向量随便线性组合后能够表示多少个向量。记为 \(span(\vec{\mathbf{a}}, \vec{\mathbf{b}})\)。

例如，对于 \(\mathbb{R}^{2}\) 空间中两个不平行的非0向量 \(\vec{\mathbf{a}} = \begin{bmatrix}2 \\ 1\end{bmatrix}\) 和向量 \(\vec{\mathbf{b}} = \begin{bmatrix} 0 \\ 3 \end{bmatrix}\) ，不难发现这两个向量能够表示二维空间中任一其他向量，即 \(span(\vec{\mathbf{a}}, \vec{\mathbf{b}}) = \mathbb{R}^{2}\)。证明如下：

对于 \(\mathbb{R}^{2}\) 中任一向量 \(\begin{bmatrix}x \\y \end{bmatrix}\) ，假设可以由 \(\vec{\mathbf{a}}\) 和 \(\vec{\mathbf{b}}\) 线性组合而成，那么有：

\[ c_1 \begin{bmatrix}2 \\ 1\end{bmatrix} + c_2 \begin{bmatrix} 0 \\ 3 \end{bmatrix} = \begin{bmatrix} x \\ y \end{bmatrix} \]

即：

\[ \left\{ \begin{align} c_1 \cdot 2 & + c_2 \cdot 0 &= x\\\ c_1 \cdot 1 & + c_2 \cdot 3 &= y \end{align} \right. \]

求解该方程得：

\[ \left\{ \begin{align} c_1 &= \frac{x}{2}\\ c_2 &= \frac{y}{3} - \frac{x}{6} \end{align} \right. \]

由于 \(x\)、\(y\) 的值已确定，所以 \(c_1\)、\(c_2\) 的值也必然唯一。

线性相关和线性无关

当一个向量集合里的每个向量都对张成的空间有贡献时，称这个向量集合线性无关。反之称为线性相关。能够表示一个空间的最少向量组合称为空间的基。

听起来有点难理解，其实就是非常简单的道理：假如一个向量集合中存在某个向量能由集合里的其他向量线性组合而成，那这个集合对于张成空间而言就存在多余的向量。此时就是线性相关；反之，假如集合里每一个元素都没法由其他元素组合而成，那么这个集合每个元素都对张成空间有贡献，这个集合就是线性无关的。

例如，对于上述的例子，如果再增加一个向量 \(\vec{\mathbf{c}} = \begin{bmatrix} 5 \\ 2\end{bmatrix}\) ，由于 \(\vec{\mathbf{c}}\) 可以由 \(\vec{\mathbf{a}}\) 和 \(\vec{\mathbf{b}}\) 线性组合而成，由 \(\mathbf{a}\) 、\({\mathbf{b}}\) 和 \({\mathbf{c}}\) 共同张成的空间并没有变化，仍然是 \(\mathbb{R}^{2}\)，因此称集合 \(\left\{\vec{\mathbf{a}}, \vec{\mathbf{b}}, \vec{\mathbf{c}}\right \}\) 线性相关。

判断是否线性相关

一个向量集合 \(s = {v_1, v_2, \ldots, v_n}\) 线性相关的充分必要条件是存在一部分非0系数使得 \(c_1 v_1 + c_2 v_2 + \ldots + c_n v_n = \mathbf{0} = \begin{bmatrix} 0 \\ 0 \\ \ldots \\ 0\end{bmatrix}\) 。

例如有向量 \(\begin{bmatrix}2 \\ 1\end{bmatrix}\) 和 \(\begin{bmatrix}3 \\ 2\end{bmatrix}\)，则可以先写出如下的等式：

\[c_1 \begin{bmatrix}2 \\ 1\end{bmatrix} + c_2 \begin{bmatrix}3 \\ 2\end{bmatrix} = \begin{bmatrix}0 \\ 0\end{bmatrix}\]

容易求解得 \(\begin{bmatrix}c_1 \\ c_2\end{bmatrix} = \begin{bmatrix}0 \\ 0\end{bmatrix}\)，说明两个向量线性无关。也说明这两个向量可以张成 \(\mathbb{R}^{2}\)。

类似地，对于三个 \(\mathbb{R}^{3}\) 中的向量 \(\begin{bmatrix}2 \\ 0 \\ 0\end{bmatrix}\)、\(\begin{bmatrix}0 \\ 1 \\ 0\end{bmatrix}\) 和 \(\begin{bmatrix}0 \\ 0 \\ 7\end{bmatrix}\)，不难判断这三个向量是线性无关的，他们共同张成了 \(\mathbb{R}^3\) 空间。

而对于向量集合 \(\left\{\begin{bmatrix}2 \\ 1\end{bmatrix}, \begin{bmatrix}3 \\ 2\end{bmatrix}, \begin{bmatrix}1 \\ 2 \end{bmatrix}\right\}\) ，不难算出存在非 0 的系数 \(\begin{bmatrix}c_1 \\ c_2 \\ c_3\end{bmatrix} = \begin{bmatrix}-4 \\ 3 \\ -1\end{bmatrix}\) 使得 \(c1 \begin{bmatrix}2 \\ 1\end{bmatrix} + c_2 \begin{bmatrix}3 \\ 2\end{bmatrix} + c_3 \begin{bmatrix}1 \\ 2 \end{bmatrix} = \begin{bmatrix}0 \\ 0\end{bmatrix}\)。因此集合 \(\left\{\begin{bmatrix}2 \\ 1\end{bmatrix}, \begin{bmatrix}3 \\ 2\end{bmatrix}, \begin{bmatrix}1 \\ 2 \end{bmatrix}\right\}\) 线性相关。

下篇文章将进阶讨论线性子空间和特征向量。

TextRank: Bring Order Into Texts ↩︎
\(\mathbb{R}^{n}\) ：表示 n 个有序实数二元组构成的空间。例如 \(\mathbb{R}^2\) 表示有序实数二元组 \((x_1, x_2)\) 构成的空间，即\(\mathbb{R}^n = \left\{ (x_1, \ldots, x_n) | x_1, \ldots, x_n \in \mathbb{R} \right\}\) 。 ↩︎
从历史的角度讲，该不等式应当称为Cauchy-Buniakowsky-Schwarz不等式【柯西-布尼亚科夫斯基-施瓦茨不等式】，因为，正是后两位数学家彼此独立地在积分学中推而广之，才将这一不等式应用到近乎完善的地步。 ↩︎

机器学习的数学基础：向量篇

前言

基本运算

加

减

乘

标量乘向量

向量点积

向量外积

矩阵向量积

向量的转置

线性无关

张成空间

线性相关和线性无关

判断是否线性相关

Comments