课程地址和说明
线性代数实现p4
本系列文章是我学习李沐老师深度学习系列课程的学习笔记,可能会对李沐老师上课没讲到的进行补充。
本节是第四篇,由于CSDN限制,只能被迫拆分
矩阵计算
矩阵的导数运算
向量对向量求导的基本运算规则
已知向量函数 y → = f → ( x → ) \overrightarrow y=\overrightarrow {f}(\overrightarrow x) y=f(x)与向量 x → = [ x 1 x 2 ⋮ x m ] m × 1 \overrightarrow x=\begin{bmatrix} x_{1}\\ x_{2}\\ \vdots \\ x_{m} \end{bmatrix}_{m\times 1} x= x1x2⋮xm m×1
- 当 y → = a → \overrightarrow y=\overrightarrow a y=a,且 a → \overrightarrow a a不是 x → \overrightarrow x x的函数(即 a → \overrightarrow a a中没有分量和 x → \overrightarrow x x相关)时,则有:
∂ y → ∂ x → = [ ∂ f ( x → ) ∂ x 1 ∂ f ( x → ) ∂ x 2 ⋮ ∂ f ( x → ) ∂ x m ] = [ 0 0 ⋮ 0 ] = 0 → \frac{\partial {\overrightarrow y}}{\partial\overrightarrow x}= \begin{bmatrix} \frac{\partial {{f}(\overrightarrow x)}}{\partial {x_{1}}}\\ \frac{\partial {{f}(\overrightarrow x)}}{\partial {x_{2}}}\\ \vdots \\ \frac{\partial {{f}(\overrightarrow x)}}{\partial {x_{m}}} \end{bmatrix}=\begin{bmatrix} 0\\ 0\\ \vdots \\ 0 \end{bmatrix}=\overrightarrow 0 ∂x∂y= ∂x1∂f(x)∂x2∂f(x)⋮∂xm∂f(x) = 00⋮0 =0 - 当 y → = x → \overrightarrow y=\overrightarrow x y=x时,即 y → = [ f 1 ( x → ) f 2 ( x → ) ⋮ f m ( x → ) ] = [ x 1 x 2 ⋮ x m ] \overrightarrow y=\begin{bmatrix} f_{1}(\overrightarrow x) \\ f_{2}(\overrightarrow x) \\ \vdots \\ f_{m}(\overrightarrow x) \end{bmatrix}=\begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{m} \end{bmatrix} y= f1(x)f2(x)⋮fm(x) = x1x2⋮xm ,则有:
∂ y → ∂ x → = [ ∂ f ( x → ) ∂ x 1 ∂ f ( x → ) ∂ x 2 ⋮ ∂ f ( x → ) ∂ x m ] = [ ∂ f 1 ( x → ) ∂ x 1 ∂ f 2 ( x → ) ∂ x 1 … ∂ f n ( x → ) ∂ x 1 ∂ f 1 ( x → ) ∂ x 2 ∂ f 2 ( x → ) ∂ x 2 … ∂ f n ( x → ) ∂ x 2 ⋮ ⋮ ⋱ ⋮ ∂ f 1 ( x → ) ∂ x m ∂ f 2 ( x → ) ∂ x m … ∂ f n ( x → ) ∂ x m ] m × n = [ 1 0 … 0 0 1 … 0 ⋮ ⋮ ⋱ ⋮ 0 0 … 1 ] = I 或 E (单位矩阵的两种不同记号,含义一致) \frac{\partial {\overrightarrow y}}{\partial\overrightarrow x}= \begin{bmatrix} \frac{\partial {{f}(\overrightarrow x)}}{\partial {x_{1}}}\\ \frac{\partial {{f}(\overrightarrow x)}}{\partial {x_{2}}}\\ \vdots \\ \frac{\partial {{f}(\overrightarrow x)}}{\partial {x_{m}}} \end{bmatrix}=\begin{bmatrix} \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{1}}}& \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{1}}} & \dots &\frac{\partial {{f_{n}}(\overrightarrow x)}}{\partial {x_{1}}} \\ \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{2}}}& \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{2}}} & \dots &\frac{\partial {{f_{n}}(\overrightarrow x)}}{\partial {x_{2}}} \\ \vdots & \vdots & \ddots &\vdots \\ \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{m}}}& \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{m}}} & \dots &\frac{\partial {{f_{n}}(\overrightarrow x)}}{\partial {x_{m}}} \end{bmatrix}_{m\times n}=\begin{bmatrix} 1& 0&\dots &0 \\ 0& 1& \dots &0 \\ \vdots & \vdots & \ddots &\vdots \\ 0 & 0& \dots &1 \end{bmatrix}=\bm{I}或\bm{E}(单位矩阵的两种不同记号,含义一致) ∂x∂y= ∂x1∂f(x)∂x2∂f(x)⋮∂xm∂f(x) = ∂x1∂f1(x)∂x2∂f1(x)⋮∂xm∂f1(x)∂x1∂f2(x)∂x2∂f2(x)⋮∂xm∂f2(x)……⋱…∂x1∂fn(x)∂x2∂fn(x)⋮∂xm∂fn(x) m×n= 10⋮001⋮0……⋱…00⋮1 =I或E(单位矩阵的两种不同记号,含义一致) - 当 y → = A x → \overrightarrow y=\bm{A}\overrightarrow {x} y=Ax, A = [ a 11 a 12 ⋯ a 1 m a 21 a 22 ⋯ a 2 m ⋮ ⋮ ⋱ ⋮ a m 1 a m 2 ⋯ a m m ] \bm{A}=\begin{bmatrix} a_{11}&a_{12} & \cdots & a_{1m}\\ a_{21}&a_{22} & \cdots & a_{2m} \\ \vdots & \vdots & \ddots &\vdots \\ a_{m1}&a_{m2} & \cdots & a_{mm} \end{bmatrix} A= a11a21⋮am1a12a22⋮am2⋯⋯⋱⋯a1ma2m⋮amm ,则有:
∂ y → ∂ x → = ∂ A x → ∂ x → = A T (按分母布局) \frac{\partial {\overrightarrow y}}{\partial\overrightarrow x}=\frac{\partial {\bm{A}\overrightarrow x}}{\partial {\overrightarrow x}} =\bm{A}^{T}(按分母布局) ∂x∂y=∂x∂Ax=AT(按分母布局)
∂ y → ∂ x → = ∂ A x → ∂ x → = A (按分子布局) \frac{\partial {\overrightarrow y}}{\partial\overrightarrow x}=\frac{\partial {\bm{A}\overrightarrow x}}{\partial {\overrightarrow x}} =\bm{A}(按分子布局) ∂x∂y=∂x∂Ax=A(按分子布局)
(证明见本节第三篇) - 当 y → = x → T A \overrightarrow y=\overrightarrow {x}^{T}\bm{A} y=xTA, A = [ a 11 a 12 ⋯ a 1 m a 21 a 22 ⋯ a 2 m ⋮ ⋮ ⋱ ⋮ a m 1 a m 2 ⋯ a m m ] \bm{A}=\begin{bmatrix} a_{11}&a_{12} & \cdots & a_{1m}\\ a_{21}&a_{22} & \cdots & a_{2m} \\ \vdots & \vdots & \ddots &\vdots \\ a_{m1}&a_{m2} & \cdots & a_{mm} \end{bmatrix} A= a11a21⋮am1a12a22⋮am2⋯⋯⋱⋯a1ma2m⋮amm ,
y → = x → T A = [ x 1 , x 2 , … , x m ] ⋅ [ a 11 a 12 ⋯ a 1 m a 21 a 22 ⋯ a 2 m ⋮ ⋮ ⋱ ⋮ a m 1 a m 2 ⋯ a m m ] = [ a 11 x 1 + a 21 x 2 + ⋯ + a m 1 x m , a 12 x 1 + a 22 x 2 + ⋯ + a m 2 x m , … , a 1 m x 1 + a 2 m x 2 + ⋯ + a m m x m ] \overrightarrow y=\overrightarrow {x}^{T}\bm{A}=\begin{bmatrix} x_{1}, & x_{2} ,& \dots ,& x_{m} \end{bmatrix}\cdot \begin{bmatrix} a_{11}&a_{12} & \cdots & a_{1m}\\ a_{21}&a_{22} & \cdots & a_{2m} \\ \vdots & \vdots & \ddots &\vdots \\ a_{m1}&a_{m2} & \cdots & a_{mm} \end{bmatrix}=\begin{bmatrix} a_{11}x_{1}+a_{21}x_{2}+\dots +a_{m1}x_{m}, & a_{12}x_{1}+a_{22}x_{2}+\dots +a_{m2}x_{m} ,& \dots ,& a_{1m}x_{1}+a_{2m}x_{2}+\dots +a_{mm}x_{m} \end{bmatrix} y=xTA=[x1,x2,…,xm]⋅ a11a21⋮am1a12a22⋮am2⋯⋯⋱⋯a1ma2m⋮amm =[a11x1+a21x2+⋯+am1xm,a12x1+a22x2+⋯+am2xm,…,a1mx1+a2mx2+⋯+ammxm],所以按一一对应法则只能理解成(这里行向量列向量混用了,没办法) y → = [ f 1 ( x → ) f 2 ( x → ) ⋮ f m ( x → ) ] = [ a 11 x 1 + a 21 x 2 + ⋯ + a m 1 x m a 12 x 1 + a 22 x 2 + ⋯ + a m 2 x m ⋮ a 1 m x 1 + a 2 m x 2 + ⋯ + a m m x m ] \overrightarrow y=\begin{bmatrix} f_{1}(\overrightarrow x) \\ f_{2}(\overrightarrow x) \\ \vdots \\ f_{m}(\overrightarrow x) \end{bmatrix}=\begin{bmatrix} a_{11}x_{1}+a_{21}x_{2}+\dots +a_{m1}x_{m}\\ a_{12}x_{1}+a_{22}x_{2}+\dots +a_{m2}x_{m}\\ \vdots \\ a_{1m}x_{1}+a_{2m}x_{2}+\dots +a_{mm}x_{m} \end{bmatrix} y= f1(x)f2(x)⋮fm(x) = a11x1+a21x2+⋯+am1xma12x1+a22x2+⋯+am2xm⋮a1mx1+a2mx2+⋯+ammxm ,则有:
∂ y → ∂ x → = ∂ x → T A ∂ x → = [ ∂ f 1 ( x → ) ∂ x 1 ∂ f 2 ( x → ) ∂ x 1 … ∂ f n ( x → ) ∂ x 1 ∂ f 1 ( x → ) ∂ x 2 ∂ f 2 ( x → ) ∂ x 2 … ∂ f n ( x → ) ∂ x 2 ⋮ ⋮ ⋱ ⋮ ∂ f 1 ( x → ) ∂ x m ∂ f 2 ( x → ) ∂ x m … ∂ f n ( x → ) ∂ x m ] = [ a 11 a 21 … a m 1 a 12 a 22 … a m 2 ⋮ ⋮ ⋱ ⋮ a 1 m a 2 m … a m m ] = A T \frac{\partial {\overrightarrow y}}{\partial\overrightarrow x}=\frac{\partial {\overrightarrow {x}^{T}\bm{A}}}{\partial {\overrightarrow x}} =\begin{bmatrix} \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{1}}}& \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{1}}} & \dots &\frac{\partial {{f_{n}}(\overrightarrow x)}}{\partial {x_{1}}} \\ \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{2}}}& \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{2}}} & \dots &\frac{\partial {{f_{n}}(\overrightarrow x)}}{\partial {x_{2}}} \\ \vdots & \vdots & \ddots &\vdots \\ \frac{\partial {{f_{1}}(\overrightarrow x)}}{\partial {x_{m}}}& \frac{\partial {{f_{2}}(\overrightarrow x)}}{\partial {x_{m}}} & \dots &\frac{\partial {{f_{n}}(\overrightarrow x)}}{\partial {x_{m}}} \end{bmatrix}=\begin{bmatrix} a_{11}& a_{21}&\dots &a_{m1} \\ a_{12}& a_{22}& \dots &a_{m2} \\ \vdots & \vdots & \ddots &\vdots \\ a_{1m}& a_{2m}& \dots &a_{mm} \end{bmatrix}=\bm{A}^{T} ∂x∂y=∂x∂xTA= ∂x1∂f1(x)∂x2∂f1(x)⋮∂xm∂f1(x)∂x1∂f2(x)∂x2∂f2(x)⋮∂xm∂f2(x)……⋱…∂x1∂fn(x)∂x2∂fn(x)⋮∂xm∂fn(x) = a11a12⋮a1ma21a22⋮a2m……⋱…am1am2⋮amm =AT - 当 y → = a u → \overrightarrow y=a\overrightarrow u y=au, a a a是任意常数, u → = u → ( x → ) \overrightarrow u=\overrightarrow {u}(\overrightarrow x) u=u(x),则有:
∂ y → ∂ x → = a ∂ u → ∂ x → = \frac{\partial {\overrightarrow y}}{\partial\overrightarrow x}=a\frac{\partial {\overrightarrow u}}{\partial\overrightarrow x}= ∂x∂y=a∂x∂u= - 当 y → = A u → \overrightarrow y=\bm{A}\overrightarrow u y=Au, u → = u → ( x → ) \overrightarrow u=\overrightarrow {u}(\overrightarrow x) u=u(x), A \bm{A} A中的元素与 x → \overrightarrow x x中的元素无关系,则有:
∂ y → ∂ x → = A ∂ u → ∂ x → = \frac{\partial {\overrightarrow y}}{\partial\overrightarrow x}=\bm{A}\frac{\partial {\overrightarrow u}}{\partial\overrightarrow x}= ∂x∂y=A∂x∂u= - 当 y → = u → + v → \overrightarrow y=\overrightarrow u+\overrightarrow v y=u+v时, u → = u → ( x → ) , v → = v → ( x → ) \overrightarrow u = \overrightarrow {u}(\overrightarrow x),\overrightarrow v = \overrightarrow {v}(\overrightarrow x) u=u(x),v=v(x),则有:
∂ y → ∂ x → = ∂ u → ∂ x → + ∂ v → ∂ x → = \frac{\partial {\overrightarrow y}}{\partial\overrightarrow x}=\frac{\partial {\overrightarrow u}}{\partial\overrightarrow x}+\frac{\partial {\overrightarrow v}}{\partial\overrightarrow x}= ∂x∂y=∂x∂u+∂x∂v=
拓展到矩阵
就是升维度,升到了四维空间,矩阵可以相当于四维空间里的向量,反正挺难懂的,我看个乐hhhhhhhh