微积分笔记04:常见的矩阵求导运算
4.1 常规矩阵求导示例
4.1.1 求导示例1:\(f(x)=A_{m\times n}\cdot x_{n \times 1}\) \(\Rightarrow f'_{x^T}(x)=A_{m\times n}\)
如:
\[A=
\begin{bmatrix}
a_1&a_2&a_3\\
b_1&b_2&b_3
\end{bmatrix},
x=
\begin{bmatrix}
x_1\\
x_2\\
x_3
\end{bmatrix}
\Rightarrow
f(x)=
\begin{bmatrix}
a_1x_1+a_2x_2+a_3x_3\\
b_1x_1+b_2x_2+b_3x_3
\end{bmatrix}
\]
由矩阵性质和意义(参数项直接保留在矩阵中)可得:
\[\tag{1}
f'_{x^T}(x)=
\begin{bmatrix}
a_1&a_2&a_3\\
b_1&b_2&b_3
\end{bmatrix}=A
\]
4.1.2 求导示例2:\(f(x)= x_{1 \times m}\cdot A_{mm} \cdot x^T_{1 \times m} \Rightarrow f'_x(x)=(A_{mm}+A_{mm}^T)\cdot x_{1 \times m}\)
如:
\[x=
\begin{bmatrix}
x_1&x_2
\end{bmatrix},
A=
\begin{bmatrix}
a&b\\
c&d
\end{bmatrix},
x^T=
\begin{bmatrix}
x_1\\
x_2
\end{bmatrix}
\]
\[\Rightarrow f(x)=
\begin{bmatrix}
ax_1+cx_2&bx_1+dx_2
\end{bmatrix}
\cdot
\begin{bmatrix}
x_1\\
x_2
\end{bmatrix}
\]
\[\qquad\quad
=
\begin{bmatrix}
a{x_1}^2+bx_1x_2+cx_1x_2+dx_2^2
\end{bmatrix}
\]
则有:
\[f'_x(x)=
\begin{bmatrix}
2ax_1+bx_2+cx_2&2dx_2+bx_1+cx_1
\end{bmatrix}
\]
\[\tag{2}
=
\begin{bmatrix}
a&b\\
c&d
\end{bmatrix}
\cdot
\begin{bmatrix}
x_1&x_2
\end{bmatrix}
+
\begin{bmatrix}
a&c\\
b&d
\end{bmatrix}
\cdot
\begin{bmatrix}
x_1&x_2
\end{bmatrix}
=(A+A^T)x
\]
4.1.3 求导示例3:\(f(x)=x_{1\times n}^T\cdot a_{n \times 1} \Rightarrow f_x'(x)=(x_{1\times n}\cdot a_{n \times 1}^T)'_x=a\)
如:
\[x^T=
\begin{bmatrix}
x_1&x_2
\end{bmatrix},
a=
\begin{bmatrix}
a_1\\
a_2
\end{bmatrix}
\]
\[\Rightarrow
f(x)=
x^T\cdot a=
\begin{bmatrix}
x_1a_1+x_2a_2
\end{bmatrix}
=x\cdot a^T
\]
又:
\[x=
\begin{bmatrix}
x_1\\
x_2
\end{bmatrix}
\]
则由矩阵的性质及意义(参数项直接保留在矩阵中),有:
\[\tag{3}
f'_x(x)=
(x\cdot a^T)_x'
=
\begin{bmatrix}
a_1\\
a_2
\end{bmatrix}
=a
\]
4.1.4 求导示例4:\(f(x)=x_{m\times 1}^T\cdot A_{m \times n}\cdot y_{n \times 1} \Rightarrow f_x'(x)=Ay,f'_A(x)=xy^T\)
如:
\[x^T=
\begin{bmatrix}
x_1&x_2&x_3
\end{bmatrix},
A=
\begin{bmatrix}
a_1&a_2\\
a_3&a_4\\
a_5&a_6
\end{bmatrix},
y=
\begin{bmatrix}
y_1\\
y_2\\
\end{bmatrix}
\]
\[\Rightarrow
f(x) =x^T\cdot A\cdot y=
\begin{bmatrix}
a_1x_1+a_3x_2+a_5x_3&a_2x_1+a_4x_2+a_6x_3\\
\end{bmatrix}
\cdot
\begin{bmatrix}
y_1\\
y_2\\
\end{bmatrix}
\]
\[\qquad\qquad\qquad\qquad\qquad\quad
=
\begin{bmatrix}
(a_1x_1+a_3x_2+a_5x_3)\cdot y_1+(a_2x_1+a_4x_2+a_6x_3)\cdot y_2
\end{bmatrix}
\]
则有:
\[f'_x(x)=
\begin{bmatrix}
(a_1+a_3+a_5)\cdot y_1+(a_2+a_4+a_6)\cdot y_2
\end{bmatrix}
=A \cdot y
\]
\[\tag{4}
f'_A(x)=
\begin{bmatrix}
(x_1)\cdot y_1+(x_1)\cdot y_2\\
(x_2)\cdot y_1+(x_2)\cdot y_2\\
(x_3)\cdot y_1+(x_3)\cdot y_2
\end{bmatrix}
=x\cdot y^T
\]
4.2 矩阵的范数求导示例
设存在矩阵\(X_{N \times n},向量a_{n \times 1},y_{N \times 1}\)
设\(f(x)=||X\cdot a-y||^2\),则\(f'_a(x)\)的求解过程如下:
由范数相关性质可得:
\[f(x)=(X\cdot a-y)\cdot (X\cdot a-y)^T
\]
\[\qquad \qquad
=(X\cdot a-y)\cdot (a^T\cdot X^T -y^T)
\]
\[\tag{5}
\qquad \qquad\qquad\qquad\qquad\quad
=a\cdot X X^T \cdot a^T -X\cdot a\cdot y^T-y\cdot a^T \cdot X^T + yy^T
\]
式(5)中:
对于项\(a\cdot X X^T \cdot a^T\),由常规矩阵求导的式(2)可得:
\[(a\cdot X X^T \cdot a^T)'_a=(XX^T+X^TX)\cdot a=2XX^T\cdot a
\]
对于项\(X\cdot a\cdot y^T\),由常规矩阵求导的式(3)可得:
\[(X\cdot a\cdot y^T)_a'=(y^T\cdot X\cdot a )_a'=[(X^T\cdot y )^T\cdot a] _a'=X^T\cdot y
\]
对于项\(y\cdot a^T \cdot X^T\):
\[(y\cdot a^T \cdot X^T)'_a=(a^T\cdot X^T\cdot y)'_a=X^T\cdot y
\]
由上可得:
\[f'_a(x)=(||X\cdot a-y||^2)_a'=2(XX^T\cdot a-X^T\cdot y)
\]
4.3 矩阵的迹求导示例
4.3.1 矩阵的迹求导示例1:\(tr'_A(A)=I\)
设存在矩阵\(A_{mm}\),且\(tr(A)\)为矩阵\(A\)的迹,则有:
\[tr(A)=\Sigma_{i=1}^m a_{ii}
\]
由矩阵的特性和意义(参数项直接保留在矩阵中)可得:
\[\tag{6}
\Rightarrow
tr(A)'_A=I=
\begin {bmatrix}
1&&&\\
&1&&\\
&&...&\\
&&&1\\
\end{bmatrix}
\]
4.3.2 矩阵的迹求导示例2:\(tr'_A(A\cdot B)=B^T\)
设存在矩阵\(A_{mm}、B_{mm}\),且\(tr(A\cdot B)\)为\(A\cdot B\)的迹,则有:
\[tr(A\cdot B)=\Sigma_{i=1}^m\Sigma_{j=1}^m a_{ij}b_{ji}
\]
由矩阵的特性和意义(参数项直接保留在矩阵中)可得:
\[\tag{7}
tr'_A(A\cdot B)=(\Sigma_{i=1}^m\Sigma_{j=1}^m a_{ij}b_{ji})'_A=B^T
\]
4.3.3 矩阵的迹求导示例3:\(tr'_A(A\cdot A^T)=2\cdot A\)
设存在矩阵\(A_{mm}\),且\(tr(A\cdot A^T)\)为\(A\cdot A^T\)的迹,则有:
\[tr(A\cdot A^T)=\Sigma_{i=1}^m\Sigma_{j=1}^m a_{ij}a_{ji}=\Sigma_{i=1}^m\Sigma_{j=1}^m a^2_{ij}
\]
由矩阵的特性和意义(参数项直接保留在矩阵中)可得:
\[\tag{8}
tr'_A(A\cdot A^T)=(\Sigma_{i=1}^m\Sigma_{j=1}^m a^2_{ij})'_A=(A^2)'_A=2\cdot A
\]
4.4 行列式求导示例:\(|A|'_A=|A|\cdot (A^{-1})^T\)
设存在矩阵\(A_{mm}\),\(|A|\)是A的行列式,\(a_{ij}\)是A中任一元素,\(A_{ij}\)是\(a_{ij}\)的代数余子式
则有:
\[|A|=a_{i1}A_{i1}+a_{i2}A_{i2}+...+a_{im}A_{im}
\]
\[\Rightarrow |A|'_A=(a_{i1}A_{i1}+a_{i2}A_{i2}+...+a_{im}A_{im})'_A
\]
\[\qquad\qquad\qquad\qquad
=
\begin {bmatrix}
(a_{11}A_{11}+a_{12}A_{12}+...+a_{1m}A_{1m})'_A\\
(a_{21}A_{21}+a_{22}A_{22}+...+a_{2m}A_{2m})'_A\\
......\\
(a_{m1}A_{m1}+a_{m2}A_{m2}+...+a_{mm}A_{mm})'_A
\end {bmatrix}
\]
\[\tag{9}
\qquad\qquad\quad
=
\begin {bmatrix}
A_{11}&A_{12}&...&A_{1m}\\
A_{21}&A_{22}&...&A_{2m}\\
&&......&\\
A_{m1}&A_{m2}&...&A_{mm}\\
\end {bmatrix}
=A^{*T}
\]
由矩阵的逆相关性质\(A^{-1}=\frac{A^*}{|A|}\)可得:
\[\tag{10}
|A|'_A=|A|\cdot (A^{-1})^T
\]