做女朋友的網(wǎng)站qq群引流推廣軟件
文章目錄
- 1. ML中為什么需要矩陣求導(dǎo)
- 2. 向量函數(shù)與矩陣求導(dǎo)初印象
- 3. YX 拉伸術(shù)
- 3.1 f(x)為標(biāo)量,X為列向量
- 3.2 f(x)為列向量,X 為標(biāo)量
- 3.3 f(x)為列向量,X 為列向量
- 4. 常見(jiàn)矩陣求導(dǎo)公式
1. ML中為什么需要矩陣求導(dǎo)
-
簡(jiǎn)潔
用方程式表示如下:
y 1 = w 1 X 11 + w 2 X 12 (1) y_1=w_1X_{11}+w_2X_{12}\tag{1} y1?=w1?X11?+w2?X12?(1)
y 2 = w 1 X 21 + w 2 X 22 (2) y_2=w_1X_{21}+w_2X_{22}\tag{2} y2?=w1?X21?+w2?X22?(2)
轉(zhuǎn)換成矩陣表示如下:
Y = X W (3) Y=XW\tag{3} Y=XW(3)
Y = [ y 1 y 2 ] , X = [ x 11 x 12 x 21 x 22 ] , W = [ w 1 w 2 ] (4) Y=\begin{bmatrix}y_1\\\\y_2\end{bmatrix},X=\begin{bmatrix}x_{11}&&x_{12}\\\\x_{21}&&x_{22}\end{bmatrix},W=\begin{bmatrix}w_{1}\\\\w_{2}\end{bmatrix}\tag{4} Y= ?y1?y2?? ?,X= ?x11?x21???x12?x22?? ?,W= ?w1?w2?? ?(4) -
快速
當(dāng)使用python 中的numpy 庫(kù)時(shí)候,在相對(duì)于 for 循環(huán),Numpy 本身的計(jì)算提速相當(dāng)快 -
源代碼
import time
import numpy as npif __name__ == "__main__":N = 1000000a = np.random.rand(N)b = np.random.rand(N)start = time.time()c = np.dot(a,b)stop = time.time()print(f"c={c}")print("vectorized version: " + str(1000*(stop-start))+"ms")c = 0start1 = time.time()for i in range(N):c += a[i]*b[i]stop1 = time.time()print(f"c={c}")print("for loop: " + str(1000*(stop1-start1))+"ms")times1 = (stop1-start1)/(stop-start)print(f"times1={times1}")
- 結(jié)果
c=250071.8870070607
vectorized version: 6.549358367919922ms
c=250071.88700706122
for loop: 265.43641090393066ms
times1=40.52861303239898# 向量化居然比單獨(dú)的for循環(huán)快40倍
2. 向量函數(shù)與矩陣求導(dǎo)初印象
- 標(biāo)量函數(shù):輸出為標(biāo)量的函數(shù)
f ( x ) = x 2 ? x ∈ R → x 2 ∈ R f(x)=x^2\Rightarrow x\in R\rightarrow x^2 \in R f(x)=x2?x∈R→x2∈R
f ( x ) = x 1 2 + x 2 2 ? [ x 1 x 2 ] ∈ R 2 → x 1 2 + x 2 2 ∈ R f(x)=x_1^2+x_2^2\Rightarrow \begin{bmatrix}x_1\\\\x_2\end{bmatrix}\in R^2\rightarrow x_1^2+x_2^2 \in R f(x)=x12?+x22?? ?x1?x2?? ?∈R2→x12?+x22?∈R - 向量函數(shù):輸出為向量或矩陣的函數(shù)
<1> 輸入標(biāo)量,輸出向量
f ( x ) = [ f 1 ( x ) = x f 2 ( x ) = x 2 ] ? x ∈ R , [ x x 2 ] ∈ R 2 f(x)=\begin{bmatrix}f_1(x)=x\\\\f_2(x)=x^2\end{bmatrix}\Rightarrow x\in R,\begin{bmatrix}x\\\\x^2\end{bmatrix} \in R^2 f(x)= ?f1?(x)=xf2?(x)=x2? ??x∈R, ?xx2? ?∈R2
<2> 輸入標(biāo)量,輸出矩陣
f ( x ) = [ f 11 ( x ) = x f 12 ( x ) = x 2 f 21 ( x ) = x 3 f 22 ( x ) = x 4 ] ? x ∈ R , [ x x 2 x 3 x 4 ] ∈ R 2 × 2 f(x)=\begin{bmatrix}f_{11}(x)=x&&f_{12}(x)=x^2\\\\f_{21}(x)=x^3&&f_{22}(x)=x^4\end{bmatrix}\Rightarrow x\in R,\begin{bmatrix}x&&x^2\\\\x^3&&x^4\end{bmatrix} \in R^{2\times2} f(x)= ?f11?(x)=xf21?(x)=x3??f12?(x)=x2f22?(x)=x4? ??x∈R, ?xx3??x2x4? ?∈R2×2
<3> 輸入向量,輸出矩陣
f ( x ) = [ f 11 ( x ) = x 1 + x 2 f 12 ( x ) = x 1 2 + x 2 2 f 21 ( x ) = x 1 3 + x 2 3 f 22 ( x ) = x 1 4 + x 2 4 ] ? [ x 1 x 2 ] ∈ R 2 , [ x 1 + x 2 x 1 2 + x 2 2 x 1 3 + x 2 3 x 1 4 + x 2 4 ] ∈ R 2 × 2 f(x)=\begin{bmatrix}f_{11}(x)=x_1+x_2&&f_{12}(x)=x_1^2+x_2^2\\\\f_{21}(x)=x_1^3+x_2^3&&f_{22}(x)=x_1^4+x_2^4\end{bmatrix}\Rightarrow \begin{bmatrix}x_1\\\\x_2\end{bmatrix} \in R^2,\begin{bmatrix}x_1+x_2&&x_1^2+x_2^2\\\\x_1^3+x_2^3&&x_1^4+x_2^4\end{bmatrix} \in R^{2\times2} f(x)= ?f11?(x)=x1?+x2?f21?(x)=x13?+x23???f12?(x)=x12?+x22?f22?(x)=x14?+x24?? ?? ?x1?x2?? ?∈R2, ?x1?+x2?x13?+x23???x12?+x22?x14?+x24?? ?∈R2×2 - 總結(jié)
矩陣求導(dǎo)的本質(zhì)
d A d B = 矩陣 A 中的每個(gè)元素對(duì)矩陣 B 中的每個(gè)元素求導(dǎo) \frac{\mathrmvxwlu0yf4A}{\mathrmvxwlu0yf4B}=矩陣A中的每個(gè)元素對(duì)矩陣B中的每個(gè)元素求導(dǎo) dBdA?=矩陣A中的每個(gè)元素對(duì)矩陣B中的每個(gè)元素求導(dǎo)
3. YX 拉伸術(shù)
3.1 f(x)為標(biāo)量,X為列向量
- 標(biāo)量不變,向量拉伸
- YX中,Y前面橫向拉,X后面縱向拉
d f ( x ) d x , Y = f ( x ) 為標(biāo)量, X = [ x 1 x 2 ? x n ] 為列向量 \frac{\mathrmvxwlu0yf4f(x)}{\mathrmvxwlu0yf4x},Y=f(x)為標(biāo)量,X=\begin{bmatrix}x_1\\\\x_2\\\\\vdots\\\\x_n\end{bmatrix}為列向量 dxdf(x)?,Y=f(x)為標(biāo)量,X= ?x1?x2??xn?? ?為列向量
f ( x ) = f ( x 1 , x 2 , . . . . , x n ) 為標(biāo)量 f(x)=f(x_1,x_2,....,x_n)為標(biāo)量 f(x)=f(x1?,x2?,....,xn?)為標(biāo)量 - 標(biāo)量 f ( x ) f(x) f(x)不變,向量X 因?yàn)樵赮X拉伸術(shù)中在Y后面,所以向量X縱向拉伸,實(shí)際上就是將多元函數(shù)的偏導(dǎo)寫(xiě)在一個(gè)列向量中
d f ( x ) d x = [ ? f ( x ) ? x 1 ? f ( x ) ? x 2 ? ? f ( x ) ? x n ] \frac{\mathrmvxwlu0yf4f(x)}{\mathrmvxwlu0yf4x}=\begin{bmatrix}\frac{\partial f(x)}{\partial x_1}\\\\\frac{\partial f(x)}{\partial x_2}\\\\\vdots\\\\\frac{\partial f(x)}{\partial x_n}\end{bmatrix} dxdf(x)?= ??x1??f(x)??x2??f(x)???xn??f(x)?? ?
3.2 f(x)為列向量,X 為標(biāo)量
f ( x ) = [ f 1 ( x ) f 2 ( x ) ? f n ( x ) ] ; X 為標(biāo)量 f(x)=\begin{bmatrix}f_1(x)\\\\f_2(x)\\\\\vdots\\\\f_n(x)\end{bmatrix};X 為標(biāo)量 f(x)= ?f1?(x)f2?(x)?fn?(x)? ?;X為標(biāo)量
- 標(biāo)量不變,向量拉伸
- YX中,Y前面橫向拉,X后面縱向拉
d f ( x ) d x = [ ? f 1 ( x ) ? x ? f 2 ( x ) ? x … ? f n ( x ) ? x ] \frac{\mathrmvxwlu0yf4f(x)}{\mathrmvxwlu0yf4x}=\begin{bmatrix}\frac{\partial f_1(x)}{\partial x}&&\frac{\partial f_2(x)}{\partial x}&&\dots&&\frac{\partial f_n(x)}{\partial x}\end{bmatrix} dxdf(x)?=[?x?f1?(x)????x?f2?(x)???…???x?fn?(x)??]
3.3 f(x)為列向量,X 為列向量
f ( x ) = [ f 1 ( x ) f 2 ( x ) ? f n ( x ) ] ; X = [ x 1 x 2 ? x n ] 為列向量 f(x)=\begin{bmatrix}f_1(x)\\\\f_2(x)\\\\\vdots\\\\f_n(x)\end{bmatrix};X=\begin{bmatrix}x_1\\\\x_2\\\\\vdots\\\\x_n\end{bmatrix}為列向量 f(x)= ?f1?(x)f2?(x)?fn?(x)? ?;X= ?x1?x2??xn?? ?為列向量
- 第一步先固定Y ,將 X 縱向拉
d f ( x ) d x = [ ? f ( x ) ? x 1 ? f ( x ) ? x 2 ? ? f ( x ) ? x n ] \frac{\mathrmvxwlu0yf4f(x)}{\mathrmvxwlu0yf4x}=\begin{bmatrix}\frac{\partial f(x)}{\partial x_1}\\\\\frac{\partial f(x)}{\partial x_2}\\\\\vdots\\\\\frac{\partial f(x)}{\partial x_n}\end{bmatrix} dxdf(x)?= ??x1??f(x)??x2??f(x)???xn??f(x)?? ? - 第二步,看每一個(gè)項(xiàng) ? f ( x ) ? x 1 \frac{\partial f(x)}{\partial x_1} ?x1??f(x)?,其中f(x)為列向量, x 1 x_1 x1?為標(biāo)量,那么可以看出要進(jìn)行 Y 橫向拉
? f ( x ) ? x 1 = [ ? f 1 ( x ) ? x 1 ? f 2 ( x ) ? x 1 … ? f n ( x ) ? x 1 ] \frac{\partial f(x)}{\partial x_1}=\begin{bmatrix}\frac{\partial f_1(x)}{\partial x_1}&&\frac{\partial f_2(x)}{\partial x_1}&&\dots&&\frac{\partial f_n(x)}{\partial x_1}\end{bmatrix} ?x1??f(x)?=[?x1??f1?(x)????x1??f2?(x)???…???x1??fn?(x)??] - 第三步 ,將每項(xiàng)整合如下
d f ( x ) d x = [ ? f 1 ( x ) ? x 1 ? f 2 ( x ) ? x 1 … ? f n ( x ) ? x 1 ? f 1 ( x ) ? x 2 ? f 2 ( x ) ? x 2 … ? f n ( x ) ? x 2 ? ? … ? ? f 1 ( x ) ? x n ? f 2 ( x ) ? x n … ? f n ( x ) ? x n ] \frac{\mathrmvxwlu0yf4f(x)}{\mathrmvxwlu0yf4x}=\begin{bmatrix}\frac{\partial f_1(x)}{\partial x_1}&&\frac{\partial f_2(x)}{\partial x_1}&&\dots&&\frac{\partial f_n(x)}{\partial x_1}\\\\\frac{\partial f_1(x)}{\partial x_2}&&\frac{\partial f_2(x)}{\partial x_2}&&\dots&&\frac{\partial f_n(x)}{\partial x_2}\\\\\vdots&&\vdots&&\dots&&\vdots\\\\\frac{\partial f_1(x)}{\partial x_n}&&\frac{\partial f_2(x)}{\partial x_n}&&\dots&&\frac{\partial f_n(x)}{\partial x_n}\end{bmatrix} dxdf(x)?= ??x1??f1?(x)??x2??f1?(x)???xn??f1?(x)????x1??f2?(x)??x2??f2?(x)???xn??f2?(x)???…………???x1??fn?(x)??x2??fn?(x)???xn??fn?(x)?? ?
4. 常見(jiàn)矩陣求導(dǎo)公式
4.1 Y = A T X Y=A^TX Y=ATX
f ( x ) = A T X ; A = [ a 1 , a 2 , … , a n ] T ; X = [ x 1 , x 2 , … , x n ] T , 求 d f ( x ) d X f(x)=A^TX;\quad A=[a_1,a_2,\dots,a_n]^T;\quad X=[x_1,x_2,\dots,x_n]^T,求\frac{\mathrmvxwlu0yf4f(x)}{\mathrmvxwlu0yf4X} f(x)=ATX;A=[a1?,a2?,…,an?]T;X=[x1?,x2?,…,xn?]T,求dXdf(x)?
- 由于 A T = 1 × n , X = n × 1 , 那么 f ( x ) 為標(biāo)量,即表示數(shù)值 A^T=1\times n,X=n\times1,那么f(x)為標(biāo)量,即表示數(shù)值 AT=1×n,X=n×1,那么f(x)為標(biāo)量,即表示數(shù)值,
- 標(biāo)量不變,向量拉伸
- YX中,Y前面橫向拉,X后面縱向拉
f ( x ) = ∑ i = 1 N a i x i f(x)=\sum_{i=1}^Na_ix_i f(x)=i=1∑N?ai?xi?
d f ( x ) d X = [ ? f ( x ) ? x 1 ? f ( x ) ? x 2 ? ? f ( x ) ? x n ] \frac{\mathrmvxwlu0yf4f(x)}{\mathrmvxwlu0yf4X}=\begin{bmatrix}\frac{\partial f(x)}{\partial x_1}\\\\\frac{\partial f(x)}{\partial x_2}\\\\\vdots\\\\\frac{\partial f(x)}{\partial x_n}\end{bmatrix} dXdf(x)?= ??x1??f(x)??x2??f(x)???xn??f(x)?? ? - 可以計(jì)算 ? f ( x ) ? x i \frac{\partial f(x)}{\partial x_i} ?xi??f(x)?
? f ( x ) ? x i = a i \frac{\partial f(x)}{\partial x_i}=a_i ?xi??f(x)?=ai? - 可得如下:
d f ( x ) d X = [ a 1 a 2 ? a n ] = A \frac{\mathrmvxwlu0yf4f(x)}{\mathrmvxwlu0yf4X}=\begin{bmatrix}a_1\\\\a_2\\\\\vdots\\\\a_n\end{bmatrix}=A dXdf(x)?= ?a1?a2??an?? ?=A - 結(jié)論:
當(dāng) f ( x ) = A T X 當(dāng)f(x)=A^TX 當(dāng)f(x)=ATX
d f ( x ) d X = A \frac{\mathrmvxwlu0yf4f(x)}{\mathrmvxwlu0yf4X}=A dXdf(x)?=A
4.2 Y = X T A X Y=X^TAX Y=XTAX
f ( x ) = X T A X ; A = [ a 11 a 12 … a 1 n a 21 a 22 … a 2 n ? ? … ? a n 1 a n 2 … a n n ] ; X = [ x 1 , x 2 , … , x n ] T , 求 d f ( x ) d X f(x)=X^TAX;\quad A=\begin{bmatrix}a_{11}&&a_{12}&&\dots&&a_{1n}\\\\a_{21}&&a_{22}&&\dots&&a_{2n}\\\\\vdots&&\vdots&&\dots&&\vdots\\\\a_{n1}&&a_{n2}&&\dots&&a_{nn}\end{bmatrix};\quad X=[x_1,x_2,\dots,x_n]^T,求\frac{\mathrmvxwlu0yf4f(x)}{\mathrmvxwlu0yf4X} f(x)=XTAX;A= ?a11?a21??an1???a12?a22??an2???…………??a1n?a2n??ann?? ?;X=[x1?,x2?,…,xn?]T,求dXdf(x)?
f ( x ) = ∑ i = 1 N ∑ j = 1 N a i j x i x j f(x)=\sum_{i=1}^N\sum_{j=1}^Na_{ij}x_ix_j f(x)=i=1∑N?j=1∑N?aij?xi?xj?
- 標(biāo)量不變,YX拉伸術(shù),X縱向拉伸
d f ( x ) d X = [ ? f ( x ) ? x 1 ? f ( x ) ? x 2 ? ? f ( x ) ? x n ] \frac{\mathrmvxwlu0yf4f(x)}{\mathrmvxwlu0yf4X}=\begin{bmatrix}\frac{\partial f(x)}{\partial x_1}\\\\\frac{\partial f(x)}{\partial x_2}\\\\\vdots\\\\\frac{\partial f(x)}{\partial x_n}\end{bmatrix} dXdf(x)?= ??x1??f(x)??x2??f(x)???xn??f(x)?? ?
? f ( x ) ? x i = [ a i 1 a i 2 … a i n ] [ x 1 x 2 ? x n ] + [ a 1 i a 2 i … a n i ] [ x 1 x 2 ? x n ] \frac{\partial f(x)}{\partial x_i}=\begin{bmatrix}a_{i1}&a_{i2}&\dots&a_{in}\end{bmatrix}\begin{bmatrix}x_1\\\\x_2\\\\\vdots\\\\x_n\end{bmatrix}+\begin{bmatrix}a_{1i}&a_{2i}&\dots&a_{ni}\end{bmatrix}\begin{bmatrix}x_1\\\\x_2\\\\\vdots\\\\x_n\end{bmatrix} ?xi??f(x)?=[ai1??ai2??…?ain??] ?x1?x2??xn?? ?+[a1i??a2i??…?ani??] ?x1?x2??xn?? ?
d f ( x ) d X = [ a 11 a 12 … a 1 n a 21 a 22 … a 2 n ? ? … ? a n 1 a n 2 … a n n ] [ x 1 x 2 ? x n ] + [ a 11 a 21 … a n 1 a 12 a 22 … a n 2 ? ? … ? a 1 n a 2 n … a n n ] [ x 1 x 2 ? x n ] \frac{\mathrmvxwlu0yf4f(x)}{\mathrmvxwlu0yf4X}=\begin{bmatrix}a_{11}&a_{12}&\dots&a_{1n}\\\\a_{21}&a_{22}&\dots&a_{2n}\\\\\vdots&\vdots&\dots&\vdots\\\\a_{n1}&a_{n2}&\dots&a_{nn}\end{bmatrix}\begin{bmatrix}x_1\\\\x_2\\\\\vdots\\\\x_n\end{bmatrix}+\begin{bmatrix}a_{11}&a_{21}&\dots&a_{n1}\\\\a_{12}&a_{22}&\dots&a_{n2}\\\\\vdots&\vdots&\dots&\vdots\\\\a_{1n}&a_{2n}&\dots&a_{nn}\end{bmatrix}\begin{bmatrix}x_1\\\\x_2\\\\\vdots\\\\x_n\end{bmatrix} dXdf(x)?= ?a11?a21??an1??a12?a22??an2??…………?a1n?a2n??ann?? ? ?x1?x2??xn?? ?+ ?a11?a12??a1n??a21?a22??a2n??…………?an1?an2??ann?? ? ?x1?x2??xn?? ? - 已知 A , A T A,A^T A,AT表示如下:
A = [ a 11 a 12 … a 1 n a 21 a 22 … a 2 n ? ? … ? a n 1 a n 2 … a n n ] ; A T = [ a 11 a 21 … a n 1 a 12 a 22 … a n 2 ? ? … ? a 1 n a 2 n … a n n ] A=\begin{bmatrix}a_{11}&a_{12}&\dots&a_{1n}\\\\a_{21}&a_{22}&\dots&a_{2n}\\\\\vdots&\vdots&\dots&\vdots\\\\a_{n1}&a_{n2}&\dots&a_{nn}\end{bmatrix}\quad;A^T=\begin{bmatrix}a_{11}&a_{21}&\dots&a_{n1}\\\\a_{12}&a_{22}&\dots&a_{n2}\\\\\vdots&\vdots&\dots&\vdots\\\\a_{1n}&a_{2n}&\dots&a_{nn}\end{bmatrix} A= ?a11?a21??an1??a12?a22??an2??…………?a1n?a2n??ann?? ?;AT= ?a11?a12??a1n??a21?a22??a2n??…………?an1?an2??ann?? ? - 綜上所述如下:
當(dāng) f ( x ) = X T A X f(x)=X^TAX f(x)=XTAX時(shí)
d f ( x ) d X = A X + A T X = ( A + A T ) X \frac{\mathrmvxwlu0yf4f(x)}{\mathrmvxwlu0yf4X}=AX+A^TX=(A+A^T)X dXdf(x)?=AX+ATX=(A+AT)X