幫助傳銷做網(wǎng)站違法嗎市場(chǎng)推廣的方法和規(guī)劃
Softmax
Softmax是神經(jīng)網(wǎng)絡(luò)中常用的一種激活函數(shù),用于多分類任務(wù)。Softmax函數(shù)將未歸一化的logits轉(zhuǎn)換為概率分布。公式如下:
P ( y i ) = e z i ∑ j = 1 N e z j P(y_i) = \frac{e^{z_i}}{\sum_{j=1}^{N} e^{z_j}} P(yi?)=∑j=1N?ezj?ezi??
其中, z i z_i zi?是類別 i i i的logit, N N N是類別總數(shù)。
在大型詞匯表情況下,計(jì)算Softmax需要對(duì)每個(gè)詞的logit進(jìn)行指數(shù)運(yùn)算并歸一化,這會(huì)導(dǎo)致計(jì)算成本隨詞匯表大小線性增長(zhǎng)。因此,當(dāng)詞匯表非常大時(shí),計(jì)算Softmax的代價(jià)非常高。
層次Softmax
層次Softmax(Hierarchical Softmax)是一種通過(guò)樹(shù)結(jié)構(gòu)來(lái)加速Softmax計(jì)算的方法。它將詞匯表組織成一個(gè)樹(shù)結(jié)構(gòu),每個(gè)葉節(jié)點(diǎn)代表一個(gè)詞,每個(gè)內(nèi)部節(jié)點(diǎn)代表一個(gè)路徑選擇的二分類器。通過(guò)這種方式,可以將計(jì)算復(fù)雜度從O(N)降低到O(log(N))。
層次Softmax的詳細(xì)步驟
-
構(gòu)建層次結(jié)構(gòu):
- 將詞匯表組織成一棵二叉樹(shù)或霍夫曼樹(shù)。霍夫曼樹(shù)可以根據(jù)詞頻來(lái)構(gòu)建,使得高頻詞的路徑更短,從而進(jìn)一步加速計(jì)算。
-
路徑表示:
- 對(duì)于每個(gè)詞,通過(guò)樹(shù)從根節(jié)點(diǎn)到葉節(jié)點(diǎn)的路徑來(lái)表示。例如,假設(shè)詞“banana”的路徑為[根 -> 右 -> 左]。
-
路徑概率計(jì)算:
- 每個(gè)內(nèi)部節(jié)點(diǎn)都有一個(gè)二分類器,計(jì)算左子節(jié)點(diǎn)或右子節(jié)點(diǎn)的概率。
- 目標(biāo)詞的概率是從根節(jié)點(diǎn)到該詞的路徑上所有內(nèi)部節(jié)點(diǎn)概率的乘積。
對(duì)于目標(biāo)詞 w w w,其概率表示為:
P ( w ∣ c o n t e x t ) = ∏ n ∈ p a t h ( w ) P ( n ∣ c o n t e x t ) P(w|context) = \prod_{n \in path(w)} P(n|context) P(w∣context)=n∈path(w)∏?P(n∣context)
其中, p a t h ( w ) path(w) path(w)表示從根節(jié)點(diǎn)到詞 w w w的路徑上的所有內(nèi)部節(jié)點(diǎn)。
-
訓(xùn)練過(guò)程:
- 使用負(fù)對(duì)數(shù)似然損失函數(shù)進(jìn)行優(yōu)化。
- 對(duì)于每個(gè)訓(xùn)練樣本,計(jì)算從根節(jié)點(diǎn)到目標(biāo)詞的路徑上的所有內(nèi)部節(jié)點(diǎn)的概率,并根據(jù)實(shí)際路徑更新模型參數(shù)。
對(duì)比分析
特點(diǎn) | Softmax | 層次Softmax |
---|---|---|
計(jì)算復(fù)雜度 | O(N) | O(log(N)) |
適用場(chǎng)景 | 小型詞匯表 | 大型詞匯表 |
實(shí)現(xiàn)復(fù)雜度 | 簡(jiǎn)單 | 復(fù)雜,需要構(gòu)建樹(shù)結(jié)構(gòu) |
計(jì)算效率 | 隨詞匯表大小增加而增加 | 隨詞匯表大小增加,增長(zhǎng)較慢 |
為了更詳細(xì)地展示層次Softmax與傳統(tǒng)Softmax的對(duì)比,并包括實(shí)際數(shù)據(jù)和計(jì)算過(guò)程,下面我們使用一個(gè)簡(jiǎn)化的例子來(lái)說(shuō)明。
案例說(shuō)明 - 詞匯表及其層次結(jié)構(gòu)
假設(shè)我們有以下詞匯表(詞匯頻率為假定):
詞匯 | 頻率 |
---|---|
apple | 7 |
banana | 2 |
cherry | 4 |
date | 1 |
根據(jù)詞匯頻率,我們構(gòu)建如下霍夫曼樹(shù):
(*)/ \(apple) (*)/ \(cherry) (*)/ \(banana) (date)
計(jì)算Softmax概率
假設(shè)在某個(gè)上下文下,模型輸出以下logits:
詞匯 | Logit z z z |
---|---|
apple | 1.5 |
banana | 0.5 |
cherry | 1.0 |
date | 0.2 |
Softmax計(jì)算步驟:
- 計(jì)算每個(gè)詞的指數(shù):
e 1.5 = 4.4817 e^{1.5} = 4.4817 e1.5=4.4817
e 0.5 = 1.6487 e^{0.5} = 1.6487 e0.5=1.6487
e 1.0 = 2.7183 e^{1.0} = 2.7183 e1.0=2.7183
e 0.2 = 1.2214 e^{0.2} = 1.2214 e0.2=1.2214
- 計(jì)算所有指數(shù)的總和:
Z = 4.4817 + 1.6487 + 2.7183 + 1.2214 = 10.0701 Z = 4.4817 + 1.6487 + 2.7183 + 1.2214 = 10.0701 Z=4.4817+1.6487+2.7183+1.2214=10.0701
- 計(jì)算每個(gè)詞的概率:
P ( a p p l e ) = 4.4817 10.0701 ≈ 0.445 P(apple) = \frac{4.4817}{10.0701} \approx 0.445 P(apple)=10.07014.4817?≈0.445
P ( b a n a n a ) = 1.6487 10.0701 ≈ 0.164 P(banana) = \frac{1.6487}{10.0701} \approx 0.164 P(banana)=10.07011.6487?≈0.164
P ( c h e r r y ) = 2.7183 10.0701 ≈ 0.270 P(cherry) = \frac{2.7183}{10.0701} \approx 0.270 P(cherry)=10.07012.7183?≈0.270
P ( d a t e ) = 1.2214 10.0701 ≈ 0.121 P(date) = \frac{1.2214}{10.0701} \approx 0.121 P(date)=10.07011.2214?≈0.121
計(jì)算層次Softmax概率
我們使用以下假設(shè)的特征向量和模型參數(shù)來(lái)計(jì)算每個(gè)內(nèi)部節(jié)點(diǎn)的概率:
模型參數(shù):
- 根節(jié)點(diǎn)二分類器:
- 權(quán)重 w r o o t = [ 0.5 , ? 0.2 ] w_{root} = [0.5, -0.2] wroot?=[0.5,?0.2]
- 偏置 b r o o t = 0 b_{root} = 0 broot?=0
- 右子節(jié)點(diǎn)二分類器:
- 權(quán)重 w r i g h t = [ 0.3 , 0.4 ] w_{right} = [0.3, 0.4] wright?=[0.3,0.4]
- 偏置 b r i g h t = ? 0.1 b_{right} = -0.1 bright?=?0.1
- 子樹(shù)根二分類器:
- 權(quán)重 w s u b t r e e = [ ? 0.4 , 0.2 ] w_{subtree} = [-0.4, 0.2] wsubtree?=[?0.4,0.2]
- 偏置 b s u b t r e e = 0.2 b_{subtree} = 0.2 bsubtree?=0.2
上下文特征向量:
- x c o n t e x t = [ 1 , 2 ] x_{context} = [1, 2] xcontext?=[1,2]
1. 計(jì)算根節(jié)點(diǎn)概率
z r o o t = w r o o t ? x c o n t e x t + b r o o t z_{root} = w_{root} \cdot x_{context} + b_{root} zroot?=wroot??xcontext?+broot?
z r o o t = 0.5 × 1 + ( ? 0.2 ) × 2 + 0 z_{root} = 0.5 \times 1 + (-0.2) \times 2 + 0 zroot?=0.5×1+(?0.2)×2+0
z r o o t = 0.5 ? 0.4 z_{root} = 0.5 - 0.4 zroot?=0.5?0.4
z r o o t = 0.1 z_{root} = 0.1 zroot?=0.1
使用sigmoid函數(shù)計(jì)算概率:
P ( l e f t ∣ c o n t e x t ) r o o t = σ ( z r o o t ) P(left|context)_{root} = \sigma(z_{root}) P(left∣context)root?=σ(zroot?)
P ( l e f t ∣ c o n t e x t ) r o o t = 1 1 + e ? 0.1 P(left|context)_{root} = \frac{1}{1 + e^{-0.1}} P(left∣context)root?=1+e?0.11?
P ( l e f t ∣ c o n t e x t ) r o o t ≈ 1 1 + 0.9048 P(left|context)_{root} \approx \frac{1}{1 + 0.9048} P(left∣context)root?≈1+0.90481?
P ( l e f t ∣ c o n t e x t ) r o o t ≈ 0.525 P(left|context)_{root} \approx 0.525 P(left∣context)root?≈0.525
P ( r i g h t ∣ c o n t e x t ) r o o t = 1 ? P ( l e f t ∣ c o n t e x t ) r o o t P(right|context)_{root} = 1 - P(left|context)_{root} P(right∣context)root?=1?P(left∣context)root?
P ( r i g h t ∣ c o n t e x t ) r o o t = 1 ? 0.525 P(right|context)_{root} = 1 - 0.525 P(right∣context)root?=1?0.525
P ( r i g h t ∣ c o n t e x t ) r o o t ≈ 0.475 P(right|context)_{root} \approx 0.475 P(right∣context)root?≈0.475
2. 計(jì)算右子節(jié)點(diǎn)概率
z r i g h t = w r i g h t ? x c o n t e x t + b r i g h t z_{right} = w_{right} \cdot x_{context} + b_{right} zright?=wright??xcontext?+bright?
z r i g h t = 0.3 × 1 + 0.4 × 2 ? 0.1 z_{right} = 0.3 \times 1 + 0.4 \times 2 - 0.1 zright?=0.3×1+0.4×2?0.1
z r i g h t = 0.3 + 0.8 ? 0.1 z_{right} = 0.3 + 0.8 - 0.1 zright?=0.3+0.8?0.1
z r i g h t = 1.0 z_{right} = 1.0 zright?=1.0
使用sigmoid函數(shù)計(jì)算概率:
P ( l e f t ∣ c o n t e x t ) r i g h t = σ ( z r i g h t ) P(left|context)_{right} = \sigma(z_{right}) P(left∣context)right?=σ(zright?)
P ( l e f t ∣ c o n t e x t ) r i g h t = 1 1 + e ? 1.0 P(left|context)_{right} = \frac{1}{1 + e^{-1.0}} P(left∣context)right?=1+e?1.01?
P ( l e f t ∣ c o n t e x t ) r i g h t ≈ 1 1 + 0.3679 P(left|context)_{right} \approx \frac{1}{1 + 0.3679} P(left∣context)right?≈1+0.36791?
P ( l e f t ∣ c o n t e x t ) r i g h t ≈ 0.731 P(left|context)_{right} \approx 0.731 P(left∣context)right?≈0.731
P ( r i g h t ∣ c o n t e x t ) r i g h t = 1 ? P ( l e f t ∣ c o n t e x t ) r i g h t P(right|context)_{right} = 1 - P(left|context)_{right} P(right∣context)right?=1?P(left∣context)right?
P ( r i g h t ∣ c o n t e x t ) r i g h t = 1 ? 0.731 P(right|context)_{right} = 1 - 0.731 P(right∣context)right?=1?0.731
P ( r i g h t ∣ c o n t e x t ) r i g h t ≈ 0.269 P(right|context)_{right} \approx 0.269 P(right∣context)right?≈0.269
3. 計(jì)算子樹(shù)根節(jié)點(diǎn)概率
z s u b t r e e = w s u b t r e e ? x c o n t e x t + b s u b t r e e z_{subtree} = w_{subtree} \cdot x_{context} + b_{subtree} zsubtree?=wsubtree??xcontext?+bsubtree?
z s u b t r e e = ? 0.4 × 1 + 0.2 × 2 + 0.2 z_{subtree} = -0.4 \times 1 + 0.2 \times 2 + 0.2 zsubtree?=?0.4×1+0.2×2+0.2
z s u b t r e e = ? 0.4 + 0.4 + 0.2 z_{subtree} = -0.4 + 0.4 + 0.2 zsubtree?=?0.4+0.4+0.2
z s u b t r e e = 0.2 z_{subtree} = 0.2 zsubtree?=0.2
使用sigmoid函數(shù)計(jì)算概率:
P ( l e f t ∣ c o n t e x t ) s u b t r e e = σ ( z s u b t r e e ) P(left|context)_{subtree} = \sigma(z_{subtree}) P(left∣context)subtree?=σ(zsubtree?)
P ( l e f t ∣ c o n t e x t ) s u b t r e e = 1 1 + e ? 0.2 P(left|context)_{subtree} = \frac{1}{1 + e^{-0.2}} P(left∣context)subtree?=1+e?0.21?
P ( l e f t ∣ c o n t e x t ) s u b t r e e ≈ 1 1 + 0.8187 P(left|context)_{subtree} \approx \frac{1}{1 + 0.8187} P(left∣context)subtree?≈1+0.81871?
P ( l e f t ∣ c o n t e x t ) s u b t r e e ≈ 0.55 P(left|context)_{subtree} \approx 0.55 P(left∣context)subtree?≈0.55
P ( r i g h t ∣ c o n t e x t ) s u b t r e e = 1 ? P ( l e f t ∣ c o n t e x t ) s u b t r e e P(right|context)_{subtree} = 1 - P(left|context)_{subtree} P(right∣context)subtree?=1?P(left∣context)subtree?
P ( r i g h t ∣ c o n t e x t ) s u b t r e e = 1 ? 0.55 P(right|context)_{subtree} = 1 - 0.55 P(right∣context)subtree?=1?0.55
P ( r i g h t ∣ c o n t e x t ) s u b t r e e ≈ 0.45 P(right|context)_{subtree} \approx 0.45 P(right∣context)subtree?≈0.45
計(jì)算各個(gè)詞的層次Softmax概率
1. apple
路徑為[根 -> 左]
P ( a p p l e ) = P ( l e f t ∣ c o n t e x t ) r o o t ≈ 0.525 P(apple) = P(left|context)_{root} \approx 0.525 P(apple)=P(left∣context)root?≈0.525
2. banana
路徑為[根 -> 右 -> 右 -> 左]
P ( b a n a n a ) = P ( r i g h t ∣ c o n t e x t ) r o o t × P ( r i g h t ∣ c o n t e x t ) r i g h t × P ( l e f t ∣ c o n t e x t ) s u b t r e e P(banana) = P(right|context)_{root} \times P(right|context)_{right} \times P(left|context)_{subtree} P(banana)=P(right∣context)root?×P(right∣context)right?×P(left∣context)subtree?
P ( b a n a n a ) ≈ 0.475 × 0.269 × 0.55 P(banana) \approx 0.475 \times 0.269 \times 0.55 P(banana)≈0.475×0.269×0.55
P ( b a n a n a ) ≈ 0.0702 P(banana) \approx 0.0702 P(banana)≈0.0702
3. cherry
路徑為[根 -> 右 -> 左]
P ( c h e r r y ) = P ( r i g h t ∣ c o n t e x t ) r o o t × P ( l e f t ∣ c o n t e x t ) r i g h t P(cherry) = P(right|context)_{root} \times P(left|context)_{right} P(cherry)=P(right∣context)root?×P(left∣context)right?
P ( c h e r r y ) ≈ 0.475 × 0.731 P(cherry) \approx 0.475 \times 0.731 P(cherry)≈0.475×0.731
P ( c h e r r y ) ≈ 0.3472 P(cherry) \approx 0.3472 P(cherry)≈0.3472
4. date
路徑為[根 -> 右 -> 右 -> 右]
P ( d a t e ) = P ( r i g h t ∣ c o n t e x t ) r o o t × P ( r i g h t ∣ c o n t e x t ) r i g h t × P ( r i g h t ∣ c o n t e x t ) s u b t r e e P(date) = P(right|context)_{root} \times P(right|context)_{right} \times P(right|context)_{subtree} P(date)=P(right∣context)root?×P(right∣context)right?×P(right∣context)subtree?
P ( d a t e ) ≈ 0.475 × 0.269 × 0.45 P(date) \approx 0.475 \times 0.269 \times 0.45 P(date)≈0.475×0.269×0.45
P ( d a t e ) ≈ 0.0575 P(date) \approx 0.0575 P(date)≈0.0575
概率總結(jié)
詞匯 | Softmax 概率 | 層次Softmax 概率 |
---|---|---|
apple | 0.445 | 0.525 |
banana | 0.164 | 0.0702 |
cherry | 0.270 | 0.3472 |
date | 0.121 | 0.0575 |
以上結(jié)果顯示了傳統(tǒng)Softmax和層次Softmax的概率計(jì)算方法及其結(jié)果。通過(guò)構(gòu)建霍夫曼樹(shù),層次Softmax顯著減少了計(jì)算復(fù)雜度,特別適用于處理大規(guī)模詞匯表的任務(wù)。
Softmax與層次Softmax總結(jié)
特點(diǎn) | Softmax | 層次Softmax |
---|---|---|
計(jì)算復(fù)雜度 | O(N) | O(log(N)) |
優(yōu)點(diǎn) | 簡(jiǎn)單直接,適用于小型詞匯表 | 計(jì)算效率高,適用于大規(guī)模詞匯表 |
缺點(diǎn) | 計(jì)算量大,隨著詞匯表大小增加而線性增加 | 需要構(gòu)建和維護(hù)層次結(jié)構(gòu),模型復(fù)雜性增加 |
適用場(chǎng)景 | 詞匯表較小的多分類問(wèn)題 | 詞匯表非常大的自然語(yǔ)言處理任務(wù),如語(yǔ)言建模和機(jī)器翻譯 |
總結(jié)來(lái)說(shuō),層次Softmax通過(guò)樹(shù)結(jié)構(gòu)優(yōu)化了大詞匯表的概率計(jì)算,使其在處理大型詞匯表的任務(wù)中具有顯著優(yōu)勢(shì),而傳統(tǒng)Softmax則更適合小型詞匯表的場(chǎng)景。