當前位置：首頁 > news >正文

鄭州網(wǎng)站制作推廣站長seo

news 2025/7/11 5:58:15

鄭州網(wǎng)站制作推廣,站長seo,做照片有那些網(wǎng)站好,wordpress博客好嗎文章目錄 🍀引言🍀eta參數(shù)的調節(jié)🍀sklearn中的梯度下降 🍀引言承接上篇，這篇主要有兩個重點，一個是eta參數(shù)的調解；一個是在sklearn中實現(xiàn)梯度下降在梯度下降算法中，學習率&#xf…

文章目錄

🍀引言
🍀eta參數(shù)的調節(jié)
🍀sklearn中的梯度下降

🍀引言

承接上篇，這篇主要有兩個重點，一個是eta參數(shù)的調解；一個是在sklearn中實現(xiàn)梯度下降

在梯度下降算法中，學習率（通常用符號η表示，也稱為步長或學習速率）的選擇非常重要，因為它直接影響了算法的性能和收斂速度。學習率控制了每次迭代中模型參數(shù)更新的幅度。以下是學習率（η）的重要性：

收斂速度：學習率決定了模型在每次迭代中移動多遠。如果學習率過大，模型可能會在參數(shù)空間中來回搖擺，導致不穩(wěn)定的收斂或甚至發(fā)散。如果學習率過小，模型將收斂得很慢，需要更多的迭代次數(shù)才能達到最優(yōu)解。因此，選擇合適的學習率可以加速收斂速度。
穩(wěn)定性：過大的學習率可能會導致梯度下降算法不穩(wěn)定，甚至無法收斂。過小的學習率可以使算法更加穩(wěn)定，但可能需要更多的迭代次數(shù)才能達到最優(yōu)解。因此，合適的學習率可以在穩(wěn)定性和收斂速度之間取得平衡。
避免局部最小值：選擇不同的學習率可能會導致模型陷入不同的局部最小值。通過嘗試不同的學習率，您可以更有可能找到全局最小值，而不是被困在局部最小值中。
調優(yōu)：學習率通常需要調優(yōu)。您可以嘗試不同的學習率值，并監(jiān)視損失函數(shù)的收斂情況。通常，您可以使用學習率衰減策略，逐漸降低學習率以改善收斂性能。
批量大小：學習率的選擇也與批量大小有關。通常，小批量梯度下降（Mini-batch Gradient Descent）使用比大批量梯度下降更大的學習率，因為小批量可以提供更穩(wěn)定的梯度估計。

總之，學習率是梯度下降算法中的關鍵超參數(shù)之一，它需要仔細選擇和調整，以在訓練過程中實現(xiàn)最佳性能和收斂性。不同的問題和數(shù)據(jù)集可能需要不同的學習率，因此在實踐中，通常需要進行實驗和調優(yōu)來找到最佳的學習率值。

🍀eta參數(shù)的調節(jié)

在上代碼前我們需要知道，如果eta的值過小會造成什么樣的結果

在這里插入圖片描述
反之如果過大呢

在這里插入圖片描述
可見，eta過大過小都會影響效率，所以一個合適的eta對于尋找最優(yōu)有著至關重要的作用

在上篇的學習中我們已經(jīng)初步完成的代碼，這篇我們將其封裝一下
首先需要定義兩個函數(shù)，一個用來返回thera的歷史列表，一個則將其繪制出來

def gradient_descent(eta,initial_theta,epsilon = 1e-8):theta = initial_thetatheta_history = [initial_theta]def dj(theta): return 2*(theta-2.5) #  傳入theta,求theta點對應的導數(shù)def j(theta):return (theta-2.5)**2-1  #  傳入theta，獲得目標函數(shù)的對應值while True:gradient = dj(theta)last_theta = thetatheta = theta-gradient*eta theta_history.append(theta)if np.abs(j(theta)-j(last_theta))<epsilon:breakreturn theta_historydef plot_gradient(theta_history):plt.plot(plt_x,plt_y)plt.plot(theta_history,[(i-2.5)**2-1 for i in theta_history],color='r',marker='+')plt.show()

其實就是上篇代碼的整合罷了
之后我們需要進行簡單的調參了，這里我們分別采用0.1、0.01、0.9，這三個參數(shù)進行調節(jié)

eta = 0.1
theta =0.0
plot_gradient(gradient_descent(eta,theta))
len(theta_history)

運行結果如下
在這里插入圖片描述

eta = 0.01
theta =0.0
plot_gradient(gradient_descent(eta,theta))
len(theta_history)

運行結果如下
在這里插入圖片描述

eta = 0.9
theta =0.0
plot_gradient(gradient_descent(eta,theta))
len(theta_history)

運行結果如下
在這里插入圖片描述
這三張圖與之前的提示很像吧，可見調參的重要性
如果我們將eta改為1.0呢，那么會發(fā)生什么

eta = 1.0
theta =0.0
plot_gradient(gradient_descent(eta,theta))
len(theta_history)

運行結果如下
在這里插入圖片描述
那改為1.1呢

eta = 1.1
theta =0.0
plot_gradient(gradient_descent(eta,theta))
len(theta_history)

運行結果如下
在這里插入圖片描述
我們從圖可以清楚的看到，當eta為1.1的時候是嗷嗷增大的，這種情況我們需要采用異常處理來限制一下，避免報錯，處理的方式是限制循環(huán)的最大值，且可以在expect中設置inf（正無窮）

def gradient_descent(eta,initial_theta,n_iters=1e3,epsilon = 1e-8):theta = initial_thetatheta_history = [initial_theta]i_iter = 1def dj(theta):  try:return 2*(theta-2.5) #  傳入theta,求theta點對應的導數(shù)except:return float('inf')def j(theta):return (theta-2.5)**2-1  #  傳入theta，獲得目標函數(shù)的對應值while i_iter<=n_iters:gradient = dj(theta)last_theta = thetatheta = theta-gradient*eta theta_history.append(theta)if np.abs(j(theta)-j(last_theta))<epsilon:breaki_iter+=1return theta_historydef plot_gradient(theta_history):plt.plot(plt_x,plt_y)plt.plot(theta_history,[(i-2.5)**2-1 for i in theta_history],color='r',marker='+')plt.show()

注意：inf表示正無窮大

🍀sklearn中的梯度下降

這里我們還是以波士頓房價為例子
首先導入需要的庫

from sklearn.datasets import load_boston
from sklearn.linear_model import SGDRegressor

之后取一部分的數(shù)據(jù)

boston = load_boston()
X = boston.data
y = boston.target
X = X[y<50]
y = y[y<50]

然后進行數(shù)據(jù)歸一化

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y)
std = StandardScaler()
std.fit(X_train)
X_train_std=std.transform(X_train)
X_test_std=std.transform(X_test)
sgd_reg = SGDRegressor()
sgd_reg.fit(X_train_std,y_train)