當(dāng)前位置：首頁(yè) > news >正文

茂名網(wǎng)站制作價(jià)格生意參謀指數(shù)在線轉(zhuǎn)換

news 2025/7/11 4:08:54

茂名網(wǎng)站制作價(jià)格,生意參謀指數(shù)在線轉(zhuǎn)換,網(wǎng)絡(luò)廣告投放形式,電商網(wǎng)站的建設(shè)背景圖片Python實(shí)現(xiàn)K-means聚類 K-means原理 K-means均值聚類算法作為最經(jīng)典也是最基礎(chǔ)的無標(biāo)簽分類學(xué)習(xí)算法。其實(shí)質(zhì)就是根據(jù)兩個(gè)數(shù)據(jù)點(diǎn)的距離去判斷他們是否屬于一類，對(duì)于一群點(diǎn)，就是類似用幾個(gè)圓去框定這些點(diǎn)（簇），然后圓心…

Python實(shí)現(xiàn)K-means聚類

K-means原理

K-means均值聚類算法作為最經(jīng)典也是最基礎(chǔ)的無標(biāo)簽分類學(xué)習(xí)算法。其實(shí)質(zhì)就是根據(jù)兩個(gè)數(shù)據(jù)點(diǎn)的距離去判斷他們是否屬于一類，對(duì)于一群點(diǎn)，就是類似用幾個(gè)圓去框定這些點(diǎn)（簇），然后圓心的心就是聚類中心。
在這里插入圖片描述

示例一

源代碼

from sklearn.cluster import KMeans
import numpy as np# 構(gòu)造數(shù)據(jù)樣本點(diǎn)集X，并計(jì)算K-means聚類
X = np.array([[1, 2], [1, 4], [1, 0], [4, 2], [4, 4], [4, 0]])
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)# 輸出及聚類后的每個(gè)樣本點(diǎn)的標(biāo)簽（即類別），預(yù)測(cè)新的樣本點(diǎn)所屬類別
print(kmeans.labels_)
print(kmeans.predict([[0, 0], [4, 4], [2, 1]]))

在這個(gè)例子中，KMeans函數(shù)的參數(shù)意義如下：

n_clusters：表示要?jiǎng)?chuàng)建的聚類數(shù)目，這里設(shè)置為2，意味著將數(shù)據(jù)劃分為兩個(gè)簇。
n_init：表示執(zhí)行算法的次數(shù)，每次執(zhí)行都會(huì)隨機(jī)初始化質(zhì)心，選擇具有最小總誤差的結(jié)果作為最終模型。這里設(shè)置為10，意味著將執(zhí)行10次算法并選擇最好的結(jié)果。
random_state：是一個(gè)隨機(jī)數(shù)生成器的種子，用于控制隨機(jī)初始化質(zhì)心的過程。通過設(shè)置相同的種子，可以使得每次運(yùn)行都得到相同的結(jié)果。
.fit(X)表示對(duì)數(shù)據(jù)X執(zhí)行K均值聚類算法，并訓(xùn)練模型。

運(yùn)行結(jié)果

在這里插入圖片描述

示例二

源代碼

import time
import numpy as np
import matplotlib.pyplot as pltfrom sklearn.cluster import KMeans
from sklearn.metrics.pairwise import pairwise_distances_argmin
from sklearn.datasets._samples_generator import make_blobs# ######################################
# Generate sample data
np.random.seed(0)batch_size = 45
centers = [[1, 1], [-1, -1], [1, -1]]
n_clusters = len(centers)
X, labels_true = make_blobs(n_samples=3000, centers=centers, cluster_std=0.7)# plot result
fig = plt.figure(figsize=(8,3))
fig.subplots_adjust(left=0.02, right=0.98, bottom=0.05, top=0.9)
colors = ['#4EACC5', '#FF9C34', '#4E9A06']# original data
ax = fig.add_subplot(1,2,1)
row, _ = np.shape(X)
for i in range(row):ax.plot(X[i, 0], X[i, 1], '#4EACC5', marker='.')ax.set_title('Original Data')
ax.set_xticks(())
ax.set_yticks(())# compute clustering with K-Means
k_means = KMeans(init='k-means++', n_clusters=3, n_init=10)
t0 = time.time()
k_means.fit(X)
t_batch = time.time() - t0k_means_cluster_centers = np.sort(k_means.cluster_centers_, axis=0)
k_means_labels = pairwise_distances_argmin(X, k_means_cluster_centers)# K-means
ax = fig.add_subplot(1, 2, 2)
for k, col in zip(range(n_clusters), colors):my_members = k_means_labels == k		# my_members是布爾型的數(shù)組（用于篩選同類的點(diǎn)，用不同顏色表示）cluster_center = k_means_cluster_centers[k]ax.plot(X[my_members, 0], X[my_members, 1], 'w',markerfacecolor=col, marker='.')	# 將同一類的點(diǎn)表示出來ax.plot(cluster_center[0], cluster_center[1], 'o', markerfacecolor=col,markeredgecolor='k', marker='o')	# 將聚類中心單獨(dú)表示出來
ax.set_title('KMeans')
ax.set_xticks(())
ax.set_yticks(())
plt.text(-3.5, 1.8, 'train time: %.2fs\ninertia: %f' % (t_batch, k_means.inertia_))plt.show()

運(yùn)行結(jié)果

在這里插入圖片描述

代碼注釋

1、使用Scikit-learn庫(kù)中的make_blobs函數(shù)來生成隨機(jī)的高斯分布數(shù)據(jù)集。通過指定n_samples參數(shù)為3000，centers參數(shù)為所需的中心點(diǎn)數(shù)量，cluster_std參數(shù)為0.7來生成數(shù)據(jù)集。返回?cái)?shù)據(jù)點(diǎn)和對(duì)應(yīng)的標(biāo)簽列表。
數(shù)據(jù)點(diǎn)列表
在這里插入圖片描述
標(biāo)簽列表

2、fig.subplots_adjust(left=0.02, right=0.98, bottom=0.05, top=0.9)這段代碼用于調(diào)整子圖的位置。它通過設(shè)置左邊界、右邊界、底邊界和頂邊界的值來控制子圖的位置。在這個(gè)例子中，左邊界被設(shè)置為0.02，右邊界被設(shè)置為0.98，底邊界被設(shè)置為0.05，頂邊界被設(shè)置為0.9。這意味著子圖將占據(jù)整個(gè)畫布的寬度的96%（從左邊界到右邊界），并且在垂直方向上從底邊界的5%位置開始，到頂邊界的90%位置結(jié)束。通過調(diào)整這些值，你可以改變子圖在畫布上的位置和大小。

3、ax = fig.add_subplot(1,2,1)這是在 Python 中創(chuàng)建一個(gè)簡(jiǎn)單的單圖形對(duì)象，使用 matplotlib 庫(kù)中的 fig.add_subplot() 方法。它創(chuàng)建了一個(gè)包含一個(gè)子圖的圖形。子圖是位置在 (1,1) 的唯一子圖。該變量 b’ax’ 將該子圖對(duì)象存儲(chǔ)起來，以便可以使用它來設(shè)置圖形屬性和添加繪圖元素。

4、k_means = KMeans(init=‘k-means++’, n_clusters=3, n_init=10)。K-Means是一種常用的無監(jiān)督學(xué)習(xí)算法，用于將數(shù)據(jù)劃分為預(yù)先指定數(shù)量的簇（clusters）。在代碼中，參數(shù)init='k-means’指定了用K-Means算法初始化聚類中心，初始化的方法有三種：k-means++，random，或者是一個(gè)數(shù)組。
k-means++能智能的選擇初始聚類中心進(jìn)行k均值聚類，加快收斂速度。該示例中初始化了聚類中心[[1, 1], [-1, -1], [1, -1]]，選擇K-means++加快收斂。random則是從數(shù)據(jù)中隨機(jī)的選擇k個(gè)觀測(cè)值作為初始的聚類中心。
n_clusters=3指定了要生成的簇的數(shù)量為3，n_init=10指定了進(jìn)行不同初始值運(yùn)行的次數(shù)，以選擇最佳的聚類結(jié)果。
對(duì)比
使用k-means++方法
在這里插入圖片描述

使用random方法的3個(gè)聚類中心
在這里插入圖片描述
運(yùn)算時(shí)間為0.14s
兩種方法總內(nèi)部方差一樣，運(yùn)算時(shí)間也一樣，當(dāng)更換為更大的數(shù)據(jù)時(shí)30000樣本時(shí)，在相同運(yùn)算時(shí)間下，k-means++計(jì)算的總內(nèi)部方差更小，收斂效果更好。

5、k_means.fit(X)。使用了k-means算法的fit()方法來擬合數(shù)據(jù)集X。

6、k_means_cluster_centers = np.sort(k_means.cluster_centers_, axis=0)。在這段代碼中，k_means是聚類模型，k_means.cluster_centers_是獲取聚類中心的屬性，np.sort是對(duì)聚類中心進(jìn)行排序的函數(shù)，axis=0表示按照列的順序進(jìn)行排序。最后，k_means_cluster_centers存儲(chǔ)了排序后的聚類中心。
聚類中心的屬性如下：
在這里插入圖片描述
排序后結(jié)果如下：

7、k_means_labels = pairwise_distances_argmin(X, k_means_cluster_centers)pairwise_distances_argmin()是一個(gè)函數(shù)，它根據(jù)輸入的數(shù)據(jù)點(diǎn)X和K-means聚類算法的中心點(diǎn)k_means_cluster_centers，計(jì)算每個(gè)數(shù)據(jù)點(diǎn)最近的中心點(diǎn)，并返回對(duì)應(yīng)的標(biāo)簽。換句話說，它會(huì)將數(shù)據(jù)點(diǎn)分配到最近的簇中，并返回每個(gè)數(shù)據(jù)點(diǎn)所屬的簇標(biāo)簽。
在這里插入圖片描述
8、my_members = k_means_labels == k
得到一個(gè)布爾值列表，用于下面索引選出不同的類