淘寶網(wǎng)站制作公司哪家好關(guān)鍵詞搜索查詢
本例所用的數(shù)據(jù)集為C-MAPSS數(shù)據(jù)集,C-MAPSS數(shù)據(jù)集是美國NASA發(fā)布的渦輪風扇發(fā)動機數(shù)據(jù)集,其中包含不同工作條件和故障模式下渦輪風扇發(fā)動機多源性能的退化數(shù)據(jù),共有 4 個子數(shù)據(jù)集,每個子集又可分為訓練集、 測試集和RUL標簽。其中,訓練集包含航空發(fā)動機從開始運行到發(fā)生故障的所有狀態(tài)參數(shù); 測試集包含一定數(shù)量發(fā)動機從開始運行到發(fā)生故障前某一時間點的全部狀態(tài)參數(shù);RUL標簽記錄測試集中發(fā)動機的 RUL 值,可用于評估模 型的RUL預測能力。C-MAPSS數(shù)據(jù)集包含的基本信息如下:
添加圖片注釋,不超過 140 字(可選)
本例只采用FD001子數(shù)據(jù)集:
添加圖片注釋,不超過 140 字(可選)
關(guān)于python的集成環(huán)境,我一般Anaconda 和 winpython 都用,windows下主要用Winpython,IDE為spyder(類MATLAB界面)。
添加圖片注釋,不超過 140 字(可選)
正如peng wang老師所說
winpython, anaconda 哪個更好? - peng wang的回答 - 知乎 winpython, anaconda 哪個更好? - 知乎
winpython脫胎于pythonxy,面向科學計算,兼顧數(shù)據(jù)分析與挖掘;Anaconda主要面向數(shù)據(jù)分析與挖掘方面,在大數(shù)據(jù)處理方面有自己特色的一些包;winpython強調(diào)便攜性,被做成綠色軟件,不寫入注冊表,安裝其實就是解壓到某個文件夾,移動文件夾甚至放到U盤里在其他電腦上也能用;Anaconda則算是傳統(tǒng)的軟件模式。winpython是由個人維護;Anaconda由數(shù)據(jù)分析服務公司維護,意味著Winpython在很多方面都從簡,而Anaconda會提供一些人性化設置。Winpython 只能在windows上用,Anaconda則有l(wèi)inux的版本。
拋開軟件包的差異,我個人也推薦初學者用winpython,正因為其簡單,問題也少點,由于便攜性的特點系統(tǒng)壞了,重裝后也能直接用。
請直接安裝、使用winPython:WinPython download因為很多模塊以及集成的模塊
添加圖片注釋,不超過 140 字(可選)
可以選擇版本,不一定要用最新版本,否則可能出現(xiàn)不兼容問題。
下載、解壓后如下
添加圖片注釋,不超過 140 字(可選)
打開spyder就可以用了。
采用8種機器學習方法對NASA渦輪風扇發(fā)動機進行剩余使用壽命RUL預測,8種方法分別為:Linear Regression,SVM regression,Decision Tree regression,KNN model,Random Forest,Gradient Boosting Regressor,Voting Regressor,ANN Model。
首先導入相關(guān)模塊
import pandas as pd import seaborn as sns import numpy as np import matplotlib.pyplot as plt from sklearn.preprocessing import StandardScaler from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.svm import SVR from sklearn.tree import DecisionTreeRegressor from sklearn.ensemble import RandomForestRegressor from sklearn.metrics import mean_squared_error, r2_score import tensorflow as tf from tensorflow.keras.layers import Dense
版本如下:
tensorflow=2.8.0 keras=2.8.0 sklearn=1.0.2
導入數(shù)據(jù)
path = '' # define column names col_names=["unit_nb","time_cycle"]+["set_1","set_2","set_3"] + [f's_{i}' for i in range(1,22)] # read data df_train = train_data = pd.read_csv(path+"train_FD001.txt", index_col=False, sep= "\s+", header = None,names=col_names )
df_test和y_test同理導入,看一下訓練樣本
df_train.head()
添加圖片注釋,不超過 140 字(可選)
進行探索性數(shù)據(jù)分析
df_train[col_names[1:]].describe().T
添加圖片注釋,不超過 140 字(可選)
數(shù)據(jù)可視化分析:
sns.set_style("darkgrid") plt.figure(figsize=(16,10)) k = 1 for col in col_names[2:] : plt.subplot(6,4,k) sns.histplot(df_train[col],color='Green') k+=1 plt.tight_layout() plt.show()
添加圖片注釋,不超過 140 字(可選)
def remaining_useful_life(df): # Get the total number of cycles for each unit grouped_by_unit = df.groupby(by="unit_nb") max_cycle = grouped_by_unit["time_cycle"].max() # Merge the max cycle back into the original frame result_frame = df.merge(max_cycle.to_frame(name='max_cycle'), left_on='unit_nb', right_index=True) # Calculate remaining useful life for each row remaining_useful_life = result_frame["max_cycle"] - result_frame["time_cycle"] result_frame["RUL"] = remaining_useful_life # drop max_cycle as it's no longer needed result_frame = result_frame.drop("max_cycle", axis=1) return result_frame df_train = remaining_useful_life(df_train) df_train.head()
繪制最大RUL的直方圖分布
plt.figure(figsize=(10,5)) sns.histplot(max_ruls.RUL, color='r') plt.xlabel('RUL') plt.ylabel('Frequency') plt.axvline(x=max_ruls.RUL.mean(), ls='--',color='k',label=f'mean={max_ruls.RUL.mean()}') plt.axvline(x=max_ruls.RUL.median(),color='b',label=f'median={max_ruls.RUL.median()}') plt.legend() plt.show()
添加圖片注釋,不超過 140 字(可選)
plt.figure(figsize=(20, 8)) cor_matrix = df_train.corr() heatmap = sns.heatmap(cor_matrix, vmin=-1, vmax=1, annot=True) heatmap.set_title('Correlation Heatmap', fontdict={'fontsize':12}, pad=10);
添加圖片注釋,不超過 140 字(可選)
col = df_train.describe().columns #we drop colummns with standard deviation is less than 0.0001 sensors_to_drop = list(col[df_train.describe().loc['std']<0.001]) + ['s_14'] print(sensors_to_drop) # df_train.drop(sensors_to_drop,axis=1,inplace=True) df_test.drop(sensors_to_drop,axis=1,inplace=True) sns.set_style("darkgrid") fig, axs = plt.subplots(4,4, figsize=(25, 18), facecolor='w', edgecolor='k') fig.subplots_adjust(hspace = .22, wspace=.2) i=0 axs = axs.ravel() index = list(df_train.unit_nb.unique()) for sensor in df_train.columns[1:-1]: for idx in index[1:-1:15]: axs[i].plot('RUL', sensor,data=df_train[df_train.unit_nb==idx]) axs[i].set_xlim(350,0) axs[i].set(xticks=np.arange(0, 350, 25)) axs[i].set_ylabel(sensor) axs[i].set_xlabel('Remaining Use Life') i=i+1
添加圖片注釋,不超過 140 字(可選)
X_train = df_train[df_train.columns[3:-1]] y_train = df_train.RUL X_test = df_test.groupby('unit_nb').last().reset_index()[df_train.columns[3:-1]] y_train = y_train.clip(upper=155) # create evalute function for train and test data def evaluate(y_true, y_hat): RMSE = np.sqrt(mean_squared_error(y_true, y_hat)) R2_score = r2_score(y_true, y_hat) return [RMSE,R2_score]; #Make Dataframe which will contain results Results = pd.DataFrame(columns=['RMSE-Train','R2-Train','RMSE-Test','R2-Test','time-train (s)'])
訓練線性回歸模型
import time Sc = StandardScaler() X_train1 = Sc.fit_transform(X_train) X_test1 = Sc.transform(X_test) # create and fit model start = time.time() lm = LinearRegression() lm.fit(X_train1, y_train) end_fit = time.time()- start # predict and evaluate y_pred_train = lm.predict(X_train1) y_pred_test = lm.predict(X_test1) Results.loc['LR']=evaluate(y_train, y_pred_train)+evaluate(y_test, y_pred_test)+[end_fit] Results def plot_prediction(y_test,y_pred_test,score): plt.style.use("ggplot") fig, ax = plt.subplots(1, 2, figsize=(17, 4), gridspec_kw={'width_ratios': [1.2, 3]}) fig.subplots_adjust(wspace=.12) ax[0].plot([min(y_test),max(y_test)], [min(y_test),max(y_test)],lw=3,c='r') ax[0].scatter(y_test,y_pred_test,lw=3,c='g') ax[0].annotate(text=('RMSE: ' + "{:.2f}".format(score[0]) +'\n' + 'R2: ' + "{:.2%}".format(score[1])), xy=(0,140), size='large'); ax[0].set_title('Actual vs predicted RUL') ax[0].set_xlabel('Actual') ax[0].set_ylabel('Predicted'); ax[1].plot(range(0,100),y_test,lw=2,c='r',label = 'actual') ax[1].plot(range(0,100),y_pred_test,lw=1,ls='--', c='b',label = 'prediction') ax[1].legend() ax[1].set_title('Actual vs predicted RUL') ax[1].set_xlabel('Engine num') ax[1].set_ylabel('RUL'); plot_prediction(y_test.RUL,y_pred_test,evaluate(y_test, y_pred_test))
添加圖片注釋,不超過 140 字(可選)
訓練支持向量機模型
# create and fit model start = time.time() svr = SVR(kernel="rbf", gamma=0.25, epsilon=0.05) svr.fit(X_train1, y_train) end_fit = time.time()-start # predict and evaluate y_pred_train = svr.predict(X_train1) y_pred_test = svr.predict(X_test1) Results.loc['SVM']=evaluate(y_train, y_pred_train)+evaluate(y_test, y_pred_test)+[end_fit] Results plot_prediction(y_test.RUL,y_pred_test,evaluate(y_test, y_pred_test))
添加圖片注釋,不超過 140 字(可選)
訓練決策樹模型
start=time.time() dtr = DecisionTreeRegressor(random_state=42, max_features='sqrt', max_depth=10, min_samples_split=10) dtr.fit(X_train1, y_train) end_fit =time.time()-start # predict and evaluate y_pred_train = dtr.predict(X_train1) y_pred_test = dtr.predict(X_test1) Results.loc['DTree']=evaluate(y_train, y_pred_train)+evaluate(y_test, y_pred_test)+[end_fit] Results plot_prediction(y_test.RUL,y_pred_test,evaluate(y_test, y_pred_test))
添加圖片注釋,不超過 140 字(可選)
訓練KNN模型
from sklearn.neighbors import KNeighborsRegressor # Evaluating on Train Data Set start = time.time() Kneigh = KNeighborsRegressor(n_neighbors=7) Kneigh.fit(X_train1, y_train) end_fit =time.time()-start # predict and evaluate y_pred_train = Kneigh.predict(X_train1) y_pred_test = Kneigh.predict(X_test1) Results.loc['KNeigh']=evaluate(y_train, y_pred_train)+evaluate(y_test, y_pred_test)+[end_fit] Results plot_prediction(y_test.RUL,y_pred_test,evaluate(y_test, y_pred_test))
添加圖片注釋,不超過 140 字(可選)
訓練隨機森林模型
start = time.time() rf = RandomForestRegressor(n_jobs=-1, n_estimators=130,max_features='sqrt', min_samples_split= 2, max_depth=10, random_state=42) rf.fit(X_train1, y_train) y_hat_train1 = rf.predict(X_train1) end_fit = time.time()-start # predict and evaluate y_pred_train = rf.predict(X_train1) y_pred_test = rf.predict(X_test1) Results.loc['RF']=evaluate(y_train, y_pred_train)+evaluate(y_test, y_pred_test)+[end_fit] Results plot_prediction(y_test.RUL,y_pred_test,evaluate(y_test, y_pred_test))
添加圖片注釋,不超過 140 字(可選)
訓練Gradient Boosting Regressor模型
from sklearn.ensemble import GradientBoostingRegressor # Evaluating on Train Data Set start = time.time() xgb_r = GradientBoostingRegressor(n_estimators=45, max_depth=10, min_samples_leaf=7, max_features='sqrt', random_state=42,learning_rate=0.11) xgb_r.fit(X_train1, y_train) end_fit =time.time()-start # predict and evaluate y_pred_train = xgb_r.predict(X_train1) y_pred_test = xgb_r.predict(X_test1) Results.loc['XGboost']=evaluate(y_train, y_pred_train)+evaluate(y_test, y_pred_test)+[end_fit] Results plot_prediction(y_test.RUL,y_pred_test,evaluate(y_test, y_pred_test))
訓練Voting Regressor模型
from sklearn.ensemble import VotingRegressor start=time.time() Vot_R = VotingRegressor([("rf", rf), ("xgb", xgb_r)],weights=[1.5,1],n_jobs=-1) Vot_R.fit(X_train1, y_train) end_fit =time.time()-start # predict and evaluate y_pred_train = Vot_R.predict(X_train1) y_pred_test = Vot_R.predict(X_test1) Results.loc['VotingR']=evaluate(y_train, y_pred_train)+evaluate(y_test, y_pred_test)+[end_fit] Results plot_prediction(y_test.RUL,y_pred_test,evaluate(y_test, y_pred_test))
訓練ANN模型
star=time.time() model = tf.keras.models.Sequential() model.add(Dense(32, activation='relu')) model.add(Dense(64, activation='relu')) model.add(Dense(128, activation='relu')) model.add(Dense(128, activation='relu')) model.add(Dense(1, activation='linear')) model.compile(loss= 'msle', optimizer='adam', metrics=['msle']) history = model.fit(x=X_train1,y=y_train, epochs = 40, batch_size = 64) end_fit = time.time()-star # predict and evaluate y_pred_train = model.predict(X_train1) y_pred_test = model.predict(X_test1) Results.loc['ANN']=evaluate(y_train, y_pred_train)+evaluate(y_test, y_pred_test)+[end_fit] Results
工學博士,擔任《Mechanical System and Signal Processing》審稿專家,擔任《中國電機工程學報》優(yōu)秀審稿專家,《控制與決策》,《系統(tǒng)工程與電子技術(shù)》,《電力系統(tǒng)保護與控制》,《宇航學報》等EI期刊審稿專家,擔任《計算機科學》,《電子器件》 , 《現(xiàn)代制造過程》 ,《電源學報》,《船舶工程》 ,《軸承》 ,《工礦自動化》 ,《重慶理工大學學報》 ,《噪聲與振動控制》 ,《機械傳動》 ,《機械強度》 ,《機械科學與技術(shù)》 ,《機床與液壓》,《聲學技術(shù)》,《應用聲學》,《石油機械》,《西安工業(yè)大學學報》等中文核心審稿專家。
擅長領域:現(xiàn)代信號處理,機器學習,深度學習,數(shù)字孿生,時間序列分析,設備缺陷檢測、設備異常檢測、設備智能故障診斷與健康管理PHM等。