湖北省建設質(zhì)量安全協(xié)會網(wǎng)站seo網(wǎng)站推廣專員招聘
一、attention機制
????注意力模型最近幾年在深度學習各個領(lǐng)域被廣泛使用,無論是圖像處理、語音識別還是自然語言處理的各種不同類型的任務中,都很容易遇到注意力模型的身影。從注意力模型的命名方式看,很明顯其借鑒了人類的注意力機制。我們來看下面的一張圖片。
????圖中形象化展示了人類在看到一副圖像時是如何高效分配有限的注意力資源的,其中紅色區(qū)域表明視覺系統(tǒng)更關(guān)注的目標,很明顯對于圖1所示的場景,人們會把注意力更多投入到人的臉部,文本的標題以及文章首句等位置。
???視覺注意力機制是人類視覺所特有的大腦信號處理機制。人類視覺通過快速掃描全局圖像,獲得需要重點關(guān)注的目標區(qū)域,也就是一般所說的注意力焦點,而后對這一區(qū)域投入更多注意力資源,以獲取更多所需要關(guān)注目標的細節(jié)信息,而抑制其他無用信息。深度學習中的注意力機制的核心就是讓網(wǎng)絡關(guān)注其更需要更重要的地方,注意力機制就是實現(xiàn)網(wǎng)絡自適應的一個方式。
????注意力機制的本質(zhì)就是定位到感興趣的信息,抑制無用信息,結(jié)果通常都是以概率圖或者概率特征向量的形式展示,從原理上來說,主要分為空間注意力模型,通道注意力模型,空間和通道混合注意力模型三種。那么今天我們主要介紹通道注意力機制。
1、通道注意力機制
????通道注意力機制最經(jīng)典的應用就是SENet(Sequeeze and Excitation Net),它通過建模各個特征通道的重要程度,然后針對不同的任務增強或者抑制不同的通道,原理圖如下。
?
???????在正常的卷積操作后分出了一個旁路分支,首先進行Squeeze操作(即圖中Fsq(·)),它將空間維度進行特征壓縮,即每個二維的特征圖變成一個實數(shù),相當于具有全局感受野的池化操作,特征通道數(shù)不變。然后是Excitation操作(即圖中的Fex(·)),它通過參數(shù)w為每個特征通道生成權(quán)重,w被學習用來顯式地建模特征通道間的相關(guān)性。在文章中,使用了一個2層bottleneck結(jié)構(gòu)(先降維再升維)的全連接層+Sigmoid函數(shù)來實現(xiàn)。得到了每一個特征通道的權(quán)重之后,就將該權(quán)重應用于原來的每個特征通道,基于特定的任務,就可以學習到不同通道的重要性。作為一種通用的設計思想,它可以被用于任何現(xiàn)有網(wǎng)絡,具有較強的實踐意義。
????綜上通道注意力計算公式總結(jié)為:
????關(guān)于通道注意力機制的原理就介紹到這里,想要了解具體原理的,大家可以參考文獻:Squeeze-and-Excitation Networks
二、代碼實戰(zhàn)
clc clear ? close all load Train.mat % load Test.mat Train.weekend = dummyvar(Train.weekend); Train.month = dummyvar(Train.month); Train = movevars(Train,{'weekend','month'},'After','demandLag'); Train.ts = []; ? ? Train(1,:) =[]; y = Train.demand; x = Train{:,2:5}; [xnorm,xopt] = mapminmax(x',0,1); [ynorm,yopt] = mapminmax(y',0,1); ? xnorm = xnorm(:,1:1000); ynorm = ynorm(1:1000); ? k = 24; % 滯后長度 ? % 轉(zhuǎn)換成2-D image for i = 1:length(ynorm)-k ?Train_xNorm{:,i} = xnorm(:,i:i+k-1);Train_yNorm(i) = ynorm(i+k-1);Train_y{i} = y(i+k-1); end Train_x = Train_xNorm'; ? ytest = Train.demand(1001:1170); xtest = Train{1001:1170,2:5}; [xtestnorm] = mapminmax('apply', xtest',xopt); [ytestnorm] = mapminmax('apply',ytest',yopt); % xtestnorm = [xtestnorm; Train.weekend(1001:1170,:)'; Train.month(1001:1170,:)']; xtest = xtest'; for i = 1:length(ytestnorm)-kTest_xNorm{:,i} = xtestnorm(:,i:i+k-1);Test_yNorm(i) = ytestnorm(i+k-1);Test_y(i) = ytest(i+k-1); end Test_x = Test_xNorm'; x_train = table(Train_x,Train_y'); x_test = table(Test_x); %% 訓練集和驗證集劃分 % TrainSampleLength = length(Train_yNorm); % validatasize = floor(TrainSampleLength * 0.1); % Validata_xNorm = Train_xNorm(:,end - validatasize:end,:); % Validata_yNorm = Train_yNorm(:,TrainSampleLength-validatasize:end); % Validata_y = Train_y(TrainSampleLength-validatasize:end); % % Train_xNorm = Train_xNorm(:,1:end-validatasize,:); % Train_yNorm = Train_yNorm(:,1:end-validatasize); % Train_y = Train_y(1:end-validatasize); %% 構(gòu)建殘差神經(jīng)網(wǎng)絡 lgraph = layerGraph(); tempLayers = [imageInputLayer([4 24 1],"Name","imageinput")convolution2dLayer([3 3],32,"Name","conv","Padding","same")]; lgraph = addLayers(lgraph,tempLayers); ? tempLayers = [batchNormalizationLayer("Name","batchnorm")reluLayer("Name","relu")]; lgraph = addLayers(lgraph,tempLayers); ? tempLayers = [additionLayer(2,"Name","addition")convolution2dLayer([3 3],32,"Name","conv_1","Padding","same")]; lgraph = addLayers(lgraph,tempLayers); ? tempLayers = [batchNormalizationLayer("Name","batchnorm_1")reluLayer("Name","relu_1")]; lgraph = addLayers(lgraph,tempLayers); ? tempLayers = [additionLayer(2,"Name","addition_1")convolution2dLayer([3 3],32,"Name","conv_2","Padding","same")]; lgraph = addLayers(lgraph,tempLayers); ? tempLayers = [batchNormalizationLayer("Name","batchnorm_2")reluLayer("Name","relu_2")]; lgraph = addLayers(lgraph,tempLayers); ? tempLayers = [additionLayer(2,"Name","addition_2")convolution2dLayer([3 3],32,"Name","conv_3","Padding","same")]; lgraph = addLayers(lgraph,tempLayers); ? tempLayers = [batchNormalizationLayer("Name","batchnorm_3")reluLayer("Name","relu_3")]; lgraph?=?addLayers(lgraph,tempLayers); tempLayers = [additionLayer(2,"Name","addition_4")sigmoidLayer("Name","sigmoid")]; lgraph = addLayers(lgraph,tempLayers); ? tempLayers = multiplicationLayer(2,"Name","multiplication"); lgraph = addLayers(lgraph,tempLayers); ? tempLayers = [additionLayer(3,"Name","addition_3")fullyConnectedLayer(32,"Name","fc1")fullyConnectedLayer(16,"Name","fc2")fullyConnectedLayer(1,"Name","fc3")regressionLayer("Name","regressionoutput")]; lgraph = addLayers(lgraph,tempLayers); ? % 清理輔助變量 clear?tempLayers; plot(lgraph); analyzeNetwork(lgraph); %% 設置網(wǎng)絡參數(shù) maxEpochs = 100; miniBatchSize = 32; options = trainingOptions('adam', ...'MaxEpochs',maxEpochs, ...'MiniBatchSize',miniBatchSize, ...'InitialLearnRate',0.005, ...'GradientThreshold',1, ...'Shuffle','never', ...'Plots','training-progress',...'Verbose',0); ? net = trainNetwork(x_train,lgraph ,options); ? Predict_yNorm = predict(net,x_test); Predict_y = double(Predict_yNorm); plot(Test_y) hold on plot(Predict_y) legend('真實值','預測值') ?
?訓練迭代圖:
試集預測曲線圖
完整代碼