當(dāng)前位置：首頁(yè) > news >正文

大石橋網(wǎng)站建設(shè)百度關(guān)鍵字優(yōu)化價(jià)格

news 2025/7/3 9:33:44

大石橋網(wǎng)站建設(shè),百度關(guān)鍵字優(yōu)化價(jià)格,公會(huì)網(wǎng)站免費(fèi)建設(shè),濟(jì)南又出了一例目錄手寫(xiě)B(tài)P前言1. 數(shù)據(jù)加載2. 前向傳播3. 反向傳播總結(jié) 手寫(xiě)B(tài)P 前言手寫(xiě)AI推出的全新面向AI算法的C課程 Algo C，鏈接。記錄下個(gè)人學(xué)習(xí)筆記，僅供自己參考。本次課程主要是手寫(xiě) BP 代碼課程大綱可看下面的思維導(dǎo)圖 1. 數(shù)據(jù)加載我們首先來(lái)實(shí)現(xiàn)下MNIST…

手寫(xiě)B(tài)P

前言

手寫(xiě)AI推出的全新面向AI算法的C++課程 Algo C++，鏈接。記錄下個(gè)人學(xué)習(xí)筆記，僅供自己參考。

本次課程主要是手寫(xiě) BP 代碼

課程大綱可看下面的思維導(dǎo)圖

在這里插入圖片描述

1. 數(shù)據(jù)加載

我們首先來(lái)實(shí)現(xiàn)下MNIST數(shù)據(jù)集的加載工作

#include <iostream>
#include <tuple>
#include <string>
#include <fstream>
#include "matrix.hpp"using namespace std;namespace Application{namespace io{// 不要內(nèi)存對(duì)齊struct __attribute__((packed)) mnist_labels_header_t{unsigned int magic_number;unsigned int number_of_items;}struct __attribute__((packed)) mnist_images_header_t{unsigned int magic_number;unsigned int number_of_images;unsigned int number_of_rows;unsigned int number_of_cols;}unsigned int inverse_byte(unsigned int v){unsigned char* p = (unsigned char*)&v;std::swap(p[0], p[3]);std::swap(p[1], p[2]);return v;}/* 加載mnist數(shù)據(jù)集 */tuple<Matrix, Matrix> load_data(const string& image_file, const string& label_file){// file, mode = rb/wb/a+Matrix images, labels;fstream fimage(image_file, ios::binary | ios::in);fstream flabel(label_file, ios::binary | ios::in)mnist_images_header_t images_header;mnist_labels_header_t labels_header;fimage.read((char *)&images_header, sizeof(images_header));flabel.read((char *)&labels_header, sizeof(labels_header));// 大小端轉(zhuǎn)換images_header.number_of_images = inverse_byte(images_header.number_of_images);labels_header.number_of_items  = inverse_byte(labels_header.number_of_items);images.resize(images_header.number_of_images, 28 * 28);// one hot, floatlabels.resize(labels_header.number_of_items, 10);// 中間存儲(chǔ)作用, 它的大小等于文件中圖像數(shù)據(jù)的大小std::vector<unsigned char> buffer(images.rows() * images.cols());fimage.read((char*)buffer.data(), buffer.size());// buffer 是unsigned char類(lèi)型的圖像數(shù)據(jù)，pixel// 現(xiàn)在需要轉(zhuǎn)化到images矩陣中// 順便把標(biāo)準(zhǔn)化給做了for(int i = 0; i < buffer.size() ++i)images.ptr()[i] = (buffer[i] / 255.0f - 0.1307f) / 0.3081f;// 開(kāi)始處理label，ont-hot過(guò)程// labels是全零的矩陣buffer.resize(labels.rows());flabel.read((char*)buffer.data(), buffer.size());for(int i = 0; i < buffer.size(); ++i)labels.ptr(i)[buffer[i]] = 1;	// one-hotreturn make_tuple(images, labels);}}int run(){// 驗(yàn)證Matrix images, labels;tie(images, labels) = io::load_data("mnist/train-images.idx3-ubyte","mnist/train-labels.idx1-ubyte");cout << labels;return 0;}
}int main(){return Application::run();
}

上面示例代碼演示了 mnist 數(shù)據(jù)集加載的示例，其中

定義了兩個(gè)用于 mnist 數(shù)據(jù)集的結(jié)構(gòu)體，分別是 mnist_labels_header_t 和 mnist_images_header_t。這兩個(gè)結(jié)構(gòu)體表示 mnist 標(biāo)簽和圖像的頭信息
__attribute__((packed)) 是一個(gè) GCC/Clang 的擴(kuò)展，用于告訴編譯器不要進(jìn)行內(nèi)存對(duì)齊，以便我們自己手動(dòng)控制內(nèi)存的布局和對(duì)齊。
inverse_byte 用于大小端轉(zhuǎn)換。在 mnist 數(shù)據(jù)集中，頭文件中存儲(chǔ)的是大端模式，而讀取數(shù)據(jù)時(shí)需要將其轉(zhuǎn)換為小端模式。inverse_byte 函數(shù)實(shí)現(xiàn)的方法是將 unsigned int 類(lèi)型的變量按照字節(jié)順序重新排列。
load_data 函數(shù)用于加載 mnist 數(shù)據(jù)集，需要傳入圖像和標(biāo)簽的文件名。在函數(shù)內(nèi)部，使用 fstream 打開(kāi)文件并讀取頭信息，對(duì)頭信息進(jìn)行大小端轉(zhuǎn)換，然后根據(jù)圖像和標(biāo)簽的大小創(chuàng)建矩陣。之后，讀取圖像和標(biāo)簽數(shù)據(jù)并進(jìn)行歸一化，最后返回歸一化后的矩陣。

標(biāo)簽處理的代碼有點(diǎn)看不懂，來(lái)分析下

buffer.resize(labels.rows());
flabel.read((char*)buffer.data(), buffer.size());
for(int i = 0; i < buffer.size(); ++i)labels.ptr(i)[buffer[i]] = 1;

在 MNIST 數(shù)據(jù)集中，標(biāo)簽是一個(gè)整數(shù)值，代表著手寫(xiě)數(shù)字的真實(shí)值。但是，在神經(jīng)網(wǎng)絡(luò)訓(xùn)練中，我們通常需要將標(biāo)簽轉(zhuǎn)化為 one-hot 編碼，以便于計(jì)算誤差。例如，對(duì)于數(shù)字 5，它的 ont-hot 編碼為[0,0,0,0,0,1,0,0,0,0]，也就是在第 6 個(gè)位置上為 1，其余位置都為0

這段代碼中，首先通過(guò) resize 函數(shù)將 buffer 的大小設(shè)置為標(biāo)簽行數(shù)，也就是樣本數(shù)。然后從 flabel 中讀取 buffer.size() 個(gè)字節(jié)到 buffer 中。由于標(biāo)簽是一個(gè)字節(jié)大小的整數(shù)，所以 buffer 中存儲(chǔ)的是一連串的整數(shù)值。接下來(lái)的 for 循環(huán)中，遍歷 buffer 中的所有元素，將標(biāo)簽對(duì)應(yīng)的位置設(shè)置為 1，其余位置設(shè)置為 0

labels.ptr(i) 返回 labels 中第i行的地址，然后用 [buffer[i]] 訪(fǎng)問(wèn)該行中第 buffer[i] 列的元素，將該元素的值設(shè)置為1。這樣就完成了標(biāo)簽數(shù)據(jù)的轉(zhuǎn)化，使其變?yōu)榱诉m合神經(jīng)網(wǎng)絡(luò)訓(xùn)練的形式

2. 前向傳播

來(lái)看下整個(gè)前向傳播過(guò)程，包括初始化超參數(shù)和權(quán)重、偏置矩陣，矩陣相乘進(jìn)行前傳并求取 Loss

#include <iostream>
#include <tuple>
#include <string>
#include <fstream>
#include <random>
#include <cmath>
#include <string.h>
#include <algorithm>    // shuffle
#include "matrix.hpp"using namespace std;namespace Application{namespace tools{vector<int> range(int end){vector<int> output(end);for(int i = 0; i < end; ++i)output[i] = i;return  output;}// 通過(guò)指定索引數(shù)組，并且指定圖像、索引的起點(diǎn)和終點(diǎn)。獲取對(duì)應(yīng)索引的對(duì)應(yīng)圖像，返回matrixMatrix slice(const Matrix& images, const vector<int>& indexs, int start, int size){Matrix output(size, images.cols());for(int i = 0; i < size; ++i){int image_row_index = indexs[start + i];// 把images[image_row_index]對(duì)應(yīng)的一行，復(fù)制到output[i]行memcpy(output.ptr(i), images.ptr(image_row_index), sizeof(float) * images.cols());}return output;}};namespace random{// 對(duì)應(yīng)的就是seedstatic default_random_engine global_random_engine;Matrix normal_distribution_matrix(int rows, int cols, float mean = 0.0f, float stddev = 0.0f){// seednormal_distribution<float> norm(mean, stddev);Matrix output(rows, cols);for(int i = 0; i < rows * cols; ++i)output.ptr()[i] = norm(global_random_engine);return output;}// inplace -> 現(xiàn)場(chǎng)修改void shuffle_array(vector<int>& indexs){std::shuffle(indexs.begin(), indexs.end(), global_random_engine);}};namespace nn{Matrix relu(cosnt Matrix& x){Matrix output = x;auto p = output.ptr();for(int i = 0; i < x.numel(); ++i, ++p){// max(float, int)*p = std::max(*p, 0.0f);}return output;            }float sigmoid_cross_entropy_loss(const Matrix& output, const Matrix& y){static auto sigmoid = [](float x){return 1.0f / (1.0f + exp(-x));}// output => n x 10// y      => n x 10auto po = output.ptr();auto py = y.ptr();float loss = 0;float eps = 1e-5;for(int i = 0; i < output.numnel(); ++i, ++po, ++py){float prob = sigmoid(*po);prob = max(min(prob, 1-eps), eps);// loss += -(y * log(p) + (1-y) * log(1-p));loss += (*py) * log(prob) + (1 - *py) * log(1 - prob);}return -loss / output.rows();}};int run(){Matrix trainimages, trainlabels;Matrix valimages, vallabels;tie(trainimages, trainlabels) = io::load_data("mnist/train-images.idx3-ubyte", "mnist/train-labels.idx1-ubyte");tie(valimage, vallabels)      = io::load_data("mnist/t10k-images.idx3-ubyte",  "mnist/t10k-labels.idx1-ubyte");// hidden = images @ w1 + b1// output = hidden @ w2 + b2// loss   = lossfn(output, onehot_label)// loss.backward()int num_images = trainimages.rows();  // 60000int num_input  = trainimages.cols();  // 784int num_hidden = 1024;int num_output = 10;int num_epoch  = 10;float lr       = 1e-1;int batch_size = 256;float momentum = 0.9f;// drop last -> pytorch dataloaderint num_batch_per_epoch = num_images / batch_size;// 訓(xùn)練流程// 1. 隨機(jī)抓取一個(gè)batch，shuffle=True//      怎么實(shí)現(xiàn)隨機(jī)呢？//      重點(diǎn)是，要求，每一個(gè)epoch，抓取的隨機(jī)性不同//     按照索引抓取，index[0,1,2,3,4,5,6,7]//     shuffle(indexs) ->[6,5,4,0,1,3,2,7] auto image_indexs = tools::range(num_images);// 確定矩陣的大小，并且對(duì)矩陣做初始化(凱明初始化fan_in,fan_out)float w1_gain =  2.0f / std::sqrt((float)num_input + num_hidden); // 對(duì)應(yīng)激活函數(shù)        Matrix w1 = random::normal_distribution_matrix(num_input, num_hidden, 0, w1_gain);Matrix b1 = random::normal_distribution_matrix(1, num_hidden);        float w2_gain =  1.0f / std::sqrt((float)num_hidden + num_output); // 對(duì)應(yīng)激活函數(shù)        Matrix w2 = random::normal_distribution_matrix(num_hidden, num_output, 0, w1_gain);Matrix b2 = random::normal_distribution_matrix(1, num_output);for(int epoch = 0; epoch < num_epoch; ++epoch){random::shuffle_array(image_indexs);for(int ibatch = 0; ibatch < num_batch_per_epoch; ++ibatch){// image_indexs[ibatch * batch_size : (ibatch + 1) * batch_size]auto x = tools::slice(trainimages, image_indexs, ibatch * batch_size, batch_size);cout << x.rows() << ", " << x.cols() << endl;auto y = tools::slice(trainlabels, image_indexs, ibatch * batch_size, batch_size);// n x m + 1 x mauto hidden = x.gemm(w1) + b1;auto hidden_act = nn::relu(hidden);auto output     = hidden_act.gemm(w2) + b2;float loss      = nn::sigmoid_cross_entropy_loss(output, y);cout << "Loss: " << loss << endl;}}// 驗(yàn)證// Matrix t = random::normal_distribution_matrix(3, 3);// Matrix s = tools::slice(t, {0, 1, 2}, 1, 2);// cout << t;// cout << s;return 0;}
}int main(){return Application::run();
}

上面示例代碼演示了前向傳播的過(guò)程，包括初始化神經(jīng)網(wǎng)絡(luò)參數(shù)、進(jìn)行前向傳播計(jì)算、計(jì)算損失函數(shù)

首先，對(duì)神經(jīng)網(wǎng)絡(luò)的參數(shù)進(jìn)行了初始化，包括輸入層到隱藏層的權(quán)重矩陣 w1 和偏置向量 b1，隱藏層到輸出層的權(quán)重矩陣 w2 和偏置向量 b2。這些參數(shù)都是通過(guò)調(diào)用 random::normal_distribution_matrix() 函數(shù)生成的，該函數(shù)使用了高斯分布來(lái)初始化，而 w1 和 w2 的初始化還使用了凱明初始化
然后，代碼進(jìn)入了訓(xùn)練流程，首先進(jìn)行了數(shù)據(jù)的隨機(jī)抽取。隨機(jī)抽取的方法是使用了 tools::range() 函數(shù)生成一個(gè)索引數(shù)組，然后使用 random::shuffle_array() 函數(shù)對(duì)該索引數(shù)組進(jìn)行打亂。打亂的目的是保證每個(gè) epoch 中數(shù)據(jù)的隨機(jī)性不同。
對(duì)于每個(gè) batch，代碼首先使用 tools::slice() 函數(shù)從訓(xùn)練集抽取出 batch_size 個(gè)樣本，然后進(jìn)行前向傳播計(jì)算。
- hidden = images @ w1 + b1
- output = hidden @ w2 + b2
在得到輸出層的數(shù)據(jù)后，代碼計(jì)算了該 batch 的損失函數(shù)。其具體實(shí)現(xiàn)在 nn::sigmoid_cross_entropy_loss() 函數(shù)中，采用交叉熵?fù)p失，它首先將 output 中的每個(gè)元素通過(guò) sigmoid 函數(shù)，然后根據(jù)公式計(jì)算 loss 并返回。

3. 反向傳播

我們先來(lái)回憶下反向傳播的公式

在這里插入圖片描述

根據(jù)上圖來(lái)計(jì)算 $L oss$ 分別對(duì) $W_1$ $B_1$ $W_2$ $B_2$ 的梯度

1.隱藏層的輸出為： $H = relu(XW_1 + B_1)$
2.輸出層的預(yù)測(cè)概率為： $P = sigmoid(HW_2 + B_2)$
3.損失為： $L = B ina ry C ross E n t ro p y L oss (P, Y)$
4.計(jì)算 $L$ 對(duì) $W_2$ 和 $B_2$ 的梯度： $\frac{\partial L}{\partial W_2}=H^{T}(P-Y)$ $\frac{\partial L}{\partial B_{2}}= reduce\_sum(P-Y)$
5.計(jì)算 $L$ 對(duì) $W_1$ 和 $B_1$ 的梯度： $\frac{\partial L}{\partial W_{1}}=X^{T}\frac{\partial L}{\partial(X W_{1}+B_{1})}$ $\frac{\partial L}{\partial B_{1}}=reduce\_sum\frac{\partial L}{\partial(X W_{1}+B_{1})}$
6.拿到梯度后，對(duì)每一個(gè)參數(shù)應(yīng)用優(yōu)化器進(jìn)行更新迭代

來(lái)看下整個(gè)反向傳播的過(guò)程，根據(jù) Loss 分別對(duì) w1、b1、w2、b2 求取梯度，并更新參數(shù)

#include <iostream>
#include <tuple>
#include <string>
#include <fstream>
#include <random>
#include <cmath>
#include <string.h>
#include <algorithm>    // shuffle
#include "matrix.hpp"using namespace std;namespace Application{namespace nn{Matrix drelu(const Matrix& g, cosnt Matrix& x){Matrix output = g;auto px = x.ptr();auto pg = output.ptr();for(int i = 0; i < x.numel(); ++i, ++px, ++pg){if(*px < 0)*pg = 0;}return output;}Matrix relu(cosnt Matrix& x){Matrix output = x;auto p = output.ptr();for(int i = 0; i < x.numel(); ++i, ++p){// max(float, int)*p = std::max(*p, 0.0f);}return output;            }Matrix sigmoid(const Matrix& x){Matrix output = x;auto p = output.ptr();for(int i = 0; i < x.numel(); ++i, ++p){float t = *p;*p = 1.0f / (1.0f + exp(-t));}return output;}float sigmoid_cross_entropy_loss(const Matrix& prob, const Matrix& y){// static auto sigmoid = [](float x){return 1.0f / (1.0f + exp(-x));}auto prob = sigmoid(output);// output => n x 10// y      => n x 10auto po = prob.ptr();auto py = y.ptr();float loss = 0;float eps = 1e-5;for(int i = 0; i < prob.numel(); ++i, ++po, ++py){float prob = *po;// prob = max(min(prob, 1-eps), eps);// loss += -(y * log(p) + (1-y) * log(1-p));loss += (*py) * log(p) + (1 - *py) * log(1 - p);}return -loss / prob.rows();}float eval_accuracy(const Matrix& testset, const Matrix& y,const Matrix& w1, const Matrix& b1, const Matrix& w2, const Matrix& b2){auto hidden = nn::relu(testset.gemm(w1) + b1);auto prob   = nn::sigmoid(hidden.gemm(w2) + b2);int correct = 0;for(int i = 0; i < prob.rows(); ++i){// 為了拿到每一行預(yù)測(cè)的標(biāo)簽值，使用argmaxint predict_label = prob.argmax(i);// 因?yàn)閥是one-hot，所以對(duì)應(yīng)預(yù)測(cè)標(biāo)簽如果是1，則正確if(y(i, predict_label) == 1)correct++;}return correct / (float)prob.rows();}};int run(){Matrix trainimages, trainlabels;Matrix valimages, vallabels;tie(trainimages, trainlabels) = io::load_data("mnist/train-images.idx3-ubyte", "mnist/train-labels.idx1-ubyte");tie(valimages, vallabels)      = io::load_data("mnist/t10k-images.idx3-ubyte",  "mnist/t10k-labels.idx1-ubyte");// hidden = images @ w1 + b1// output = hidden @ w2 + b2// loss   = lossfn(output, onehot_label)// loss.backward()int num_images = trainimages.rows();  // 60000int num_input  = trainimages.cols();  // 784int num_hidden = 1024;int num_output = 10;int num_epoch  = 10;float lr       = 1e-1;int batch_size = 256;float momentum = 0.9f;// drop last -> pytorch dataloaderint num_batch_per_epoch = num_images / batch_size;  // 60000 / 256 = 234(iter)// 訓(xùn)練流程// 1. 隨機(jī)抓取一個(gè)batch，shuffle=True//      怎么實(shí)現(xiàn)隨機(jī)呢？//      重點(diǎn)是，要求，每一個(gè)epoch，抓取的隨機(jī)性不同//     按照索引抓取，index[0,1,2,3,4,5,6,7]//     shuffle(indexs) ->[6,5,4,0,1,3,2,7] auto image_indexs = tools::range(num_images);// 確定矩陣的大小，并且對(duì)矩陣做初始化(凱明初始化fan_in,fan_out)float w1_gain =  2.0f / std::sqrt((float)num_input + num_hidden); // 對(duì)應(yīng)激活函數(shù)        Matrix w1 = random::normal_distribution_matrix(num_input, num_hidden, 0, w1_gain);Matrix b1 = random::normal_distribution_matrix(1, num_hidden);        float w2_gain =  1.0f / std::sqrt((float)num_hidden + num_output); // 對(duì)應(yīng)激活函數(shù)        Matrix w2 = random::normal_distribution_matrix(num_hidden, num_output, 0, w1_gain);Matrix b2 = random::normal_distribution_matrix(1, num_output);for(int epoch = 0; epoch < num_epoch; ++epoch){random::shuffle_array(image_indexs);for(int ibatch = 0; ibatch < num_batch_per_epoch; ++ibatch){    // 一個(gè)epoch// image_indexs[ibatch * batch_size : (ibatch + 1) * batch_size]auto x = tools::slice(trainimages, image_indexs, ibatch * batch_size, batch_size);cout << x.rows() << ", " << x.cols() << endl;auto y = tools::slice(trainlabels, image_indexs, ibatch * batch_size, batch_size);// n x m + 1 x mauto hidden      = x.gemm(w1) + b1;auto hidden_act  = nn::relu(hidden);auto output      = hidden_act.gemm(w2) + b2;auto probability = nn::sigmoid(output)float loss       = nn::sigmoid_cross_entropy_loss(probability, y);cout << "Loss: " << loss << endl;// backward過(guò)程// dloss / doutput = (sigmoid(output) - y) / output.rows// C = AB// G = dloss / dC// dloss / dA = G @ B^T// dloss / dB = A^T @ G// output = hidden_act @ w2auto doutput     = (probability - y) / (float)output.rows();auto db2         = doutput.reduce_sum_by_row();auto dhidden_act = doutput.gemm(w2, false, true);auto dw2         = hidden_act.gemm(doutput, true);// dloss / dhiddenauto dhidden     = nn::drelu(dhidden_act, hidden);// x @ w1 + b1auto db1         = dhidden.reduce_sum_by_row();auto dw1         = x.gemm(dhidden, true);// 更新w1 = w1 - lr * dw1;b1 = b1 - lr * dw1;}float accuracy = nn::eval_accuracy(valimages, vallabels, w1, b1, w2, b2);printf("%d. accuracy: %.2f %%\n", epoch, accuracy * 100);}// 驗(yàn)證// Matrix t = random::normal_distribution_matrix(3, 3);// Matrix s = tools::slice(t, {0, 1, 2}, 1, 2);// cout << t;// cout << s;return 0;}
}int main(){return Application::run();
}

在上面的示例代碼中，主要是求取梯度的過(guò)程比較繁瑣，需要明確 $L$ 分別對(duì) $W_1$ $B_1$ $W_2$ $B_2$ 的導(dǎo)數(shù)計(jì)算，后續(xù)就是利用 SGD 算法進(jìn)行參數(shù)更新。最后在每個(gè) epoch 結(jié)束后，計(jì)算測(cè)試集上的準(zhǔn)確率并輸出結(jié)果

總結(jié)

本次課程跟著杜老師手寫(xiě)了一遍 BP 反向傳播算法，加深了對(duì) BP 的認(rèn)識(shí)，鍛煉了自己的動(dòng)手能力，提高了對(duì) C++ 和神經(jīng)網(wǎng)絡(luò)的進(jìn)一步理解。整個(gè) BP 過(guò)程還是比較清晰的，無(wú)非是前傳 => 計(jì)算Loss => 反傳 => 更新，但還是要求對(duì)許多細(xì)節(jié)的把握，比如如何去優(yōu)化代碼提高性能，如何盡可能的減少矩陣相乘的次數(shù)等等，多想多寫(xiě)吧。期待下次手寫(xiě) AutoGrad 課程😄

查看全文

http://www.risenshineclean.com/news/34677.html

中文亚洲精品无码_熟女乱子伦免费_人人超碰人人爱国产_亚洲熟妇女综合网