當(dāng)前位置：首頁(yè) > news >正文

福州seo服務(wù)泰州seo外包

news 2025/7/5 14:31:11

福州seo服務(wù),泰州seo外包,國(guó)產(chǎn)erp軟件前十名,主頁(yè)背景圖本來，前輩們用caffe搭建了一個(gè)squeezenet的工程，用起來也還行，但考慮到caffe的停更后續(xù)轉(zhuǎn)trt應(yīng)用在工程上時(shí)可能會(huì)有版本的問題所以搭建了一個(gè)pytorch版本的。以下的環(huán)境搭建不再細(xì)說，主要就是pyorch，其余的需要什么p…

本來，前輩們用caffe搭建了一個(gè)squeezenet的工程，用起來也還行，但考慮到caffe的停更后續(xù)轉(zhuǎn)trt應(yīng)用在工程上時(shí)可能會(huì)有版本的問題所以搭建了一個(gè)pytorch版本的。
以下的環(huán)境搭建不再細(xì)說，主要就是pyorch，其余的需要什么pip install什么。

網(wǎng)絡(luò)搭建

squeezenet的網(wǎng)絡(luò)結(jié)構(gòu)及其具體的參數(shù)如下：
在這里插入圖片描述
后續(xù)對(duì)著這張表進(jìn)行查看每層的輸出時(shí)偶然發(fā)現(xiàn)這張表有問題，一張224×224的圖片經(jīng)過7×7步長(zhǎng)為2的卷積層時(shí)輸出應(yīng)該是109×109才對(duì)，而不是這個(gè)111×111。所以此處我猜測(cè)要不是卷積核的參數(shù)有問題，要不就是這個(gè)輸出結(jié)果有問題。我對(duì)了下下面的結(jié)果，發(fā)現(xiàn)都是從這個(gè)111×111的結(jié)果得出來的，這個(gè)結(jié)果沒問題；但是我又對(duì)了下原有caffe版本的第一個(gè)卷積層用的就是這個(gè)7×7/2的參數(shù)，卷積核也沒問題。這就有點(diǎn)矛盾了…這張表出自作者原論文，論文也是發(fā)表在頂會(huì)上，按道理應(yīng)該不會(huì)有錯(cuò)才對(duì)。才疏學(xué)淺，希望大家有知道咋回事的能告訴我一聲，這里我就還是用這個(gè)卷積核的參數(shù)了。
在這里插入圖片描述
squeezenet有以上三個(gè)版本，我對(duì)了下發(fā)現(xiàn)前輩用的是中間這個(gè)帶有簡(jiǎn)單殘差的結(jié)構(gòu)，為了進(jìn)行對(duì)比這里也就用這個(gè)結(jié)構(gòu)進(jìn)行搭建了。
如下為網(wǎng)絡(luò)結(jié)構(gòu)的代碼：

import torch
import torch.nn as nnclass Fire(nn.Module):def __init__(self, in_channel, squzee_channel, out_channel):super().__init__()self.squeeze = nn.Sequential(nn.Conv2d(in_channel, squzee_channel, 1),nn.ReLU(inplace=True))self.expand_1x1 = nn.Sequential(nn.Conv2d(squzee_channel, out_channel, 1), nn.ReLU(inplace=True))self.expand_3x3 = nn.Sequential(nn.Conv2d(squzee_channel, out_channel, 3, padding=1),nn.ReLU(inplace=True))def forward(self, x):x = self.squeeze(x)x = torch.cat([self.expand_1x1(x),self.expand_3x3(x)], 1)return xclass SqueezeNet_caffe(nn.Module):"""mobile net with simple bypass"""def __init__(self, class_num=5):super().__init__()self.stem = nn.Sequential(nn.Conv2d(in_channels=3, out_channels=96, kernel_size=7, stride=2),nn.ReLU(inplace=True),nn.MaxPool2d(3, 2, ceil_mode=True))self.fire2 = Fire(96, 16, 64)self.fire3 = Fire(128, 16, 64)self.fire4 = Fire(128, 32, 128)self.fire5 = Fire(256, 32, 128)self.fire6 = Fire(256, 48, 192)self.fire7 = Fire(384, 48, 192)self.fire8 = Fire(384, 64, 256)self.fire9 = Fire(512, 64, 256)self.maxpool = nn.MaxPool2d(3, 2, ceil_mode=True)self.classifier = nn.Sequential(nn.Dropout(p=0.5),nn.Conv2d(512, class_num, kernel_size=1),   nn.ReLU(inplace=True),nn.AdaptiveAvgPool2d((1, 1))  )def forward(self, x):x = self.stem(x)f2 = self.fire2(x)f3 = self.fire3(f2) + f2f4 = self.fire4(f3)f4 = self.maxpool(f4)f5 = self.fire5(f4) + f4f6 = self.fire6(f5)f7 = self.fire7(f6) + f6f8 = self.fire8(f7)f8 = self.maxpool(f8)f9 = self.fire9(f8) + f8x = self.classifier(f9)x = x.view(x.size(0), -1)return xdef squeezenet_caffe(class_num=5):return SqueezeNet_caffe(class_num=class_num)

然后其余的整個(gè)工程代碼就是pytorch搭建dataset、dataloader，每輪的前向、計(jì)算loss、反向傳播等都是一個(gè)差不多的套路，就不在這里碼出來了，直接放上鏈接，大家有需要可以直接下載（里面也集成了其他的分類網(wǎng)絡(luò)）。

數(shù)據(jù)處理

dataset我用的是torchvision.datasets.ImageFolder，所以用目錄名稱作為數(shù)據(jù)集的label，目錄結(jié)構(gòu)如下：
在這里插入圖片描述
將每一類的圖片都放在對(duì)應(yīng)的目錄中，驗(yàn)證集以及測(cè)試集的數(shù)據(jù)集也是按照這樣的格式。

運(yùn)行命令

訓(xùn)練命令：

python train.py -net squeezenet_caffe -gpu -b 64 -t_data 訓(xùn)練集路徑 -v_data 驗(yàn)證集路徑 -imgsz 100

-net后面跟著是網(wǎng)絡(luò)類型，都集成了如下的分類網(wǎng)絡(luò)：
在這里插入圖片描述
如果有n卡則-gpu使用gpu訓(xùn)練，-b是batch size，-imgsz是數(shù)據(jù)的input尺寸即resize的尺寸。
測(cè)試命令：

python test.py -net squeezenet_caffe -weights 訓(xùn)練好的模型路徑 -gpu -b 64 -data 測(cè)試集路徑 -imgsz 100

出現(xiàn)問題

一開始進(jìn)行訓(xùn)練一切正常，到后面卻忽然畫風(fēng)突變：
在這里插入圖片描述
loss忽然大幅度上升，acc也同一時(shí)刻大幅度下降，然后數(shù)值不變呈斜率為0的一條直線。估計(jì)是梯度爆炸了（也是到這一步我先從網(wǎng)絡(luò)結(jié)構(gòu)找原因，對(duì)本文的第一張表一層一層對(duì)參數(shù)和結(jié)果才發(fā)現(xiàn)表中的問題），網(wǎng)絡(luò)結(jié)構(gòu)對(duì)完沒問題，于是打印每個(gè)batch的梯度，順便使用clip進(jìn)行剪枝限定其最大閾值。

optimizer.zero_grad()
outputs = net(images)
loss = loss_function(outputs, labels)
loss.backward()grad_max = 0
grad_min = 10
for p in net.parameters():# 打印每個(gè)梯度的模,發(fā)現(xiàn)打印太多了一直刷屏所以改為下面的print最大最小值# print(p.grad.norm())gvalue = p.grad.norm()if gvalue > grad_max:grad_max = gvalueif gvalue < grad_min:grad_min = gvalue
print("grad_max:")
print(grad_max)
print("grad_min:")
print(grad_min)
# 將梯度的模clip到小于10的范圍
torch.nn.utils.clip_grad_norm(p,10)optimizer.step()

按道理來說應(yīng)該會(huì)有所改善，但結(jié)果是，訓(xùn)練幾輪之后依舊出現(xiàn)這個(gè)問題。
但是，果然梯度在曲線異常的時(shí)候數(shù)值也是異常的：
在這里插入圖片描述
剛開始正常學(xué)習(xí)的時(shí)候梯度值基本上都在e-1數(shù)量級(jí)的，曲線異常階段梯度值都如圖所示無限接近0,難怪不學(xué)習(xí)。
我們此時(shí)看一下tensorboard，我將梯度的最大最小值write進(jìn)去，方便追蹤：
在這里插入圖片描述

可以發(fā)現(xiàn)在突變處梯度值忽然爆炸激增，猜測(cè)原因很可能是學(xué)習(xí)率太大了，動(dòng)量振動(dòng)幅度太大了跳出去跳不回來了。查看設(shè)置的學(xué)習(xí)率超參發(fā)現(xiàn)初始值果然太大了（0.1），于是改為0.01。再次運(yùn)行后發(fā)現(xiàn)查看其tensorboard：
在這里插入圖片描述
這回是正常的了。
但其實(shí)我放大查看了梯度爆炸點(diǎn)的梯度值：

發(fā)現(xiàn)其最大值沒超過10，所以我上面的clip沒起到作用，我如果將閾值改成2，結(jié)果如下：
在這里插入圖片描述

發(fā)現(xiàn)起到了作用，但曲線沒那么平滑，可能改成1或者再小一些效果會(huì)更好。但我覺得還是直接改學(xué)習(xí)率一勞永逸比較簡(jiǎn)單。

Pytorch模型轉(zhuǎn)TensorRT模型

在訓(xùn)練了神經(jīng)網(wǎng)絡(luò)之后，TensorRT可以對(duì)網(wǎng)絡(luò)進(jìn)行壓縮、優(yōu)化以及運(yùn)行時(shí)部署，并且沒有框架的開銷。TensorRT通過combines
layers，kernel優(yōu)化選擇，以及根據(jù)指定的精度執(zhí)行歸一化和轉(zhuǎn)換成最優(yōu)的matrix math方法，改善網(wǎng)絡(luò)的延遲、吞吐量以及效率。
總之，通俗來說，就是訓(xùn)練的模型轉(zhuǎn)trt后可以在n卡上高效推理，對(duì)于實(shí)際工程應(yīng)用更加有優(yōu)勢(shì)。

首先將pth轉(zhuǎn)onnx：

# pth->onnx->trtexec
# (optional) Exporting a Model from PyTorch to ONNX and Running it using ONNX Runtime
import torchvision
import torch,os
from models.squeezenet_caffe import squeezenet_caffebatch_size = 1    # just a random numbercurrent_dir=os.path.dirname(os.path.abspath(__file__)) # 獲取當(dāng)前路徑
device = 'cuda' if torch.cuda.is_available() else 'cpu'model = squeezenet_caffe().cuda()model_path='/data/cch/pytorch-cifar100-master/checkpoint/squeezenet_caffe/Monday_04_September_2023_11h_48m_33s/squeezenet_caffe-297-best.pth'  # cloth
state_dict = torch.load(model_path, map_location=device)
print(1)
# mew_state_dict = OrderedDict()
model_dict = model.state_dict()
pretrained_dict = {k: v for k, v in state_dict.items() if (k in model_dict and 'fc' not in k)}
model_dict.update(pretrained_dict)
print(2)
model.load_state_dict(model_dict)
model.eval()
print(3)
# output = model(data)# Input to the model
x = torch.randn(batch_size, 3, 100, 100, requires_grad=True)
x = x.cuda()
torch_out = model(x)# Export the model
torch.onnx.export(model,               # model being runx,                         # model input (or a tuple for multiple inputs)"/data/cch/pytorch-cifar100-master/checkpoint/squeezenet_caffe/Monday_04_September_2023_11h_48m_33s/squeezenet_caffe-297-best.onnx",   # where to save the model (can be a file or file-like object)export_params=True,        # store the trained parameter weights inside the model fileopset_version=10,          # the ONNX version to export the model todo_constant_folding=True,  # whether to execute constant folding for optimizationinput_names = ['input'],   # the model's input namesoutput_names = ['output'], # the model's output namesdynamic_axes={'input' : {0 : 'batch_size'},    # variable length axes'output' : {0 : 'batch_size'}})

只需要修改一下輸入輸出的路徑和輸入的size即可。
然后是onnx轉(zhuǎn)trt，這里需要自己先安裝搭建好tensorrt的環(huán)境（環(huán)境搭建可能會(huì)有點(diǎn)復(fù)雜需要編譯，有時(shí)間單獨(dú)出一個(gè)詳細(xì)的搭建過程），然后在tensorrt工程下的bin目錄下運(yùn)行命令：

./trtexec --onnx=/data/.../best.onnx --saveEngine=/data.../best.trt --workspace=6000

TensorRT可以提供workspace作為每層網(wǎng)絡(luò)執(zhí)行時(shí)的臨時(shí)存儲(chǔ)空間，該空間是共享的以減少顯存占用（單位是M）。具體的原理可以參考這篇。

前向推理

代碼如下：

# 動(dòng)態(tài)推理
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np
import torchvision.transforms as transforms
from PIL import Imagedef load_engine(engine_path):# TRT_LOGGER = trt.Logger(trt.Logger.WARNING)  # INFOTRT_LOGGER = trt.Logger(trt.Logger.ERROR)print('---')print(trt.Runtime(TRT_LOGGER))print('---')with open(engine_path, 'rb') as f, trt.Runtime(TRT_LOGGER) as runtime:return runtime.deserialize_cuda_engine(f.read())# 2. 讀取數(shù)據(jù)，數(shù)據(jù)處理為可以和網(wǎng)絡(luò)結(jié)構(gòu)輸入對(duì)應(yīng)起來的的shape，數(shù)據(jù)可增加預(yù)處理
def get_test_transform():return transforms.Compose([transforms.Resize([100, 100]),transforms.ToTensor(),# transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),transforms.Normalize(mean=[0.4796262, 0.4549252, 0.43396652], std=[0.27888104, 0.28492442, 0.27168077])])image = Image.open('/data/.../dog.jpg') 
image = get_test_transform()(image)
image = image.unsqueeze_(0) # -> NCHW, 1,3,224,224
print("input img mean {} and std {}".format(image.mean(), image.std()))
image =  np.array(image)path = '/data/.../squeezenet_caffe-297-best.trt'
# 1. 建立模型，構(gòu)建上下文管理器
engine = load_engine(path)
print(engine)
context = engine.create_execution_context()
context.active_optimization_profile = 0# 3.分配內(nèi)存空間，并進(jìn)行數(shù)據(jù)cpu到gpu的拷貝
# 動(dòng)態(tài)尺寸，每次都要set一下模型輸入的shape，0代表的就是輸入，輸出根據(jù)具體的網(wǎng)絡(luò)結(jié)構(gòu)而定，可以是0,1,2,3...其中的某個(gè)頭。
context.set_binding_shape(0, image.shape)
d_input = cuda.mem_alloc(image.nbytes)  # 分配輸入的內(nèi)存。
output_shape = context.get_binding_shape(1)
buffer = np.empty(output_shape, dtype=np.float32)
d_output = cuda.mem_alloc(buffer.nbytes)  # 分配輸出內(nèi)存。
cuda.memcpy_htod(d_input, image)
bindings = [d_input, d_output]# 4.進(jìn)行推理，并將結(jié)果從gpu拷貝到cpu。
context.execute_v2(bindings)  # 可異步和同步
cuda.memcpy_dtoh(buffer, d_output)
output = buffer.reshape(output_shape)
y_pred_binary = np.argmax(output, axis=1)
print(y_pred_binary[0])

查看全文

http://www.risenshineclean.com/news/4199.html

中文亚洲精品无码_熟女乱子伦免费_人人超碰人人爱国产_亚洲熟妇女综合网