網(wǎng)站開發(fā)中的網(wǎng)頁上傳和網(wǎng)站發(fā)布網(wǎng)站點(diǎn)擊量與排名
?💡💡💡本文獨(dú)家改進(jìn):本文首先復(fù)現(xiàn)了將EMA引入到RT-DETR中,并跟不同模塊進(jìn)行結(jié)合創(chuàng)新;1)Rep C3結(jié)合;2)直接作為注意力機(jī)制放在網(wǎng)絡(luò)不同位置;3)高效和HGBlock結(jié)合;
總有一種改進(jìn)適用你的數(shù)據(jù)集,完成漲點(diǎn)工作并進(jìn)行創(chuàng)新
推薦指數(shù):五星
?RT-DETR魔術(shù)師專欄介紹:
https://blog.csdn.net/m0_63774211/category_12497375.html
???魔改創(chuàng)新RT-DETR
🚀🚀🚀引入前沿頂會創(chuàng)新,助力RT-DETR
🍉🍉🍉基于ultralytics優(yōu)化,與YOLO完美結(jié)合
1.RT-DETR介紹
論文:?https://arxiv.org/pdf/2304.08069.pdf
????????RT-DETR?(Real-Time?DEtection?TRansformer) ,一種基于 DETR 架構(gòu)的實(shí)時端到端檢測器,其在速度和精度上取得了 SOTA 性能
為什么會出現(xiàn):
????????YOLO 檢測器有個較大的待改進(jìn)點(diǎn)是需要 NMS 后處理,其通常難以優(yōu)化且不夠魯棒,因此檢測器的速度存在延遲。為避免該問題,我們將目光移向了不需要 NMS 后處理的 DETR,一種基于 Transformer 的端到端目標(biāo)檢測器。然而,相比于 YOLO 系列檢測器,DETR 系列檢測器的速度要慢的多,這使得"無需 NMS "并未在速度上體現(xiàn)出優(yōu)勢。上述問題促使我們針對實(shí)時的端到端檢測器進(jìn)行探索,旨在基于 DETR 的優(yōu)秀架構(gòu)設(shè)計一個全新的實(shí)時檢測器,從根源上解決 NMS 對實(shí)時檢測器帶來的速度延遲問題。
????????RT-DETR是第一個實(shí)時端到端目標(biāo)檢測器。具體而言,我們設(shè)計了一個高效的混合編碼器,通過解耦尺度內(nèi)交互和跨尺度融合來高效處理多尺度特征,并提出了IoU感知的查詢選擇機(jī)制,以優(yōu)化解碼器查詢的初始化。此外,RT-DETR支持通過使用不同的解碼器層來靈活調(diào)整推理速度,而不需要重新訓(xùn)練,這有助于實(shí)時目標(biāo)檢測器的實(shí)際應(yīng)用。RT-DETR-L在COCO val2017上實(shí)現(xiàn)了53.0%的AP,在T4 GPU上實(shí)現(xiàn)了114FPS,RT-DETR-X實(shí)現(xiàn)了54.8%的AP和74FPS,在速度和精度方面都優(yōu)于相同規(guī)模的所有YOLO檢測器。RT-DETR-R50實(shí)現(xiàn)了53.1%的AP和108FPS,RT-DETR-R101實(shí)現(xiàn)了54.3%的AP和74FPS,在精度上超過了全部使用相同骨干網(wǎng)絡(luò)的DETR檢測器。
?
?
2.EMA介紹?
論文:https://arxiv.org/abs/2305.13563v1?
錄用:ICASSP2023??
????????通過通道降維來建??缤ǖ狸P(guān)系可能會給提取深度視覺表示帶來副作用。本文提出了一種新的高效的多尺度注意力(EMA)模塊。以保留每個通道上的信息和降低計算開銷為目標(biāo),將部分通道重塑為批量維度,并將通道維度分組為多個子特征,使空間語義特征在每個特征組中均勻分布。?
???本文提出了一種新的跨空間學(xué)習(xí)方法,并設(shè)計了一個多尺度并行子網(wǎng)絡(luò)來建立短和長依賴關(guān)系。
1)我們考慮一種通用方法,將部分通道維度重塑為批量維度,以避免通過通用卷積進(jìn)行某種形式的降維。
2)除了在不進(jìn)行通道降維的情況下在每個并行子網(wǎng)絡(luò)中構(gòu)建局部的跨通道交互外,我們還通過跨空間學(xué)習(xí)方法融合兩個并行子網(wǎng)絡(luò)的輸出特征圖。
3)與CBAM、NAM[16]、SA、ECA和CA相比,EMA不僅取得了更好的結(jié)果,而且在所需參數(shù)方面效率更高。
3. EMA加入到RT-DETR
3.1? 新建ultralytics/nn/attention/EMA.py
代碼詳見:
RT-DETR手把手教程,注意力機(jī)制如何添加在網(wǎng)絡(luò)的不同位置進(jìn)行創(chuàng)新優(yōu)化,EMA注意力為案列-CSDN博客
3.3?EMA_attention如何跟RT-DETR結(jié)合進(jìn)行結(jié)合創(chuàng)新
3.3.1 如何跟Rep C3結(jié)合
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'# [depth, width, max_channels]l: [1.00, 1.00, 1024]backbone:# [from, repeats, module, args]- [-1, 1, HGStem, [32, 48]] # 0-P2/4- [-1, 6, HGBlock, [48, 128, 3]] # stage 1- [-1, 1, DWConv, [128, 3, 2, 1, False]] # 2-P3/8- [-1, 6, HGBlock, [96, 512, 3]] # stage 2- [-1, 1, DWConv, [512, 3, 2, 1, False]] # 4-P3/16- [-1, 6, HGBlock, [192, 1024, 5, True, False]] # cm, c2, k, light, shortcut- [-1, 6, HGBlock, [192, 1024, 5, True, True]]- [-1, 6, HGBlock, [192, 1024, 5, True, True]] # stage 3- [-1, 1, DWConv, [1024, 3, 2, 1, False]] # 8-P4/32- [-1, 6, HGBlock, [384, 2048, 5, True, False]] # stage 4head:- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 10 input_proj.2- [-1, 1, AIFI, [1024, 8]]- [-1, 1, Conv, [256, 1, 1]] # 12, Y5, lateral_convs.0- [-1, 1, nn.Upsample, [None, 2, 'nearest']]- [7, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 14 input_proj.1- [[-2, -1], 1, Concat, [1]]- [-1, 3, EMA_attentionC3, [256]] # 16, fpn_blocks.0- [-1, 1, Conv, [256, 1, 1]] # 17, Y4, lateral_convs.1- [-1, 1, nn.Upsample, [None, 2, 'nearest']]- [3, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 19 input_proj.0- [[-2, -1], 1, Concat, [1]] # cat backbone P4- [-1, 3, EMA_attentionC3, [256]] # X3 (21), fpn_blocks.1- [-1, 1, Conv, [256, 3, 2]] # 22, downsample_convs.0- [[-1, 17], 1, Concat, [1]] # cat Y4- [-1, 3, EMA_attentionC3, [256]] # F4 (24), pan_blocks.0- [-1, 1, Conv, [256, 3, 2]] # 25, downsample_convs.1- [[-1, 12], 1, Concat, [1]] # cat Y5- [-1, 3, EMA_attentionC3, [256]] # F5 (27), pan_blocks.1- [[21, 24, 27], 1, RTDETRDecoder, [nc]] # Detect(P3, P4, P5)
3.3.2 直接作為注意力機(jī)制放在網(wǎng)絡(luò)不同位置
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'# [depth, width, max_channels]l: [1.00, 1.00, 1024]backbone:# [from, repeats, module, args]- [-1, 1, HGStem, [32, 48]] # 0-P2/4- [-1, 6, HGBlock, [48, 128, 3]] # stage 1- [-1, 1, DWConv, [128, 3, 2, 1, False]] # 2-P3/8- [-1, 6, HGBlock, [96, 512, 3]] # stage 2- [-1, 1, DWConv, [512, 3, 2, 1, False]] # 4-P3/16- [-1, 6, HGBlock, [192, 1024, 5, True, False]] # cm, c2, k, light, shortcut- [-1, 6, HGBlock, [192, 1024, 5, True, True]]- [-1, 6, HGBlock, [192, 1024, 5, True, True]] # stage 3- [-1, 1, DWConv, [1024, 3, 2, 1, False]] # 8-P4/32- [-1, 6, HGBlock, [384, 2048, 5, True, False]] # stage 4head:- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 10 input_proj.2- [-1, 1, AIFI, [1024, 8]]- [-1, 1, Conv, [256, 1, 1]] # 12, Y5, lateral_convs.0- [-1, 1, EMA_attention, [256]] # 13- [-1, 1, nn.Upsample, [None, 2, 'nearest']]- [7, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 15 input_proj.1- [[-2, -1], 1, Concat, [1]]- [-1, 3, RepC3, [256]] # 17, fpn_blocks.0- [-1, 1, Conv, [256, 1, 1]] # 18, Y4, lateral_convs.1- [-1, 1, EMA_attention, [256]] # 19- [-1, 1, nn.Upsample, [None, 2, 'nearest']]- [3, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 21 input_proj.0- [[-2, -1], 1, Concat, [1]] # cat backbone P4- [-1, 3, RepC3, [256]] # X3 (23), fpn_blocks.1- [-1, 1, Conv, [256, 3, 2]] # 24, downsample_convs.0- [[-1, 19], 1, Concat, [1]] # cat Y4- [-1, 3, RepC3, [256]] # F4 (26), pan_blocks.0- [-1, 1, Conv, [256, 3, 2]] # 27, downsample_convs.1- [[-1, 13], 1, Concat, [1]] # cat Y5- [-1, 3, RepC3, [256]] # F5 (29), pan_blocks.1- [[23, 26, 29], 1, RTDETRDecoder, [nc]] # Detect(P3, P4, P5)
3.3.3 高效和HGBlock結(jié)合
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'# [depth, width, max_channels]l: [1.00, 1.00, 1024]backbone:# [from, repeats, module, args]- [-1, 1, HGStem, [32, 48]] # 0-P2/4- [-1, 6, HGBlock, [48, 128, 3]] # stage 1- [-1, 1, DWConv, [128, 3, 2, 1, False]] # 2-P3/8- [-1, 6, HGBlock, [96, 512, 3]] # stage 2- [-1, 1, DWConv, [512, 3, 2, 1, False]] # 4-P3/16- [-1, 6, HGBlock, [192, 1024, 5, True, False]] # cm, c2, k, light, shortcut- [-1, 6, HGBlock, [192, 1024, 5, True, True]]- [-1, 6, HGBlock, [192, 1024, 5, True, True]] # stage 3- [-1, 1, DWConv, [1024, 3, 2, 1, False]] # 8-P4/32- [-1, 6, HGBlock_EMA_attention, [384, 2048, 5, True, False]] # stage 4head:- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 10 input_proj.2- [-1, 1, AIFI, [1024, 8]]- [-1, 1, Conv, [256, 1, 1]] # 12, Y5, lateral_convs.0- [-1, 1, nn.Upsample, [None, 2, 'nearest']]- [7, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 14 input_proj.1- [[-2, -1], 1, Concat, [1]]- [-1, 3, RepC3, [256]] # 16, fpn_blocks.0- [-1, 1, Conv, [256, 1, 1]] # 17, Y4, lateral_convs.1- [-1, 1, nn.Upsample, [None, 2, 'nearest']]- [3, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 19 input_proj.0- [[-2, -1], 1, Concat, [1]] # cat backbone P4- [-1, 3, RepC3, [256]] # X3 (21), fpn_blocks.1- [-1, 1, Conv, [256, 3, 2]] # 22, downsample_convs.0- [[-1, 17], 1, Concat, [1]] # cat Y4- [-1, 3, RepC3, [256]] # F4 (24), pan_blocks.0- [-1, 1, Conv, [256, 3, 2]] # 25, downsample_convs.1- [[-1, 12], 1, Concat, [1]] # cat Y5- [-1, 3, RepC3, [256]] # F5 (27), pan_blocks.1- [[21, 24, 27], 1, RTDETRDecoder, [nc]] # Detect(P3, P4, P5)
4.總結(jié)
本文復(fù)現(xiàn)了將EMA引入到RT-DETR中,并跟不同模塊進(jìn)行結(jié)合創(chuàng)新;
1)Rep C3結(jié)合;
2)直接作為注意力機(jī)制放在網(wǎng)絡(luò)不同位置;
3)高效和HGBlock結(jié)合;