商標(biāo)注冊(cè)核名查詢(xún)系統(tǒng)成都百度快照優(yōu)化排名
?💡💡💡本文獨(dú)家改進(jìn):本文首先復(fù)現(xiàn)了將EMA引入到RT-DETR中,并跟不同模塊進(jìn)行結(jié)合創(chuàng)新;1)Rep C3結(jié)合;2)直接作為注意力機(jī)制放在網(wǎng)絡(luò)不同位置;3)高效和HGBlock結(jié)合;
總有一種改進(jìn)適用你的數(shù)據(jù)集,完成漲點(diǎn)工作并進(jìn)行創(chuàng)新
推薦指數(shù):五星
?RT-DETR魔術(shù)師專(zhuān)欄介紹:
https://blog.csdn.net/m0_63774211/category_12497375.html
???魔改創(chuàng)新RT-DETR
🚀🚀🚀引入前沿頂會(huì)創(chuàng)新,助力RT-DETR
🍉🍉🍉基于ultralytics優(yōu)化,與YOLO完美結(jié)合
1.RT-DETR介紹
論文:?https://arxiv.org/pdf/2304.08069.pdf
????????RT-DETR?(Real-Time?DEtection?TRansformer) ,一種基于 DETR 架構(gòu)的實(shí)時(shí)端到端檢測(cè)器,其在速度和精度上取得了 SOTA 性能
為什么會(huì)出現(xiàn):
????????YOLO 檢測(cè)器有個(gè)較大的待改進(jìn)點(diǎn)是需要 NMS 后處理,其通常難以?xún)?yōu)化且不夠魯棒,因此檢測(cè)器的速度存在延遲。為避免該問(wèn)題,我們將目光移向了不需要 NMS 后處理的 DETR,一種基于 Transformer 的端到端目標(biāo)檢測(cè)器。然而,相比于 YOLO 系列檢測(cè)器,DETR 系列檢測(cè)器的速度要慢的多,這使得"無(wú)需 NMS "并未在速度上體現(xiàn)出優(yōu)勢(shì)。上述問(wèn)題促使我們針對(duì)實(shí)時(shí)的端到端檢測(cè)器進(jìn)行探索,旨在基于 DETR 的優(yōu)秀架構(gòu)設(shè)計(jì)一個(gè)全新的實(shí)時(shí)檢測(cè)器,從根源上解決 NMS 對(duì)實(shí)時(shí)檢測(cè)器帶來(lái)的速度延遲問(wèn)題。
????????RT-DETR是第一個(gè)實(shí)時(shí)端到端目標(biāo)檢測(cè)器。具體而言,我們?cè)O(shè)計(jì)了一個(gè)高效的混合編碼器,通過(guò)解耦尺度內(nèi)交互和跨尺度融合來(lái)高效處理多尺度特征,并提出了IoU感知的查詢(xún)選擇機(jī)制,以?xún)?yōu)化解碼器查詢(xún)的初始化。此外,RT-DETR支持通過(guò)使用不同的解碼器層來(lái)靈活調(diào)整推理速度,而不需要重新訓(xùn)練,這有助于實(shí)時(shí)目標(biāo)檢測(cè)器的實(shí)際應(yīng)用。RT-DETR-L在COCO val2017上實(shí)現(xiàn)了53.0%的AP,在T4 GPU上實(shí)現(xiàn)了114FPS,RT-DETR-X實(shí)現(xiàn)了54.8%的AP和74FPS,在速度和精度方面都優(yōu)于相同規(guī)模的所有YOLO檢測(cè)器。RT-DETR-R50實(shí)現(xiàn)了53.1%的AP和108FPS,RT-DETR-R101實(shí)現(xiàn)了54.3%的AP和74FPS,在精度上超過(guò)了全部使用相同骨干網(wǎng)絡(luò)的DETR檢測(cè)器。
?
?
2.EMA介紹?
論文:https://arxiv.org/abs/2305.13563v1?
錄用:ICASSP2023??
????????通過(guò)通道降維來(lái)建模跨通道關(guān)系可能會(huì)給提取深度視覺(jué)表示帶來(lái)副作用。本文提出了一種新的高效的多尺度注意力(EMA)模塊。以保留每個(gè)通道上的信息和降低計(jì)算開(kāi)銷(xiāo)為目標(biāo),將部分通道重塑為批量維度,并將通道維度分組為多個(gè)子特征,使空間語(yǔ)義特征在每個(gè)特征組中均勻分布。?
???本文提出了一種新的跨空間學(xué)習(xí)方法,并設(shè)計(jì)了一個(gè)多尺度并行子網(wǎng)絡(luò)來(lái)建立短和長(zhǎng)依賴(lài)關(guān)系。
1)我們考慮一種通用方法,將部分通道維度重塑為批量維度,以避免通過(guò)通用卷積進(jìn)行某種形式的降維。
2)除了在不進(jìn)行通道降維的情況下在每個(gè)并行子網(wǎng)絡(luò)中構(gòu)建局部的跨通道交互外,我們還通過(guò)跨空間學(xué)習(xí)方法融合兩個(gè)并行子網(wǎng)絡(luò)的輸出特征圖。
3)與CBAM、NAM[16]、SA、ECA和CA相比,EMA不僅取得了更好的結(jié)果,而且在所需參數(shù)方面效率更高。
3. EMA加入到RT-DETR
3.1? 新建ultralytics/nn/attention/EMA.py
代碼詳見(jiàn):
RT-DETR手把手教程,注意力機(jī)制如何添加在網(wǎng)絡(luò)的不同位置進(jìn)行創(chuàng)新優(yōu)化,EMA注意力為案列-CSDN博客
3.3?EMA_attention如何跟RT-DETR結(jié)合進(jìn)行結(jié)合創(chuàng)新
3.3.1 如何跟Rep C3結(jié)合
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'# [depth, width, max_channels]l: [1.00, 1.00, 1024]backbone:# [from, repeats, module, args]- [-1, 1, HGStem, [32, 48]] # 0-P2/4- [-1, 6, HGBlock, [48, 128, 3]] # stage 1- [-1, 1, DWConv, [128, 3, 2, 1, False]] # 2-P3/8- [-1, 6, HGBlock, [96, 512, 3]] # stage 2- [-1, 1, DWConv, [512, 3, 2, 1, False]] # 4-P3/16- [-1, 6, HGBlock, [192, 1024, 5, True, False]] # cm, c2, k, light, shortcut- [-1, 6, HGBlock, [192, 1024, 5, True, True]]- [-1, 6, HGBlock, [192, 1024, 5, True, True]] # stage 3- [-1, 1, DWConv, [1024, 3, 2, 1, False]] # 8-P4/32- [-1, 6, HGBlock, [384, 2048, 5, True, False]] # stage 4head:- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 10 input_proj.2- [-1, 1, AIFI, [1024, 8]]- [-1, 1, Conv, [256, 1, 1]] # 12, Y5, lateral_convs.0- [-1, 1, nn.Upsample, [None, 2, 'nearest']]- [7, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 14 input_proj.1- [[-2, -1], 1, Concat, [1]]- [-1, 3, EMA_attentionC3, [256]] # 16, fpn_blocks.0- [-1, 1, Conv, [256, 1, 1]] # 17, Y4, lateral_convs.1- [-1, 1, nn.Upsample, [None, 2, 'nearest']]- [3, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 19 input_proj.0- [[-2, -1], 1, Concat, [1]] # cat backbone P4- [-1, 3, EMA_attentionC3, [256]] # X3 (21), fpn_blocks.1- [-1, 1, Conv, [256, 3, 2]] # 22, downsample_convs.0- [[-1, 17], 1, Concat, [1]] # cat Y4- [-1, 3, EMA_attentionC3, [256]] # F4 (24), pan_blocks.0- [-1, 1, Conv, [256, 3, 2]] # 25, downsample_convs.1- [[-1, 12], 1, Concat, [1]] # cat Y5- [-1, 3, EMA_attentionC3, [256]] # F5 (27), pan_blocks.1- [[21, 24, 27], 1, RTDETRDecoder, [nc]] # Detect(P3, P4, P5)
3.3.2 直接作為注意力機(jī)制放在網(wǎng)絡(luò)不同位置
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'# [depth, width, max_channels]l: [1.00, 1.00, 1024]backbone:# [from, repeats, module, args]- [-1, 1, HGStem, [32, 48]] # 0-P2/4- [-1, 6, HGBlock, [48, 128, 3]] # stage 1- [-1, 1, DWConv, [128, 3, 2, 1, False]] # 2-P3/8- [-1, 6, HGBlock, [96, 512, 3]] # stage 2- [-1, 1, DWConv, [512, 3, 2, 1, False]] # 4-P3/16- [-1, 6, HGBlock, [192, 1024, 5, True, False]] # cm, c2, k, light, shortcut- [-1, 6, HGBlock, [192, 1024, 5, True, True]]- [-1, 6, HGBlock, [192, 1024, 5, True, True]] # stage 3- [-1, 1, DWConv, [1024, 3, 2, 1, False]] # 8-P4/32- [-1, 6, HGBlock, [384, 2048, 5, True, False]] # stage 4head:- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 10 input_proj.2- [-1, 1, AIFI, [1024, 8]]- [-1, 1, Conv, [256, 1, 1]] # 12, Y5, lateral_convs.0- [-1, 1, EMA_attention, [256]] # 13- [-1, 1, nn.Upsample, [None, 2, 'nearest']]- [7, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 15 input_proj.1- [[-2, -1], 1, Concat, [1]]- [-1, 3, RepC3, [256]] # 17, fpn_blocks.0- [-1, 1, Conv, [256, 1, 1]] # 18, Y4, lateral_convs.1- [-1, 1, EMA_attention, [256]] # 19- [-1, 1, nn.Upsample, [None, 2, 'nearest']]- [3, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 21 input_proj.0- [[-2, -1], 1, Concat, [1]] # cat backbone P4- [-1, 3, RepC3, [256]] # X3 (23), fpn_blocks.1- [-1, 1, Conv, [256, 3, 2]] # 24, downsample_convs.0- [[-1, 19], 1, Concat, [1]] # cat Y4- [-1, 3, RepC3, [256]] # F4 (26), pan_blocks.0- [-1, 1, Conv, [256, 3, 2]] # 27, downsample_convs.1- [[-1, 13], 1, Concat, [1]] # cat Y5- [-1, 3, RepC3, [256]] # F5 (29), pan_blocks.1- [[23, 26, 29], 1, RTDETRDecoder, [nc]] # Detect(P3, P4, P5)
3.3.3 高效和HGBlock結(jié)合
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'# [depth, width, max_channels]l: [1.00, 1.00, 1024]backbone:# [from, repeats, module, args]- [-1, 1, HGStem, [32, 48]] # 0-P2/4- [-1, 6, HGBlock, [48, 128, 3]] # stage 1- [-1, 1, DWConv, [128, 3, 2, 1, False]] # 2-P3/8- [-1, 6, HGBlock, [96, 512, 3]] # stage 2- [-1, 1, DWConv, [512, 3, 2, 1, False]] # 4-P3/16- [-1, 6, HGBlock, [192, 1024, 5, True, False]] # cm, c2, k, light, shortcut- [-1, 6, HGBlock, [192, 1024, 5, True, True]]- [-1, 6, HGBlock, [192, 1024, 5, True, True]] # stage 3- [-1, 1, DWConv, [1024, 3, 2, 1, False]] # 8-P4/32- [-1, 6, HGBlock_EMA_attention, [384, 2048, 5, True, False]] # stage 4head:- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 10 input_proj.2- [-1, 1, AIFI, [1024, 8]]- [-1, 1, Conv, [256, 1, 1]] # 12, Y5, lateral_convs.0- [-1, 1, nn.Upsample, [None, 2, 'nearest']]- [7, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 14 input_proj.1- [[-2, -1], 1, Concat, [1]]- [-1, 3, RepC3, [256]] # 16, fpn_blocks.0- [-1, 1, Conv, [256, 1, 1]] # 17, Y4, lateral_convs.1- [-1, 1, nn.Upsample, [None, 2, 'nearest']]- [3, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 19 input_proj.0- [[-2, -1], 1, Concat, [1]] # cat backbone P4- [-1, 3, RepC3, [256]] # X3 (21), fpn_blocks.1- [-1, 1, Conv, [256, 3, 2]] # 22, downsample_convs.0- [[-1, 17], 1, Concat, [1]] # cat Y4- [-1, 3, RepC3, [256]] # F4 (24), pan_blocks.0- [-1, 1, Conv, [256, 3, 2]] # 25, downsample_convs.1- [[-1, 12], 1, Concat, [1]] # cat Y5- [-1, 3, RepC3, [256]] # F5 (27), pan_blocks.1- [[21, 24, 27], 1, RTDETRDecoder, [nc]] # Detect(P3, P4, P5)
4.總結(jié)
本文復(fù)現(xiàn)了將EMA引入到RT-DETR中,并跟不同模塊進(jìn)行結(jié)合創(chuàng)新;
1)Rep C3結(jié)合;
2)直接作為注意力機(jī)制放在網(wǎng)絡(luò)不同位置;
3)高效和HGBlock結(jié)合;