上海建設(shè)交通黨建網(wǎng)站網(wǎng)絡(luò)推廣都需要做什么
這個(gè)系列代碼被封裝的非常的精致,對(duì)二次開發(fā)不太友好,雖然也還是可以做些調(diào)節(jié)
模型的導(dǎo)出
有三種方式試過,都可以導(dǎo)出onnx的模型
1. 用yolov8
源碼來自:ultralytics\yolo\engine\exporter.py
(不固定尺寸)
yolo export model=path/to/best.pt format=onnx dynamic=True
2. 用yolov5 里 export.py
但是attempt_load_weights這一步,要用yolo v8
3. 直接用 torch
class Demo(nn.Module):def __init__(self, model=None):super(Demo, self).__init__()self.model = YOUR_PROCESS(model, 0, 255, False)def forward(self, img):return self.model(img)[0]from ultralytics.nn.tasks import attempt_load_weights
model = attempt_load_weights(weights, device=0, inplace=True, fuse=True)
model = Demo(model)
model.to(device).eval()
#......(過程省略)
torch.onnx.export(self.model, img, output_path, verbose=False, opset_version=11, input_names=['images'],output_names=['output'],dynamic_axes={'images': {0: 'batch', 2: 'height', 3: 'width'}})
- 這里的Demo和YOUR_PROCESS都需要基于
nn.Module
,在YOUR_PROCESS用于包含一些模型額外的處理。- YOUR_PROCESS 中的內(nèi)容,如果是用于前處理,記得不要進(jìn)行梯度計(jì)算,并對(duì)運(yùn)算過程和整個(gè)層做梯度忽略。如下:
with torch.no_grad():# 在這個(gè)代碼塊中執(zhí)行的操作不會(huì)被記錄用于自動(dòng)求導(dǎo)output = model(input)
self.conv_xx.eval()
# 對(duì)于self.conv_xx層以及與其相關(guān)的層,將啟用評(píng)估模式的行為
output = self.conv_xx(input)
- attempt_load_weights這一步,要用yolo v8的
multi-scale
def preprocess_batch(self, batch, imgsz_train, gs):"""Allows custom preprocessing model inputs and ground truths depending on task type."""sz = random.randrange(int(self.args.imgsz * 0.5), int(self.args.imgsz * 1.5) + self.gs) // self.gs * self.gs # sizesf = sz / max(batch['img'].shape[2:]) # scale factorif sf != 1:ns = [math.ceil(x * sf / self.gs) * self.gs for x in batch['img'].shape[2:]] # new shape (stretched to gs-multiple)batch['img'] = batch['img'].to(self.device, non_blocking=True).float() / 255batch['img'] = nn.functional.interpolate(batch['img'], size=ns, mode='bilinear', align_corners=False)return batch
Loss
Yolo V8 只有兩個(gè) loss, 因?yàn)槭莂nchor-free 了,所以不需要objective loss, 直接看分類和預(yù)測(cè)出來的框的兩個(gè)內(nèi)容。
-
Yolo v5 使用 CIOU 去算的loss, 而 Yolo v8 加入了更符合anchor-free的 loss 。所以這一步,用了兩個(gè)loss一起幫助優(yōu)化IOU。
CIOU loss + DFL
來自Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection這篇論文(DFL論文)- VFL 在 yolo v8中也implement了,但是沒有用上,這個(gè)loss 也是針對(duì) anchor-free 的, 來自arifocalNet: An IoU-aware Dense Object Detector這篇論文(VFL論文)。
-
對(duì)分類loss, yolo v5 使用Focal loss(由BCE為基礎(chǔ)), Yolo v8 沿用
BCEWithLogitsLoss
# cls loss
# loss[1] = self.varifocal_loss(pred_scores, target_scores, target_labels) / target_scores_sum # VFL wayloss[1] = self.bce(pred_scores, target_scores.to(dtype)).sum() / target_scores_sum # BCE# bbox lossif fg_mask.sum():loss[0], loss[2] = self.bbox_loss(pred_distri, pred_bboxes, anchor_points, target_bboxes, target_scores,target_scores_sum, fg_mask)loss[0] *= self.hyp.box # box gainloss[1] *= self.hyp.cls # cls gainloss[2] *= self.hyp.dfl # dfl gainreturn loss.sum() * batch_size, loss.detach() # loss(box, cls, dfl)
VFL
class VarifocalLoss(nn.Module):"""Varifocal loss by Zhang et al. https://arxiv.org/abs/2008.13367."""def __init__(self):"""Initialize the VarifocalLoss class."""super().__init__()def forward(self, pred_score, gt_score, label, alpha=0.75, gamma=2.0):"""Computes varfocal loss."""weight = alpha * pred_score.sigmoid().pow(gamma) * (1 - label) + gt_score * labelwith torch.cuda.amp.autocast(enabled=False):loss = (F.binary_cross_entropy_with_logits(pred_score.float(), gt_score.float(), reduction='none') *weight).sum()return loss
DFL
(感謝大白話 Generalized Focal Loss)
DFL 來自于GFL (Generalised focal loss)
GFL 主要解決了兩個(gè)大的問題:
- classification score 和 IoU/centerness score 訓(xùn)練測(cè)試不一致
- bbox regression 采用的表示不夠靈活,沒有辦法建模復(fù)雜場(chǎng)景下的uncertainty
class BboxLoss(nn.Module):def __init__(self, reg_max, use_dfl=False):"""Initialize the BboxLoss module with regularization maximum and DFL settings."""super().__init__()self.reg_max = reg_maxself.use_dfl = use_dfldef forward(self, pred_dist, pred_bboxes, anchor_points, target_bboxes, target_scores, target_scores_sum, fg_mask):"""IoU loss."""weight = torch.masked_select(target_scores.sum(-1), fg_mask).unsqueeze(-1)iou = bbox_iou(pred_bboxes[fg_mask], target_bboxes[fg_mask], xywh=False, CIoU=True)loss_iou = ((1.0 - iou) * weight).sum() / target_scores_sum# DFL lossif self.use_dfl:target_ltrb = bbox2dist(anchor_points, target_bboxes, self.reg_max)loss_dfl = self._df_loss(pred_dist[fg_mask].view(-1, self.reg_max + 1), target_ltrb[fg_mask]) * weightloss_dfl = loss_dfl.sum() / target_scores_sumelse:loss_dfl = torch.tensor(0.0).to(pred_dist.device)return loss_iou, loss_dfl@staticmethoddef _df_loss(pred_dist, target):"""Return sum of left and right DFL losses."""# Distribution Focal Loss (DFL) proposed in Generalized Focal Loss https://ieeexplore.ieee.org/document/9792391tl = target.long() # target lefttr = tl + 1 # target rightwl = tr - target # weight leftwr = 1 - wl # weight rightreturn (F.cross_entropy(pred_dist, tl.view(-1), reduction='none').view(tl.shape) * wl +F.cross_entropy(pred_dist, tr.view(-1), reduction='none').view(tl.shape) * wr).mean(-1, keepdim=True)
問題1
不一致有兩方面:
方面1 分類,objective 和 IOU 都是各自訓(xùn)練自己的這部分,比如Fcos(論文里提到). 查看了Fcos 的loss 計(jì)算,看到和YOLO的方式類似。
以上的loss 計(jì)算,在GFL一文中,作者不認(rèn)可,他認(rèn)為這個(gè)不夠End-to-End
方面2 歸功于focal loss,分類的計(jì)算,能幫助平衡樣本類別imbalance的情況。但是IOU的計(jì)算這里,沒有考慮到樣本imbalance的情況。如果將不公平的IOU分?jǐn)?shù)乘上還算公平的分類分?jǐn)?shù),那么可能導(dǎo)致這個(gè)結(jié)果有水分(因?yàn)槲覀兿M鸌OU和分類都?jí)蚝玫?#xff0c;排到前面,作為正樣本)。
問題2
作者認(rèn)為對(duì)于訓(xùn)練中IOU形成的Dirac delta distribution 或者預(yù)先做Gaussian分布的假設(shè)不足以用于general purpose 的場(chǎng)景。所以作者提出的新distribution, 可以讓分布有形狀上的明顯特征。
確定為銳利區(qū)域的是紫色箭頭所指,模棱兩可為平滑區(qū)域的為紅色箭頭所指。從圖上看,當(dāng)遇到模棱兩可的情況時(shí),predict 的結(jié)果離 ground truth 有點(diǎn)多。
問題1 solution
針對(duì)方面1的問題,在QFL中用了classification-IoU joint representation(作為NMS score,而不像之前的做法是把兩者相乘)。針對(duì)方面2,將Focal loss 運(yùn)用進(jìn)了 loss 公式。
問題2 solution
第二個(gè)問題使用general 分布解決,雖然數(shù)據(jù)也會(huì)有正負(fù)樣本imbalance問題,但我們做的是目標(biāo)檢測(cè),只有是正樣本的時(shí)候,我們才在意它的IOU,因此作者決定只考慮正樣本的情況,所以DFL僅用了cross entropy。
它之所以也叫focal,應(yīng)該是因?yàn)樗ㄟ^增加兩個(gè)y labels( y i y_i yi? , y i + 1 y_{i+1} yi+1?)的結(jié)果,來快速定位到正確的label y. y i < = y < = y i + 1 y_i <= y <= y_{i+1} yi?<=y<=yi+1?
Finally
總的來說,作者靠QFL & DFL 解決了以上所有問題。但是一直強(qiáng)調(diào)的GFL呢?
在論文中,作者將QFL 與 DFL 做了 unified, GFL 就融合QFL 與 DFL的思想:
https://crossminds.ai/video/generalized-focal-loss-learning-qualified-and-distributed-bounding-boxes-for-dense-object-detection-606fdcaef43a7f2f827bf6f1/
https://paperswithcode.com/method/generalized-focal-loss
https://github.com/implus/GFocal
https://zhuanlan.zhihu.com/p/147691786
OTA
Distillation
DAMO-YOLO 用了KD, yolo v6 用了 self-distillation.
https://www.youtube.com/watch?v=MvM9J1lj1a8
https://openaccess.thecvf.com/content_ICCV_2019/papers/Zhang_Be_Your_Own_Teacher_Improve_the_Performance_of_Convolutional_Neural_ICCV_2019_paper.pdf
https://crossminds.ai/video/generalized-focal-loss-learning-qualified-and-distributed-bounding-boxes-for-dense-object-detection-606fdcaef43a7f2f827bf6f1/