企業(yè)網(wǎng)站建設(shè)電話網(wǎng)絡(luò)推廣方案七步法
前言
????????以下內(nèi)容僅為個人在學習人工智能中所記錄的筆記,先將目標識別算法yolo系列的整理出來分享給大家,供大家學習參考。
????????本文僅對YOLOV3代碼中關(guān)鍵部分進行了注釋,未掌握基礎(chǔ)代碼的鐵汁可以自己百度一下。
????????若文中內(nèi)容有誤,希望大家批評指正。
資料下載
????????YOLOV3論文下載地址:YOLOv3:An Incremental Improvement
回顧
????????YOLO V1:【YOLO系列】YOLO V1論文思想詳解
????????YOLO V2:【YOLO系列】YOLO V2論文思想詳解
????????YOLO V3:【YOLO系列】 YOLOv3論文思想詳解
項目地址
????????YOLOV3 keras版本:下載地址
????????YOLOV3 Tensorflow版本:下載地址
????????YOLOV3 Pytorch版本:下載地址
Gitee倉庫
????????YOLOV3 各版本:yolov3各版本
YOLO V3代碼詳解
????????YOLO V3代碼詳解(一):【YOLO系列】YOLOv3代碼詳解(一):主腳本yolo_video.py
????????YOLO V3代碼詳解(二):【YOLO系列】YOLOv3代碼詳解(二):檢測腳本yolo.py
????????YOLO V3代碼詳解(三):【YOLO系列】YOLOv3代碼詳解(三):訓練腳本train.py
????????本文主要基于keras版本進行講解
????????話不多說,直接上代碼
一、代碼詳解
1、定義卷積神經(jīng)網(wǎng)絡(luò)函數(shù)
@wraps(Conv2D)
def DarknetConv2D(*args, **kwargs):"""Wrapper to set Darknet parameters for Convolution2D."""# 定義一個darknet_conv_kwargs字典,傳遞“kernel_regularizer”、“padding”參數(shù)darknet_conv_kwargs = {'kernel_regularizer': l2(5e-4)}# 如果輸入的kwargs中定義了strides為(2,2),則padding模式為valid,否則為same模式darknet_conv_kwargs['padding'] = 'valid' if kwargs.get('strides') == (2, 2) else 'same'# 將輸入的kwargs值更新到darknet_conv_kwargs字典中darknet_conv_kwargs.update(kwargs)return Conv2D(*args, **darknet_conv_kwargs)def DarknetConv2D_BN_Leaky(*args, **kwargs):"""Darknet Convolution2D followed by BatchNormalization and LeakyReLU."""# 定義一個no_bias_kwargs字典no_bias_kwargs = {'use_bias': False}# 將傳遞里面的kwargs值更新到no_bias_kwargs字典中no_bias_kwargs.update(kwargs)# 返回一個組合函數(shù),由DarknetConv2D、BN、LeakyRelu組成,LeakyRelu的alpha值為0.1,這意味著當輸入值小于 0 時,輸出為 0.1 倍的輸入值;當輸入值大于等于 0 時,輸出為輸入值本身。return compose(# 定義一個Conv2D層DarknetConv2D(*args, **no_bias_kwargs),BatchNormalization(),LeakyReLU(alpha=0.1))
2、定義殘差結(jié)構(gòu)塊函數(shù)
def resblock_body(x, num_filters, num_blocks):"""A series of resblocks starting with a downsampling Convolution2D"""# Darknet uses left and top padding instead of 'same' mode# 進行零填充# 第一個元組(1, 0)指定了垂直方向(或高度方向)的填充。1表示在頂部填充1行零,0表示在底部不填充。# 第二個元組(1, 0)指定了水平方向(或?qū)挾确较?#xff09;的填充。1表示在左側(cè)填充1列零,0表示在右側(cè)不填充。x = ZeroPadding2D(((1, 0), (1, 0)))(x)# 創(chuàng)建一個DarknetConv2D_BN_Leaky卷積層,其中包括卷積層(filters=num_filters, kernel_size=(3, 3),strides=(2, 2),padding="same")、歸一化層BN、激活函數(shù)層LeakyRule# 這里strides=(2, 2),代替了池化的作用x = DarknetConv2D_BN_Leaky(num_filters, (3, 3), strides=(2, 2))(x)# 殘差結(jié)構(gòu)for i in range(num_blocks):y = compose(DarknetConv2D_BN_Leaky(num_filters // 2, (1, 1)),DarknetConv2D_BN_Leaky(num_filters, (3, 3)))(x)x = Add()([x, y])return x
3、定義darknet_body()函數(shù)
def darknet_body(x):'''Darknent body having 52 Convolution2D layers'''# 創(chuàng)建一個DarknetConv2D_BN_Leaky卷積層,其中包括卷積層(filters=32, kernel_size=(3, 3),strides=(1, 1),padding="same")、歸一化層BN、激活函數(shù)層LeakyRulex = DarknetConv2D_BN_Leaky(32, (3, 3))(x)# 殘差結(jié)構(gòu),(輸入,filter數(shù)量, 殘差block數(shù)量)x = resblock_body(x, 64, 1)x = resblock_body(x, 128, 2)x = resblock_body(x, 256, 8)x = resblock_body(x, 512, 8)x = resblock_body(x, 1024, 4)return x
4、定義最后輸出層的神經(jīng)網(wǎng)絡(luò)函數(shù)
def make_last_layers(x, num_filters, out_filters):'''6 Conv2D_BN_Leaky layers followed by a Conv2D_linear layer'''x = compose(DarknetConv2D_BN_Leaky(num_filters, (1, 1)),DarknetConv2D_BN_Leaky(num_filters * 2, (3, 3)),DarknetConv2D_BN_Leaky(num_filters, (1, 1)),DarknetConv2D_BN_Leaky(num_filters * 2, (3, 3)),DarknetConv2D_BN_Leaky(num_filters, (1, 1)))(x)y = compose(DarknetConv2D_BN_Leaky(num_filters * 2, (3, 3)),DarknetConv2D(out_filters, (1, 1)))(x)return x, y
5、定義輸出倒數(shù)三個特征圖函數(shù)
def yolo_body(inputs, num_anchors, num_classes):"""Create YOLO_V3 model CNN body in Keras."""darknet = Model(inputs, darknet_body(inputs))# 輸出三個特征圖# 輸出層的最后計算,包括6個Conv2D_BN_Leaky層和1個Conv2D_linear層x, y1 = make_last_layers(darknet.output, 512, num_anchors * (num_classes + 5))# 最后一層輸出層進行Conv2D_BN_Leaky層與上采樣操作后,與第152層的輸出拼接x = compose(DarknetConv2D_BN_Leaky(256, (1, 1)),UpSampling2D(2))(x)x = Concatenate()([x, darknet.layers[152].output])x, y2 = make_last_layers(x, 256, num_anchors * (num_classes + 5))# 倒數(shù)第二層輸出層進行Conv2D_BN_Leaky層與上采樣操作后,與第92層的輸出拼接x = compose(DarknetConv2D_BN_Leaky(128, (1, 1)),UpSampling2D(2))(x)x = Concatenate()([x, darknet.layers[92].output])x, y3 = make_last_layers(x, 128, num_anchors * (num_classes + 5))return Model(inputs, [y1, y2, y3])
6、定義tiny model的輸出特征圖函數(shù)
def tiny_yolo_body(inputs, num_anchors, num_classes):"""在keras架構(gòu)上創(chuàng)建一個tiny YOLOV3模型,由8個CNN層+6個池化層+上采樣層(CNN+upsampling)+2個輸出(2個CNN+2個Conv)構(gòu)成,總共20層"""# 生成一個卷積組合x1,輸入為inputs,由5個DarknetConv2D_BN_Leaky與4個池化層構(gòu)成x1 = compose(# 創(chuàng)建一個DarknetConv2D_BN_Leaky卷積層,其中包括卷積層(filters=16, kernel_size=(3, 3),strides=(1, 1),padding="same")、歸一化層BN、激活函數(shù)層LeakyRuleDarknetConv2D_BN_Leaky(16, (3, 3)),# 池化層,池化框尺寸為(2,2),步長為(2,2),表示特征圖縮小4倍,即寬和高各縮小2倍;padding模式為same,'same'表示在輸入特征圖的邊緣填充0,使得經(jīng)過池化后,輸出特征圖的大小與輸入特征圖一致MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same'),DarknetConv2D_BN_Leaky(32, (3, 3)),MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same'),DarknetConv2D_BN_Leaky(64, (3, 3)),MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same'),DarknetConv2D_BN_Leaky(128, (3, 3)),MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same'),DarknetConv2D_BN_Leaky(256, (3, 3)))(inputs)# 生成一個卷積組合x2,輸入為x1,由3個DarknetConv2D_BN_Leaky與2個池化層構(gòu)成x2 = compose(MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same'),DarknetConv2D_BN_Leaky(512, (3, 3)),MaxPooling2D(pool_size=(2, 2), strides=(1, 1), padding='same'),DarknetConv2D_BN_Leaky(1024, (3, 3)),DarknetConv2D_BN_Leaky(256, (1, 1)))(x1)# 生成一個預(yù)測層,輸入為x2,由1個DarknetConv2D_BN_Leaky與1個卷積層構(gòu)成,輸出一個N*N*Anchor個數(shù)*(類別數(shù)量+5)的tensory1 = compose(DarknetConv2D_BN_Leaky(512, (3, 3)),DarknetConv2D(num_anchors * (num_classes + 5), (1, 1)))(x2)# 生成一個卷積組合x2,輸入為x2,由1個DarknetConv2D_BN_Leaky與1個上采樣層構(gòu)成x2 = compose(DarknetConv2D_BN_Leaky(128, (1, 1)),UpSampling2D(2))(x2)# 將經(jīng)過上采樣的x2與x1拼接在一起,再1個DarknetConv2D_BN_Leaky層與1個卷積層,輸出一個N*N*Anchor個數(shù)*(類別數(shù)量+5)的tensory2 = compose(Concatenate(),DarknetConv2D_BN_Leaky(256, (3, 3)),DarknetConv2D(num_anchors * (num_classes + 5), (1, 1)))([x2, x1])return Model(inputs, [y1, y2])
7、計算bbox坐標、置信度與類別概率
def yolo_head(feats, anchors, num_classes, input_shape, calc_loss=False):"""Convert final layer features to bounding box parameters.預(yù)測box的坐標,置信度與分類"""num_anchors = len(anchors)# 生成一個tensor,形狀為(batch, height, width, num_anchors, box_params).anchors_tensor = K.reshape(K.constant(anchors), [1, 1, 1, num_anchors, 2])# 獲取輸出層的height, width的維度grid_shape = K.shape(feats)[1:3]# 繪制x、y坐標,y-height, x-width# K.arange(0, stop=grid_shape[0]) 表示生成一個0-(grid_shape[0]-1)的張量grid_y = K.tile(K.reshape(K.arange(0, stop=grid_shape[0]), [-1, 1, 1, 1]),[1, grid_shape[1], 1, 1])grid_x = K.tile(K.reshape(K.arange(0, stop=grid_shape[1]), [1, -1, 1, 1]),[grid_shape[0], 1, 1, 1])grid = K.concatenate([grid_x, grid_y])grid = K.cast(grid, K.dtype(feats))feats = K.reshape(feats, [-1, grid_shape[0], grid_shape[1], num_anchors, 5 + num_classes])# 這一步對應(yīng)論文中Bounding box Prediction.同時做了歸一化box_xy = (K.sigmoid(feats[..., :2]) + grid) / K.cast(grid_shape[::-1], K.dtype(feats))box_wh = K.exp(feats[..., 2:4]) * anchors_tensor / K.cast(input_shape[::-1], K.dtype(feats))# 獲取置信度值與分類值box_confidence = K.sigmoid(feats[..., 4:5])box_class_probs = K.sigmoid(feats[..., 5:])# 計算坐標損失if calc_loss == True:return grid, feats, box_xy, box_whreturn box_xy, box_wh, box_confidence, box_class_probs
8、修正bbox坐標
def yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape):"""Get corrected boxes修正box的坐標將得到的特征圖與原圖相比,求出偏移量,修正box的坐標"""box_yx = box_xy[..., ::-1]box_hw = box_wh[..., ::-1]input_shape = K.cast(input_shape, K.dtype(box_yx))image_shape = K.cast(image_shape, K.dtype(box_yx))# 新生成一個以(input_shape / image_shape)中最小比例的尺寸圖片new_shape = K.round(image_shape * K.min(input_shape / image_shape))# 計算新生成的最小比例的圖片與放大后的特征圖的相對偏移量offset = (input_shape - new_shape) / 2. / input_shape# 計算放大后的特征圖與新生成的最小比例的圖片的比例scale = input_shape / new_shape# 修正box的坐標box_yx = (box_yx - offset) * scalebox_hw *= scalebox_mins = box_yx - (box_hw / 2.)box_maxes = box_yx + (box_hw / 2.)boxes = K.concatenate([box_mins[..., 0:1], # y_minbox_mins[..., 1:2], # x_minbox_maxes[..., 0:1], # y_maxbox_maxes[..., 1:2] # x_max])# Scale boxes back to original image shape.# 反歸一化,求得box在輸入圖片的實際坐標值boxes *= K.concatenate([image_shape, image_shape])return boxes
9、預(yù)測box的坐標(x, y, w, h),置信度與分類(用于預(yù)測)
def yolo_boxes_and_scores(feats, anchors, num_classes, input_shape, image_shape):"""Process Conv layer outputfeats: 輸出層,shape=(m,N,N,3,5+80)anchors: 輸出層對應(yīng)的Anchornum_classes:類別的數(shù)量input_shape: 特征圖放大32倍的尺寸image_shape:輸入圖片的大小"""# 預(yù)測box的坐標(x, y, w, h),置信度與分類box_xy, box_wh, box_confidence, box_class_probs = yolo_head(feats,anchors, num_classes, input_shape)# 修正每個特征圖中box的坐標boxes = yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape)boxes = K.reshape(boxes, [-1, 4])# 計算每個box的置信度box_scores = box_confidence * box_class_probsbox_scores = K.reshape(box_scores, [-1, num_classes])return boxes, box_scores
10、評估函數(shù)()
def yolo_eval(yolo_outputs,anchors,num_classes,image_shape,max_boxes=20,score_threshold=.6,iou_threshold=.5):"""評估函數(shù)Evaluate YOLO model on given input and return filtered boxes.yolo_outputs:輸出層,shape=(m,N,N,3,5+80)anchors:Anchor Boxnum_classes:類別的數(shù)量image_shape:輸入圖像的尺寸max_boxes:box的最大數(shù)量score_threshold:預(yù)測分數(shù)的閾值iou_threshold:IOU的閾值"""# 將Anchor Box與輸出層對應(yīng)num_layers = len(yolo_outputs)anchor_mask = [[6, 7, 8], [3, 4, 5], [0, 1, 2]] if num_layers == 3 else [[3, 4, 5], [1, 2, 3]] # default setting# 將特征圖尺寸放大32倍input_shape = K.shape(yolo_outputs[0])[1:3] * 32boxes = []box_scores = []for l in range(num_layers):# 計算輸出的box與分值_boxes, _box_scores = yolo_boxes_and_scores(yolo_outputs[l],anchors[anchor_mask[l]], num_classes, input_shape, image_shape)boxes.append(_boxes)box_scores.append(_box_scores)boxes = K.concatenate(boxes, axis=0)box_scores = K.concatenate(box_scores, axis=0)# 篩選出分值大于閾值的mask = box_scores >= score_thresholdmax_boxes_tensor = K.constant(max_boxes, dtype='int32')boxes_ = []scores_ = []classes_ = []for c in range(num_classes):# TODO: use keras backend instead of tf.# 將box_scores >= score_threshold的box,box score取出來class_boxes = tf.boolean_mask(boxes, mask[:, c])class_box_scores = tf.boolean_mask(box_scores[:, c], mask[:, c])# 非極大值抑制,去除IOU>iou_threshold的框nms_index = tf.image.non_max_suppression(class_boxes, class_box_scores, max_boxes_tensor, iou_threshold=iou_threshold)# 將剩下的class_boxes、class_box_scores、class取出來class_boxes = K.gather(class_boxes, nms_index)class_box_scores = K.gather(class_box_scores, nms_index)classes = K.ones_like(class_box_scores, 'int32') * cboxes_.append(class_boxes)scores_.append(class_box_scores)classes_.append(classes)boxes_ = K.concatenate(boxes_, axis=0)scores_ = K.concatenate(scores_, axis=0)classes_ = K.concatenate(classes_, axis=0)return boxes_, scores_, classes_
11、對GT框進行預(yù)處理
def preprocess_true_boxes(true_boxes, input_shape, anchors, num_classes):'''Preprocess true boxes to training input formatParameters----------true_boxes: array, shape=(m, T, 5)Absolute x_min, y_min, x_max, y_max, class_id relative to input_shape.input_shape: array-like, hw, multiples of 32anchors: array, shape=(N, 2), whnum_classes: integerReturns-------y_true: list of array, shape like yolo_outputs, xywh are reletive value'''# 首先判斷GT框中的class_id是否超過了類別的總數(shù)assert (true_boxes[..., 4] < num_classes).all(), 'class id must be less than num_classes'# 判斷Anchor Box是否能分為3組,并指定每一組中Anchor Box的索引值# 這里對應(yīng)原文中 作者選擇了9種不同Anchor Box來對3種不同的尺度進行預(yù)測# 特征圖較大的用較小的Anchor([0, 1, 2])去預(yù)測,特征圖較小的用較大的Anchor([6, 7, 8])去預(yù)測num_layers = len(anchors) // 3 # default settinganchor_mask = [[6, 7, 8], [3, 4, 5], [0, 1, 2]] if num_layers == 3 else [[3, 4, 5], [1, 2, 3]]true_boxes = np.array(true_boxes, dtype='float32')input_shape = np.array(input_shape, dtype='int32')# 計算GT框的中心點左邊與寬、高,boxes_xy.shape=(m, T, 2)=boxes_wh.shapeboxes_xy = (true_boxes[..., 0:2] + true_boxes[..., 2:4]) // 2boxes_wh = true_boxes[..., 2:4] - true_boxes[..., 0:2]true_boxes[..., 0:2] = boxes_xy / input_shape[::-1]true_boxes[..., 2:4] = boxes_wh / input_shape[::-1]m = true_boxes.shape[0]# 生成倒數(shù)三層輸出層的特征圖大小(13,13),(26,26),(52,52)grid_shapes = [input_shape // {0: 32, 1: 16, 2: 8}[l] for l in range(num_layers)]# 創(chuàng)建倒數(shù)三層輸出層的y_true零數(shù)組(m,13,13,3,5+80),(m,26,26,3,5+80),(m,52,52,3,5+80)y_true = [np.zeros((m, grid_shapes[l][0], grid_shapes[l][1], len(anchor_mask[l]), 5 + num_classes),dtype='float32') for l in range(num_layers)]# Expand dim to apply broadcasting.# 在anchor box數(shù)組中增加一維,shape=(1, N, 2)anchors = np.expand_dims(anchors, 0)anchor_maxes = anchors / 2.anchor_mins = -anchor_maxes# 要求所有維度的第一維元素要>0,返回的數(shù)組為(1,n)的bool值valid_mask = boxes_wh[..., 0] > 0for b in range(m):# Discard zero rows.# 判斷wh是否為0,若為0則跳過該輪循環(huán), wh.shape=(1, 2)wh = boxes_wh[b, valid_mask[b]]if len(wh) == 0: continue# Expand dim to apply broadcasting.# 在倒數(shù)第二維增加一維,wh.shape=(1, 1, 2)wh = np.expand_dims(wh, -2)box_maxes = wh / 2.box_mins = -box_maxesintersect_mins = np.maximum(box_mins, anchor_mins)intersect_maxes = np.minimum(box_maxes, anchor_maxes)intersect_wh = np.maximum(intersect_maxes - intersect_mins, 0.)intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]box_area = wh[..., 0] * wh[..., 1]anchor_area = anchors[..., 0] * anchors[..., 1]iou = intersect_area / (box_area + anchor_area - intersect_area)# Find best anchor for each true box# 獲取與GT IOU最大的anchor box,記為best anchorbest_anchor = np.argmax(iou, axis=-1)# 將這個IOU最大的anchor box對應(yīng)的GT的y_true記為1for t, n in enumerate(best_anchor):for l in range(num_layers):if n in anchor_mask[l]:i = np.floor(true_boxes[b, t, 0] * grid_shapes[l][1]).astype('int32')j = np.floor(true_boxes[b, t, 1] * grid_shapes[l][0]).astype('int32')k = anchor_mask[l].index(n)c = true_boxes[b, t, 4].astype('int32')y_true[l][b, j, i, k, 0:4] = true_boxes[b, t, 0:4]y_true[l][b, j, i, k, 4] = 1y_true[l][b, j, i, k, 5 + c] = 1return y_true
12、IOU計算
def box_iou(b1, b2):'''Return iou tensorParameters----------b1: tensor, shape=(i1,...,iN, 4), xywhb2: tensor, shape=(j, 4), xywhReturns-------iou: tensor, shape=(i1,...,iN, j)'''# Expand dim to apply broadcasting.b1 = K.expand_dims(b1, -2)b1_xy = b1[..., :2]b1_wh = b1[..., 2:4]b1_wh_half = b1_wh / 2.b1_mins = b1_xy - b1_wh_halfb1_maxes = b1_xy + b1_wh_half# Expand dim to apply broadcasting.b2 = K.expand_dims(b2, 0)b2_xy = b2[..., :2]b2_wh = b2[..., 2:4]b2_wh_half = b2_wh / 2.b2_mins = b2_xy - b2_wh_halfb2_maxes = b2_xy + b2_wh_halfintersect_mins = K.maximum(b1_mins, b2_mins)intersect_maxes = K.minimum(b1_maxes, b2_maxes)intersect_wh = K.maximum(intersect_maxes - intersect_mins, 0.)intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]b1_area = b1_wh[..., 0] * b1_wh[..., 1]b2_area = b2_wh[..., 0] * b2_wh[..., 1]iou = intersect_area / (b1_area + b2_area - intersect_area)return iou
13、損失計算
def yolo_loss(args, anchors, num_classes, ignore_thresh=.5, print_loss=False):"""Return yolo_loss tensorParameters----------yolo_outputs: list of tensor, the output of yolo_body or tiny_yolo_bodyy_true: list of array, the output of preprocess_true_boxesanchors: array, shape=(N, 2), whnum_classes: integerignore_thresh: float, the iou threshold whether to ignore object confidence lossReturns-------loss: tensor, shape=(1,)"""# 這個默認將Anchor Box分為3組num_layers = len(anchors) // 3 # default setting# 將前num_layers層(不含num_layers層)定義為輸出層,yolo_outputs中輸出的張量為(batch_size, height, width, channels)yolo_outputs = args[:num_layers]# 將后num_layers層定義為y_true層y_true = args[num_layers:]# 判斷Anchor Box是否能分為3組,并指定每一組中Anchor Box的索引值# 這里對應(yīng)原文中 作者選擇了9種不同Anchor Box來對3種不同的尺度進行預(yù)測# 特征圖較大的用較小的Anchor([0, 1, 2])去預(yù)測,特征圖較小的用較大的Anchor([6, 7, 8])去預(yù)測anchor_mask = [[6, 7, 8], [3, 4, 5], [0, 1, 2]] if num_layers == 3 else [[3, 4, 5], [1, 2, 3]]# K.cast()函數(shù)用于將一個值從一個類型轉(zhuǎn)換為另一個類型# K.shape(yolo_outputs[0])[1:3]表示獲取yolo_outputs[0]的第二維與第三維的形狀,即(height, width)的形狀# 然后再將(height, width)的形狀放大32倍input_shape = K.cast(K.shape(yolo_outputs[0])[1:3] * 32, K.dtype(y_true[0]))grid_shapes = [K.cast(K.shape(yolo_outputs[l])[1:3], K.dtype(y_true[0])) for l in range(num_layers)]loss = 0# 獲取batch sizem = K.shape(yolo_outputs[0])[0]mf = K.cast(m, K.dtype(yolo_outputs[0]))for l in range(num_layers):# [...]: 用于表示多個冒號,通常用于多維數(shù)組的索引,這里代表取第5維:是否為物體object_mask = y_true[l][..., 4:5]# 獲取物體正確的分類true_class_probs = y_true[l][..., 5:]# 計算圖像的每個像素點坐標grid,輸出層raw_pred,shape=(m, N, N, 3, 5+80),預(yù)測box的坐標(x, y, w, h),grid, raw_pred, pred_xy, pred_wh = yolo_head(yolo_outputs[l],anchors[anchor_mask[l]], num_classes, input_shape, calc_loss=True)pred_box = K.concatenate([pred_xy, pred_wh])# Darknet raw box to calculate loss.# 計算坐標xy偏差raw_true_xy = y_true[l][..., :2] * grid_shapes[l][::-1] - grid# 計算wh的偏移量raw_true_wh = K.log(y_true[l][..., 2:4] / anchors[anchor_mask[l]] * input_shape[::-1])# 當object_mask是物體的時候,返回raw_true_wh,不是物體返回K.zeros_like(raw_true_wh)=0數(shù)組raw_true_wh = K.switch(object_mask, raw_true_wh, K.zeros_like(raw_true_wh)) # avoid log(0)=-infbox_loss_scale = 2 - y_true[l][..., 2:3] * y_true[l][..., 3:4]# Find ignore mask, iterate over each of batch.# 創(chuàng)建一個與y_true[0] 相同數(shù)據(jù)類型的動態(tài)數(shù)組,初始大小為 1。ignore_mask = tf.TensorArray(K.dtype(y_true[0]), size=1, dynamic_size=True)# 將object_mask數(shù)據(jù)類型轉(zhuǎn)為boolobject_mask_bool = K.cast(object_mask, 'bool')# 計算某個bbox與ground truth的重合度是否超過某個閾值,超過則不計入損失計算def loop_body(b, ignore_mask):# 取出是物體的box坐標true_box = tf.boolean_mask(y_true[l][b, ..., 0:4], object_mask_bool[b, ..., 0])# 計算預(yù)測框與GT框的IOU值iou = box_iou(pred_box[b], true_box)# 輸出best_iou < ignore_thresh判斷的0,1值best_iou = K.max(iou, axis=-1)ignore_mask = ignore_mask.write(b, K.cast(best_iou < ignore_thresh, K.dtype(true_box)))return b + 1, ignore_mask_, ignore_mask = K.control_flow_ops.while_loop(lambda b, *args: b < m, loop_body, [0, ignore_mask])ignore_mask = ignore_mask.stack()ignore_mask = K.expand_dims(ignore_mask, -1)# K.binary_crossentropy is helpful to avoid exp overflow.# 坐標損失、置信度損失、分類損失計算xy_loss = object_mask * box_loss_scale * K.binary_crossentropy(raw_true_xy, raw_pred[..., 0:2], from_logits=True)wh_loss = object_mask * box_loss_scale * 0.5 * K.square(raw_true_wh - raw_pred[..., 2:4])confidence_loss = object_mask * K.binary_crossentropy(object_mask, raw_pred[..., 4:5], from_logits=True) + (1 - object_mask) * K.binary_crossentropy(object_mask, raw_pred[..., 4:5], from_logits=True) * ignore_maskclass_loss = object_mask * K.binary_crossentropy(true_class_probs, raw_pred[..., 5:], from_logits=True)xy_loss = K.sum(xy_loss) / mfwh_loss = K.sum(wh_loss) / mfconfidence_loss = K.sum(confidence_loss) / mfclass_loss = K.sum(class_loss) / mfloss += xy_loss + wh_loss + confidence_loss + class_lossif print_loss:loss = tf.Print(loss, [loss, xy_loss, wh_loss, confidence_loss, class_loss, K.sum(ignore_mask)], message='loss: ')return loss