如何在 Keras 中使用 YOLOv3 进行对象检测

作者： Jason Brownlee 于 2019 年 10 月 8 日发布于计算机视觉深度学习 405

对象检测是计算机视觉中的一项任务，涉及识别给定照片中一个或多个对象的存在、位置和类型。

这是一个具有挑战性的问题，它需要建立在物体识别（例如，它们在哪里）、物体定位（例如，它们的范围是什么）和物体分类（例如，它们是什么）方法的基础上。

近年来，深度学习技术在对象检测方面取得了最先进的结果，例如在标准基准数据集和计算机视觉竞赛中。值得注意的是“You Only Look Once”或 YOLO 系列卷积神经网络，它们通过单一的端到端模型实现了接近最先进的结果，能够实时执行对象检测。

在本教程中，您将学习如何开发 YOLOv3 模型来检测新照片中的对象。

完成本教程后，您将了解：

用于对象检测的 YOLO 系列卷积神经网络模型，以及称为 YOLOv3 的最新变体。
Keras 深度学习库 YOLOv3 的最佳开源库实现。
如何使用预训练的 YOLOv3 对新照片执行对象定位和检测。

通过我新书《计算机视觉深度学习》开启您的项目，其中包含分步教程和所有示例的Python 源代码文件。

让我们开始吧。

更新于 2019 年 10 月：已更新并针对 Keras 2.3.0 API 和 TensorFlow 2.0.0 进行了测试。

How to Perform Object Detection With YOLOv3 in Keras

如何在 Keras 中使用 YOLOv3 进行对象检测
照片由 David Berkowitz 提供，部分权利保留。

教程概述

本教程分为三个部分；它们是：

YOLO 用于对象检测
Experiencor YOLO3 项目
使用 YOLOv3 进行对象检测

想通过深度学习实现计算机视觉成果吗？

立即参加我为期7天的免费电子邮件速成课程（附示例代码）。

点击注册，同时获得该课程的免费PDF电子书版本。

YOLO 用于对象检测

对象检测是一项计算机视觉任务，它涉及在图像中定位一个或多个对象，并对图像中的每个对象进行分类。

这是一项具有挑战性的计算机视觉任务，它需要成功的对象定位（以定位并围绕图像中的每个对象绘制边界框）和对象分类（以预测被定位对象的正确类别）。

“You Only Look Once”（简称 YOLO）系列模型是由 Joseph Redmon 等人设计的、用于快速对象检测的端到端深度学习模型系列，首次在 2015 年的论文“You Only Look Once: Unified, Real-Time Object Detection”中进行了描述。

该方法涉及一个单一的深度卷积神经网络（最初是 GoogLeNet 的一个版本，后来更新并称为基于 VGG 的 DarkNet），它将输入分割成一个网格单元，每个单元直接预测一个边界框和对象分类。结果是大量的候选边界框，它们通过后处理步骤合并成最终预测。

截至撰写本文时，该方法有三个主要变体：YOLOv1、YOLOv2 和 YOLOv3。第一个版本提出了通用架构，而第二个版本优化了设计并使用了预定义的锚框来改进边界框提议，第三个版本进一步优化了模型架构和训练过程。

虽然这些模型的准确性接近但不如基于区域的卷积神经网络（R-CNNs），但它们因其检测速度而成为对象检测的热门选择，通常在视频或摄像头输入上实现实时检测。

单个神经网络通过一次评估即可直接从完整图像预测边界框和类别概率。由于整个检测管道是一个单一的网络，因此可以直接针对检测性能进行端到端优化。

— You Only Look Once: Unified, Real-Time Object Detection，2015。

在本教程中，我们将重点关注 YOLOv3 的使用。

Experiencor Keras-YOLO3 项目

YOLO 每个版本的源代码以及预训练模型均可用。

官方 DarkNet GitHub 存储库包含论文中提到的 YOLO 版本用 C 语言编写的源代码。该存储库提供了如何使用该代码进行对象检测的分步教程。

从头开始实现这个模型是一项挑战，尤其是对于初学者，因为它需要开发许多定制的模型元素来进行训练和预测。例如，即使直接使用预训练模型，也需要复杂的代码来提炼和解释模型预测的边界框。

与其从头开始开发此代码，不如使用第三方实现。有许多第三方实现旨在将 YOLO 与 Keras 结合使用，但没有一个是标准化并设计为用作库的。

The YAD2K project was a de facto standard for YOLOv2 and provided scripts to convert the pre-trained weights into Keras format, use the pre-trained model to make predictions, and provided the code required to distill interpret the predicted bounding boxes. Many other third-party developers have used this code as a starting point and updated it to support YOLOv3.

也许使用预训练 YOLO 模型最广泛使用的项目是名为“keras-yolo3: Training and Detecting Objects with YOLO3”（作者：Huynh Ngoc Anh 或 experiencor）。该项目的代码已根据宽松的 MIT 开源许可证提供。与 YAD2K 一样，它提供了用于加载和使用预训练 YOLO 模型以及用于在新数据集上开发 YOLOv3 模型的迁移学习的脚本。

他还有一个 keras-yolo2 项目，为 YOLOv2 提供了类似的 कोड 以及有关如何使用存储库中代码的详细教程。keras-yolo3 项目似乎是该项目的更新版本。

有趣的是，experiencor 使用该模型作为一些实验的基础，并在标准对象检测问题（如袋鼠数据集、浣熊数据集、红细胞检测等）上训练了 YOLOv3 的版本。他列出了模型性能，提供了模型权重供下载，并提供了模型行为的 YouTube 视频。例如

使用 YOLO 3 检测浣熊

在本教程中，我们将以 experiencor 的 keras-yolo3 项目为基础，在 Keras 中使用 YOLOv3 模型执行对象检测。

如果存储库发生更改或被移除（第三方开源项目可能会发生这种情况），我们提供了截至撰写本文时的代码分支。

使用 YOLOv3 进行对象检测

The keras-yolo3 project provides a lot of capability for using YOLOv3 models, including object detection, transfer learning, and training new models from scratch.

在本节中，我们将使用预训练模型对未见过的照片执行对象检测。此功能包含在存储库中的一个 Python 文件“yolo3_one_file_to_detect_them_all.py”中，该文件大约有 435 行。该脚本实际上是一个程序，它将使用预训练权重来准备模型，并使用该模型执行对象检测并输出一个模型。它还依赖于 OpenCV。

我们不直接使用这个程序，而是重用这个程序中的元素，并开发我们自己的脚本来首先准备和保存 Keras YOLOv3 模型，然后加载模型以对新照片进行预测。

创建和保存模型

第一步是下载预训练的模型权重。

这些权重是在 MSCOCO 数据集上使用 DarkNet 代码库训练的。下载模型权重并将它们放在当前工作目录中，文件名设置为“yolov3.weights”。这是一个大文件，下载时间可能需要一段时间，具体取决于您的互联网连接速度。

YOLOv3 预训练模型权重 (yolov3.weights) (237 MB)

接下来，我们需要定义一个 Keras 模型，该模型具有与下载的模型权重相匹配的正确数量和类型的层。模型架构称为“DarkNet”，最初松散地基于 VGG-16 模型。

The “yolo3_one_file_to_detect_them_all.py” script provides the make_yolov3_model() function to create the model for us, and the helper function _conv_block() that is used to create blocks of layers. These two functions can be copied directly from the script.

现在我们可以定义 YOLOv3 的 Keras 模型。

# define the model
model = make_yolov3_model()

1 2	# 定义模型 model = make_yolov3_model()

接下来，我们需要加载模型权重。模型权重以 DarkNet 使用的任何格式存储。我们不是手动解码文件，而是可以使用脚本中提供的 WeightReader 类。

要使用 WeightReader，需要使用权重文件的路径（例如，‘yolov3.weights’）对其进行实例化。这将解析文件并将模型权重加载到内存中，然后我们可以将其设置到我们的 Keras 模型中。

# load the model weights
weight_reader = WeightReader('yolov3.weights')

1 2	# 加载模型权重 weight_reader = WeightReader('yolov3.weights')

然后，我们可以调用 WeightReader 实例的 load_weights() 函数，传入我们定义的 Keras 模型以将权重设置到层中。

# set the model weights into the model
weight_reader.load_weights(model)

1 2	# 将模型权重设置到模型中 weight_reader.load_weights(model)

就是这样，我们现在有了一个可用的 YOLOv3 模型。

我们可以将此模型保存到 Keras 兼容的 .h5 模型文件中，以便将来使用。

# save the model to file
model.save('model.h5')

1 2	# 将模型保存到文件 model.save('model.h5')

我们可以将所有这些内容整合在一起；完整的代码示例，包括直接从“yolo3_one_file_to_detect_them_all.py”脚本复制的函数，如下所示：

# create a YOLOv3 Keras model and save it to file
# based on https://github.com/experiencor/keras-yolo3
import struct
import numpy as np
from keras.layers import Conv2D
from keras.layers import Input
from keras.layers import BatchNormalization
from keras.layers import LeakyReLU
from keras.layers import ZeroPadding2D
from keras.layers import UpSampling2D
from keras.layers.merge import add, concatenate
from keras.models import Model

def _conv_block(inp, convs, skip=True):
	x = inp
	count = 0
	for conv in convs:
		if count == (len(convs) - 2) and skip:
			skip_connection = x
		count += 1
		if conv['stride'] > 1: x = ZeroPadding2D(((1,0),(1,0)))(x) # peculiar padding as darknet prefer left and top
		x = Conv2D(conv['filter'],
				   conv['kernel'],
				   strides=conv['stride'],
				   padding='valid' if conv['stride'] > 1 else 'same', # peculiar padding as darknet prefer left and top
				   name='conv_' + str(conv['layer_idx']),
				   use_bias=False if conv['bnorm'] else True)(x)
		if conv['bnorm']: x = BatchNormalization(epsilon=0.001, name='bnorm_' + str(conv['layer_idx']))(x)
		if conv['leaky']: x = LeakyReLU(alpha=0.1, name='leaky_' + str(conv['layer_idx']))(x)
	return add([skip_connection, x]) if skip else x

def make_yolov3_model():
	input_image = Input(shape=(None, None, 3))
	# Layer  0 => 4
	x = _conv_block(input_image, [{'filter': 32, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 0},
								  {'filter': 64, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 1},
								  {'filter': 32, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 2},
								  {'filter': 64, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 3}])
	# Layer  5 => 8
	x = _conv_block(x, [{'filter': 128, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 5},
						{'filter':  64, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 6},
						{'filter': 128, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 7}])
	# Layer  9 => 11
	x = _conv_block(x, [{'filter':  64, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 9},
						{'filter': 128, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 10}])
	# Layer 12 => 15
	x = _conv_block(x, [{'filter': 256, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 12},
						{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 13},
						{'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 14}])
	# Layer 16 => 36
	for i in range(7):
		x = _conv_block(x, [{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 16+i*3},
							{'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 17+i*3}])
	skip_36 = x
	# Layer 37 => 40
	x = _conv_block(x, [{'filter': 512, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 37},
						{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 38},
						{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 39}])
	# Layer 41 => 61
	for i in range(7):
		x = _conv_block(x, [{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 41+i*3},
							{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 42+i*3}])
	skip_61 = x
	# Layer 62 => 65
	x = _conv_block(x, [{'filter': 1024, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 62},
						{'filter':  512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 63},
						{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 64}])
	# Layer 66 => 74
	for i in range(3):
		x = _conv_block(x, [{'filter':  512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 66+i*3},
							{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 67+i*3}])
	# Layer 75 => 79
	x = _conv_block(x, [{'filter':  512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 75},
						{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 76},
						{'filter':  512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 77},
						{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 78},
						{'filter':  512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 79}], skip=False)
	# Layer 80 => 82
	yolo_82 = _conv_block(x, [{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 80},
							  {'filter':  255, 'kernel': 1, 'stride': 1, 'bnorm': False, 'leaky': False, 'layer_idx': 81}], skip=False)
	# Layer 83 => 86
	x = _conv_block(x, [{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 84}], skip=False)
	x = UpSampling2D(2)(x)
	x = concatenate([x, skip_61])
	# Layer 87 => 91
	x = _conv_block(x, [{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 87},
						{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 88},
						{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 89},
						{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 90},
						{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 91}], skip=False)
	# Layer 92 => 94
	yolo_94 = _conv_block(x, [{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 92},
							  {'filter': 255, 'kernel': 1, 'stride': 1, 'bnorm': False, 'leaky': False, 'layer_idx': 93}], skip=False)
	# Layer 95 => 98
	x = _conv_block(x, [{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True,   'layer_idx': 96}], skip=False)
	x = UpSampling2D(2)(x)
	x = concatenate([x, skip_36])
	# Layer 99 => 106
	yolo_106 = _conv_block(x, [{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 99},
							   {'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 100},
							   {'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 101},
							   {'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 102},
							   {'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 103},
							   {'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 104},
							   {'filter': 255, 'kernel': 1, 'stride': 1, 'bnorm': False, 'leaky': False, 'layer_idx': 105}], skip=False)
	model = Model(input_image, [yolo_82, yolo_94, yolo_106])
	return model

class WeightReader:
	def __init__(self, weight_file):
		with open(weight_file, 'rb') as w_f:
			major,	= struct.unpack('i', w_f.read(4))
			minor,	= struct.unpack('i', w_f.read(4))
			revision, = struct.unpack('i', w_f.read(4))
			if (major*10 + minor) >= 2 and major < 1000 and minor < 1000:
				w_f.read(8)
			else:
				w_f.read(4)
			transpose = (major > 1000) or (minor > 1000)
			binary = w_f.read()
		self.offset = 0
		self.all_weights = np.frombuffer(binary, dtype='float32')

	def read_bytes(self, size):
		self.offset = self.offset + size
		return self.all_weights[self.offset-size:self.offset]

	def load_weights(self, model):
		for i in range(106):
			try:
				conv_layer = model.get_layer('conv_' + str(i))
				print("loading weights of convolution #" + str(i))
				if i not in [81, 93, 105]:
					norm_layer = model.get_layer('bnorm_' + str(i))
					size = np.prod(norm_layer.get_weights()[0].shape)
					beta  = self.read_bytes(size) # bias
					gamma = self.read_bytes(size) # scale
					mean  = self.read_bytes(size) # mean
					var   = self.read_bytes(size) # variance
					weights = norm_layer.set_weights([gamma, beta, mean, var])
				if len(conv_layer.get_weights()) > 1:
					bias   = self.read_bytes(np.prod(conv_layer.get_weights()[1].shape))
					kernel = self.read_bytes(np.prod(conv_layer.get_weights()[0].shape))
					kernel = kernel.reshape(list(reversed(conv_layer.get_weights()[0].shape)))
					kernel = kernel.transpose([2,3,1,0])
					conv_layer.set_weights([kernel, bias])
				else:
					kernel = self.read_bytes(np.prod(conv_layer.get_weights()[0].shape))
					kernel = kernel.reshape(list(reversed(conv_layer.get_weights()[0].shape)))
					kernel = kernel.transpose([2,3,1,0])
					conv_layer.set_weights([kernel])
			except ValueError:
				print("no convolution #" + str(i))

	def reset(self):
		self.offset = 0

# define the model
model = make_yolov3_model()
# load the model weights
weight_reader = WeightReader('yolov3.weights')
# set the model weights into the model
weight_reader.load_weights(model)
# save the model to file
model.save('model.h5')

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

# 创建一个 YOLOv3 Keras 模型并将其保存到文件

# 基于 https://github.com/experiencor/keras-yolo3

import struct

import numpy as np

从 keras.layers 导入 Conv2D

from keras.layers import Input

从 keras.层导入 BatchNormalization

from keras.layers import LeakyReLU

from keras.layers import ZeroPadding2D

from keras.layers import UpSampling2D

from keras.layers.merge import add, concatenate

from keras.models import Model

def _conv_block(inp, convs, skip=True):

x = inp

count = 0

for conv in convs:

if count == (len(convs) - 2) and skip:

skip_connection = x

count += 1

if conv['stride'] > 1: x = ZeroPadding2D(((1,0),(1,0)))(x) # peculiar padding as darknet prefer left and top

x = Conv2D(conv['filter'],

conv['kernel'],

strides=conv['stride'],

padding='valid' if conv['stride'] > 1 else 'same', # peculiar padding as darknet prefer left and top

name='conv_' + str(conv['layer_idx']),

use_bias=False if conv['bnorm'] else True)(x)

if conv['bnorm']: x = BatchNormalization(epsilon=0.001, name='bnorm_' + str(conv['layer_idx']))(x)

if conv['leaky']: x = LeakyReLU(alpha=0.1, name='leaky_' + str(conv['layer_idx']))(x)

return add([skip_connection, x]) if skip else x

def make_yolov3_model():

input_image = Input(shape=(None, None, 3))

# Layer 0 => 4

x = _conv_block(input_image, [{'filter': 32, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 0},

{'filter': 64, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 1},

{'filter': 32, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 2},

{'filter': 64, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 3}])

# Layer 5 => 8

x = _conv_block(x, [{'filter': 128, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 5},

{'filter': 64, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 6},

{'filter': 128, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 7}])

# Layer 9 => 11

x = _conv_block(x, [{'filter': 64, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 9},

{'filter': 128, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 10}])

# Layer 12 => 15

x = _conv_block(x, [{'filter': 256, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 12},

{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 13},

{'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 14}])

# 第 16 层 => 36

for i in range(7):

x = _conv_block(x, [{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 16+i*3},

{'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 17+i*3}])

skip_36 = x

# 第 37 层 => 40

x = _conv_block(x, [{'filter': 512, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 37},

{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 38},

{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 39}])

# 第 41 层 => 61

for i in range(7):

x = _conv_block(x, [{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 41+i*3},

{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 42+i*3}])

skip_61 = x

# 第 62 层 => 65

x = _conv_block(x, [{'filter': 1024, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 62},

{'filter': 512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 63},

{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 64}])

# 第 66 层 => 74

for i in range(3):

x = _conv_block(x, [{'filter': 512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 66+i*3},

{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 67+i*3}])

# 第 75 层 => 79

x = _conv_block(x, [{'filter': 512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 75},

{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 76},

{'filter': 512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 77},

{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 78},

{'filter': 512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 79}], skip=False)

# 第 80 层 => 82

yolo_82 = _conv_block(x, [{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 80},

{'filter': 255, 'kernel': 1, 'stride': 1, 'bnorm': False, 'leaky': False, 'layer_idx': 81}], skip=False)

# 第 83 层 => 86

x = _conv_block(x, [{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 84}], skip=False)

x = UpSampling2D(2)(x)

x = concatenate([x, skip_61])

# 第 87 层 => 91

x = _conv_block(x, [{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 87},

{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 88},

{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 89},

{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 90},

{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 91}], skip=False)

# 第 92 层 => 94

yolo_94 = _conv_block(x, [{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 92},

{'filter': 255, 'kernel': 1, 'stride': 1, 'bnorm': False, 'leaky': False, 'layer_idx': 93}], skip=False)

# 第 95 层 => 98

x = _conv_block(x, [{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 96}], skip=False)

x = UpSampling2D(2)(x)

x = concatenate([x, skip_36])

# 第 99 层 => 106

yolo_106 = _conv_block(x, [{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 99},

{'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 100},

{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 101},

{'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 102},

{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 103},

{'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 104},

{'filter': 255, 'kernel': 1, 'stride': 1, 'bnorm': False, 'leaky': False, 'layer_idx': 105}], skip=False)

model = Model(input_image, [yolo_82, yolo_94, yolo_106])

return model

class WeightReader:

def __init__(self, weight_file):

with open(weight_file, 'rb') as w_f:

major, = struct.unpack('i', w_f.read(4))

minor, = struct.unpack('i', w_f.read(4))

revision, = struct.unpack('i', w_f.read(4))

if (major*10 + minor) >= 2 and major < 1000 and minor < 1000:

w_f.read(8)

else:

w_f.read(4)

transpose = (major > 1000) or (minor > 1000)

binary = w_f.read()

self.offset = 0

self.all_weights = np.frombuffer(binary, dtype='float32')

def read_bytes(self, size):

self.offset = self.offset + size

return self.all_weights[self.offset-size:self.offset]

def load_weights(self, model):

for i in range(106):

try:

conv_layer = model.get_layer('conv_' + str(i))

print("loading weights of convolution #" + str(i))

if i not in [81, 93, 105]:

norm_layer = model.get_layer('bnorm_' + str(i))

size = np.prod(norm_layer.get_weights()[0].shape)

beta = self.read_bytes(size) # bias

gamma = self.read_bytes(size) # scale

mean = self.read_bytes(size) # mean

var = self.read_bytes(size) # variance

weights = norm_layer.set_weights([gamma, beta, mean, var])

if len(conv_layer.get_weights()) > 1:

bias = self.read_bytes(np.prod(conv_layer.get_weights()[1].shape))

kernel = self.read_bytes(np.prod(conv_layer.get_weights()[0].shape))

kernel = kernel.reshape(list(reversed(conv_layer.get_weights()[0].shape)))

kernel = kernel.transpose([2,3,1,0])

conv_layer.set_weights([kernel, bias])

else:

kernel = self.read_bytes(np.prod(conv_layer.get_weights()[0].shape))

kernel = kernel.reshape(list(reversed(conv_layer.get_weights()[0].shape)))

kernel = kernel.transpose([2,3,1,0])

conv_layer.set_weights([kernel])

except ValueError:

print("no convolution #" + str(i))

def reset(self):

self.offset = 0

# 定义模型

model = make_yolov3_model()

# 加载模型权重

weight_reader = WeightReader('yolov3.weights')

# 将模型权重设置到模型中

weight_reader.load_weights(model)

# 将模型保存到文件

model.save('model.h5')

运行此示例在现代硬件上执行可能需要不到一分钟的时间。

在加载权重时，您将看到有关加载内容的调试信息，由WeightReader类输出。

...
loading weights of convolution #99
loading weights of convolution #100
loading weights of convolution #101
loading weights of convolution #102
loading weights of convolution #103
loading weights of convolution #104
loading weights of convolution #105

...

loading weights of convolution #99

loading weights of convolution #100

loading weights of convolution #101

loading weights of convolution #102

loading weights of convolution #103

loading weights of convolution #104

loading weights of convolution #105

在运行结束时，model.h5文件将保存在您的当前工作目录中，其大小与原始权重文件（237MB）大致相同，但已准备好直接加载和使用为Keras模型。

进行预测

我们需要一张新的照片来进行对象检测，最好是模型已知的来自MSCOCO数据集的对象。

我们将使用一张由Boegh在一次狩猎中拍摄的三个斑马的照片，并以宽松的许可发布。

三个斑马的照片
由Boegh拍摄，部分权利保留。

三个斑马的照片（zebra.jpg）

下载照片并将其命名为‘zebra.jpg‘，然后将其放在您的当前工作目录中。

进行预测很简单，但解释预测需要一些工作。

第一步是加载Keras模型。这可能是进行预测中最慢的部分。

# load yolov3 model
model = load_model('model.h5')

1 2	# load yolov3 model model = load_model('model.h5')

接下来，我们需要加载我们的新照片并将其准备成适合模型的输入。模型期望输入是彩色图像，分辨率为416×416像素。

我们可以使用Keras函数load_img()来加载图像，并使用target_size参数在加载后调整图像大小。我们还可以使用img_to_array()函数将加载的PIL图像对象转换为NumPy数组，然后将像素值从0-255缩放到0-1的32位浮点值。

# load the image with the required size
image = load_img('zebra.jpg', target_size=(416, 416))
# convert to numpy array
image = img_to_array(image)
# scale pixel values to [0, 1]
image = image.astype('float32')
image /= 255.0

# load the image with the required size

image = load_img('zebra.jpg', target_size=(416, 416))

# 转换为numpy数组

image = img_to_array(image)

# scale pixel values to [0, 1]

image = image.astype('float32')

image /= 255.0

稍后我们还需要再次显示原始照片，这意味着我们需要将所有检测到的对象的边界框从正方形形状缩放到原始形状。因此，我们可以加载图像并获取原始形状。

# load the image to get its shape
image = load_img('zebra.jpg')
width, height = image.size

# load the image to get its shape

image = load_img('zebra.jpg')

width, height = image.size

我们可以将所有这些内容整合到一个名为load_image_pixels()的便利函数中，该函数接受文件名和目标大小，并返回已缩放的像素数据，以便作为Keras模型的输入，以及图像的原始宽度和高度。

# load and prepare an image
def load_image_pixels(filename, shape):
    # load the image to get its shape
    image = load_img(filename)
    width, height = image.size
    # load the image with the required size
    image = load_img(filename, target_size=shape)
    # convert to numpy array
    image = img_to_array(image)
    # scale pixel values to [0, 1]
    image = image.astype('float32')
    image /= 255.0
    # add a dimension so that we have one sample
    image = expand_dims(image, 0)
    return image, width, height

# load and prepare an image

def load_image_pixels(filename, shape):

# load the image to get its shape

image = load_img(filename)

width, height = image.size

# load the image with the required size

image = load_img(filename, target_size=shape)

# convert to numpy array

image = img_to_array(image)

# scale pixel values to [0, 1]

image = image.astype('float32')

image /= 255.0

# add a dimension so that we have one sample

image = expand_dims(image, 0)

return image, width, height

然后，我们可以调用此函数来加载我们的斑马照片。

# define the expected input shape for the model
input_w, input_h = 416, 416
# define our new photo
photo_filename = 'zebra.jpg'
# load and prepare image
image, image_w, image_h = load_image_pixels(photo_filename, (input_w, input_h))

# define the expected input shape for the model

input_w, input_h = 416, 416

# define our new photo

photo_filename = 'zebra.jpg'

# load and prepare image

image, image_w, image_h = load_image_pixels(photo_filename, (input_w, input_h))

现在，我们可以将照片输入Keras模型并进行预测。

# make prediction
yhat = model.predict(image)
# summarize the shape of the list of arrays
print([a.shape for a in yhat])

# 进行预测

yhat = model.predict(image)

# summarize the shape of the list of arrays

print([a.shape for a in yhat])

好了，至少在进行预测方面是这样。完整的示例将在下面列出。

# load yolov3 model and perform object detection
# based on https://github.com/experiencor/keras-yolo3
from numpy import expand_dims
from keras.models import load_model
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array

# load and prepare an image
def load_image_pixels(filename, shape):
    # load the image to get its shape
    image = load_img(filename)
    width, height = image.size
    # load the image with the required size
    image = load_img(filename, target_size=shape)
    # convert to numpy array
    image = img_to_array(image)
    # scale pixel values to [0, 1]
    image = image.astype('float32')
    image /= 255.0
    # add a dimension so that we have one sample
    image = expand_dims(image, 0)
    return image, width, height

# load yolov3 model
model = load_model('model.h5')
# define the expected input shape for the model
input_w, input_h = 416, 416
# define our new photo
photo_filename = 'zebra.jpg'
# load and prepare image
image, image_w, image_h = load_image_pixels(photo_filename, (input_w, input_h))
# make prediction
yhat = model.predict(image)
# summarize the shape of the list of arrays
print([a.shape for a in yhat])

# load yolov3 model and perform object detection

# 基于 https://github.com/experiencor/keras-yolo3

from numpy import expand_dims

from keras.models import load_model

from keras.preprocessing.image import load_img

from keras.preprocessing.image import img_to_array

# load and prepare an image

def load_image_pixels(filename, shape):

# load the image to get its shape

image = load_img(filename)

width, height = image.size

# load the image with the required size

image = load_img(filename, target_size=shape)

# convert to numpy array

image = img_to_array(image)

# scale pixel values to [0, 1]

image = image.astype('float32')

image /= 255.0

# add a dimension so that we have one sample

image = expand_dims(image, 0)

return image, width, height

# load yolov3 model

model = load_model('model.h5')

# define the expected input shape for the model

input_w, input_h = 416, 416

# define our new photo

photo_filename = 'zebra.jpg'

# load and prepare image

image, image_w, image_h = load_image_pixels(photo_filename, (input_w, input_h))

# 进行预测

yhat = model.predict(image)

# summarize the shape of the list of arrays

print([a.shape for a in yhat])

运行此示例将返回一个包含三个NumPy数组的列表，其形状将显示为输出。

这些数组预测边界框和类别标签，但它们是经过编码的。必须对其进行解释。

[(1, 13, 13, 255), (1, 26, 26, 255), (1, 52, 52, 255)]

1	[(1, 13, 13, 255), (1, 26, 26, 255), (1, 52, 52, 255)]

进行预测并解释结果

模型的输出实际上是来自三种不同网格尺寸的编码候选边界框，边界框是根据对MSCOCO数据集对象大小的分析仔细选择的锚框的上下文定义的。

experiencor提供的脚本提供了一个名为decode_netout()的函数，该函数将逐个接收NumPy数组，并解码候选边界框和类别预测。此外，任何不能确信地描述对象的边界框（例如，所有类别概率都低于阈值）都会被忽略。我们将使用60%或0.6的概率。该函数返回一个BoundBox实例列表，这些实例定义了每个边界框在输入图像形状和类别概率的上下文中的角点。

# define the anchors
anchors = [[116,90, 156,198, 373,326], [30,61, 62,45, 59,119], [10,13, 16,30, 33,23]]
# define the probability threshold for detected objects
class_threshold = 0.6
boxes = list()
for i in range(len(yhat)):
	# decode the output of the network
	boxes += decode_netout(yhat[i][0], anchors[i], class_threshold, input_h, input_w)

# 定义锚框

anchors = [[116,90, 156,198, 373,326], [30,61, 62,45, 59,119], [10,13, 16,30, 33,23]]

# 定义检测对象的概率阈值

class_threshold = 0.6

boxes = list()

for i in range(len(yhat)):

# 解码网络的输出

boxes += decode_netout(yhat[i][0], anchors[i], class_threshold, input_h, input_w)

接下来，可以将边界框拉伸回原始图像的形状。这很有帮助，因为它意味着我们之后可以绘制原始图像并绘制边界框，希望能检测到真实的对象。

experiencor脚本提供了correct_yolo_boxes()函数来执行边界框坐标的这种转换，该函数将边界框列表、加载照片的原始形状以及网络的输入形状作为参数。边界框的坐标会直接更新。

# correct the sizes of the bounding boxes for the shape of the image
correct_yolo_boxes(boxes, image_h, image_w, input_h, input_w)

1 2	# 修正边界框的大小以适应图像的形状 correct_yolo_boxes(boxes, image_h, image_w, input_h, input_w)

模型预测了很多候选边界框，其中大部分框将指向相同的对象。边界框列表可以进行过滤，那些重叠并指向同一对象的框可以合并。我们可以将重叠量定义为一个配置参数，在这种情况下是50%或0.5。这种边界框区域的过滤通常称为非极大值抑制，是一个必需的后处理步骤。

experiencor脚本通过do_nms()函数提供此功能，该函数接受边界框列表和阈值参数。它不是清除重叠框，而是清除它们对重叠类别的预测概率。这允许框保留并用于它们也检测到其他对象类型的情况。

# suppress non-maximal boxes
do_nms(boxes, 0.5)

1 2	# 抑制非极大值框 do_nms(boxes, 0.5)

这将使我们拥有的框数量不变，但只有很少的框是我们感兴趣的。我们可以检索那些强烈预测对象存在的框：即置信度超过60%的框。这可以通过枚举所有框并检查类别预测值来实现。然后，我们可以查找框对应的类别标签，并将其添加到列表中。每个框都必须针对每个类别标签进行考虑，以防同一个框强烈预测多个对象。

我们可以开发一个get_boxes()函数来实现这一点，该函数将框列表、已知标签和我们的分类阈值作为参数，并返回框、标签和分数的并行列表。

# get all of the results above a threshold
def get_boxes(boxes, labels, thresh):
	v_boxes, v_labels, v_scores = list(), list(), list()
	# enumerate all boxes
	for box in boxes:
		# enumerate all possible labels
		for i in range(len(labels)):
			# check if the threshold for this label is high enough
			if box.classes[i] > thresh:
				v_boxes.append(box)
				v_labels.append(labels[i])
				v_scores.append(box.classes[i]*100)
				# don't break, many labels may trigger for one box
	return v_boxes, v_labels, v_scores

# 获取高于阈值的所有结果

def get_boxes(boxes, labels, thresh):

v_boxes, v_labels, v_scores = list(), list(), list()

# 枚举所有框

for box in boxes:

# 枚举所有可能的标签

for i in range(len(labels)):

# 检查此标签的阈值是否足够高

if box.classes[i] > thresh:

v_boxes.append(box)

v_labels.append(labels[i])

v_scores.append(box.classes[i]*100)

# 不打断，一个框可能触发多个标签

return v_boxes, v_labels, v_scores

我们可以调用此函数来处理我们的框列表。

我们还需要一个字符串列表，其中包含模型已知的类别标签，以及训练期间使用的正确顺序，特别是来自MSCOCO数据集的那些类别标签。幸运的是，这在experiencor脚本中提供了。

# define the labels
labels = ["person", "bicycle", "car", "motorbike", "aeroplane", "bus", "train", "truck",
    "boat", "traffic light", "fire hydrant", "stop sign", "parking meter", "bench",
    "bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe",
    "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard",
    "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
    "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana",
    "apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake",
    "chair", "sofa", "pottedplant", "bed", "diningtable", "toilet", "tvmonitor", "laptop", "mouse",
    "remote", "keyboard", "cell phone", "microwave", "oven", "toaster", "sink", "refrigerator",
    "book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush"]
# get the details of the detected objects
v_boxes, v_labels, v_scores = get_boxes(boxes, labels, class_threshold)

# 定义标签

labels = ["person", "bicycle", "car", "motorbike", "aeroplane", "bus", "train", "truck",

"boat", "traffic light", "fire hydrant", "stop sign", "parking meter", "bench",

"bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe",

"backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard",

"sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",

"tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana",

"apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake",

"chair", "sofa", "pottedplant", "bed", "diningtable", "toilet", "tvmonitor", "laptop", "mouse",

"remote", "keyboard", "cell phone", "microwave", "oven", "toaster", "sink", "refrigerator",

"book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush"]

# 获取检测到的对象的详细信息

v_boxes, v_labels, v_scores = get_boxes(boxes, labels, class_threshold)

现在我们有了使用YOLOv3模型进行预测、解释结果并绘制它们以供审查所需的所有元素。

# summarize what we found
for i in range(len(v_boxes)):
    print(v_labels[i], v_scores[i])

# 总结我们发现的内容

for i in range(len(v_boxes)):

print(v_labels[i], v_scores[i])

我们还可以绘制原始照片并围绕每个检测到的对象绘制边界框。这可以通过从每个边界框中检索坐标并创建Rectangle对象来实现。

box = v_boxes[i]
# get coordinates
y1, x1, y2, x2 = box.ymin, box.xmin, box.ymax, box.xmax
# calculate width and height of the box
width, height = x2 - x1, y2 - y1
# create the shape
rect = Rectangle((x1, y1), width, height, fill=False, color='white')
# draw the box
ax.add_patch(rect)

box = v_boxes[i]

# 获取坐标

y1, x1, y2, x2 = box.ymin, box.xmin, box.ymax, box.xmax

# 计算框的宽度和高度

width, height = x2 - x1, y2 - y1

# 创建形状

rect = Rectangle((x1, y1), width, height, fill=False, color='white')

# 绘制边界框

ax.add_patch(rect)

我们还可以绘制带有类别标签和置信度的字符串。

# draw text and score in top left corner
label = "%s (%.3f)" % (v_labels[i], v_scores[i])
pyplot.text(x1, y1, label, color='white')

# 在左上角绘制文本和分数

label = "%s (%.3f)" % (v_labels[i], v_scores[i])

pyplot.text(x1, y1, label, color='white')

下面的draw_boxes()函数实现了这一点，它接受原始照片的文件名以及边界框、标签和分数的并行列表，并创建一个显示所有检测到的对象的图。

# draw all results
def draw_boxes(filename, v_boxes, v_labels, v_scores):
	# load the image
	data = pyplot.imread(filename)
	# plot the image
	pyplot.imshow(data)
	# get the context for drawing boxes
	ax = pyplot.gca()
	# plot each box
	for i in range(len(v_boxes)):
		box = v_boxes[i]
		# get coordinates
		y1, x1, y2, x2 = box.ymin, box.xmin, box.ymax, box.xmax
		# calculate width and height of the box
		width, height = x2 - x1, y2 - y1
		# create the shape
		rect = Rectangle((x1, y1), width, height, fill=False, color='white')
		# draw the box
		ax.add_patch(rect)
		# draw text and score in top left corner
		label = "%s (%.3f)" % (v_labels[i], v_scores[i])
		pyplot.text(x1, y1, label, color='white')
	# show the plot
	pyplot.show()

# 绘制所有结果

def draw_boxes(filename, v_boxes, v_labels, v_scores):

# 加载图像

data = pyplot.imread(filename)

# 绘制图像

pyplot.imshow(data)

# 获取绘制框的上下文

ax = pyplot.gca()

# 绘制每个框

for i in range(len(v_boxes)):

box = v_boxes[i]

# 获取坐标

y1, x1, y2, x2 = box.ymin, box.xmin, box.ymax, box.xmax

# 计算框的宽度和高度

width, height = x2 - x1, y2 - y1

# 创建形状

rect = Rectangle((x1, y1), width, height, fill=False, color='white')

# 绘制框

ax.add_patch(rect)

# 在左上角绘制文本和分数

label = "%s (%.3f)" % (v_labels[i], v_scores[i])

pyplot.text(x1, y1, label, color='white')

# 显示图表

pyplot.show()

然后，我们可以调用此函数来绘制我们的最终结果。

# draw what we found
draw_boxes(photo_filename, v_boxes, v_labels, v_scores)

1 2	# 绘制我们发现的内容 draw_boxes(photo_filename, v_boxes, v_labels, v_scores)

现在我们拥有了使用YOLOv3模型进行预测、解释结果并绘制它们以供审查所需的所有元素。

完整的代码列表，包括从experiencor脚本中获取的原始和修改后的函数，为求完整性列在下面。

# load yolov3 model and perform object detection
# based on https://github.com/experiencor/keras-yolo3
import numpy as np
from numpy import expand_dims
from keras.models import load_model
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from matplotlib import pyplot
from matplotlib.patches import Rectangle

class BoundBox:
	def __init__(self, xmin, ymin, xmax, ymax, objness = None, classes = None):
		self.xmin = xmin
		self.ymin = ymin
		self.xmax = xmax
		self.ymax = ymax
		self.objness = objness
		self.classes = classes
		self.label = -1
		self.score = -1

	def get_label(self):
		if self.label == -1:
			self.label = np.argmax(self.classes)

		return self.label

	def get_score(self):
		if self.score == -1:
			self.score = self.classes[self.get_label()]

		return self.score

def _sigmoid(x):
	return 1. / (1. + np.exp(-x))

def decode_netout(netout, anchors, obj_thresh, net_h, net_w):
	grid_h, grid_w = netout.shape[:2]
	nb_box = 3
	netout = netout.reshape((grid_h, grid_w, nb_box, -1))
	nb_class = netout.shape[-1] - 5
	boxes = []
	netout[..., :2]  = _sigmoid(netout[..., :2])
	netout[..., 4:]  = _sigmoid(netout[..., 4:])
	netout[..., 5:]  = netout[..., 4][..., np.newaxis] * netout[..., 5:]
	netout[..., 5:] *= netout[..., 5:] > obj_thresh

	for i in range(grid_h*grid_w):
		row = i / grid_w
		col = i % grid_w
		for b in range(nb_box):
			# 4th element is objectness score
			objectness = netout[int(row)][int(col)][b][4]
			if(objectness.all() <= obj_thresh): continue
			# first 4 elements are x, y, w, and h
			x, y, w, h = netout[int(row)][int(col)][b][:4]
			x = (col + x) / grid_w # center position, unit: image width
			y = (row + y) / grid_h # center position, unit: image height
			w = anchors[2 * b + 0] * np.exp(w) / net_w # unit: image width
			h = anchors[2 * b + 1] * np.exp(h) / net_h # unit: image height
			# last elements are class probabilities
			classes = netout[int(row)][col][b][5:]
			box = BoundBox(x-w/2, y-h/2, x+w/2, y+h/2, objectness, classes)
			boxes.append(box)
	return boxes

def correct_yolo_boxes(boxes, image_h, image_w, net_h, net_w):
	new_w, new_h = net_w, net_h
	for i in range(len(boxes)):
		x_offset, x_scale = (net_w - new_w)/2./net_w, float(new_w)/net_w
		y_offset, y_scale = (net_h - new_h)/2./net_h, float(new_h)/net_h
		boxes[i].xmin = int((boxes[i].xmin - x_offset) / x_scale * image_w)
		boxes[i].xmax = int((boxes[i].xmax - x_offset) / x_scale * image_w)
		boxes[i].ymin = int((boxes[i].ymin - y_offset) / y_scale * image_h)
		boxes[i].ymax = int((boxes[i].ymax - y_offset) / y_scale * image_h)

def _interval_overlap(interval_a, interval_b):
	x1, x2 = interval_a
	x3, x4 = interval_b
	if x3 < x1:
		if x4 < x1:
			return 0
		else:
			return min(x2,x4) - x1
	else:
		if x2 < x3:
			 return 0
		else:
			return min(x2,x4) - x3

def bbox_iou(box1, box2):
	intersect_w = _interval_overlap([box1.xmin, box1.xmax], [box2.xmin, box2.xmax])
	intersect_h = _interval_overlap([box1.ymin, box1.ymax], [box2.ymin, box2.ymax])
	intersect = intersect_w * intersect_h
	w1, h1 = box1.xmax-box1.xmin, box1.ymax-box1.ymin
	w2, h2 = box2.xmax-box2.xmin, box2.ymax-box2.ymin
	union = w1*h1 + w2*h2 - intersect
	return float(intersect) / union

def do_nms(boxes, nms_thresh):
	if len(boxes) > 0:
		nb_class = len(boxes[0].classes)
	else:
		return
	for c in range(nb_class):
		sorted_indices = np.argsort([-box.classes[c] for box in boxes])
		for i in range(len(sorted_indices)):
			index_i = sorted_indices[i]
			if boxes[index_i].classes[c] == 0: continue
			for j in range(i+1, len(sorted_indices)):
				index_j = sorted_indices[j]
				if bbox_iou(boxes[index_i], boxes[index_j]) >= nms_thresh:
					boxes[index_j].classes[c] = 0

# load and prepare an image
def load_image_pixels(filename, shape):
	# load the image to get its shape
	image = load_img(filename)
	width, height = image.size
	# load the image with the required size
	image = load_img(filename, target_size=shape)
	# convert to numpy array
	image = img_to_array(image)
	# scale pixel values to [0, 1]
	image = image.astype('float32')
	image /= 255.0
	# add a dimension so that we have one sample
	image = expand_dims(image, 0)
	return image, width, height

# get all of the results above a threshold
def get_boxes(boxes, labels, thresh):
	v_boxes, v_labels, v_scores = list(), list(), list()
	# enumerate all boxes
	for box in boxes:
		# enumerate all possible labels
		for i in range(len(labels)):
			# check if the threshold for this label is high enough
			if box.classes[i] > thresh:
				v_boxes.append(box)
				v_labels.append(labels[i])
				v_scores.append(box.classes[i]*100)
				# don't break, many labels may trigger for one box
	return v_boxes, v_labels, v_scores

# draw all results
def draw_boxes(filename, v_boxes, v_labels, v_scores):
	# load the image
	data = pyplot.imread(filename)
	# plot the image
	pyplot.imshow(data)
	# get the context for drawing boxes
	ax = pyplot.gca()
	# plot each box
	for i in range(len(v_boxes)):
		box = v_boxes[i]
		# get coordinates
		y1, x1, y2, x2 = box.ymin, box.xmin, box.ymax, box.xmax
		# calculate width and height of the box
		width, height = x2 - x1, y2 - y1
		# create the shape
		rect = Rectangle((x1, y1), width, height, fill=False, color='white')
		# draw the box
		ax.add_patch(rect)
		# draw text and score in top left corner
		label = "%s (%.3f)" % (v_labels[i], v_scores[i])
		pyplot.text(x1, y1, label, color='white')
	# show the plot
	pyplot.show()

# load yolov3 model
model = load_model('model.h5')
# define the expected input shape for the model
input_w, input_h = 416, 416
# define our new photo
photo_filename = 'zebra.jpg'
# load and prepare image
image, image_w, image_h = load_image_pixels(photo_filename, (input_w, input_h))
# make prediction
yhat = model.predict(image)
# summarize the shape of the list of arrays
print([a.shape for a in yhat])
# define the anchors
anchors = [[116,90, 156,198, 373,326], [30,61, 62,45, 59,119], [10,13, 16,30, 33,23]]
# define the probability threshold for detected objects
class_threshold = 0.6
boxes = list()
for i in range(len(yhat)):
	# decode the output of the network
	boxes += decode_netout(yhat[i][0], anchors[i], class_threshold, input_h, input_w)
# correct the sizes of the bounding boxes for the shape of the image
correct_yolo_boxes(boxes, image_h, image_w, input_h, input_w)
# suppress non-maximal boxes
do_nms(boxes, 0.5)
# define the labels
labels = ["person", "bicycle", "car", "motorbike", "aeroplane", "bus", "train", "truck",
	"boat", "traffic light", "fire hydrant", "stop sign", "parking meter", "bench",
	"bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe",
	"backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard",
	"sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
	"tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana",
	"apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake",
	"chair", "sofa", "pottedplant", "bed", "diningtable", "toilet", "tvmonitor", "laptop", "mouse",
	"remote", "keyboard", "cell phone", "microwave", "oven", "toaster", "sink", "refrigerator",
	"book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush"]
# get the details of the detected objects
v_boxes, v_labels, v_scores = get_boxes(boxes, labels, class_threshold)
# summarize what we found
for i in range(len(v_boxes)):
	print(v_labels[i], v_scores[i])
# draw what we found
draw_boxes(photo_filename, v_boxes, v_labels, v_scores)

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

# load yolov3 model and perform object detection

# 基于 https://github.com/experiencor/keras-yolo3

import numpy as np

from numpy import expand_dims

from keras.models import load_model

from keras.preprocessing.image import load_img

from keras.preprocessing.image import img_to_array

from matplotlib import pyplot

from matplotlib.patches import Rectangle

class BoundBox:

def __init__(self, xmin, ymin, xmax, ymax, objness = None, classes = None):

self.xmin = xmin

self.ymin = ymin

self.xmax = xmax

self.ymax = ymax

self.objness = objness

self.classes = classes

self.label = -1

self.score = -1

def get_label(self):

if self.label == -1:

self.label = np.argmax(self.classes)

return self.label

def get_score(self):

if self.score == -1:

self.score = self.classes[self.get_label()]

return self.score

def _sigmoid(x):

return 1. / (1. + np.exp(-x))

def decode_netout(netout, anchors, obj_thresh, net_h, net_w):

grid_h, grid_w = netout.shape[:2]

nb_box = 3

netout = netout.reshape((grid_h, grid_w, nb_box, -1))

nb_class = netout.shape[-1] - 5

boxes = []

netout[..., :2] = _sigmoid(netout[..., :2])

netout[..., 4:] = _sigmoid(netout[..., 4:])

netout[..., 5:] = netout[..., 4] * netout[..., 5:]

netout[..., 5:] *= netout[..., 5:] > obj_thresh

for i in range(grid_h*grid_w):

row = i / grid_w

col = i % grid_w

for b in range(nb_box):

# 4th element is objectness score

objectness = netout[int(row)][int(col)][b][4]

if(objectness.all() <= obj_thresh): continue

# first 4 elements are x, y, w, and h

x, y, w, h = netout[int(row)][int(col)][b][:4]

x = (col + x) / grid_w # center position, unit: image width

y = (row + y) / grid_h # center position, unit: image height

w = anchors[2 * b + 0] * np.exp(w) / net_w # unit: image width

h = anchors[2 * b + 1] * np.exp(h) / net_h # unit: image height

# last elements are class probabilities

classes = netout[int(row)][col][b][5:]

box = BoundBox(x-w/2, y-h/2, x+w/2, y+h/2, objectness, classes)

boxes.append(box)

return boxes

def correct_yolo_boxes(boxes, image_h, image_w, net_h, net_w):

new_w, new_h = net_w, net_h

for i in range(len(boxes)):

x_offset, x_scale = (net_w - new_w)/2./net_w, float(new_w)/net_w

y_offset, y_scale = (net_h - new_h)/2./net_h, float(new_h)/net_h

boxes[i].xmin = int((boxes[i].xmin - x_offset) / x_scale * image_w)

boxes[i].xmax = int((boxes[i].xmax - x_offset) / x_scale * image_w)

boxes[i].ymin = int((boxes[i].ymin - y_offset) / y_scale * image_h)

boxes[i].ymax = int((boxes[i].ymax - y_offset) / y_scale * image_h)

def _interval_overlap(interval_a, interval_b):

x1, x2 = interval_a

x3, x4 = interval_b

if x3 < x1:

if x4 < x1:

return 0

else:

return min(x2,x4) - x1

else:

if x2 < x3:

return 0

else:

return min(x2,x4) - x3

def bbox_iou(box1, box2):

intersect_w = _interval_overlap([box1.xmin, box1.xmax], [box2.xmin, box2.xmax])

intersect_h = _interval_overlap([box1.ymin, box1.ymax], [box2.ymin, box2.ymax])

intersect = intersect_w * intersect_h

w1, h1 = box1.xmax-box1.xmin, box1.ymax-box1.ymin

w2, h2 = box2.xmax-box2.xmin, box2.ymax-box2.ymin

union = w1*h1 + w2*h2 - intersect

return float(intersect) / union

def do_nms(boxes, nms_thresh):

if len(boxes) > 0:

nb_class = len(boxes[0].classes)

else:

return

for c in range(nb_class):

sorted_indices = np.argsort([-box.classes[c] for box in boxes])

for i in range(len(sorted_indices)):

index_i = sorted_indices[i]

if boxes[index_i].classes[c] == 0: continue

for j in range(i+1, len(sorted_indices)):

index_j = sorted_indices[j]

if bbox_iou(boxes[index_i], boxes[index_j]) >= nms_thresh:

boxes[index_j].classes[c] = 0

# load and prepare an image

def load_image_pixels(filename, shape):

# load the image to get its shape

image = load_img(filename)

width, height = image.size

# load the image with the required size

image = load_img(filename, target_size=shape)

# 转换为numpy数组

image = img_to_array(image)

# scale pixel values to [0, 1]

image = image.astype('float32')

image /= 255.0

# add a dimension so that we have one sample

image = expand_dims(image, 0)

return image, width, height

# 获取高于阈值的所有结果

def get_boxes(boxes, labels, thresh):

v_boxes, v_labels, v_scores = list(), list(), list()

# 枚举所有框

for box in boxes:

# 枚举所有可能的标签

for i in range(len(labels)):

# 检查此标签的阈值是否足够高

if box.classes[i] > thresh:

v_boxes.append(box)

v_labels.append(labels[i])

v_scores.append(box.classes[i]*100)

# 不打断，一个框可能触发多个标签

return v_boxes, v_labels, v_scores

# 绘制所有结果

def draw_boxes(filename, v_boxes, v_labels, v_scores):

# 加载图像

data = pyplot.imread(filename)

# 绘制图像

pyplot.imshow(data)

# 获取绘制框的上下文

ax = pyplot.gca()

# 绘制每个框

for i in range(len(v_boxes)):

box = v_boxes[i]

# 获取坐标

y1, x1, y2, x2 = box.ymin, box.xmin, box.ymax, box.xmax

# 计算框的宽度和高度

width, height = x2 - x1, y2 - y1

# 创建形状

rect = Rectangle((x1, y1), width, height, fill=False, color='white')

# 绘制框

ax.add_patch(rect)

# 在左上角绘制文本和分数

label = "%s (%.3f)" % (v_labels[i], v_scores[i])

pyplot.text(x1, y1, label, color='white')

# 显示图表

pyplot.show()

# load yolov3 model

model = load_model('model.h5')

# define the expected input shape for the model

input_w, input_h = 416, 416

# define our new photo

photo_filename = 'zebra.jpg'

# load and prepare image

image, image_w, image_h = load_image_pixels(photo_filename, (input_w, input_h))

# 进行预测

yhat = model.predict(image)

# summarize the shape of the list of arrays

print([a.shape for a in yhat])

# 定义锚框

anchors = [[116,90, 156,198, 373,326], [30,61, 62,45, 59,119], [10,13, 16,30, 33,23]]

# 定义检测对象的概率阈值

class_threshold = 0.6

boxes = list()

for i in range(len(yhat)):

# 解码网络的输出

boxes += decode_netout(yhat[i][0], anchors[i], class_threshold, input_h, input_w)

# 修正边界框的大小以适应图像的形状

correct_yolo_boxes(boxes, image_h, image_w, input_h, input_w)

# 抑制非极大值框

do_nms(boxes, 0.5)