如何手动优化神经网络模型

作者： Jason Brownlee 发布于 2021年10月12日在优化 13

深度学习神经网络模型使用随机梯度下降优化算法在训练数据上进行拟合。

模型的权重通过误差反向传播算法进行更新。优化算法和权重更新算法的组合经过精心选择，是目前已知拟合神经网络最有效的方法。

然而，也可以使用替代优化算法将神经网络模型拟合到训练数据集。这对于学习神经网络如何工作以及优化在应用机器学习中的核心作用来说，可能是一个有用的练习。对于具有非常规模型架构和不可微分传递函数的神经网络，可能也需要这样做。

在本教程中，您将学习如何手动优化神经网络模型的权重。

完成本教程后，您将了解：

如何从头开始开发神经网络模型的前向推理过程。
如何优化感知器模型用于二元分类的权重。
如何使用随机爬山算法优化多层感知器模型的权重。

使用我的新书机器学习优化，启动您的项目，包括分步教程和所有示例的Python源代码文件。

让我们开始吧。

How to Manually Optimize Neural Network Models

如何手动优化神经网络模型
图片由土地管理局提供，保留部分权利。

教程概述

本教程分为三个部分；它们是：

优化神经网络
优化感知器模型
优化多层感知器

优化神经网络

深度学习或神经网络是一种灵活的机器学习类型。

它们是由受大脑结构和功能启发而组成的节点和层模型。神经网络模型通过将给定的输入向量传播通过一个或多个层来生成一个数值输出，该输出可以用于分类或回归预测建模。

模型通过反复将模型暴露于输入和输出示例，并调整权重以最小化模型输出与预期输出之间的误差来训练。这被称为随机梯度下降优化算法。模型的权重使用一种特定的微积分规则进行调整，该规则将误差按比例分配给网络中的每个权重。这被称为反向传播算法。

使用反向传播进行权重更新的随机梯度下降优化算法是训练神经网络模型的最佳方法。然而，它并不是训练神经网络的唯一方法。

可以使用任何任意优化算法来训练神经网络模型。

也就是说，我们可以定义一个神经网络模型架构，并使用给定的优化算法来找到一组权重，使模型的预测误差最小化或分类准确率最大化。

使用替代优化算法的效率预计平均低于使用带有反向传播的随机梯度下降。然而，在某些特定情况下，例如非标准网络架构或非微分传递函数，它可能会更有效。

这也可以是一个有趣的练习，以展示优化在训练机器学习算法，特别是神经网络中的核心作用。

接下来，让我们探讨如何使用随机爬山算法训练一个简单的单节点神经网络，称为感知器模型。

想要开始学习优化算法吗？

立即参加我为期7天的免费电子邮件速成课程（附示例代码）。

点击注册，同时获得该课程的免费PDF电子书版本。

优化感知器模型

感知器算法是最简单的人工神经网络类型。

它是一个单神经元模型，可用于两类分类问题，并为以后开发更大的网络奠定基础。

在本节中，我们将优化感知器神经网络模型的权重。

首先，让我们定义一个合成的二元分类问题，我们可以将其作为优化模型的重点。

我们可以使用make_classification()函数来定义一个包含1000行和五个输入变量的二元分类问题。

以下示例创建数据集并总结数据的形状。

# define a binary classification dataset
from sklearn.datasets import make_classification
# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# summarize the shape of the dataset
print(X.shape, y.shape)

# 定义一个二元分类数据集

from sklearn.datasets import make_classification

# 定义数据集

X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)

# 总结数据集的形状

print(X.shape, y.shape)

运行示例会打印创建数据集的形状，证实了我们的预期。

(1000, 5) (1000,)

1	(1000, 5) (1000,)

接下来，我们需要定义一个感知器模型。

感知器模型有一个单独的节点，每个数据集中列都有一个输入权重。

每个输入乘以其对应的权重以得到加权和，然后添加一个偏置权重，就像回归模型中的截距系数一样。这个加权和被称为激活。最后，激活被解释并用于预测类别标签，正激活为1，负激活为0。

在优化模型权重之前，我们必须开发模型并对它的工作方式有信心。

让我们从定义一个函数来解释模型的激活开始。

这被称为激活函数，或传递函数；后者更传统，也是我的偏好。

下面的 *transfer()* 函数接受模型的激活并返回一个类别标签，正激活或零激活为类别=1，负激活为类别=0。这被称为阶跃传递函数。

# transfer function
def transfer(activation):
	if activation >= 0.0:
		return 1
	return 0

# 传递函数

def transfer(activation):

if activation >= 0.0:

return 1

return 0

接下来，我们可以开发一个函数，用于计算数据集中给定输入行数据的模型激活。

该函数将接收数据行和模型的权重，并计算输入的加权和以及偏置权重。下面的 `activate()` 函数实现了这一点。

注意：我们故意使用简单的 Python 列表和命令式编程风格，而不是 NumPy 数组或列表推导式，以使代码对 Python 初学者更具可读性。欢迎优化它并将您的代码发布在下面的评论中。

# activation function
def activate(row, weights):
	# add the bias, the last weight
	activation = weights[-1]
	# add the weighted input
	for i in range(len(row)):
		activation += weights[i] * row[i]
	return activation

# 激活函数

def activate(row, weights):

# 添加偏差，即最后一个权重

activation = weights[-1]

# 添加加权输入

for i in range(len(row)):

activation += weights[i] * row[i]

return activation

接下来，我们可以将 `activate()` 和 `transfer()` 函数结合起来，为给定数据行生成预测。下面的 `predict_row()` 函数实现了这一点。

# use model weights to predict 0 or 1 for a given row of data
def predict_row(row, weights):
	# activate for input
	activation = activate(row, weights)
	# transfer for activation
	return transfer(activation)

# 使用模型权重为给定数据行预测 0 或 1

def predict_row(row, weights):

# 激活输入

activation = activate(row, weights)

# 转移激活

return transfer(activation)

接下来，我们可以对给定数据集中的每一行调用 `predict_row()` 函数。下面的 `predict_dataset()` 函数实现了这一点。

同样，为了可读性，我们有意使用简单的命令式编码风格，而不是列表推导式。

# use model weights to generate predictions for a dataset of rows
def predict_dataset(X, weights):
	yhats = list()
	for row in X:
		yhat = predict_row(row, weights)
		yhats.append(yhat)
	return yhats

# 使用模型权重为数据集中的行生成预测

def predict_dataset(X, weights):

yhats = list()

for row in X:

yhat = predict_row(row, weights)

yhats.append(yhat)

return yhats

最后，我们可以使用模型对我们的合成数据集进行预测，以确认它都正常工作。

我们可以使用 rand() 函数生成一组随机模型权重。

回想一下，我们需要为每个输入（此数据集中有五个输入）一个权重，再加上一个额外的偏置权重。

...
# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# determine the number of weights
n_weights = X.shape[1] + 1
# generate random weights
weights = rand(n_weights)

...

# 定义数据集

X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)

# 确定权重的数量

n_weights = X.shape[1] + 1

# 生成随机权重

weights = rand(n_weights)

然后我们可以使用这些权重和数据集进行预测。

...
# generate predictions for dataset
yhat = predict_dataset(X, weights)

...

# 为数据集生成预测

yhat = predict_dataset(X, weights)

我们可以评估这些预测的分类准确性。

...
# calculate accuracy
score = accuracy_score(y, yhat)
print(score)

...

# 计算准确率

score = accuracy_score(y, yhat)

print(score)

就是这样。

我们可以将所有这些结合起来，展示我们用于分类的简单感知器模型。完整的示例如下所示。

# simple perceptron model for binary classification
from numpy.random import rand
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score

# transfer function
def transfer(activation):
	if activation >= 0.0:
		return 1
	return 0

# activation function
def activate(row, weights):
	# add the bias, the last weight
	activation = weights[-1]
	# add the weighted input
	for i in range(len(row)):
		activation += weights[i] * row[i]
	return activation

# use model weights to predict 0 or 1 for a given row of data
def predict_row(row, weights):
	# activate for input
	activation = activate(row, weights)
	# transfer for activation
	return transfer(activation)

# use model weights to generate predictions for a dataset of rows
def predict_dataset(X, weights):
	yhats = list()
	for row in X:
		yhat = predict_row(row, weights)
		yhats.append(yhat)
	return yhats

# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# determine the number of weights
n_weights = X.shape[1] + 1
# generate random weights
weights = rand(n_weights)
# generate predictions for dataset
yhat = predict_dataset(X, weights)
# calculate accuracy
score = accuracy_score(y, yhat)
print(score)

# 用于二元分类的简单感知器模型

from numpy.random import rand

from sklearn.datasets import make_classification

from sklearn.metrics import accuracy_score

# 传递函数

def transfer(activation):

if activation >= 0.0:

return 1

return 0

# 激活函数

def activate(row, weights):

# 添加偏差，即最后一个权重

activation = weights[-1]

# 添加加权输入

for i in range(len(row)):

activation += weights[i] * row[i]

return activation

# 使用模型权重为给定数据行预测 0 或 1

def predict_row(row, weights):

# 激活输入

activation = activate(row, weights)

# 转移激活

return transfer(activation)

# 使用模型权重为数据集中的行生成预测

def predict_dataset(X, weights):

yhats = list()

for row in X:

yhat = predict_row(row, weights)

yhats.append(yhat)

return yhats

# 定义数据集

X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)

# 确定权重的数量

n_weights = X.shape[1] + 1

# 生成随机权重

weights = rand(n_weights)

# 为数据集生成预测

yhat = predict_dataset(X, weights)

# 计算准确率

score = accuracy_score(y, yhat)

print(score)

运行示例会为训练数据集中的每个示例生成一个预测，然后打印这些预测的分类准确率。

注意：由于算法或评估过程的随机性，或者数值精度的差异，您的结果可能会有所不同。考虑运行几次示例并比较平均结果。

给定一组随机权重和每个类别中示例数量相等的数据集，我们预计准确率约为 50%，在本例中我们看到的也大致如此。

0.548

0.548

现在我们可以优化数据集的权重以在该数据集上获得良好的准确性。

首先，我们需要将数据集拆分为训练集和测试集。重要的是保留一些未用于优化模型的数据，以便我们可以对模型在新数据上进行预测时的性能做出合理估计。

我们将使用 67% 的数据进行训练，其余 33% 作为测试集，用于评估模型的性能。

...
# split into train test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

...

# 拆分为训练测试集

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

接下来，我们可以开发一个随机爬山算法。

优化算法需要一个目标函数来优化。它必须接受一组权重并返回一个分数，该分数将被最小化或最大化以对应一个更好的模型。

在这种情况下，我们将使用给定的一组权重评估模型的准确性，并返回分类准确性，分类准确性必须最大化。

下面的 `objective()` 函数实现了这一点，给定数据集和一组权重，并返回模型的准确性。

# objective function
def objective(X, y, weights):
	# generate predictions for dataset
	yhat = predict_dataset(X, weights)
	# calculate accuracy
	score = accuracy_score(y, yhat)
	return score

# 目标函数

def objective(X, y, weights):

# 为数据集生成预测

yhat = predict_dataset(X, weights)

# 计算准确率

score = accuracy_score(y, yhat)

return score

接下来，我们可以定义随机爬山算法。

该算法需要一个初始解决方案（例如随机权重），并将迭代地对解决方案进行微小更改，并检查它是否能产生性能更好的模型。对当前解决方案进行的更改量由 `step_size` 超参数控制。此过程将持续固定次数的迭代，也作为超参数提供。

下面的 `hillclimbing()` 函数实现了这一点，它接受数据集、目标函数、初始解决方案和超参数作为参数，并返回找到的最佳权重集和估计性能。

# hill climbing local search algorithm
def hillclimbing(X, y, objective, solution, n_iter, step_size):
	# evaluate the initial point
	solution_eval = objective(X, y, solution)
	# run the hill climb
	for i in range(n_iter):
		# take a step
		candidate = solution + randn(len(solution)) * step_size
		# evaluate candidate point
		candidte_eval = objective(X, y, candidate)
		# check if we should keep the new point
		if candidte_eval >= solution_eval:
			# store the new point
			solution, solution_eval = candidate, candidte_eval
			# report progress
			print('>%d %.5f' % (i, solution_eval))
	return [solution, solution_eval]

# 爬山局部搜索算法

def hillclimbing(X, y, objective, solution, n_iter, step_size):

# 评估初始点

solution_eval = objective(X, y, solution)

# 运行爬山算法

for i in range(n_iter):

# 迈出一步

candidate = solution + randn(len(solution)) * step_size

# 评估候选点

candidte_eval = objective(X, y, candidate)

# 检查是否应该保留新点

if candidte_eval >= solution_eval:

# 存储新点

solution, solution_eval = candidate, candidte_eval

# 报告进度

print('>%d %.5f' % (i, solution_eval))

return [solution, solution_eval]

然后我们可以调用这个函数，传入一组权重作为初始解，训练数据集作为要优化模型的参照数据集。

...
# define the total iterations
n_iter = 1000
# define the maximum step size
step_size = 0.05
# determine the number of weights
n_weights = X.shape[1] + 1
# define the initial solution
solution = rand(n_weights)
# perform the hill climbing search
weights, score = hillclimbing(X_train, y_train, objective, solution, n_iter, step_size)
print('Done!')
print('f(%s) = %f' % (weights, score))

...

# 定义总迭代次数

n_iter = 1000

# 定义最大步长

step_size = 0.05

# 确定权重的数量

n_weights = X.shape[1] + 1

# 定义初始解

solution = rand(n_weights)

# 执行爬山搜索

weights, score = hillclimbing(X_train, y_train, objective, solution, n_iter, step_size)

print('Done!')

print('f(%s) = %f' % (weights, score))

最后，我们可以在测试数据集上评估最佳模型并报告性能。

...
# generate predictions for the test dataset
yhat = predict_dataset(X_test, weights)
# calculate accuracy
score = accuracy_score(y_test, yhat)
print('Test Accuracy: %.5f' % (score * 100))

...

# 为测试数据集生成预测

yhat = predict_dataset(X_test, weights)

# 计算准确率

score = accuracy_score(y_test, yhat)

print('Test Accuracy: %.5f' % (score * 100))

将这些结合起来，下面列出了在合成二元优化数据集上优化感知器模型权重的完整示例。

# hill climbing to optimize weights of a perceptron model for classification
from numpy import asarray
from numpy.random import randn
from numpy.random import rand
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# transfer function
def transfer(activation):
	if activation >= 0.0:
		return 1
	return 0

# activation function
def activate(row, weights):
	# add the bias, the last weight
	activation = weights[-1]
	# add the weighted input
	for i in range(len(row)):
		activation += weights[i] * row[i]
	return activation

# # use model weights to predict 0 or 1 for a given row of data
def predict_row(row, weights):
	# activate for input
	activation = activate(row, weights)
	# transfer for activation
	return transfer(activation)

# use model weights to generate predictions for a dataset of rows
def predict_dataset(X, weights):
	yhats = list()
	for row in X:
		yhat = predict_row(row, weights)
		yhats.append(yhat)
	return yhats

# objective function
def objective(X, y, weights):
	# generate predictions for dataset
	yhat = predict_dataset(X, weights)
	# calculate accuracy
	score = accuracy_score(y, yhat)
	return score

# hill climbing local search algorithm
def hillclimbing(X, y, objective, solution, n_iter, step_size):
	# evaluate the initial point
	solution_eval = objective(X, y, solution)
	# run the hill climb
	for i in range(n_iter):
		# take a step
		candidate = solution + randn(len(solution)) * step_size
		# evaluate candidate point
		candidte_eval = objective(X, y, candidate)
		# check if we should keep the new point
		if candidte_eval >= solution_eval:
			# store the new point
			solution, solution_eval = candidate, candidte_eval
			# report progress
			print('>%d %.5f' % (i, solution_eval))
	return [solution, solution_eval]

# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# split into train test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
# define the total iterations
n_iter = 1000
# define the maximum step size
step_size = 0.05
# determine the number of weights
n_weights = X.shape[1] + 1
# define the initial solution
solution = rand(n_weights)
# perform the hill climbing search
weights, score = hillclimbing(X_train, y_train, objective, solution, n_iter, step_size)
print('Done!')
print('f(%s) = %f' % (weights, score))
# generate predictions for the test dataset
yhat = predict_dataset(X_test, weights)
# calculate accuracy
score = accuracy_score(y_test, yhat)
print('Test Accuracy: %.5f' % (score * 100))

# 爬山算法优化感知器模型权重用于分类

from numpy import asarray

from numpy.random import randn

from numpy.random import rand

from sklearn.datasets import make_classification

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

# 传递函数

def transfer(activation):

if activation >= 0.0:

return 1

return 0

# 激活函数

def activate(row, weights):

# 添加偏差，即最后一个权重

activation = weights[-1]

# 添加加权输入

for i in range(len(row)):

activation += weights[i] * row[i]

return activation

# # 使用模型权重为给定数据行预测 0 或 1

def predict_row(row, weights):

# 激活输入

activation = activate(row, weights)

# 转移激活

return transfer(activation)

# 使用模型权重为数据集中的行生成预测

def predict_dataset(X, weights):

yhats = list()

for row in X:

yhat = predict_row(row, weights)

yhats.append(yhat)

return yhats

# 目标函数

def objective(X, y, weights):

# 为数据集生成预测

yhat = predict_dataset(X, weights)

# 计算准确率

score = accuracy_score(y, yhat)

return score

# 爬山局部搜索算法

def hillclimbing(X, y, objective, solution, n_iter, step_size):

# 评估初始点

solution_eval = objective(X, y, solution)

# 运行爬山算法

for i in range(n_iter):

# 迈出一步

candidate = solution + randn(len(solution)) * step_size

# 评估候选点

candidte_eval = objective(X, y, candidate)

# 检查是否应该保留新点

if candidte_eval >= solution_eval:

# 存储新点

solution, solution_eval = candidate, candidte_eval

# 报告进度

print('>%d %.5f' % (i, solution_eval))

return [solution, solution_eval]

# 定义数据集

X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)

# 拆分为训练测试集

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

# 定义总迭代次数

n_iter = 1000

# 定义最大步长

step_size = 0.05

# 确定权重的数量

n_weights = X.shape[1] + 1

# 定义初始解

solution = rand(n_weights)

# 执行爬山搜索

weights, score = hillclimbing(X_train, y_train, objective, solution, n_iter, step_size)

print('Done!')

print('f(%s) = %f' % (weights, score))

# 为测试数据集生成预测

yhat = predict_dataset(X_test, weights)

# 计算准确率

score = accuracy_score(y_test, yhat)

print('Test Accuracy: %.5f' % (score * 100))

运行此示例将报告迭代次数和分类准确率，每次模型有所改进时都会报告。

搜索结束时，将报告最佳权重集在训练数据集上的性能，并计算和报告同一模型在测试数据集上的性能。

注意：由于算法或评估过程的随机性，或者数值精度的差异，您的结果可能会有所不同。考虑运行几次示例并比较平均结果。

在这种情况下，我们可以看到优化算法找到了一组权重，在训练数据集上实现了约 88.5% 的准确率，在测试数据集上实现了约 81.8% 的准确率。

...
>111 0.88060
>119 0.88060
>126 0.88209
>134 0.88209
>205 0.88209
>262 0.88209
>280 0.88209
>293 0.88209
>297 0.88209
>336 0.88209
>373 0.88209
>437 0.88358
>463 0.88507
>630 0.88507
>701 0.88507
Done!
f([ 0.0097317 0.13818088 1.17634326 -0.04296336 0.00485813 -0.14767616]) = 0.885075
Test Accuracy: 81.81818

...

>111 0.88060

>119 0.88060

>126 0.88209

>134 0.88209

>205 0.88209

>262 0.88209

>280 0.88209

>293 0.88209

>297 0.88209

>336 0.88209

>373 0.88209

>437 0.88358

>463 0.88507

>630 0.88507

>701 0.88507

完成！

f([ 0.0097317 0.13818088 1.17634326 -0.04296336 0.00485813 -0.14767616]) = 0.885075

测试准确率：81.81818

现在我们熟悉了如何手动优化感知器模型的权重，接下来让我们看看如何扩展该示例来优化多层感知器 (MLP) 模型的权重。

优化多层感知器

多层感知器 (MLP) 模型是一种具有一个或多个层（其中每个层具有一个或多个节点）的神经网络。

它是感知器模型的扩展，可能是使用最广泛的神经网络（深度学习）模型。

在本节中，我们将利用上一节所学知识，优化具有任意数量层和每层节点数的 MLP 模型的权重。

首先，我们将开发模型并使用随机权重对其进行测试，然后使用随机爬山算法来优化模型权重。

在使用 MLP 进行二元分类时，通常使用 sigmoid 传递函数（也称为逻辑函数），而不是感知器中使用的阶跃传递函数。

此函数输出一个 0-1 之间的实数值，表示一个二项概率分布，例如一个示例属于类别=1 的概率。下面的 `transfer()` 函数实现了这一点。

# transfer function
def transfer(activation):
	# sigmoid transfer function
	return 1.0 / (1.0 + exp(-activation))

# 传递函数

def transfer(activation):

# sigmoid 传递函数

return 1.0 / (1.0 + exp(-activation))

我们可以使用上一节中相同的 `activate()` 函数。在这里，我们将使用它来计算给定层中每个节点的激活。

必须用更复杂的版本替换 `predict_row()` 函数。

该函数接受一行数据和网络，并返回网络的输出。

我们将网络定义为列表的列表。每个层将是节点的列表，每个节点将是权重的列表或数组。

为了计算网络的预测，我们只需枚举层，然后枚举节点，然后计算每个节点的激活和传递输出。在这种情况下，我们将对网络中的所有节点使用相同的传递函数，尽管情况并非总是如此。

对于多层网络，前一层的输出用作下一层中每个节点的输入。然后返回网络中最后一层的输出。

下面的 `predict_row()` 函数实现了这一点。

# activation function for a network
def predict_row(row, network):
	inputs = row
	# enumerate the layers in the network from input to output
	for layer in network:
		new_inputs = list()
		# enumerate nodes in the layer
		for node in layer:
			# activate the node
			activation = activate(inputs, node)
			# transfer activation
			output = transfer(activation)
			# store output
			new_inputs.append(output)
		# output from this layer is input to the next layer
		inputs = new_inputs
	return inputs[0]

# 网络的激活函数

def predict_row(row, network):

inputs = row

# 从输入到输出枚举网络中的层

for layer in network:

new_inputs = list()

# 枚举层中的节点

for node in layer:

# 激活节点

activation = activate(inputs, node)

# 传递激活

output = transfer(activation)

# 存储输出

new_inputs.append(output)

# 此层的输出是下一层的输入

inputs = new_inputs

return inputs[0]

差不多就是这样了。

最后，我们需要定义一个要使用的网络。

例如，我们可以定义一个具有单个隐藏层和单个节点的 MLP，如下所示：

...
# create a one node network
node = rand(n_inputs + 1)
layer = [node]
network = [layer]

...

# 创建一个单节点网络

node = rand(n_inputs + 1)

layer = [node]

network = [layer]

这实际上是一个感知器，尽管使用了 sigmoid 传递函数。相当无聊。

我们来定义一个 MLP，它有一个隐藏层和一个输出层。第一个隐藏层将有 10 个节点，每个节点将从数据集中获取输入模式（例如，五个输入）。输出层将有一个单节点，它从第一个隐藏层的输出中获取输入，然后输出一个预测。

...
# one hidden layer and an output layer
n_hidden = 10
hidden1 = [rand(n_inputs + 1) for _ in range(n_hidden)]
output1 = [rand(n_hidden + 1)]
network = [hidden1, output1]

...

# 一个隐藏层和一个输出层

n_hidden = 10

hidden1 = [rand(n_inputs + 1) for _ in range(n_hidden)]

output1 = [rand(n_hidden + 1)]

network = [hidden1, output1]

然后我们可以使用该模型对数据集进行预测。

...
# generate predictions for dataset
yhat = predict_dataset(X, network)

...

# 为数据集生成预测

yhat = predict_dataset(X, network)

在计算分类准确率之前，我们必须将预测结果四舍五入为类别标签 0 和 1。

...
# round the predictions
yhat = [round(y) for y in yhat]
# calculate accuracy
score = accuracy_score(y, yhat)
print(score)

...

# 四舍五入预测结果

yhat = [round(y) for y in yhat]

# 计算准确率

score = accuracy_score(y, yhat)

print(score)

综合所有这些，下面列出了在我们的合成二元分类数据集上使用随机初始权重评估 MLP 的完整示例。

# develop an mlp model for classification
from math import exp
from numpy.random import rand
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score

# transfer function
def transfer(activation):
	# sigmoid transfer function
	return 1.0 / (1.0 + exp(-activation))

# activation function
def activate(row, weights):
	# add the bias, the last weight
	activation = weights[-1]
	# add the weighted input
	for i in range(len(row)):
		activation += weights[i] * row[i]
	return activation

# activation function for a network
def predict_row(row, network):
	inputs = row
	# enumerate the layers in the network from input to output
	for layer in network:
		new_inputs = list()
		# enumerate nodes in the layer
		for node in layer:
			# activate the node
			activation = activate(inputs, node)
			# transfer activation
			output = transfer(activation)
			# store output
			new_inputs.append(output)
		# output from this layer is input to the next layer
		inputs = new_inputs
	return inputs[0]

# use model weights to generate predictions for a dataset of rows
def predict_dataset(X, network):
	yhats = list()
	for row in X:
		yhat = predict_row(row, network)
		yhats.append(yhat)
	return yhats

# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# determine the number of inputs
n_inputs = X.shape[1]
# one hidden layer and an output layer
n_hidden = 10
hidden1 = [rand(n_inputs + 1) for _ in range(n_hidden)]
output1 = [rand(n_hidden + 1)]
network = [hidden1, output1]
# generate predictions for dataset
yhat = predict_dataset(X, network)
# round the predictions
yhat = [round(y) for y in yhat]
# calculate accuracy
score = accuracy_score(y, yhat)
print(score)

# 开发用于分类的 MLP 模型

from math import exp

from numpy.random import rand

from sklearn.datasets import make_classification

from sklearn.metrics import accuracy_score

# 传递函数

def transfer(activation):

# sigmoid 传递函数

return 1.0 / (1.0 + exp(-activation))

# 激活函数

def activate(row, weights):

# 添加偏差，即最后一个权重

activation = weights[-1]

# 添加加权输入

for i in range(len(row)):

activation += weights[i] * row[i]

return activation

# 网络的激活函数

def predict_row(row, network):

inputs = row

# 从输入到输出枚举网络中的层

for layer in network:

new_inputs = list()

# 枚举层中的节点

for node in layer:

# 激活节点

activation = activate(inputs, node)

# 传递激活

output = transfer(activation)

# 存储输出

new_inputs.append(output)

# 此层的输出是下一层的输入

inputs = new_inputs

return inputs[0]

# 使用模型权重为数据集中的行生成预测

def predict_dataset(X, network):

yhats = list()

for row in X:

yhat = predict_row(row, network)

yhats.append(yhat)

return yhats

# 定义数据集

X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)

# 确定输入数量

n_inputs = X.shape[1]

# 一个隐藏层和一个输出层

n_hidden = 10

hidden1 = [rand(n_inputs + 1) for _ in range(n_hidden)]

output1 = [rand(n_hidden + 1)]

network = [hidden1, output1]

# 为数据集生成预测

yhat = predict_dataset(X, network)

# 四舍五入预测结果

yhat = [round(y) for y in yhat]

# 计算准确率

score = accuracy_score(y, yhat)

print(score)

运行示例会为训练数据集中的每个示例生成一个预测，然后打印这些预测的分类准确率。

注意：由于算法或评估过程的随机性，或者数值精度的差异，您的结果可能会有所不同。考虑运行几次示例并比较平均结果。

同样，给定一组随机权重和每个类别中示例数量相等的数据集，我们预计准确率约为 50%，在本例中我们看到的也大致如此。

0.499

0.499

接下来，我们可以将随机爬山算法应用于数据集。

这与将爬山算法应用于感知器模型非常相似，只是在这种情况下，一个步骤需要修改网络中的所有权重。

为此，我们将开发一个新函数，它创建网络的一个副本，并在创建副本时突变网络中的每个权重。

下面的 `step()` 函数实现了这一点。

# take a step in the search space
def step(network, step_size):
	new_net = list()
	# enumerate layers in the network
	for layer in network:
		new_layer = list()
		# enumerate nodes in this layer
		for node in layer:
			# mutate the node
			new_node = node.copy() + randn(len(node)) * step_size
			# store node in layer
			new_layer.append(new_node)
		# store layer in network
		new_net.append(new_layer)
	return new_net

# 在搜索空间中迈出一步

def step(network, step_size):

new_net = list()

# 枚举网络中的层

for layer in network:

new_layer = list()

# 枚举此层中的节点

for node in layer:

# 突变节点

new_node = node.copy() + randn(len(node)) * step_size

# 将节点存储在层中

new_layer.append(new_node)

# 将层存储在网络中

new_net.append(new_layer)

return new_net

修改网络中的所有权重是激进的。

搜索空间中不那么激进的步骤可能是对模型中一部分权重进行少量更改，这可能由超参数控制。这留待扩展。

然后我们可以从 hillclimbing() 函数中调用这个新的 `step()` 函数。

# hill climbing local search algorithm
def hillclimbing(X, y, objective, solution, n_iter, step_size):
	# evaluate the initial point
	solution_eval = objective(X, y, solution)
	# run the hill climb
	for i in range(n_iter):
		# take a step
		candidate = step(solution, step_size)
		# evaluate candidate point
		candidte_eval = objective(X, y, candidate)
		# check if we should keep the new point
		if candidte_eval >= solution_eval:
			# store the new point
			solution, solution_eval = candidate, candidte_eval
			# report progress
			print('>%d %f' % (i, solution_eval))
	return [solution, solution_eval]

# 爬山局部搜索算法

def hillclimbing(X, y, objective, solution, n_iter, step_size):

# 评估初始点

solution_eval = objective(X, y, solution)

# 运行爬山算法

for i in range(n_iter):

# 迈出一步

candidate = step(solution, step_size)

# 评估候选点

candidte_eval = objective(X, y, candidate)

# 检查是否应该保留新点

if candidte_eval >= solution_eval:

# 存储新点

solution, solution_eval = candidate, candidte_eval

# 报告进度

print('>%d %f' % (i, solution_eval))

return [solution, solution_eval]

综合起来，下面列出了将随机爬山算法应用于优化 MLP 模型权重以进行二元分类的完整示例。

# stochastic hill climbing to optimize a multilayer perceptron for classification
from math import exp
from numpy.random import randn
from numpy.random import rand
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# transfer function
def transfer(activation):
	# sigmoid transfer function
	return 1.0 / (1.0 + exp(-activation))

# activation function
def activate(row, weights):
	# add the bias, the last weight
	activation = weights[-1]
	# add the weighted input
	for i in range(len(row)):
		activation += weights[i] * row[i]
	return activation

# activation function for a network
def predict_row(row, network):
	inputs = row
	# enumerate the layers in the network from input to output
	for layer in network:
		new_inputs = list()
		# enumerate nodes in the layer
		for node in layer:
			# activate the node
			activation = activate(inputs, node)
			# transfer activation
			output = transfer(activation)
			# store output
			new_inputs.append(output)
		# output from this layer is input to the next layer
		inputs = new_inputs
	return inputs[0]

# use model weights to generate predictions for a dataset of rows
def predict_dataset(X, network):
	yhats = list()
	for row in X:
		yhat = predict_row(row, network)
		yhats.append(yhat)
	return yhats

# objective function
def objective(X, y, network):
	# generate predictions for dataset
	yhat = predict_dataset(X, network)
	# round the predictions
	yhat = [round(y) for y in yhat]
	# calculate accuracy
	score = accuracy_score(y, yhat)
	return score

# take a step in the search space
def step(network, step_size):
	new_net = list()
	# enumerate layers in the network
	for layer in network:
		new_layer = list()
		# enumerate nodes in this layer
		for node in layer:
			# mutate the node
			new_node = node.copy() + randn(len(node)) * step_size
			# store node in layer
			new_layer.append(new_node)
		# store layer in network
		new_net.append(new_layer)
	return new_net

# hill climbing local search algorithm
def hillclimbing(X, y, objective, solution, n_iter, step_size):
	# evaluate the initial point
	solution_eval = objective(X, y, solution)
	# run the hill climb
	for i in range(n_iter):
		# take a step
		candidate = step(solution, step_size)
		# evaluate candidate point
		candidte_eval = objective(X, y, candidate)
		# check if we should keep the new point
		if candidte_eval >= solution_eval:
			# store the new point
			solution, solution_eval = candidate, candidte_eval
			# report progress
			print('>%d %f' % (i, solution_eval))
	return [solution, solution_eval]

# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# split into train test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
# define the total iterations
n_iter = 1000
# define the maximum step size
step_size = 0.1
# determine the number of inputs
n_inputs = X.shape[1]
# one hidden layer and an output layer
n_hidden = 10
hidden1 = [rand(n_inputs + 1) for _ in range(n_hidden)]
output1 = [rand(n_hidden + 1)]
network = [hidden1, output1]
# perform the hill climbing search
network, score = hillclimbing(X_train, y_train, objective, network, n_iter, step_size)
print('Done!')
print('Best: %f' % (score))
# generate predictions for the test dataset
yhat = predict_dataset(X_test, network)
# round the predictions
yhat = [round(y) for y in yhat]
# calculate accuracy
score = accuracy_score(y_test, yhat)
print('Test Accuracy: %.5f' % (score * 100))

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

# 随机爬山算法优化多层感知器用于分类

from math import exp

from numpy.random import randn

from numpy.random import rand

from sklearn.datasets import make_classification

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

# 传递函数

def transfer(activation):

# sigmoid 传递函数

return 1.0 / (1.0 + exp(-activation))

# 激活函数

def activate(row, weights):

# 添加偏差，即最后一个权重

activation = weights[-1]

# 添加加权输入

for i in range(len(row)):

activation += weights[i] * row[i]

return activation

# 网络的激活函数

def predict_row(row, network):

inputs = row

# 从输入到输出枚举网络中的层

for layer in network:

new_inputs = list()

# 枚举层中的节点

for node in layer:

# 激活节点

activation = activate(inputs, node)

# 传递激活

output = transfer(activation)

# 存储输出

new_inputs.append(output)

# 此层的输出是下一层的输入

inputs = new_inputs

return inputs[0]

# 使用模型权重为数据集中的行生成预测

def predict_dataset(X, network):

yhats = list()

for row in X:

yhat = predict_row(row, network)

yhats.append(yhat)

return yhats

# 目标函数

def objective(X, y, network):

# 为数据集生成预测

yhat = predict_dataset(X, network)

# 四舍五入预测值

yhat = [round(y) for y in yhat]

# 计算准确率

score = accuracy_score(y, yhat)

return score

# 在搜索空间中迈出一步

def step(network, step_size):

new_net = list()

# 枚举网络中的层

for layer in network:

new_layer = list()

# 枚举此层中的节点

for node in layer:

# 突变节点

new_node = node.copy() + randn(len(node)) * step_size

# 将节点存储在层中

new_layer.append(new_node)

# 将层存储在网络中

new_net.append(new_layer)

return new_net

# 爬山局部搜索算法

def hillclimbing(X, y, objective, solution, n_iter, step_size):

# 评估初始点

solution_eval = objective(X, y, solution)

# 运行爬山算法

for i in range(n_iter):

# 迈出一步

candidate = step(solution, step_size)

# 评估候选点

candidte_eval = objective(X, y, candidate)

# 检查是否应该保留新点

if candidte_eval >= solution_eval:

# 存储新点

solution, solution_eval = candidate, candidte_eval

# 报告进度

print('>%d %f' % (i, solution_eval))

return [solution, solution_eval]

# 定义数据集

X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)

# 拆分为训练测试集

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

# 定义总迭代次数

n_iter = 1000

# 定义最大步长

step_size = 0.1

# 确定输入数量

n_inputs = X.shape[1]

# 一个隐藏层和一个输出层

n_hidden = 10

hidden1 = [rand(n_inputs + 1) for _ in range(n_hidden)]

output1 = [rand(n_hidden + 1)]

network = [hidden1, output1]

# 执行爬山搜索

network, score = hillclimbing(X_train, y_train, objective, network, n_iter, step_size)

print('Done!')

print('Best: %f' % (score))

# 为测试数据集生成预测

yhat = predict_dataset(X_test, network)

# 四舍五入预测结果

yhat = [round(y) for y in yhat]

# 计算准确率

score = accuracy_score(y_test, yhat)

print('Test Accuracy: %.5f' % (score * 100))

运行此示例将报告迭代次数和分类准确率，每次模型有所改进时都会报告。

搜索结束时，将报告最佳权重集在训练数据集上的性能，并计算和报告同一模型在测试数据集上的性能。

注意：由于算法或评估过程的随机性，或者数值精度的差异，您的结果可能会有所不同。考虑运行几次示例并比较平均结果。

在这种情况下，我们可以看到优化算法找到了一组权重，在训练数据集上实现了约 87.3% 的准确率，在测试数据集上实现了约 85.1% 的准确率。

...
>55 0.755224
>56 0.765672
>59 0.794030
>66 0.805970
>77 0.835821
>120 0.838806
>165 0.840299
>188 0.841791
>218 0.846269
>232 0.852239
>237 0.852239
>239 0.855224
>292 0.867164
>368 0.868657
>823 0.868657
>852 0.871642
>889 0.871642
>892 0.871642
>992 0.873134
Done!
Best: 0.873134
Test Accuracy: 85.15152

...

>55 0.755224

>56 0.765672

>59 0.794030

>66 0.805970

>77 0.835821

>120 0.838806

>165 0.840299

>188 0.841791

>218 0.846269

>232 0.852239

>237 0.852239

>239 0.855224

>292 0.867164

>368 0.868657

>823 0.868657

>852 0.871642

>889 0.871642

>892 0.871642

>992 0.873134

完成！

最佳：0.873134

测试准确率：85.15152

进一步阅读

如果您想深入了解，本节提供了更多关于该主题的资源。

教程

API

总结

在本教程中，您学习了如何手动优化神经网络模型的权重。

具体来说，你学到了：

如何从头开始开发神经网络模型的前向推理过程。
如何优化感知器模型用于二元分类的权重。
如何使用随机爬山算法优化多层感知器模型的权重。

你有什么问题吗？
在下面的评论中提出你的问题，我会尽力回答。

关于此主题的更多信息

对如何手动优化神经网络模型的 13 条回复

Rupesh Mahamune 2020年12月4日上午11:55 #

有用的代码。谢谢

回复
- Jason Brownlee 2020年12月4日下午1:22 #
  
  谢谢！
  
  回复
Suraj 2020年12月5日下午6:52 #

好文章，对神经网络有了深入了解，谢谢！！

回复
- Jason Brownlee 2020年12月6日上午6:58 #
  
  谢谢。
  
  回复
Dexter 2020年12月7日上午10:28 #

您好 Jason，文章很棒！！

你能对 LSTM 网络也做同样的操作吗？

回复
- Jason Brownlee 2020年12月7日下午1:35 #
  
  当然！
  
  回复
  - Bibhuti 2020年12月18日上午12:45 #
    
    你能分享一些关于如何使用PSO或GA优化神经网络，特别是用于预测回归任务的知识吗？我搜索了所有地方，但没有找到好的或适当的解释。
    
    回复
    - Jason Brownlee 2020年12月18日上午7:17 #
      
      您可以修改上述示例，使用您喜欢的任何优化算法，例如 GA 或 PSO。
      
      回复
Andrew 2021年8月4日上午6:47 #

嗨，Jason，

非常感谢。这种优化可以扩展到 CNN，用于 MNIST 数据集等简单测试案例吗？

回复
- Jason Brownlee 2021年8月5日上午5:14 #
  
  当然可以。
  
  回复
CHAYMAE MAKRI 2021年10月21日下午8:45 #

我如何使用人工神经网络来解决优化问题。
提前感谢您的回答

回复
- Adrian Tam 2021年10月22日上午4:11 #
  
  您能举一个优化问题的例子吗？
  
  回复
Eshwar 2022年2月15日上午3:35 #

干得好，先生，谢谢。

回复

导航

如何手动优化神经网络模型

教程概述

优化神经网络

想要开始学习优化算法吗？

优化感知器模型

优化多层感知器

进一步阅读

教程

API

总结

掌握现代优化算法！

加深您对优化的理解

将现代优化算法应用于
您的机器学习项目

关于此主题的更多信息

对如何手动优化神经网络模型的 13 条回复

发表回复点击此处取消回复。

导航

教程概述

优化神经网络

想要开始学习优化算法吗？

优化感知器模型

优化多层感知器

进一步阅读

教程

API

总结

掌握现代优化算法！

加深您对优化的理解

将现代优化算法应用于您的机器学习项目

关于此主题的更多信息

对 如何手动优化神经网络模型 的 13 条回复

发表回复 点击此处取消回复。

将现代优化算法应用于
您的机器学习项目

对如何手动优化神经网络模型的 13 条回复

发表回复点击此处取消回复。