如何用更多数据更新神经网络模型

作者： Jason Brownlee 于 2021年2月22日发布在深度学习 42

用于预测建模的深度学习神经网络模型可能需要更新。

这可能是因为自模型开发和部署以来数据发生了变化，或者自模型开发以来提供了更多标记数据，并且期望这些额外数据能够提高模型的性能。

在更新神经网络模型以适应新数据时，尝试并评估各种不同方法非常重要，尤其是在模型更新将被自动化（例如按计划定期进行）的情况下。

更新神经网络模型有很多方法，尽管两种主要方法包括将现有模型作为起点进行重新训练，或者保持现有模型不变并将现有模型的预测与新模型相结合。

在本教程中，您将了解如何根据新数据更新深度学习神经网络模型。

完成本教程后，您将了解：

当底层数据发生变化或提供了新的标记数据时，神经网络模型可能需要更新。
如何仅使用新数据或新旧数据组合更新已训练的神经网络模型。
如何创建由现有模型和仅在新数据或新旧数据组合上训练的新模型组成的集成。

让我们开始吧。

How to Update Neural Network Models With More Data

如何用更多数据更新神经网络模型
照片由 Judy Gallagher 拍摄，部分权利保留。

教程概述

本教程分为三个部分；它们是：

更新神经网络模型
重新训练更新策略
1. 仅在新数据上更新模型
2. 在新旧数据上更新模型
集成更新策略
1. 集成模型与仅在新数据上训练的模型
2. 集成模型与在新旧数据上训练的模型

更新神经网络模型

为预测建模项目选择并确定深度学习神经网络模型只是开始。

然后，您就可以开始使用该模型对新数据进行预测了。

您可能会遇到的一个问题是，预测问题的性质可能会随着时间而改变。

您可能会注意到预测效果随时间开始下降。这可能是因为模型中进行的假设正在改变或不再成立。

通常，这被称为“概念漂移”问题，即变量的底层概率分布以及变量之间的关系会随时间变化，这会对从数据中构建的模型产生负面影响。

有关概念漂移的更多信息，请参阅本教程

机器学习中概念漂移入门指南

概念漂移可能在不同时间影响您的模型，具体取决于您要解决的预测问题以及用于解决该问题的模型。

随着时间的推移监控模型的性能会很有帮助，并使用模型性能的明显下降作为触发因素来更改您的模型，例如使用新数据对其进行重新训练。

或者，您可能知道您领域的数据更改非常频繁，需要定期更改模型，例如每周、每月或每年。

最后，您可能会运行模型一段时间并积累其他具有已知结果的数据，您希望使用这些数据来更新模型，以期提高预测性能。

重要的是，在响应问题变化或新数据可用性方面，您拥有很大的灵活性。

例如，您可以采用已训练的神经网络模型并使用新数据更新模型权重。或者，我们可以保持现有模型不变，并将其预测与在新可用数据上拟合的新模型相结合。

这些方法可能代表了响应新数据更新神经网络模型的两个总体主题，它们是

重新训练更新策略。
集成更新策略。

让我们依次仔细看看每一个。

重新训练更新策略

神经网络模型的一个优点是它们的权重可以在任何时候通过持续训练来更新。

在响应底层数据变化或新数据可用性时，在更新神经网络模型时有几种不同的策略可供选择，例如

仅在新数据上继续训练模型。
在新旧数据上继续训练模型。

我们还可以设想上述策略的变体，例如使用新数据样本或新旧数据样本而不是所有可用数据，以及对样本数据可能的实例级加权。

我们还可以考虑模型的扩展，冻结现有模型的层（例如，使模型权重在训练期间不能改变），然后添加具有可以改变的模型权重的层，将扩展嫁接到模型上以处理数据中的任何变化。也许这是下一节中重新训练和集成方法的变体，我们暂时搁置它。

尽管如此，这是需要考虑的两个主要策略。

让我们通过一个实际示例来具体化这些方法。

仅在新数据上更新模型

我们可以仅在新数据上更新模型。

这种方法的一个极端版本是不使用任何新数据，而是简单地在旧数据上重新训练模型。这可能与响应新数据“不做任何事”相同。在另一个极端，可以仅在新数据上拟合模型，丢弃旧数据和旧模型。

忽略新数据，什么也不做。
在新数据上更新现有模型。
在新数据上拟合新模型，丢弃旧模型和数据。

在本示例中，我们将重点关注中间方法，但测试所有三种方法并查看哪种最适合您的问题可能会很有趣。

首先，我们可以定义一个合成二分类数据集并将其分成两半，然后将一部分用作“旧数据”，另一部分用作“新数据”。

...
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
# record the number of input features in the data
n_features = X.shape[1]
# split into old and new data
X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)

...

# 定义数据集

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)

# 记录数据中的输入特征数量

n_features = X.shape[1]

# 分割为旧数据和新数据

X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)

然后，我们可以定义一个多层感知器（MLP）模型，并仅在旧数据上进行拟合。

...
# define the model
model = Sequential()
model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on old data
model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

...

# 定义模型

model = Sequential()

model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))

model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))

model.add(Dense(1, activation='sigmoid'))

# 定义优化算法

opt = SGD(learning_rate=0.01, momentum=0.9)

# 编译模型

model.compile(optimizer=opt, loss='binary_crossentropy')

# 在旧数据上拟合模型

model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

然后，我们可以保存模型并使用一段时间。

时间流逝，我们希望在新数据可用时对其进行更新。

这需要使用比正常情况小得多的学习率，这样我们才不会冲掉从旧数据中学到的权重。

注意：您需要找到适合您的模型和数据集的学习率，该学习率可实现比从头开始拟合新模型更好的性能。

...
# update model on new data only with a smaller learning rate
opt = SGD(learning_rate=0.001, momentum=0.9)
# compile the model
model.compile(optimizer=opt, loss='binary_crossentropy')

...

# 使用更小的学习率仅在新数据上更新模型

opt = SGD(learning_rate=0.001, momentum=0.9)

# 编译模型

model.compile(optimizer=opt, loss='binary_crossentropy')

然后，我们可以使用此更小的学习率仅在新数据上拟合模型。

...
model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on new data
model.fit(X_new, y_new, epochs=100, batch_size=32, verbose=0)

...

model.compile(optimizer=opt, loss='binary_crossentropy')

# 在新数据上拟合模型

model.fit(X_new, y_new, epochs=100, batch_size=32, verbose=0)

将这些内容结合起来，更新神经网络模型仅在新数据上运行的完整示例列在下面。

# update neural network with new data only
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
# record the number of input features in the data
n_features = X.shape[1]
# split into old and new data
X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)
# define the model
model = Sequential()
model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on old data
model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

# save model...

# load model...

# update model on new data only with a smaller learning rate
opt = SGD(learning_rate=0.001, momentum=0.9)
# compile the model
model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on new data
model.fit(X_new, y_new, epochs=100, batch_size=32, verbose=0)

# 仅使用新数据更新神经网络

from sklearn.datasets import make_classification

from sklearn.model_selection import train_test_split

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

from tensorflow.keras.optimizers import SGD

# 定义数据集

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)

# 记录数据中的输入特征数量

n_features = X.shape[1]

# 分割为旧数据和新数据

X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)

# 定义模型

model = Sequential()

model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))

model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))

model.add(Dense(1, activation='sigmoid'))

# 定义优化算法

opt = SGD(learning_rate=0.01, momentum=0.9)

# 编译模型

model.compile(optimizer=opt, loss='binary_crossentropy')

# 在旧数据上拟合模型

model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

# 保存模型...

# 加载模型...

# 使用更小的学习率仅在新数据上更新模型

opt = SGD(learning_rate=0.001, momentum=0.9)

# 编译模型

model.compile(optimizer=opt, loss='binary_crossentropy')

# 在新数据上拟合模型

model.fit(X_new, y_new, epochs=100, batch_size=32, verbose=0)

接下来，让我们看看如何在新旧数据上更新模型。

在新旧数据上更新模型

我们可以对新旧数据的组合进行模型更新。

这种方法的一个极端版本是丢弃模型，而仅在新旧所有可用数据上拟合新模型。一个不那么极端版本是使用现有模型作为起点，并基于组合数据集对其进行更新。

同样，测试这两种策略并查看哪种适合您的数据集是个好主意。

在这种情况下，我们将重点关注不那么极端更新策略。

如前所述，可以拟合合成数据集和模型以在旧数据集上运行。

...
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
# record the number of input features in the data
n_features = X.shape[1]
# split into old and new data
X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)
# define the model
model = Sequential()
model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on old data
model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

...

# 定义数据集

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)

# 记录数据中的输入特征数量

n_features = X.shape[1]

# 分割为旧数据和新数据

X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)

# 定义模型

model = Sequential()

model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))

model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))

model.add(Dense(1, activation='sigmoid'))

# 定义优化算法

opt = SGD(learning_rate=0.01, momentum=0.9)

# 编译模型

model.compile(optimizer=opt, loss='binary_crossentropy')

# 在旧数据上拟合模型

model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

新数据可用，我们希望在新旧数据的组合上更新模型。

首先，我们必须使用更小的学习率，以尝试使用当前权重作为搜索的起点。

注意：您需要找到适合您的模型和数据集的学习率，该学习率可实现比从头开始拟合新模型更好的性能。

...
# update model with a smaller learning rate
opt = SGD(learning_rate=0.001, momentum=0.9)
# compile the model
model.compile(optimizer=opt, loss='binary_crossentropy')

...

# 使用更小的学习率更新模型

opt = SGD(learning_rate=0.001, momentum=0.9)

# 编译模型

model.compile(optimizer=opt, loss='binary_crossentropy')

然后，我们可以创建一个由旧数据和新数据组成的复合数据集。

...
# create a composite dataset of old and new data
X_both, y_both = vstack((X_old, X_new)), hstack((y_old, y_new))

...

# 创建一个由旧数据和新数据组成的复合数据集

X_both, y_both = vstack((X_old, X_new)), hstack((y_old, y_new))

最后，我们可以在此复合数据集上更新模型。

...
# fit the model on new data
model.fit(X_both, y_both, epochs=100, batch_size=32, verbose=0)

...

# 在新数据上拟合模型

model.fit(X_both, y_both, epochs=100, batch_size=32, verbose=0)

将这些内容结合起来，更新神经网络模型在新旧数据组合上运行的完整示例列在下面。

# update neural network with both old and new data
from numpy import vstack
from numpy import hstack
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
# record the number of input features in the data
n_features = X.shape[1]
# split into old and new data
X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)
# define the model
model = Sequential()
model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on old data
model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

# save model...

# load model...

# update model with a smaller learning rate
opt = SGD(learning_rate=0.001, momentum=0.9)
# compile the model
model.compile(optimizer=opt, loss='binary_crossentropy')
# create a composite dataset of old and new data
X_both, y_both = vstack((X_old, X_new)), hstack((y_old, y_new))
# fit the model on new data
model.fit(X_both, y_both, epochs=100, batch_size=32, verbose=0)

# 使用新旧数据更新神经网络

from numpy import vstack

from numpy import hstack

from sklearn.datasets import make_classification

from sklearn.model_selection import train_test_split

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

from tensorflow.keras.optimizers import SGD

# 定义数据集

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)

# 记录数据中的输入特征数量

n_features = X.shape[1]

# 分割为旧数据和新数据

X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)

# 定义模型

model = Sequential()

model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))

model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))

model.add(Dense(1, activation='sigmoid'))

# 定义优化算法

opt = SGD(learning_rate=0.01, momentum=0.9)

# 编译模型

model.compile(optimizer=opt, loss='binary_crossentropy')

# 在旧数据上拟合模型

model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

# 保存模型...

# 加载模型...

# 使用更小的学习率更新模型

opt = SGD(learning_rate=0.001, momentum=0.9)

# 编译模型

model.compile(optimizer=opt, loss='binary_crossentropy')

# 创建一个由旧数据和新数据组成的复合数据集

X_both, y_both = vstack((X_old, X_new)), hstack((y_old, y_new))

# 在新数据上拟合模型

model.fit(X_both, y_both, epochs=100, batch_size=32, verbose=0)

接下来，让我们看看如何使用集成模型来响应新数据。

集成更新策略

集成模型是由多个其他模型组成的预测模型。

集成模型有很多不同类型，尽管也许最简单的方法是对多个不同模型的预测进行平均。

有关深度学习神经网络的集成算法的更多信息，请参阅本教程

深度学习神经网络的集成学习方法

作为响应底层数据变化或新数据可用性的策略，我们可以使用集成模型。

借鉴上一节的方法，我们可以考虑两种集成学习算法作为响应新数据的策略；它们是

集成现有模型和仅在新数据上拟合的新模型。
集成现有模型和在新旧数据上拟合的新模型。

同样，我们可以考虑这些方法的变体，例如旧数据和新数据的样本，以及集成中包含的一个以上的现有模型或附加模型。

尽管如此，这是需要考虑的两个主要策略。

让我们通过一个实际示例来具体化这些方法。

集成模型与仅在新数据上训练的模型

我们可以创建一个现有模型和仅在新数据上拟合的新模型的集成。

预期是，集成预测的性能优于或比单独使用旧模型或新模型更稳定（方差更小）。在采用集成之前，应在您的数据集上对此进行检查。

首先，我们可以准备数据集并像在前面的部分一样拟合旧模型。

...
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
# record the number of input features in the data
n_features = X.shape[1]
# split into old and new data
X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)
# define the old model
old_model = Sequential()
old_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
old_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
old_model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
old_model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on old data
old_model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

...

# 定义数据集

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)

# 记录数据中的输入特征数量

n_features = X.shape[1]

# 分割为旧数据和新数据

X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)

# 定义旧模型

old_model = Sequential()

old_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))

old_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))

old_model.add(Dense(1, activation='sigmoid'))

# 定义优化算法

opt = SGD(learning_rate=0.01, momentum=0.9)

# 编译模型

old_model.compile(optimizer=opt, loss='binary_crossentropy')

# 在旧数据上拟合模型

old_model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

一段时间过去，新数据可用了。

然后，我们可以将新模型拟合到新数据上，自然地发现适用于新数据集的模型和配置。

在这种情况下，我们将仅使用与旧模型相同的模型架构和配置。

...
# define the new model
new_model = Sequential()
new_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
new_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
new_model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
new_model.compile(optimizer=opt, loss='binary_crossentropy')

...

# 定义新模型

new_model = Sequential()

new_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))

new_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))

new_model.add(Dense(1, activation='sigmoid'))

# 定义优化算法

opt = SGD(learning_rate=0.01, momentum=0.9)

# 编译模型

new_model.compile(optimizer=opt, loss='binary_crossentropy')

然后，我们可以将此新模型拟合到新数据上。

...
# fit the model on old data
new_model.fit(X_new, y_new, epochs=150, batch_size=32, verbose=0)

...

# 在旧数据上拟合模型

new_model.fit(X_new, y_new, epochs=150, batch_size=32, verbose=0)

现在我们有了两个模型，我们可以用每个模型进行预测，并计算预测的平均值作为“集成预测”。

...
# make predictions with both models
yhat1 = old_model.predict(X_new)
yhat2 = new_model.predict(X_new)
# combine predictions into single array
combined = hstack((yhat1, yhat2))
# calculate outcome as mean of predictions
yhat = mean(combined, axis=-1)

...

# 用两个模型进行预测

yhat1 = old_model.predict(X_new)

yhat2 = new_model.predict(X_new)

# 将预测组合成一个数组

combined = hstack((yhat1, yhat2))

# 将结果计算为预测的平均值

yhat = mean(combined, axis=-1)

将这些内容结合起来，使用现有模型和仅在新数据上拟合的新模型的集成进行更新的完整示例列在下面。

# ensemble old neural network with new model fit on new data only
from numpy import hstack
from numpy import mean
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
# record the number of input features in the data
n_features = X.shape[1]
# split into old and new data
X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)
# define the old model
old_model = Sequential()
old_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
old_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
old_model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
old_model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on old data
old_model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

# save model...

# load model...

# define the new model
new_model = Sequential()
new_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
new_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
new_model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
new_model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on old data
new_model.fit(X_new, y_new, epochs=150, batch_size=32, verbose=0)

# make predictions with both models
yhat1 = old_model.predict(X_new)
yhat2 = new_model.predict(X_new)
# combine predictions into single array
combined = hstack((yhat1, yhat2))
# calculate outcome as mean of predictions
yhat = mean(combined, axis=-1)

# 集成现有神经网络和仅在新数据上拟合的新模型

from numpy import hstack

from numpy import mean

from sklearn.datasets import make_classification

from sklearn.model_selection import train_test_split

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

from tensorflow.keras.optimizers import SGD

# 定义数据集

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)

# 记录数据中的输入特征数量

n_features = X.shape[1]

# 分割为旧数据和新数据

X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)

# 定义旧模型

old_model = Sequential()

old_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))

old_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))

old_model.add(Dense(1, activation='sigmoid'))

# 定义优化算法

opt = SGD(learning_rate=0.01, momentum=0.9)

# 编译模型

old_model.compile(optimizer=opt, loss='binary_crossentropy')

# 在旧数据上拟合模型

old_model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

# 保存模型...

# 加载模型...

# 定义新模型

new_model = Sequential()

new_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))

new_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))

new_model.add(Dense(1, activation='sigmoid'))

# 定义优化算法

opt = SGD(learning_rate=0.01, momentum=0.9)

# 编译模型

new_model.compile(optimizer=opt, loss='binary_crossentropy')

# 在旧数据上拟合模型

new_model.fit(X_new, y_new, epochs=150, batch_size=32, verbose=0)

# 用两个模型进行预测

yhat1 = old_model.predict(X_new)

yhat2 = new_model.predict(X_new)

# 将预测组合成一个数组

combined = hstack((yhat1, yhat2))

# 将结果计算为预测的平均值

yhat = mean(combined, axis=-1)

集成模型与在新旧数据上训练的模型

我们可以创建一个现有模型和在新旧数据上拟合的新模型的集成。

预期是，集成预测的性能优于或比单独使用旧模型或新模型更稳定（方差更小）。在采用集成之前，应在您的数据集上对此进行检查。

首先，我们可以准备数据集并像在前面的部分一样拟合旧模型。

...
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
# record the number of input features in the data
n_features = X.shape[1]
# split into old and new data
X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)
# define the old model
old_model = Sequential()
old_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
old_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
old_model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
old_model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on old data
old_model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

...

# 定义数据集

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)

# 记录数据中的输入特征数量

n_features = X.shape[1]

# 分割为旧数据和新数据

X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)

# 定义旧模型

old_model = Sequential()

old_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))

old_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))

old_model.add(Dense(1, activation='sigmoid'))

# 定义优化算法

opt = SGD(learning_rate=0.01, momentum=0.9)

# 编译模型

old_model.compile(optimizer=opt, loss='binary_crossentropy')

# 在旧数据上拟合模型

old_model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

一段时间过去，新数据可用了。

然后，我们可以将新模型拟合到旧数据和新数据的组合上，自然地发现适用于新数据集的模型和配置。

在这种情况下，我们将仅使用与旧模型相同的模型架构和配置。

...
# define the new model
new_model = Sequential()
new_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
new_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
new_model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
new_model.compile(optimizer=opt, loss='binary_crossentropy')

...

# 定义新模型

new_model = Sequential()

new_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))

new_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))

new_model.add(Dense(1, activation='sigmoid'))

# 定义优化算法

opt = SGD(learning_rate=0.01, momentum=0.9)

# 编译模型

new_model.compile(optimizer=opt, loss='binary_crossentropy')

我们可以从旧数据和新数据创建复合数据集，然后将新模型拟合到此数据集上。

...
# create a composite dataset of old and new data
X_both, y_both = vstack((X_old, X_new)), hstack((y_old, y_new))
# fit the model on old data
new_model.fit(X_both, y_both, epochs=150, batch_size=32, verbose=0)

...

# 创建一个由旧数据和新数据组成的复合数据集

X_both, y_both = vstack((X_old, X_new)), hstack((y_old, y_new))

# 在旧数据上拟合模型

new_model.fit(X_both, y_both, epochs=150, batch_size=32, verbose=0)

最后，我们可以同时使用这两个模型来做出集成预测。

...
# make predictions with both models
yhat1 = old_model.predict(X_new)
yhat2 = new_model.predict(X_new)
# combine predictions into single array
combined = hstack((yhat1, yhat2))
# calculate outcome as mean of predictions
yhat = mean(combined, axis=-1)

...

# 用两个模型进行预测

yhat1 = old_model.predict(X_new)

yhat2 = new_model.predict(X_new)

# 将预测组合成一个数组

combined = hstack((yhat1, yhat2))

# 将结果计算为预测的平均值

yhat = mean(combined, axis=-1)

将这些内容结合起来，使用现有模型和在新旧数据上拟合的新模型的集成进行更新的完整示例列在下面。

# ensemble old neural network with new model fit on old and new data
from numpy import hstack
from numpy import vstack
from numpy import mean
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)
# record the number of input features in the data
n_features = X.shape[1]
# split into old and new data
X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)
# define the old model
old_model = Sequential()
old_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
old_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
old_model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
old_model.compile(optimizer=opt, loss='binary_crossentropy')
# fit the model on old data
old_model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

# save model...

# load model...

# define the new model
new_model = Sequential()
new_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))
new_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))
new_model.add(Dense(1, activation='sigmoid'))
# define the optimization algorithm
opt = SGD(learning_rate=0.01, momentum=0.9)
# compile the model
new_model.compile(optimizer=opt, loss='binary_crossentropy')
# create a composite dataset of old and new data
X_both, y_both = vstack((X_old, X_new)), hstack((y_old, y_new))
# fit the model on old data
new_model.fit(X_both, y_both, epochs=150, batch_size=32, verbose=0)

# make predictions with both models
yhat1 = old_model.predict(X_new)
yhat2 = new_model.predict(X_new)
# combine predictions into single array
combined = hstack((yhat1, yhat2))
# calculate outcome as mean of predictions
yhat = mean(combined, axis=-1)

# 集成现有神经网络和在新旧数据上拟合的新模型

from numpy import hstack

from numpy import vstack

from numpy import mean

from sklearn.datasets import make_classification

from sklearn.model_selection import train_test_split

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

from tensorflow.keras.optimizers import SGD

# 定义数据集

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=1)

# 记录数据中的输入特征数量

n_features = X.shape[1]

# 分割为旧数据和新数据

X_old, X_new, y_old, y_new = train_test_split(X, y, test_size=0.50, random_state=1)

# 定义旧模型

old_model = Sequential()

old_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))

old_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))

old_model.add(Dense(1, activation='sigmoid'))

# 定义优化算法

opt = SGD(learning_rate=0.01, momentum=0.9)

# 编译模型

old_model.compile(optimizer=opt, loss='binary_crossentropy')

# 在旧数据上拟合模型

old_model.fit(X_old, y_old, epochs=150, batch_size=32, verbose=0)

# 保存模型...

# 加载模型...

# 定义新模型

new_model = Sequential()

new_model.add(Dense(20, kernel_initializer='he_normal', activation='relu', input_dim=n_features))

new_model.add(Dense(10, kernel_initializer='he_normal', activation='relu'))

new_model.add(Dense(1, activation='sigmoid'))

# 定义优化算法

opt = SGD(learning_rate=0.01, momentum=0.9)

# 编译模型

new_model.compile(optimizer=opt, loss='binary_crossentropy')

# 创建一个由旧数据和新数据组成的复合数据集

X_both, y_both = vstack((X_old, X_new)), hstack((y_old, y_new))

# 在旧数据上拟合模型

new_model.fit(X_both, y_both, epochs=150, batch_size=32, verbose=0)

# 用两个模型进行预测

yhat1 = old_model.predict(X_new)

yhat2 = new_model.predict(X_new)

# 将预测组合成一个数组

combined = hstack((yhat1, yhat2))

# 将结果计算为预测的平均值

yhat = mean(combined, axis=-1)

进一步阅读

如果您想深入了解，本节提供了更多关于该主题的资源。

教程

总结

在本教程中，您了解了如何根据新数据更新深度学习神经网络模型。

具体来说，你学到了：

当底层数据发生变化或提供了新的标记数据时，神经网络模型可能需要更新。
如何仅使用新数据或新旧数据组合更新已训练的神经网络模型。
如何创建由现有模型和仅在新数据或新旧数据组合上训练的新模型组成的集成。

你有什么问题吗？
在下面的评论中提出你的问题，我会尽力回答。

关于此主题的更多信息

从头开始的简单遗传算法（Python）

用于函数优化的随机搜索和网格搜索

如何使用更多数据更新神经网络模型的 42 条回复

Jack 2021年3月5日晚上6:59 #

您好，Jason Brownlee 博士

很好的例子
但我在网站上看不到完整的代码，因为我无法移动截图并看到所有代码，我能看到所有代码吗？

回复
- Jason Brownlee 2021年3月6日早上5:15 #
  
  抱歉，我不太明白。您能详细说明一下吗？
  
  您是指您看不到完整的代码示例吗？
  
  回复
- Cpc 2021年3月6日早上6:49 #
  
  尝试按住代码框几秒钟。我那样做了，出现了一个水平条。现在我可以读懂所有代码了。
  
  回复
  - Jason Brownlee 2021年3月6日上午9:02 #
    
    复制粘贴功能已修复，抱歉。
    
    回复
Vidya 2021年3月8日下午1:18 #

你好，Jason。

kernel_initailizer = ‘he_normal’ 是什么意思？尝试从 tensorflow.keras 网站上理解，但不太清楚。

非常感谢！

回复
- Jason Brownlee 2021年3月8日下午1:35 #
  
  它定义了我们在设置训练之前设置随机权重的方式，称为权重初始化。
  
  你可以在这里了解更多
  https://machinelearning.org.cn/weight-initialization-for-deep-learning-neural-networks/
  
  回复
  - Vidya 2021年3月8日下午9:56 #
    
    谢谢！
    
    回复
Vidya 2021年3月8日下午1:24 #

嗨，Jason。

感谢这篇帖子。
当我们获得新数据时，不是总是将新数据添加到旧数据并重新训练更好吗？因为历史数据越多，推理效果越好。但这会因情况而异吗？

回复
- Jason Brownlee 2021年3月8日下午1:36 #
  
  不客气。
  
  没有好的通用规则。这真的取决于有多少新数据以及它们有多不同。
  
  回复
Niranjan 2021年3月12日下午11:30 #

嗨，Jason，

感谢这篇文章。

我只是想知道您对向现有模型添加新标签有什么看法。
例如，我已经训练了一个模型 A，包含 10 个类别，然后过了一段时间我想将另外两个类别的的数据集添加到现有数据中，一个选择是使用所有 12 个类别（10 个旧 + 2 个新）重新训练整个模型，并使用之前的权重。
但我的顾虑是，您认为我们可以在不使用旧数据或旧数据的一部分和所有新数据集的情况下，用 12 个类别 [10 个旧 + 2 个新] 重新训练模型吗？我们可以使用之前训练过的模型权重。

谢谢

回复
- Jason Brownlee 2021年3月13日上午5:33 #
  
  这感觉更像是迁移学习，而不是模型更新。我可能会去掉输出层，然后拟合一个新的输出层。
  
  不过，还是请您尝试并发现对您的数据集最有效的方法。
  
  回复
  - Nishant Kumar 2022年2月3日下午4:23 #
    
    你好 Jason，
    
    这是针对神经网络的。
    
    假设我有一个强大的 sklearn 模型，但它不支持 partial fit。
    
    在这种情况下，我们应该如何进行？
    
    回复
    - James Carmichael 2022年2月4日上午10:21 #
      
      你好 Nishant…请澄清您想用您的模型实现什么目标，以便我能更好地帮助您。
      
      回复
yan 2021年4月28日上午10:06 #

非常感谢您的这项工作。

回复
- Jason Brownlee 2021年4月29日上午6:21 #
  
  不客气。
  
  回复
Mostafa Amin RIZK 2021年7月14日上午12:26 #

你好，
关于这个主题，是否有已发表的研究文章？

回复
- Jason Brownlee 2021年7月14日上午5:30 #
  
  也许可以在 scholar.google.com 上搜索。
  
  回复
ishaque 2021年7月14日下午8:38 #

老师您好，我的问题是，要实现“在新的数据上更新模型”，我需要更改模型的 input_shape 来在新模型上训练新数据，所以请告诉我如何更改现有模型的 input_shape。

回复
- Jason Brownlee 2021年7月15日上午5:28 #
  
  您可以移除模型中的输入层，并定义一个具有所需形状的新输入层。使用函数式 API 可以轻松实现这一点，这在迁移学习中经常发生（请参阅博客上的一些示例）。
  
  回复
  - ishaque 2021年7月16日下午11:00 #
    
    您能给我提供关于“在函数式 API 中移除模型的输入层并定义具有所需形状的新输入层”的代码吗？我一直在寻找解决方案但找不到？
    
    FYI 我已经阅读了关于函数式 API 的博客，写得很好，谢谢。
    请帮助。
    
    回复
    - Jason Brownlee 2021年7月17日上午5:23 #
      
      博客上有很多这方面的例子，请使用搜索功能，查找有关图像分类数据集的特征选择的示例。
      
      回复
      - ishaque 2021年7月17日下午4:32 #
        
        请分享更改现有模型输入的代码，我非常困惑。请帮忙。
      - Jason Brownlee 2021年7月18日上午5:21 #
        
        抱歉，我没有能力为您准备定制代码。
ww 2021年7月26日上午11:23 #

你好，老师。感谢您的帖子。
我现在使用 NN 模型进行时间序列预测，当我有新的时间序列数据时，我能否使用新的时间序列数据更新我的 NN 模型？

回复
- Jason Brownlee 2021年7月27日上午5:04 #
  
  上面的教程向您展示了如何更新您的模型。
  
  回复
recohut 2021年10月26日上午4:07 #

这种再训练策略不会导致灾难性遗忘吗？您能否也做一个关于增量学习再训练的教程？

回复
- Adrian Tam 2021年10月27日上午2:51 #
  
  是的，这可能导致灾难性遗忘，所以例如，我们在再训练时使用较小的学习率。
  
  回复
Saurav Pandey 2022年2月14日下午5:56 #

先生，
首先，非常感谢您的这些博文。我有一些关于在新数据上重新训练模型的问题。
1. 在我们用新旧数据集更新旧模型的策略中，这种策略不会很容易导致对旧数据集过拟合吗？
2. 如果我们使用旧数据样本和所有新数据来更新模型，会怎样？这种策略会起作用吗？

回复
Shruti 2022年2月23日上午7:17 #

你好 Jason，

是否可以更改成本函数来更新训练好的模型？假设我开始使用 MSE 作为成本函数进行训练，但现在有了新数据集，我想用 MAE 作为成本函数进行更新。

回复
- James Carmichael 2022年2月23日下午12:18 #
  
  你好 Shruti…您可能会发现以下内容很有趣
  
  https://stackoverflow.com/questions/60996892/how-to-replace-loss-function-during-training-tensorflow-keras
  
  回复
Mara 2022年3月25日下午8:50 #

你好，
首先，非常感谢您的博文！它们非常有价值。
现在，我想知道如何处理归一化数据。具体来说：我使用归一化数据训练我的模型，训练集、验证集和测试集都是使用训练集的最小和最大值进行缩放的。我的问题是：我应该使用原始训练数据的最小和最大值来缩放新数据，还是使用它们自己的最小和最大值来缩放新数据？

回复
- James Carmichael 2022年3月28日上午8:03 #
  
  你好 Mara…新的验证数据不需要缩放。归一化和缩放用于改进训练。
  
  回复
Peter Allan 2022年4月5日下午10:55 #

我有一些新数据，我希望将其与我的旧数据结合，并根据这些新数据更新我的预训练网络。我的起点将是之前的最佳权重以及我之前停止训练时使用的学习率（我正在使用学习率调度器）。

首先，进行测试-训练拆分的最佳方法是什么？看起来您将所有新数据添加到训练集中，也就是说，没有保留用于测试集。将新数据按与旧数据相同的比例进行拆分（例如 80:20），并将数据添加到旧的训练集和测试集中，这样不是更好吗？

此外，我在训练之前对旧数据使用了各种转换器（min-max、box-cox 等）。这里的最佳策略是什么？在继续更新网络之前，在合并后的新旧数据上重新拟合转换器？还是将应用于旧数据的转换器应用于新数据？我在这两种方法中都看到了优点和缺点。

回复
- James Carmichael 2022年4月6日上午8:38 #
  
  你好 Peter…您可能会发现以下内容有益
  
  https://machinelearning.org.cn/train-test-split-for-evaluating-machine-learning-algorithms/
  
  回复
Emily Hunter 2022年4月6日下午1:35 #

嗨，Jason，

感谢您这篇精彩的文章。

我有一个关于更新模型的问题。我开发了一个用于分类的 XGBoost 模型，我使用了 24 个月的数据进行训练和调优。我想在生产环境中运行该模型。但是，我想使用最新的一个月的数据重新训练模型（在可用时），但使用与我调优时相同的超参数集。方法是，我将丢弃最旧的一个月的数据，然后添加新的一个月的数据，这样我将始终拥有一个 24 个月的窗口。
这样合理吗？还是我每次运行都需要重新调优超参数？

回复
- James Carmichael 2022年4月7日上午9:50 #
  
  你好 Emily…以下资源可能有助于阐明如何使用新数据更新模型。
  
  https://machinelearning.org.cn/update-neural-network-models-with-more-data/
  
  回复
Peter Allan 2022年4月8日下午6:12 #

我知道如何分割数据，我的问题更多地是关于如何用新数据更新旧数据和进行转换，但感谢您的帮助。

回复
- James Carmichael 2022年4月9日上午8:45 #
  
  不客气，Peter！
  
  回复
Darshan S.P 2022年9月1日下午2:22 #

先生，您好，

我是模型训练方面的新手。

加载预训练模型后，我们是否应始终使用 model.compile？

这会以新的数据训练开始时具有全新的权重和偏差，对吗？

我需要知道何时在重新训练预训练模型时使用 compile=True 或 False。

回复
- James Carmichael 2022年9月2日上午9:25 #
  
  你好 Darshan…以下讨论应该能让你更清楚。
  
  https://stackoverflow.com/questions/47995324/does-model-compile-initialize-all-the-weights-and-biases-in-keras-tensorflow
  
  回复
Ziming Wang 2022年10月11日下午12:11 #

老师您好，感谢您的博文！
我有两个问题。
首先，仅使用新数据进行训练与结合新旧数据进行训练在训练过程中的本质区别是什么？
其次，如果我们能够直接通过 tf 修改训练好的模型的权重？如果我们能获得训练好的模型的权重并对其进行修改？

回复
- James Carmichael 2022年10月12日上午7:57 #
  
  你好 Ziming…请澄清你的第一个问题，以便我们能更好地帮助你。关于你的第二个问题，以下资源可能很有趣。
  
  https://machinelearning.org.cn/save-load-machine-learning-models-python-scikit-learn/
  
  回复

导航

如何用更多数据更新神经网络模型

教程概述

更新神经网络模型

重新训练更新策略

仅在新数据上更新模型

在新旧数据上更新模型

集成更新策略

集成模型与仅在新数据上训练的模型

集成模型与在新旧数据上训练的模型

进一步阅读

教程

总结

关于此主题的更多信息

如何使用更多数据更新神经网络模型的 42 条回复

Leave a Reply Click here to cancel reply.

导航

教程概述

更新神经网络模型

重新训练更新策略

仅在新数据上更新模型

在新旧数据上更新模型

集成更新策略

集成模型与仅在新数据上训练的模型

集成模型与在新旧数据上训练的模型

进一步阅读

教程

总结

关于此主题的更多信息

如何使用更多数据更新神经网络模型 的 42 条回复

Leave a Reply Click here to cancel reply.

如何使用更多数据更新神经网络模型的 42 条回复