HyperOpt 用于带有 Scikit-Learn 的自动化机器学习

作者： Jason Brownlee 于 2020年9月7日发布在 Python 机器学习 23

自动化机器学习（AutoML）是指在用户几乎不参与的情况下，为预测建模任务自动发现表现良好的模型的技术。

HyperOpt 是一个用于大规模 AutoML 的开源库，而 HyperOpt-Sklearn 是 HyperOpt 的一个包装器，它为流行的 Scikit-Learn 机器学习库支持 AutoML，包括数据预处理转换以及分类和回归算法套件。

在本教程中，您将了解如何在 Python 中使用 HyperOpt 为 Scikit-Learn 实现自动化机器学习。

完成本教程后，您将了解：

Hyperopt-Sklearn 是一个用于 Scikit-Learn 数据预处理和机器学习模型的 AutoML 的开源库。
如何使用 Hyperopt-Sklearn 自动发现分类任务中的高性能模型。
如何使用 Hyperopt-Sklearn 自动发现回归任务中的高性能模型。

让我们开始吧。

HyperOpt for Automated Machine Learning With Scikit-Learn

HyperOpt 用于带有 Scikit-Learn 的自动化机器学习
照片作者：Neil Williamson，部分权利保留。

教程概述

本教程分为四个部分；它们是

HyperOpt 和 HyperOpt-Sklearn
如何安装和使用 HyperOpt-Sklearn
HyperOpt-Sklearn 用于分类
HyperOpt-Sklearn 用于回归

HyperOpt 和 HyperOpt-Sklearn

HyperOpt 是由 James Bergstra 开发的用于贝叶斯优化的开源 Python 库。

它专为拥有数百个参数的模型的大规模优化而设计，并允许将优化过程扩展到多个核心和多个机器。

该库被明确用于优化机器学习管道，包括数据预处理、模型选择和模型超参数。

我们的方法是暴露底层表达式图，该图描述了性能指标（例如，验证示例的分类准确率）如何从控制单个处理步骤的应用方式，甚至控制哪些处理步骤包含在内的超参数中计算出来。

— Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures, 2013。

HyperOpt 直接使用起来具有挑战性，需要仔细指定优化过程和搜索空间。

HyperOpt 的一个扩展称为 HyperOpt-Sklearn，它允许将 HyperOpt 过程应用于流行的 Scikit-Learn 开源机器学习库提供的数据预处理和机器学习模型。

HyperOpt-Sklearn 封装了 HyperOpt 库，并允许自动搜索分类和回归任务的数据预处理方法、机器学习算法和模型超参数。

……我们引入了 Hyperopt-Sklearn：一个项目，它将自动算法配置的好处带给 Python 和 scikit-learn 用户。Hyperopt-Sklearn 使用 Hyperopt 来描述 Scikit-Learn 组件（包括预处理和分类模块）的可能配置的搜索空间。

— Hyperopt-Sklearn: Automatic Hyperparameter Configuration for Scikit-Learn, 2014。

现在我们熟悉了 HyperOpt 和 HyperOpt-Sklearn，让我们看看如何使用 HyperOpt-Sklearn。

如何安装和使用 HyperOpt-Sklearn

第一步是安装 HyperOpt 库。

可以使用 pip 包管理器实现，如下所示：

sudo pip install hyperopt

1	sudo pip install hyperopt

安装完成后，我们可以通过键入以下命令来确认安装成功并检查库的版本：

sudo pip show hyperopt

1	sudo pip show hyperopt

这将总结安装的 HyperOpt 版本，确认正在使用现代版本。

Name: hyperopt
Version: 0.2.3
Summary: Distributed Asynchronous Hyperparameter Optimization
Home-page: http://hyperopt.github.com/hyperopt/
Author: James Bergstra
Author-email: james.bergstra@gmail.com
License: BSD
Location: ...
Requires: tqdm, six, networkx, future, scipy, cloudpickle, numpy
Required-by:

名称：hyperopt

版本：0.2.3

摘要：分布式异步超参数优化

主页：http://hyperopt.github.com/hyperopt/

作者：James Bergstra

作者邮箱：james.bergstra@gmail.com

许可证：BSD

位置：...

需要：tqdm, six, networkx, future, scipy, cloudpickle, numpy

所需通过

接下来，我们必须安装 HyperOpt-Sklearn 库。

这也可以使用 pip 安装，但我们必须通过克隆存储库并从本地文件运行安装来手动执行此操作，如下所示：

git clone git@github.com:hyperopt/hyperopt-sklearn.git
cd hyperopt-sklearn
sudo pip install .
cd ..

git clone git@github.com:hyperopt/hyperopt-sklearn.git

cd hyperopt-sklearn

sudo pip install .

cd ..

同样，我们可以通过以下命令检查版本号来确认安装成功：

sudo pip show hpsklearn

1	sudo pip show hpsklearn

这将总结安装的 HyperOpt-Sklearn 版本，确认正在使用现代版本。

Name: hpsklearn
Version: 0.0.3
Summary: Hyperparameter Optimization for sklearn
Home-page: http://hyperopt.github.com/hyperopt-sklearn/
Author: James Bergstra
Author-email: anon@anon.com
License: BSD
Location: ...
Requires: nose, scikit-learn, numpy, scipy, hyperopt
Required-by:

名称：hpsklearn

版本：0.0.3

摘要：Scikit-Learn 的超参数优化

主页：http://hyperopt.github.com/hyperopt-sklearn/

作者：James Bergstra

作者邮箱：anon@anon.com

许可证：BSD

位置：...

需要：nose, scikit-learn, numpy, scipy, hyperopt

所需通过

现在已安装所需库，我们可以查看 HyperOpt-Sklearn API。

使用 HyperOpt-Sklearn 非常简单。通过创建和配置 HyperoptEstimator 类的实例来定义搜索过程。

可以通过“algo”参数指定用于搜索的算法，通过“max_evals”参数指定搜索执行的评估次数，并且可以通过“trial_timeout”参数对每个管道的评估施加限制。

...
# define search
model = HyperoptEstimator(..., algo=tpe.suggest, max_evals=50, trial_timeout=120)

...

# 定义搜索

model = HyperoptEstimator(..., algo=tpe.suggest, max_evals=50, trial_timeout=120)

有许多不同的优化算法可用，包括：

随机搜索
Parzen 估计树
退火
树
高斯过程树

“Parzen 估计树”是一个不错的默认选择，您可以在论文“Algorithms for Hyper-Parameter Optimization”中了解有关算法类型的更多信息。

对于分类任务，“classifier”参数指定模型搜索空间，对于回归，“regressor”参数指定模型搜索空间，这两者都可以设置为使用库提供的预定义模型列表，例如“any_classifier”和“any_regressor”。

类似地，数据预处理的搜索空间通过“preprocessing”参数指定，也可以使用预定义的数据预处理步骤列表“any_preprocessing。

...
# define search
model = HyperoptEstimator(classifier=any_classifier('cla'), preprocessing=any_preprocessing('pre'), ...)

...

# 定义搜索

model = HyperoptEstimator(classifier=any_classifier('cla'), preprocessing=any_preprocessing('pre'), ...)

有关搜索其他参数的更多信息，您可以直接查看该类的源代码。

HyperoptEstimator 类的参数

定义搜索后，可以通过调用 fit() 函数来执行它。

...
# perform the search
model.fit(X_train, y_train)

...

# 执行搜索

model.fit(X_train, y_train)

运行结束后，可以通过调用 score() 函数在新的数据上评估表现最佳的模型。

...
# summarize performance
acc = model.score(X_test, y_test)
print("Accuracy: %.3f" % acc)

...

# 总结性能

acc = model.score(X_test, y_test)

print("Accuracy: %.3f" % acc)

最后，我们可以通过 best_model() 函数检索在训练数据集上表现最佳的转换、模型和模型配置的管道。

...
# summarize the best model
print(model.best_model())

...

# 总结最佳模型

print(model.best_model())

现在我们熟悉了 API，让我们来看一些实际示例。

HyperOpt-Sklearn 用于分类

在本节中，我们将使用 HyperOpt-Sklearn 为声纳数据集发现一个模型。

声纳数据集是一个标准机器学习数据集，包含 208 行数据，其中有 60 个数值输入变量和一个具有两个类别值的目标变量，例如二元分类。

使用重复分层 10 折交叉验证（重复三次）的测试框架，朴素模型可以达到约 53% 的准确率。表现最佳的模型可以在相同的测试框架上达到约 88% 的准确率。这提供了该数据集的预期性能范围。

该数据集涉及预测声纳回波是否指示岩石或模拟水雷。

无需下载数据集；我们将在工作示例中自动下载它。

下面的示例下载数据集并汇总其形状。

# summarize the sonar dataset
from pandas import read_csv
# load dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/sonar.csv'
dataframe = read_csv(url, header=None)
# split into input and output elements
data = dataframe.values
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)

# 汇总声纳数据集

from pandas import read_csv

# 加载数据集

url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/sonar.csv'

dataframe = read_csv(url, header=None)

# 分割输入和输出元素

data = dataframe.values

X, y = data[:, :-1], data[:, -1]

print(X.shape, y.shape)

运行此示例将下载数据集并将其拆分为输入和输出元素。正如预期的那样，我们可以看到有 208 行数据和 60 个输入变量。

(208, 60) (208,)

1	(208, 60) (208,)

接下来，我们使用 HyperOpt-Sklearn 为声纳数据集找到一个好的模型。

我们可以进行一些基本的数据预处理，包括将目标字符串转换为类标签，然后将数据集拆分为训练集和测试集。

...
# minimally prepare dataset
X = X.astype('float32')
y = LabelEncoder().fit_transform(y.astype('str'))
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)

...

# 最少地准备数据集

X = X.astype('float32')

y = LabelEncoder().fit_transform(y.astype('str'))

# 拆分为训练集和测试集

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)

接下来，我们可以定义搜索过程。我们将探索库可用的所有分类算法和所有数据转换，并使用 TPE（Parzen 估计树）搜索算法，该算法在“Algorithms for Hyper-Parameter Optimization”中有介绍。

搜索将评估 50 个管道，并将每次评估限制在 30 秒。

...
# define search
model = HyperoptEstimator(classifier=any_classifier('cla'), preprocessing=any_preprocessing('pre'), algo=tpe.suggest, max_evals=50, trial_timeout=30)

...

# 定义搜索

model = HyperoptEstimator(classifier=any_classifier('cla'), preprocessing=any_preprocessing('pre'), algo=tpe.suggest, max_evals=50, trial_timeout=30)

然后我们开始搜索。

...
# perform the search
model.fit(X_train, y_train)

...

# 执行搜索

model.fit(X_train, y_train)

运行结束后，我们将报告模型在保留数据集上的性能，并总结表现最佳的管道。

...
# summarize performance
acc = model.score(X_test, y_test)
print("Accuracy: %.3f" % acc)
# summarize the best model
print(model.best_model())

...

# 总结性能

acc = model.score(X_test, y_test)

print("Accuracy: %.3f" % acc)

# 总结最佳模型

print(model.best_model())

将这些结合起来，完整的示例列在下面。

# example of hyperopt-sklearn for the sonar classification dataset
from pandas import read_csv
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from hpsklearn import HyperoptEstimator
from hpsklearn import any_classifier
from hpsklearn import any_preprocessing
from hyperopt import tpe
# load dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/sonar.csv'
dataframe = read_csv(url, header=None)
# split into input and output elements
data = dataframe.values
X, y = data[:, :-1], data[:, -1]
# minimally prepare dataset
X = X.astype('float32')
y = LabelEncoder().fit_transform(y.astype('str'))
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)
# define search
model = HyperoptEstimator(classifier=any_classifier('cla'), preprocessing=any_preprocessing('pre'), algo=tpe.suggest, max_evals=50, trial_timeout=30)
# perform the search
model.fit(X_train, y_train)
# summarize performance
acc = model.score(X_test, y_test)
print("Accuracy: %.3f" % acc)
# summarize the best model
print(model.best_model())

# hyperopt-sklearn 用于声纳分类数据集的示例

from pandas import read_csv

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import LabelEncoder

from hpsklearn import HyperoptEstimator

from hpsklearn import any_classifier

from hpsklearn import any_preprocessing

from hyperopt import tpe

# 加载数据集

url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/sonar.csv'

dataframe = read_csv(url, header=None)

# 分割输入和输出元素

data = dataframe.values

X, y = data[:, :-1], data[:, -1]

# 最少地准备数据集

X = X.astype('float32')

y = LabelEncoder().fit_transform(y.astype('str'))

# 拆分为训练集和测试集

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)

# 定义搜索

model = HyperoptEstimator(classifier=any_classifier('cla'), preprocessing=any_preprocessing('pre'), algo=tpe.suggest, max_evals=50, trial_timeout=30)

# 执行搜索

model.fit(X_train, y_train)

# 总结性能

acc = model.score(X_test, y_test)

print("Accuracy: %.3f" % acc)

# 总结最佳模型

print(model.best_model())

运行示例可能需要几分钟。

将报告搜索的进度，您会看到一些可以安全忽略的警告。

运行结束后，对表现最佳的模型在保留数据集上进行评估，并打印发现的管道以供将来使用。

注意：您的结果可能有所不同，因为算法或评估过程的随机性，或者数值精度的差异。请考虑运行示例几次并比较平均结果。

在这种情况下，我们可以看到所选模型在保留测试集上的准确率约为 85.5%。该管道涉及一个梯度提升模型，没有进行任何预处理。

Accuracy: 0.855
{'learner': GradientBoostingClassifier(ccp_alpha=0.0, criterion='friedman_mse', init=None,
                           learning_rate=0.009132299586303643, loss='deviance',
                           max_depth=None, max_features='sqrt',
                           max_leaf_nodes=None, min_impurity_decrease=0.0,
                           min_impurity_split=None, min_samples_leaf=1,
                           min_samples_split=2, min_weight_fraction_leaf=0.0,
                           n_estimators=342, n_iter_no_change=None,
                           presort='auto', random_state=2,
                           subsample=0.6844206624548879, tol=0.0001,
                           validation_fraction=0.1, verbose=0,
                           warm_start=False), 'preprocs': (), 'ex_preprocs': ()}

准确率：0.855

{'learner': GradientBoostingClassifier(ccp_alpha=0.0, criterion='friedman_mse', init=None,

learning_rate=0.009132299586303643, loss='deviance',

max_depth=None, max_features='sqrt',

max_leaf_nodes=None, min_impurity_decrease=0.0,

min_impurity_split=None, min_samples_leaf=1,

min_samples_split=2, min_weight_fraction_leaf=0.0,

n_estimators=342, n_iter_no_change=None,

presort='auto', random_state=2,

subsample=0.6844206624548879, tol=0.0001,

validation_fraction=0.1, verbose=0,

warm_start=False), 'preprocs': (), 'ex_preprocs': ()}

打印出的模型可以直接使用，例如，可以将代码复制粘贴到另一个项目中。

接下来，让我们看看如何将 HyperOpt-Sklearn 用于回归预测建模问题。

HyperOpt-Sklearn 用于回归

在本节中，我们将使用 HyperOpt-Sklearn 为房价数据集发现一个模型。

房价数据集是一个标准的机器学习数据集，包含 506 行数据，具有 13 个数值输入变量和一个数值目标变量。

使用重复分层 10 折交叉验证（重复三次）的测试框架，朴素模型可以达到约 6.6 的平均绝对误差 (MAE)。表现最佳的模型可以在相同的测试框架上达到约 1.9 的 MAE。这提供了该数据集的预期性能范围。

该数据集涉及根据美国波士顿郊区的房屋细节预测房价。

无需下载数据集；我们将在工作示例中自动下载它。

下面的示例下载数据集并汇总其形状。

# summarize the auto insurance dataset
from pandas import read_csv
# load dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv'
dataframe = read_csv(url, header=None)
# split into input and output elements
data = dataframe.values
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)

# 总结汽车保险数据集

from pandas import read_csv

# 加载数据集

url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv'

dataframe = read_csv(url, header=None)

# 分割输入和输出元素

data = dataframe.values

X, y = data[:, :-1], data[:, -1]

print(X.shape, y.shape)

运行示例会下载数据集并将其分割为输入和输出元素。正如预期的那样，我们可以看到有 63 行数据，其中有一个输入变量。

(208, 60), (208,)

1	(208, 60), (208,)

接下来，我们可以使用 HyperOpt-Sklearn 为汽车保险数据集找到一个好的模型。

使用 HyperOpt-Sklearn 进行回归与用于分类相同，只是必须指定“regressor”参数。

在这种情况下，我们希望优化 MAE，因此我们将“loss_fn”参数设置为 Scikit-learn 提供的 mean_absolute_error() 函数。

...
# define search
model = HyperoptEstimator(regressor=any_regressor('reg'), preprocessing=any_preprocessing('pre'), loss_fn=mean_absolute_error, algo=tpe.suggest, max_evals=50, trial_timeout=30)

...

# 定义搜索

model = HyperoptEstimator(regressor=any_regressor('reg'), preprocessing=any_preprocessing('pre'), loss_fn=mean_absolute_error, algo=tpe.suggest, max_evals=50, trial_timeout=30)

将这些结合起来，完整的示例列在下面。

# example of hyperopt-sklearn for the housing regression dataset
from pandas import read_csv
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from hpsklearn import HyperoptEstimator
from hpsklearn import any_regressor
from hpsklearn import any_preprocessing
from hyperopt import tpe
# load dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv'
dataframe = read_csv(url, header=None)
# split into input and output elements
data = dataframe.values
data = data.astype('float32')
X, y = data[:, :-1], data[:, -1]
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)
# define search
model = HyperoptEstimator(regressor=any_regressor('reg'), preprocessing=any_preprocessing('pre'), loss_fn=mean_absolute_error, algo=tpe.suggest, max_evals=50, trial_timeout=30)
# perform the search
model.fit(X_train, y_train)
# summarize performance
mae = model.score(X_test, y_test)
print("MAE: %.3f" % mae)
# summarize the best model
print(model.best_model())

# hyperopt-sklearn 用于房价回归数据集的示例

from pandas import read_csv

from sklearn.model_selection import train_test_split

from sklearn.metrics import mean_absolute_error

from hpsklearn import HyperoptEstimator

from hpsklearn import any_regressor

from hpsklearn import any_preprocessing

from hyperopt import tpe

# 加载数据集

url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv'

dataframe = read_csv(url, header=None)

# 分割输入和输出元素

data = dataframe.values

data = data.astype('float32')

X, y = data[:, :-1], data[:, -1]

# 拆分为训练集和测试集

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)

# 定义搜索

model = HyperoptEstimator(regressor=any_regressor('reg'), preprocessing=any_preprocessing('pre'), loss_fn=mean_absolute_error, algo=tpe.suggest, max_evals=50, trial_timeout=30)

# 执行搜索

model.fit(X_train, y_train)

# 总结性能

mae = model.score(X_test, y_test)

print("MAE: %.3f" % mae)

# 总结最佳模型

print(model.best_model())

运行示例可能需要几分钟。

将报告搜索的进度，您会看到一些可以安全忽略的警告。

运行结束后，将对表现最佳的模型在保留数据集上进行评估，并打印发现的管道以供将来使用。

注意：您的结果可能有所不同，因为算法或评估过程的随机性，或者数值精度的差异。请考虑运行示例几次并比较平均结果。

在这种情况下，我们可以看到所选模型在保留测试集上的 MAE 约为 0.883，这似乎是有效的。该管道涉及一个XGBRegressor模型，没有进行任何预处理。

注意：要使搜索使用 XGBoost，您必须安装 XGBoost 库。

MAE: 0.883
{'learner': XGBRegressor(base_score=0.5, booster='gbtree',
             colsample_bylevel=0.5843250948679669, colsample_bynode=1,
             colsample_bytree=0.6635160670570662, gamma=6.923399395303031e-05,
             importance_type='gain', learning_rate=0.07021104887683309,
             max_delta_step=0, max_depth=3, min_child_weight=5, missing=nan,
             n_estimators=4000, n_jobs=1, nthread=None, objective='reg:linear',
             random_state=0, reg_alpha=0.5690202874759704,
             reg_lambda=3.3098341637038, scale_pos_weight=1, seed=1,
             silent=None, subsample=0.7194797262656784, verbosity=1), 'preprocs': (), 'ex_preprocs': ()}

MAE：0.883

{'learner': XGBRegressor(base_score=0.5, booster='gbtree',

colsample_bylevel=0.5843250948679669, colsample_bynode=1,

colsample_bytree=0.6635160670570662, gamma=6.923399395303031e-05,

importance_type='gain', learning_rate=0.07021104887683309,

max_delta_step=0, max_depth=3, min_child_weight=5, missing=nan,

n_estimators=4000, n_jobs=1, nthread=None, objective='reg:linear',

random_state=0, reg_alpha=0.5690202874759704,

reg_lambda=3.3098341637038, scale_pos_weight=1, seed=1,

silent=None, subsample=0.7194797262656784, verbosity=1), 'preprocs': (), 'ex_preprocs': ()}

进一步阅读

如果您想深入了解，本节提供了更多关于该主题的资源。

总结

在本教程中，您了解了如何在 Python 中使用 HyperOpt 为 Scikit-Learn 实现自动机器学习。

具体来说，你学到了：

Hyperopt-Sklearn 是一个用于 Scikit-Learn 数据预处理和机器学习模型的 AutoML 的开源库。
如何使用 Hyperopt-Sklearn 自动发现分类任务中的高性能模型。
如何使用 Hyperopt-Sklearn 自动发现回归任务中的高性能模型。

你有什么问题吗？
在下面的评论中提出你的问题，我会尽力回答。

关于此主题的更多信息

TPOT 在 Python 中用于自动化机器学习

使用随机搜索和网格搜索进行超参数优化

23 条回复关于 HyperOpt 助力 Scikit-Learn 自动化机器学习

Harrison 2020年9月11日晚上7:31 #

非常有教育意义的博客，感谢分享。

回复
- Jason Brownlee 2020年9月12日早上6:07 #
  
  谢谢！
  
  回复
Zach 2020年9月12日早上8:59 #

我多年来经常看到您的作品。它通常是最好的。几年前在读研究生时，我曾使用您从头开始构建神经网络的作品作为样板。我能够添加不少功能，但这是一个非常清晰的教程，真正帮助我和我的同学们学习了整个过程。总之，这篇帖子特别有意义，因为我正在研究如何在工作中进行 HyperOpt。我决定发帖只是为了让您知道我有多感激，我甚至开始了博客！非常感谢，请继续保持出色的工作！

回复
- Jason Brownlee 2020年9月12日晚上1:18 #
  
  谢谢！
  
  为你的进步喝彩！
  
  回复

Anthony The Koala 2020年9月17日晚上10:57 #

尊敬的Jason博士，
当我运行您的第一个程序时，我遇到了运行时错误。
* 我 pip 安装了 hyperopt
* 我无法使用以下代码，因为我没有访问权限。

git clone git@github.com:hyperopt/hyperopt-sklearn.git
cd hyperopt-sklearn
sudo pip install .
cd ..

git clone git@github.com:hyperopt/hyperopt-sklearn.git

cd hyperopt-sklearn

sudo pip install .

cd ..

* 相反，我安装了 GitHub for Windows 并 git 了 hypertop-sklearn。
* 进入我的 C:\Users\A\Documents\GitHub\hyperopt-sklearn 并执行
* pip install .
* 安装成功。没有问题。
* 尽管安装成功，运行您的第一个程序时仍然出现运行时错误
* 这是尽管在代码中包含了

import os
os.environ['OMP_NUM_THREADS'] = "1"

1 2	import os os.environ['OMP_NUM_THREADS'] = "1"

WARN: OMP_NUM_THREADS=None =>
... If you are using openblas if you are using openblas set OMP_NUM_THREAD
 risk subprocess calls hanging indefinitely
  0%|                                    | 0/1 [00:00
... If you are using openblas if you are using openblas set OMP_NUM_THREAD
 risk subprocess calls hanging indefinitely
Error.  nthreads cannot be larger than environment variable "NUMEXPR_MAX_T
  0%|                                    | 0/1 [00:00<?, ?trial/s, best lo
ob exception:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

  0%|                                    | 0/1 [00:00<?, ?trial/s, best lo
Traceback (most recent call last):
  File "", line 1, in 
  File "c:\python38\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "c:\python38\lib\multiprocessing\spawn.py", line 125, in _main
    prepare(preparation_data)
  File "c:\python38\lib\multiprocessing\spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "c:\python38\lib\multiprocessing\spawn.py", line 287, in _fixup_mai
_path
    main_content = runpy.run_path(main_path,
  File "c:\python38\lib\runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "c:\python38\lib\runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "c:\python38\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Python38\test3.py", line 25, in 
    model.fit(X_train, y_train)
  File "c:\python38\lib\site-packages\hpsklearn\estimator.py", line 787, i
    fit_iter.send(increment)
  File "c:\python38\lib\site-packages\hpsklearn\estimator.py", line 688, i
iter
    hyperopt.fmin(fn_with_timeout,
  File "c:\python38\lib\site-packages\hyperopt\fmin.py", line 469, in fmin
    return trials.fmin(
  File "c:\python38\lib\site-packages\hyperopt\base.py", line 671, in fmin
    return fmin(
  File "c:\python38\lib\site-packages\hyperopt\fmin.py", line 509, in fmin
    rval.exhaust()
  File "c:\python38\lib\site-packages\hyperopt\fmin.py", line 330, in exha
    self.run(self.max_evals - n_done, block_until_done=self.asynchronous)
  File "c:\python38\lib\site-packages\hyperopt\fmin.py", line 286, in run
    self.serial_evaluate()
  File "c:\python38\lib\site-packages\hyperopt\fmin.py", line 165, in seri
luate
    result = self.domain.evaluate(spec, ctrl)
  File "c:\python38\lib\site-packages\hyperopt\base.py", line 894, in eval
    rval = self.fn(pyll_rval)
  File "c:\python38\lib\site-packages\hpsklearn\estimator.py", line 645, i
ith_timeout
    th.start()
  File "c:\python38\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "c:\python38\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "c:\python38\lib\multiprocessing\context.py", line 327, in _Popen
    return Popen(process_obj)
  File "c:\python38\lib\multiprocessing\popen_spawn_win32.py", line 45, in
t__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "c:\python38\lib\multiprocessing\spawn.py", line 154, in get_prepar
data
    _check_not_importing_main()
  File "c:\python38\lib\multiprocessing\spawn.py", line 134, in _check_not
ting_main
    raise RuntimeError('''
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

WARN: OMP_NUM_THREADS=None =>

... If you are using openblas if you are using openblas set OMP_NUM_THREAD

risk subprocess calls hanging indefinitely

0%| | 0/1 [00:00

... If you are using openblas if you are using openblas set OMP_NUM_THREAD

risk subprocess calls hanging indefinitely

Error. nthreads cannot be larger than environment variable "NUMEXPR_MAX_T

0%| | 0/1 [00:00<?, ?trial/s, best lo

ob exception

An attempt has been made to start a new process before the

current process has finished its bootstrapping phase.

This probably means that you are not using fork to start your

child processes and you have forgotten to use the proper idiom

in the main module

if __name__ == '__main__'

freeze_support()

...

The "freeze_support()" line can be omitted if the program

is not going to be frozen to produce an executable.

0%| | 0/1 [00:00<?, ?trial/s, best lo

回溯（最近一次调用）

File "", line 1, in

File "c:\python38\lib\multiprocessing\spawn.py", line 116, in spawn_main

exitcode = _main(fd, parent_sentinel)

File "c:\python38\lib\multiprocessing\spawn.py", line 125, in _main

prepare(preparation_data)

File "c:\python38\lib\multiprocessing\spawn.py", line 236, in prepare

_fixup_main_from_path(data['init_main_from_path'])

File "c:\python38\lib\multiprocessing\spawn.py", line 287, in _fixup_mai

_path

main_content = runpy.run_path(main_path,

File "c:\python38\lib\runpy.py", line 265, in run_path

return _run_module_code(code, init_globals, run_name,

文件 "c:\python38\lib\runpy.py", 第 97 行, in _run_module_code

_run_code(code, mod_globals, init_globals,

文件 "c:\python38\lib\runpy.py", 第 87 行, in _run_code

exec(code, run_globals)

文件 "C:\Python38\test3.py", 第 25 行, in

model.fit(X_train, y_train)

文件 "c:\python38\lib\site-packages\hpsklearn\estimator.py", 第 787 行, i

fit_iter.send(increment)

文件 "c:\python38\lib\site-packages\hpsklearn\estimator.py", 第 688 行, i

iter

hyperopt.fmin(fn_with_timeout,

文件 "c:\python38\lib\site-packages\hyperopt\fmin.py", 第 469 行, in fmin

return trials.fmin(

文件 "c:\python38\lib\site-packages\hyperopt\base.py", 第 671 行, in fmin

return fmin(

文件 "c:\python38\lib\site-packages\hyperopt\fmin.py", 第 509 行, in fmin

rval.exhaust()

文件 "c:\python38\lib\site-packages\hyperopt\fmin.py", 第 330 行, in exha

self.run(self.max_evals - n_done, block_until_done=self.asynchronous)

文件 "c:\python38\lib\site-packages\hyperopt\fmin.py", 第 286 行, in run

self.serial_evaluate()

文件 "c:\python38\lib\site-packages\hyperopt\fmin.py", 第 165 行, in seri

luate

result = self.domain.evaluate(spec, ctrl)

文件 "c:\python38\lib\site-packages\hyperopt\base.py", 第 894 行, in eval

rval = self.fn(pyll_rval)

文件 "c:\python38\lib\site-packages\hpsklearn\estimator.py", 第 645 行, i

ith_timeout

th.start()

文件 "c:\python38\lib\multiprocessing\process.py", 第 121 行, in start

self._popen = self._Popen(self)

文件 "c:\python38\lib\multiprocessing\context.py", 第 224 行, in _Popen

return _default_context.get_context().Process._Popen(process_obj)

文件 "c:\python38\lib\multiprocessing\context.py", 第 327 行, in _Popen

return Popen(process_obj)

文件 "c:\python38\lib\multiprocessing\popen_spawn_win32.py", 第 45 行, in

t__

prep_data = spawn.get_preparation_data(process_obj._name)

文件 "c:\python38\lib\multiprocessing\spawn.py", 第 154 行, in get_prepar

data

_check_not_importing_main()

文件 "c:\python38\lib\multiprocessing\spawn.py", 第 134 行, in _check_not

ting_main

raise RuntimeError('''

RuntimeError

An attempt has been made to start a new process before the

current process has finished its bootstrapping phase.

This probably means that you are not using fork to start your

child processes and you have forgotten to use the proper idiom

in the main module

if __name__ == '__main__'

freeze_support()

...

The "freeze_support()" line can be omitted if the program

is not going to be frozen to produce an executable.

我不确定为什么程序会产生运行时错误。

再次注意：pip 和 git 安装成功。

谢谢你，
悉尼的Anthony

Jason Brownlee 2020年9月18日上午6:47 #

很遗憾听到这个消息，Anthony。我没有在 Windows 上操作过。

也许可以尝试在stackoverflow上发帖/搜索？

回复

Anthony The Koala 2020年9月17日晚上11:25 #

尊敬的Jason博士，
我在 Python IDE IDLE 中一行一行地输入了相同的程序，没有运行时错误！！
我不知道为什么没有运行时错误。在上次尝试时，我只是从这个页面复制了代码并将其作为脚本运行。
这次我是逐行输入的代码
这是输出的示例

..........
from hpsklearn import HyperoptEstimator, any_classifier
WARN: OMP_NUM_THREADS=None =>
... If you are using openblas if you are using openblas set OMP_NUM_THREADS=1 or risk subprocess calls hanging indefinitely
model.fit(X_train,y_train)

  0%|          | 0/1 [00:00<?, ?trial/s, best loss=?]
100%|██████████| 1/1 [00:07<00:00,  7.45s/trial, best loss: 0.2142857142857143]
100%|██████████| 1/1 [00:07<00:00,  7.46s/trial, best loss: 0.2142857142857143]

 50%|█████     | 1/2 [00:00<?, ?trial/s, best loss=?]
100%|██████████| 2/2 [00:07<00:00,  7.59s/trial, best loss: 0.1785714285714286]
100%|██████████| 2/2 [00:07<00:00,  7.65s/trial, best loss: 0.1785714285714286]

 67%|██████▋   | 2/3 [00:00<?, ?trial/s, best loss=?]
100%|██████████| 3/3 [00:07<00:00,  7.44s/trial, best loss: 0.1428571428571429]
100%|██████████| 3/3 [00:07<00:00,  7.47s/trial, best loss: 0.1428571428571429]
......................
.....................
 98%|█████████▊| 49/50 [00:00<?, ?trial/s, best loss=?]
100%|██████████| 50/50 [00:07<00:00,  7.77s/trial, best loss: 0.0714285714285714]
100%|██████████| 50/50 [00:07>> acc=model.score(X_test, y_test)
>>> acc
0.8405797101449275
>>> print(model.best_model())
{'learner': ExtraTreesClassifier(bootstrap=False, ccp_alpha=0.0, class_weight=None,
                     criterion='gini', max_depth=None,
                     max_features=0.4474405338213263, max_leaf_nodes=None,
                     max_samples=None, min_impurity_decrease=0.0,
                     min_impurity_split=None, min_samples_leaf=2,
                     min_samples_split=2, min_weight_fraction_leaf=0.0,
                     n_estimators=388, n_jobs=1, oob_score=False,
                     random_state=4, verbose=False, warm_start=False), 'preprocs': (Normalizer(copy=True, norm='l1'),), 'ex_preprocs': ()}

..........

from hpsklearn import HyperoptEstimator, any_classifier

WARN: OMP_NUM_THREADS=None =>

... If you are using openblas if you are using openblas set OMP_NUM_THREADS=1 or risk subprocess calls hanging indefinitely

model.fit(X_train,y_train)

0%| | 0/1 [00:00<?, ?trial/s, best loss=?]

100%|██████████| 1/1 [00:07<00:00, 7.45s/trial, best loss: 0.2142857142857143]

100%|██████████| 1/1 [00:07<00:00, 7.46s/trial, best loss: 0.2142857142857143]

50%|█████ | 1/2 [00:00<?, ?trial/s, best loss=?]

100%|██████████| 2/2 [00:07<00:00, 7.59s/trial, best loss: 0.1785714285714286]

100%|██████████| 2/2 [00:07<00:00, 7.65s/trial, best loss: 0.1785714285714286]

67%|██████▋ | 2/3 [00:00<?, ?trial/s, best loss=?]

100%|██████████| 3/3 [00:07<00:00, 7.44s/trial, best loss: 0.1428571428571429]

100%|██████████| 3/3 [00:07<00:00, 7.47s/trial, best loss: 0.1428571428571429]

......................

.....................

98%|█████████▊| 49/50 [00:00<?, ?trial/s, best loss=?]

100%|██████████| 50/50 [00:07<00:00, 7.77s/trial, best loss: 0.0714285714285714]

100%|██████████| 50/50 [00:07>> acc=model.score(X_test, y_test)

>>> acc

0.8405797101449275

>>> print(model.best_model())

{'learner': ExtraTreesClassifier(bootstrap=False, ccp_alpha=0.0, class_weight=None,

criterion='gini', max_depth=None,

max_features=0.4474405338213263, max_leaf_nodes=None,

max_samples=None, min_impurity_decrease=0.0,

min_impurity_split=None, min_samples_leaf=2,

min_samples_split=2, min_weight_fraction_leaf=0.0,

n_estimators=388, n_jobs=1, oob_score=False,

random_state=4, verbose=False, warm_start=False), 'preprocs': (Normalizer(copy=True, norm='l1'),), 'ex_preprocs': ()}

奇怪它居然奏效了。
但是……
为什么你得到了准确率=0.855 的 GradientBoostingClassfier，而我得到的是准确率=0.84 的 ExtraTreessClassifier？

而且我仍然不知道为什么我第一次尝试时遇到了运行时错误，而这次尝试没有。

谢谢你，
悉尼的Anthony

Jason Brownlee 2020年9月18日上午6:48 #

干得好！

回复

Anthony The Koala 2020年9月18日下午12:14 #

尊敬的Jason博士，
我扩展了程序最后几行，从

# summarize performance
mae = model.score(X_test, y_test)
print("MAE: %.3f" % mae)
# summarize the best model
print(model.best_model())
#Now find the predicted values of the test_x
predict_test = model.predict(X_test)
#Now find the correlation between predict_test and y_test
from numpy.stats import pearsonr
corr, p_value = pearsonr(predict_test,y_test)
corr, p_value
(0.9334779887384739, 2.1861642011348944e-75)
#Notice that R^2 = corr^2 = mae
mae
0.8710212877137651
mae**0.5
0.9332852124156715
#Now plot the data
import matplotlib.pyplot as plt
plt.scatter(y_test,predict_test)
plt.show()
#Note the strong correlation between the predicted y and test y

# 总结性能

mae = model.score(X_test, y_test)

print("MAE: %.3f" % mae)

# 总结最佳模型

print(model.best_model())

#现在找到 test_x 的预测值

predict_test = model.predict(X_test)

#现在找到 predict_test 和 y_test 之间的相关性

from numpy.stats import pearsonr

corr, p_value = pearsonr(predict_test,y_test)

corr, p_value

(0.9334779887384739, 2.1861642011348944e-75)

#注意 R^2 = corr^2 = mae

mae

0.8710212877137651

mae**0.5

0.9332852124156715

#现在绘制数据

import matplotlib.pyplot as plt

plt.scatter(y_test,predict_test)

plt.show()

#注意预测的 y 和测试的 y 之间存在很强的相关性

结论
预测的 y 和测试的 y 之间的相关性为 0.933，p 值为 << 显著性值，其中显著性值 = 0.05。
预测的 y 和测试的 y 之间存在很强的线性关系，mae = R^2 = 0.871

谢谢你，
悉尼的Anthony

Jason Brownlee 2020年9月18日下午2:49 #

我希望如此。是个好测试！

回复

Cecile S 2020年10月21日晚上8:58 #

亲爱的 Jason,

感谢您这篇非常有用的博文！我正在为分类问题测试 hyperopt sklearn，我想优化平衡准确率。在这种情况下，我该如何定义 loss_fn？或者更普遍地说，给定特定的 sklearn 指标评分（平衡准确率、f1、召回率等），我该如何定义 loss_fn？

提前感谢您！

回复
- Jason Brownlee 2020年10月22日早上6:42 #
  
  您将要最小化的函数名称指定给 loss_fn 参数。
  
  您可以使用最大化分数的补数或倒数，例如 1 / score
  
  更多细节在此
  https://github.com/hyperopt/hyperopt-sklearn/blob/master/hpsklearn/estimator.py#L429
  
  回复
Michael 2020年12月4日上午10:38 #

很棒的博文。我使用基础的 hyperopt 库进行机器学习工作流的各个方面（数据处理、超参数选择）大约有一年了，但没有意识到通过 sk-learn 也可以用于模型选择。我将复活一些旧项目，看看它们在这方面的表现如何。谢谢！

回复
- Jason Brownlee 2020年12月4日下午1:21 #
  
  谢谢！
  
  不客气，祝你好运。
  
  回复
Shrey Jain 2021年7月9日下午3:34 #

Jason 你好，不错的博客！不过有个问题。您使用了 Hyperopt 来选择最佳模型。我们能否同时使用 Hyperopt 进行超参数调整和选择最佳模型？这样可以大大减少代码行数并获得更好的整体性能。

回复
- Jason Brownlee 2021年7月10日早上6:06 #
  
  是的，我相信是这样。
  
  回复
Marco Cerliani 2021年12月28日凌晨1:33 #

我建议使用 shap-hypetune 来工业化 xgboost 和 hyperopt 的参数调优（以及特征选择）（https://github.com/cerlymarco/shap-hypetune）

回复
- James Carmichael 2022年1月10日上午11:21 #
  
  谢谢你的反馈 Marco！
  
  回复
Sam 2022年4月20日晚上11:57 #

你好 Jason，

我正在为我的 SARIMA 统计模型寻找“最佳”超参数：p、d、q、P、D、Q 和 s。

我一开始使用的是“暴力”网格搜索，我使用嵌套的 for 循环来尝试一系列参数，并记录每个模型产生的平均绝对百分比误差，最后打印出产生最小误差的模型。然而，由于我为超参数设定的范围，这个过程耗时很长。

因此，我想使用优化算法，例如 Hyperopt。

我的问题是
– 这个库是否可用于 SARIMA 超参数优化？
– 您是否曾经这样做过，或者有关于此的博客？
– 如果不能将 Hyperopt 用于此目的，您会推荐其他库吗？

感谢您的努力！

回复
- James Carmichael 2022年4月21日早上8:56 #
  
  Sam 你好……Hyperopt 可以用于此目的
  
  https://medium.com/district-data-labs/parameter-tuning-with-hyperopt-faa86acdfdce
  
  回复
Ari 2023年1月6日早上5:50 #

Jason 尊敬的——在 Conda 环境中（基于 Python 3.9 和 TensorFlow 2.11），当我尝试上面的分类示例时，我遇到了一个“base_estimator”错误。我有点困惑，希望能得到您的看法。谢谢。

100%|██████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.48s/trial, best loss: 0.3928571428571429]
100%|██████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.50s/trial, best loss: 0.3214285714285714]
100%|██████████████████████████████████████████████████| 3/3 [00:01<00:00, 1.65s/trial, best loss: 0.2142857142857143]
100%|██████████████████████████████████████████████████| 4/4 [00:01<00:00, 1.41s/trial, best loss: 0.2142857142857143]
100%|██████████████████████████████████████████████████| 5/5 [00:01<00:00, 1.52s/trial, best loss: 0.2142857142857143]
83%|███████████████████████████████████████████████████████████████▎ | 5/6 [00:00<?, ?trial/s, best loss=?]
job exception: The 'base_estimator' parameter of AdaBoostClassifier must be an object implementing 'fit' and 'predict' or a str among {'deprecated'}. Got None instead.

83%|███████████████████████████████████████████████████████████████▎ | 5/6 [00:01 24 model.fit(X_train, y_train)
25 # summarize performance
26 acc = model.score(X_test, y_test)

File ~\hyperopt-sklearn\hpsklearn\estimator\estimator.py:464, in hyperopt_estimator.fit(self, X, y, EX_list, valid_size, n_folds, cv_shuffle, warm_start, random_state)
461 try
462 increment = min(self.fit_increment,
463 adjusted_max_evals – len(self.trials.trials))
–> 464 fit_iter.send(increment)
466 if self.fit_increment_dump_filename is not None
467 with open(self.fit_increment_dump_filename, “wb”) as dump_file

File ~\hyperopt-sklearn\hpsklearn\estimator\estimator.py:339, in hyperopt_estimator.fit_iter(self, X, y, EX_list, valid_size, n_folds, cv_shuffle, warm_start, random_state)
337 # Workaround for rstate issue #35
338 if “rstate” in inspect.getfullargspec(hyperopt.fmin).args
–> 339 hyperopt.fmin(_fn_with_timeout,
340 space=self.space,
341 algo=self.algo,
342 trials=self.trials,
343 max_evals=len(self.trials.trials) + increment,
344 # — let exceptions crash the program, so we notice them.
345 catch_eval_exceptions=False,
346 return_argmin=False) # — in case no success so far
347 else
348 if self.seed is None

File ~\.conda\envs\py39tf211\lib\site-packages\hyperopt\fmin.py:540, in fmin(fn, space, algo, max_evals, timeout, loss_threshold, trials, rstate, allow_trials_fmin, pass_expr_memo_ctrl, catch_eval_exceptions, verbose, return_argmin, points_to_evaluate, max_queue_len, show_progressbar, early_stop_fn, trials_save_file)
537 fn = __objective_fmin_wrapper(fn)
539 if allow_trials_fmin and hasattr(trials, “fmin”)
–> 540 return trials.fmin(
541 fn,
542 space,
543 algo=algo,
544 max_evals=max_evals,
545 timeout=timeout,
546 loss_threshold=loss_threshold,
547 max_queue_len=max_queue_len,
548 rstate=rstate,
549 pass_expr_memo_ctrl=pass_expr_memo_ctrl,
550 verbose=verbose,
551 catch_eval_exceptions=catch_eval_exceptions,
552 return_argmin=return_argmin,
553 show_progressbar=show_progressbar,
554 early_stop_fn=early_stop_fn,
555 trials_save_file=trials_save_file,
556 )
558 if trials is None
559 if os.path.exists(trials_save_file)

File ~\.conda\envs\py39tf211\lib\site-packages\hyperopt\base.py:671, in Trials.fmin(self, fn, space, algo, max_evals, timeout, loss_threshold, max_queue_len, rstate, verbose, pass_expr_memo_ctrl, catch_eval_exceptions, return_argmin, show_progressbar, early_stop_fn, trials_save_file)
666 # — Stop-gap implementation!
667 # fmin should have been a Trials method in the first place
668 # but for now it’s still sitting in another file.
669 from .fmin import fmin
–> 671 return fmin(
672 fn,
673 space,
674 algo=algo,
675 max_evals=max_evals,
676 timeout=timeout,
677 loss_threshold=loss_threshold,
678 trials=self,
679 rstate=rstate,
680 verbose=verbose,
681 max_queue_len=max_queue_len,
682 allow_trials_fmin=False, # — prevent recursion
683 pass_expr_memo_ctrl=pass_expr_memo_ctrl,
684 catch_eval_exceptions=catch_eval_exceptions,
685 return_argmin=return_argmin,
686 show_progressbar=show_progressbar,
687 early_stop_fn=early_stop_fn,
688 trials_save_file=trials_save_file,
689 )

File ~\.conda\envs\py39tf211\lib\site-packages\hyperopt\fmin.py:586, in fmin(fn, space, algo, max_evals, timeout, loss_threshold, trials, rstate, allow_trials_fmin, pass_expr_memo_ctrl, catch_eval_exceptions, verbose, return_argmin, points_to_evaluate, max_queue_len, show_progressbar, early_stop_fn, trials_save_file)
583 rval.catch_eval_exceptions = catch_eval_exceptions
585 # next line is where the fmin is actually executed
–> 586 rval.exhaust()
588 if return_argmin
589 if len(trials.trials) == 0

File ~\.conda\envs\py39tf211\lib\site-packages\hyperopt\fmin.py:364, in FMinIter.exhaust(self)
362 def exhaust(self)
363 n_done = len(self.trials)
–> 364 self.run(self.max_evals – n_done, block_until_done=self.asynchronous)
365 self.trials.refresh()
366 return self

文件 ~\.conda\envs\py39tf211\lib\site-packages\hyperopt\fmin.py:300, in FMinIter.run(self, N, block_until_done)
297 time.sleep(self.poll_interval_secs)
298 else
299 # — loop over trials and do the jobs directly
–> 300 self.serial_evaluate()
302 self.trials.refresh()
303 if self.trials_save_file != “”

文件 ~\.conda\envs\py39tf211\lib\site-packages\hyperopt\fmin.py:178, in FMinIter.serial_evaluate(self, N)
176 ctrl = base.Ctrl(self.trials, current_trial=trial)
177 try
–> 178 result = self.domain.evaluate(spec, ctrl)
179 except Exception as e
180 logger.error(“job exception: %s” % str(e))

文件 ~\.conda\envs\py39tf211\lib\site-packages\hyperopt\base.py:892, in Domain.evaluate(self, config, ctrl, attach_attachments)
883 else
884 # — the “work” of evaluating config can be written
885 # either into the pyll part (self.expr)
886 # or the normal Python part (self.fn)
887 pyll_rval = pyll.rec_eval(
888 self.expr,
889 memo=memo,
890 print_node_on_error=self.rec_eval_print_node_on_error,
891 )
–> 892 rval = self.fn(pyll_rval)
894 if isinstance(rval, (float, int, np.number))
895 dict_rval = {“loss”: float(rval), “status”: STATUS_OK}

文件 ~\hyperopt-sklearn\hpsklearn\estimator\estimator.py:311, in hyperopt_estimator.fit_iter.._fn_with_timeout(*args, **kwargs)
309 assert fn_rval[0] in (“raise”, “return”)
310 if fn_rval[0] == “raise”
–> 311 raise fn_rval[1]
313 # — remove potentially large objects from the rval
314 # so that the Trials() object below stays small
315 # We can recompute them if necessary, and it’s usually
316 # not necessary at all.
317 if fn_rval[1][“status”] == hyperopt.STATUS_OK

InvalidParameterError: The ‘base_estimator’ parameter of AdaBoostClassifier must be an object implementing ‘fit’ and ‘predict’ or a str among {‘deprecated’}. Got None instead.

回复
- James Carmichael 2023年1月6日上午8:10 #
  
  您好 Ari…请将您的查询缩小到一个问题，以便我们能更好地帮助您。
  
  回复
  - Ari 2023年1月6日下午3:01 #
    
    您好 James，很抱歉造成混淆。实际上只有一个问题。我遇到了我发布的“base_estimator”错误。谢谢。
    
    回复

导航

HyperOpt 用于带有 Scikit-Learn 的自动化机器学习

教程概述

HyperOpt 和 HyperOpt-Sklearn

如何安装和使用 HyperOpt-Sklearn

HyperOpt-Sklearn 用于分类

HyperOpt-Sklearn 用于回归

进一步阅读

总结

发现 Python 中的快速机器学习！

在几分钟内开发您自己的模型

最终将机器学习带入
您自己的项目

关于此主题的更多信息

23 条回复关于 HyperOpt 助力 Scikit-Learn 自动化机器学习

发表回复点击此处取消回复。

导航

教程概述

HyperOpt 和 HyperOpt-Sklearn

如何安装和使用 HyperOpt-Sklearn

HyperOpt-Sklearn 用于分类

HyperOpt-Sklearn 用于回归

进一步阅读

总结

发现 Python 中的快速机器学习！

在几分钟内开发您自己的模型

最终将机器学习带入您自己的项目

关于此主题的更多信息

23 条回复关于 *HyperOpt 助力 Scikit-Learn 自动化机器学习*

发表回复 点击此处取消回复。

最终将机器学习带入
您自己的项目

23 条回复关于 HyperOpt 助力 Scikit-Learn 自动化机器学习

发表回复点击此处取消回复。