调整分类机器学习算法的超参数

作者： Jason Brownlee 于 2020年8月28日发布在 Python 机器学习 46

机器学习算法具有超参数，可以让你根据特定数据集定制算法的行为。

超参数与参数不同，参数是模型通过学习算法找到的内部系数或权重。与参数不同，超参数由实践者在配置模型时指定。

通常，对于给定数据集上的给定算法的超参数，很难知道使用什么值，因此使用随机搜索或网格搜索策略来尝试不同的超参数值是很常见的。

算法的超参数越多，调优过程就越慢。因此，希望选择模型超参数的最小子集进行搜索或调优。

并非所有模型超参数都同等重要。一些超参数对机器学习算法的行为，进而对性能有不成比例的影响。

作为一名机器学习实践者，你必须知道要关注哪些超参数才能快速获得良好的结果。

在本教程中，你将发现对于一些顶级机器学习算法最重要的超参数。

开始你的项目，阅读我的新书《Python 机器学习精通》，其中包括分步教程和所有示例的Python 源代码文件。

让我们开始吧。

**2020年1月更新**：已针对 scikit-learn v0.22 API 的变更进行更新。

Hyperparameters for Classification Machine Learning Algorithms

分类机器学习算法的超参数
照片作者：shuttermonkey，保留部分权利。

分类算法概述

我们将仔细研究您可能用于分类的顶级机器学习算法的重要超参数。

我们将研究您需要关注的超参数以及在您的数据集上调优模型时建议尝试的值。

这些建议基于算法教科书的建议、从业者的实际建议，以及我自己的经验。

我们将研究的七种分类算法如下：

逻辑回归
岭分类器
K-最近邻 (KNN)
支持向量机 (SVM)
装袋决策树 (Bagging)
随机森林
随机梯度提升

我们将结合它们在 scikit-learn 实现 (Python) 中的应用来考虑这些算法；尽管如此，您也可以将相同的超参数建议用于其他平台，例如 Weka 和 R。

此外，对于每种算法都提供了一个简单的网格搜索示例，您可以将其作为分类预测建模项目的起点。

注意：如果您在使用不同的超参数值，甚至与本教程中建议的超参数不同的超参数方面取得了成功，请在下面的评论中告诉我。我很乐意听到您的反馈。

让我们开始吧。

逻辑回归

逻辑回归实际上没有需要调优的关键超参数。

有时，您可以通过不同的求解器（solver）看到性能或收敛性的有用差异。

solver 在 ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'] 中

正则化（penalty）有时会有帮助。

penalty 在 ['none', 'l1', 'l2', 'elasticnet'] 中

注意：并非所有求解器都支持所有正则化项。

C 参数控制正则化强度，这也会很有效。

C 在 [100, 10, 1.0, 0.1, 0.01] 中

有关超参数的完整列表，请参阅

sklearn.linear_model.LogisticRegression API.

下面的示例演示了在合成二分类数据集上对逻辑回归的关键超参数进行网格搜索。

为减少警告/错误，省略了一些组合。

# example of grid searching key hyperparametres for logistic regression
from sklearn.datasets import make_blobs
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression
# define dataset
X, y = make_blobs(n_samples=1000, centers=2, n_features=100, cluster_std=20)
# define models and parameters
model = LogisticRegression()
solvers = ['newton-cg', 'lbfgs', 'liblinear']
penalty = ['l2']
c_values = [100, 10, 1.0, 0.1, 0.01]
# define grid search
grid = dict(solver=solvers,penalty=penalty,C=c_values)
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
grid_search = GridSearchCV(estimator=model, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy',error_score=0)
grid_result = grid_search.fit(X, y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

# 逻辑回归关键超参数的网格搜索示例

from sklearn.datasets import make_blobs

from sklearn.model_selection import RepeatedStratifiedKFold

from sklearn.model_selection import GridSearchCV

从 sklearn.线性模型导入 LogisticRegression

# 定义数据集

X, y = make_blobs(n_samples=1000, centers=2, n_features=100, cluster_std=20)

# 定义模型和参数

model = LogisticRegression()

solvers = ['newton-cg', 'lbfgs', 'liblinear']

penalty = ['l2']

c_values = [100, 10, 1.0, 0.1, 0.01]

# 定义网格搜索

grid = dict(solver=solvers,penalty=penalty,C=c_values)

cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)

grid_search = GridSearchCV(estimator=model, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy',error_score=0)

grid_result = grid_search.fit(X, y)

# 总结结果

print("最佳：%f 使用 %s" % (grid_result.best_score_, grid_result.best_params_))

means = grid_result.cv_results_['mean_test_score']

stds = grid_result.cv_results_['std_test_score']

params = grid_result.cv_results_['params']

for mean, stdev, param in zip(means, stds, params):

print("%f (%f) 配合: %r" % (mean, stdev, param))

注意：由于算法或评估程序的随机性，或数值精度的差异，您的结果可能有所不同。请考虑多次运行示例并比较平均结果。

运行示例会打印最佳结果以及所有评估过的组合的结果。

Best: 0.945333 using {'C': 0.01, 'penalty': 'l2', 'solver': 'liblinear'}
0.936333 (0.016829) with: {'C': 100, 'penalty': 'l2', 'solver': 'newton-cg'}
0.937667 (0.017259) with: {'C': 100, 'penalty': 'l2', 'solver': 'lbfgs'}
0.938667 (0.015861) with: {'C': 100, 'penalty': 'l2', 'solver': 'liblinear'}
0.936333 (0.017413) with: {'C': 10, 'penalty': 'l2', 'solver': 'newton-cg'}
0.938333 (0.017904) with: {'C': 10, 'penalty': 'l2', 'solver': 'lbfgs'}
0.939000 (0.016401) with: {'C': 10, 'penalty': 'l2', 'solver': 'liblinear'}
0.937333 (0.017114) with: {'C': 1.0, 'penalty': 'l2', 'solver': 'newton-cg'}
0.939000 (0.017195) with: {'C': 1.0, 'penalty': 'l2', 'solver': 'lbfgs'}
0.939000 (0.015780) with: {'C': 1.0, 'penalty': 'l2', 'solver': 'liblinear'}
0.940000 (0.015706) with: {'C': 0.1, 'penalty': 'l2', 'solver': 'newton-cg'}
0.940333 (0.014941) with: {'C': 0.1, 'penalty': 'l2', 'solver': 'lbfgs'}
0.941000 (0.017000) with: {'C': 0.1, 'penalty': 'l2', 'solver': 'liblinear'}
0.943000 (0.016763) with: {'C': 0.01, 'penalty': 'l2', 'solver': 'newton-cg'}
0.943000 (0.016763) with: {'C': 0.01, 'penalty': 'l2', 'solver': 'lbfgs'}
0.945333 (0.017651) with: {'C': 0.01, 'penalty': 'l2', 'solver': 'liblinear'}

最佳：0.945333 使用 {'C': 0.01, 'penalty': 'l2', 'solver': 'liblinear'}

0.936333 (0.016829) 使用: {'C': 100, 'penalty': 'l2', 'solver': 'newton-cg'}

0.937667 (0.017259) 使用: {'C': 100, 'penalty': 'l2', 'solver': 'lbfgs'}

0.938667 (0.015861) 使用: {'C': 100, 'penalty': 'l2', 'solver': 'liblinear'}

0.936333 (0.017413) 使用: {'C': 10, 'penalty': 'l2', 'solver': 'newton-cg'}

0.938333 (0.017904) 使用: {'C': 10, 'penalty': 'l2', 'solver': 'lbfgs'}

0.939000 (0.016401) 使用: {'C': 10, 'penalty': 'l2', 'solver': 'liblinear'}

0.937333 (0.017114) 使用: {'C': 1.0, 'penalty': 'l2', 'solver': 'newton-cg'}

0.939000 (0.017195) 使用: {'C': 1.0, 'penalty': 'l2', 'solver': 'lbfgs'}

0.939000 (0.015780) 使用: {'C': 1.0, 'penalty': 'l2', 'solver': 'liblinear'}

0.940000 (0.015706) 使用: {'C': 0.1, 'penalty': 'l2', 'solver': 'newton-cg'}

0.940333 (0.014941) 使用: {'C': 0.1, 'penalty': 'l2', 'solver': 'lbfgs'}

0.941000 (0.017000) 使用: {'C': 0.1, 'penalty': 'l2', 'solver': 'liblinear'}

0.943000 (0.016763) 使用: {'C': 0.01, 'penalty': 'l2', 'solver': 'newton-cg'}

0.943000 (0.016763) 使用: {'C': 0.01, 'penalty': 'l2', 'solver': 'lbfgs'}

0.945333 (0.017651) 使用: {'C': 0.01, 'penalty': 'l2', 'solver': 'liblinear'}

岭分类器

岭回归是一种用于预测数值的正则化线性回归模型。

然而，当应用于分类时，它可能非常有效。

也许最重要的调优参数是正则化强度（alpha）。一个好的起点可能是 [0.1 到 1.0] 范围内的值。

alpha 在 [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] 中

有关超参数的完整列表，请参阅

sklearn.linear_model.RidgeClassifier API.

下面的示例演示了在合成二分类数据集上对 RidgeClassifier 的关键超参数进行网格搜索。

# example of grid searching key hyperparametres for ridge classifier
from sklearn.datasets import make_blobs
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import RidgeClassifier
# define dataset
X, y = make_blobs(n_samples=1000, centers=2, n_features=100, cluster_std=20)
# define models and parameters
model = RidgeClassifier()
alpha = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
# define grid search
grid = dict(alpha=alpha)
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
grid_search = GridSearchCV(estimator=model, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy',error_score=0)
grid_result = grid_search.fit(X, y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

# 岭分类器关键超参数的网格搜索示例

from sklearn.datasets import make_blobs

from sklearn.model_selection import RepeatedStratifiedKFold

from sklearn.model_selection import GridSearchCV

from sklearn.linear_model import RidgeClassifier

# 定义数据集

X, y = make_blobs(n_samples=1000, centers=2, n_features=100, cluster_std=20)

# 定义模型和参数

model = RidgeClassifier()

alpha = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]

# 定义网格搜索

grid = dict(alpha=alpha)

cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)

grid_search = GridSearchCV(estimator=model, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy',error_score=0)

grid_result = grid_search.fit(X, y)

# 总结结果

print("最佳：%f 使用 %s" % (grid_result.best_score_, grid_result.best_params_))

means = grid_result.cv_results_['mean_test_score']

stds = grid_result.cv_results_['std_test_score']

params = grid_result.cv_results_['params']

for mean, stdev, param in zip(means, stds, params):

print("%f (%f) 配合: %r" % (mean, stdev, param))

注意：由于算法或评估程序的随机性，或数值精度的差异，您的结果可能有所不同。请考虑多次运行示例并比较平均结果。

运行示例会打印最佳结果以及所有评估过的组合的结果。

Best: 0.974667 using {'alpha': 0.1}
0.974667 (0.014545) with: {'alpha': 0.1}
0.974667 (0.014545) with: {'alpha': 0.2}
0.974667 (0.014545) with: {'alpha': 0.3}
0.974667 (0.014545) with: {'alpha': 0.4}
0.974667 (0.014545) with: {'alpha': 0.5}
0.974667 (0.014545) with: {'alpha': 0.6}
0.974667 (0.014545) with: {'alpha': 0.7}
0.974667 (0.014545) with: {'alpha': 0.8}
0.974667 (0.014545) with: {'alpha': 0.9}
0.974667 (0.014545) with: {'alpha': 1.0}

最佳：0.974667 使用 {'alpha': 0.1}

0.974667 (0.014545) 使用: {'alpha': 0.1}

0.974667 (0.014545) 使用: {'alpha': 0.2}

0.974667 (0.014545) 使用: {'alpha': 0.3}

0.974667 (0.014545) 使用: {'alpha': 0.4}

0.974667 (0.014545) 使用: {'alpha': 0.5}

0.974667 (0.014545) 使用: {'alpha': 0.6}

0.974667 (0.014545) 使用: {'alpha': 0.7}

0.974667 (0.014545) 使用: {'alpha': 0.8}

0.974667 (0.014545) 使用: {'alpha': 0.9}

0.974667 (0.014545) 使用: {'alpha': 1.0}

K-最近邻 (KNN)

KNN 最重要的超参数是邻居的数量（n_neighbors）。

测试值在 1 到 21 之间，可能只测试奇数。

n_neighbors 在 [1 到 21] 中

测试选择邻居组成的距离度量（metric）也可能很有趣。

metric 在 ['euclidean', 'manhattan', 'minkowski'] 中

有关更全面的列表，请参阅

sklearn.neighbors.DistanceMetric API

通过不同的权重（weights）测试邻居的贡献也可能很有趣。

weights 在 ['uniform', 'distance'] 中

有关超参数的完整列表，请参阅

sklearn.neighbors.KNeighborsClassifier API.

下面的示例演示了在合成二分类数据集上对 KNeighborsClassifier 的关键超参数进行网格搜索。

# example of grid searching key hyperparametres for KNeighborsClassifier
from sklearn.datasets import make_blobs
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
# define dataset
X, y = make_blobs(n_samples=1000, centers=2, n_features=100, cluster_std=20)
# define models and parameters
model = KNeighborsClassifier()
n_neighbors = range(1, 21, 2)
weights = ['uniform', 'distance']
metric = ['euclidean', 'manhattan', 'minkowski']
# define grid search
grid = dict(n_neighbors=n_neighbors,weights=weights,metric=metric)
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
grid_search = GridSearchCV(estimator=model, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy',error_score=0)
grid_result = grid_search.fit(X, y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

# KNeighborsClassifier 关键超参数的网格搜索示例

from sklearn.datasets import make_blobs

from sklearn.model_selection import RepeatedStratifiedKFold

from sklearn.model_selection import GridSearchCV

from sklearn.neighbors import KNeighborsClassifier

# 定义数据集

X, y = make_blobs(n_samples=1000, centers=2, n_features=100, cluster_std=20)

# 定义模型和参数

model = KNeighborsClassifier()

n_neighbors = range(1, 21, 2)

weights = ['uniform', 'distance']

metric = ['euclidean', 'manhattan', 'minkowski']

# 定义网格搜索

grid = dict(n_neighbors=n_neighbors,weights=weights,metric=metric)

cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)

grid_search = GridSearchCV(estimator=model, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy',error_score=0)

grid_result = grid_search.fit(X, y)

# 总结结果

print("最佳：%f 使用 %s" % (grid_result.best_score_, grid_result.best_params_))

means = grid_result.cv_results_['mean_test_score']

stds = grid_result.cv_results_['std_test_score']

params = grid_result.cv_results_['params']

for mean, stdev, param in zip(means, stds, params):

print("%f (%f) 配合: %r" % (mean, stdev, param))

注意：由于算法或评估程序的随机性，或数值精度的差异，您的结果可能有所不同。请考虑多次运行示例并比较平均结果。

运行示例会打印最佳结果以及所有评估过的组合的结果。

Best: 0.937667 using {'metric': 'manhattan', 'n_neighbors': 13, 'weights': 'uniform'}
0.833667 (0.031674) with: {'metric': 'euclidean', 'n_neighbors': 1, 'weights': 'uniform'}
0.833667 (0.031674) with: {'metric': 'euclidean', 'n_neighbors': 1, 'weights': 'distance'}
0.895333 (0.030081) with: {'metric': 'euclidean', 'n_neighbors': 3, 'weights': 'uniform'}
0.895333 (0.030081) with: {'metric': 'euclidean', 'n_neighbors': 3, 'weights': 'distance'}
0.909000 (0.021810) with: {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'uniform'}
0.909000 (0.021810) with: {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'distance'}
0.925333 (0.020774) with: {'metric': 'euclidean', 'n_neighbors': 7, 'weights': 'uniform'}
0.925333 (0.020774) with: {'metric': 'euclidean', 'n_neighbors': 7, 'weights': 'distance'}
0.929000 (0.027368) with: {'metric': 'euclidean', 'n_neighbors': 9, 'weights': 'uniform'}
0.929000 (0.027368) with: {'metric': 'euclidean', 'n_neighbors': 9, 'weights': 'distance'}
...

最佳：0.937667 使用 {'metric': 'manhattan', 'n_neighbors': 13, 'weights': 'uniform'}

0.833667 (0.031674) 使用: {'metric': 'euclidean', 'n_neighbors': 1, 'weights': 'uniform'}

0.833667 (0.031674) 使用: {'metric': 'euclidean', 'n_neighbors': 1, 'weights': 'distance'}

0.895333 (0.030081) 使用: {'metric': 'euclidean', 'n_neighbors': 3, 'weights': 'uniform'}

0.895333 (0.030081) 使用: {'metric': 'euclidean', 'n_neighbors': 3, 'weights': 'distance'}

0.909000 (0.021810) 使用: {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'uniform'}

0.909000 (0.021810) 使用: {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'distance'}

0.925333 (0.020774) 使用: {'metric': 'euclidean', 'n_neighbors': 7, 'weights': 'uniform'}

0.925333 (0.020774) 使用: {'metric': 'euclidean', 'n_neighbors': 7, 'weights': 'distance'}

0.929000 (0.027368) 使用: {'metric': 'euclidean', 'n_neighbors': 9, 'weights': 'uniform'}

0.929000 (0.027368) 使用: {'metric': 'euclidean', 'n_neighbors': 9, 'weights': 'distance'}

...

支持向量机 (SVM)

SVM 算法，与梯度提升一样，非常流行，非常有效，并提供了大量的超参数来调优。

也许第一个重要的参数是核的选择，它将控制输入变量投影的方式。有许多可供选择，但线性、多项式和 RBF 是最常见的，实际上可能只剩线性和 RBF。

kernels 在 ['linear', 'poly', 'rbf', 'sigmoid'] 中

如果多项式核效果不错，那么深入研究度参数（degree）是个好主意。

另一个关键参数是正则化（C），它可以取一系列值，并对每个类的结果区域的形状产生显著影响。对数尺度可能是一个好的起点。

C 在 [100, 10, 1.0, 0.1, 0.001] 中

有关超参数的完整列表，请参阅

sklearn.svm.SVC API.

下面的示例演示了在合成二分类数据集上对 SVC 的关键超参数进行网格搜索。

# example of grid searching key hyperparametres for SVC
from sklearn.datasets import make_blobs
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
# define dataset
X, y = make_blobs(n_samples=1000, centers=2, n_features=100, cluster_std=20)
# define model and parameters
model = SVC()
kernel = ['poly', 'rbf', 'sigmoid']
C = [50, 10, 1.0, 0.1, 0.01]
gamma = ['scale']
# define grid search
grid = dict(kernel=kernel,C=C,gamma=gamma)
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
grid_search = GridSearchCV(estimator=model, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy',error_score=0)
grid_result = grid_search.fit(X, y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

# SVC 关键超参数的网格搜索示例

from sklearn.datasets import make_blobs

from sklearn.model_selection import RepeatedStratifiedKFold

from sklearn.model_selection import GridSearchCV

from sklearn.svm import SVC

# 定义数据集

X, y = make_blobs(n_samples=1000, centers=2, n_features=100, cluster_std=20)

# 定义模型和参数

model = SVC()

kernel = ['poly', 'rbf', 'sigmoid']

C = [50, 10, 1.0, 0.1, 0.01]

gamma = ['scale']

# 定义网格搜索

grid = dict(kernel=kernel,C=C,gamma=gamma)

cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)

grid_search = GridSearchCV(estimator=model, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy',error_score=0)

grid_result = grid_search.fit(X, y)

# 总结结果

print("最佳：%f 使用 %s" % (grid_result.best_score_, grid_result.best_params_))

means = grid_result.cv_results_['mean_test_score']

stds = grid_result.cv_results_['std_test_score']

params = grid_result.cv_results_['params']

for mean, stdev, param in zip(means, stds, params):

print("%f (%f) 配合: %r" % (mean, stdev, param))

注意：由于算法或评估程序的随机性，或数值精度的差异，您的结果可能有所不同。请考虑多次运行示例并比较平均结果。

运行示例会打印最佳结果以及所有评估过的组合的结果。

Best: 0.974333 using {'C': 1.0, 'gamma': 'scale', 'kernel': 'poly'}
0.973667 (0.012512) with: {'C': 50, 'gamma': 'scale', 'kernel': 'poly'}
0.970667 (0.018062) with: {'C': 50, 'gamma': 'scale', 'kernel': 'rbf'}
0.945333 (0.024594) with: {'C': 50, 'gamma': 'scale', 'kernel': 'sigmoid'}
0.973667 (0.012512) with: {'C': 10, 'gamma': 'scale', 'kernel': 'poly'}
0.970667 (0.018062) with: {'C': 10, 'gamma': 'scale', 'kernel': 'rbf'}
0.957000 (0.016763) with: {'C': 10, 'gamma': 'scale', 'kernel': 'sigmoid'}
0.974333 (0.012565) with: {'C': 1.0, 'gamma': 'scale', 'kernel': 'poly'}
0.971667 (0.016948) with: {'C': 1.0, 'gamma': 'scale', 'kernel': 'rbf'}
0.966333 (0.016224) with: {'C': 1.0, 'gamma': 'scale', 'kernel': 'sigmoid'}
0.972333 (0.013585) with: {'C': 0.1, 'gamma': 'scale', 'kernel': 'poly'}
0.974000 (0.013317) with: {'C': 0.1, 'gamma': 'scale', 'kernel': 'rbf'}
0.971667 (0.015934) with: {'C': 0.1, 'gamma': 'scale', 'kernel': 'sigmoid'}
0.972333 (0.013585) with: {'C': 0.01, 'gamma': 'scale', 'kernel': 'poly'}
0.973667 (0.014716) with: {'C': 0.01, 'gamma': 'scale', 'kernel': 'rbf'}
0.974333 (0.013828) with: {'C': 0.01, 'gamma': 'scale', 'kernel': 'sigmoid'}

最佳：0.974333 使用 {'C': 1.0, 'gamma': 'scale', 'kernel': 'poly'}

0.973667 (0.012512) 使用: {'C': 50, 'gamma': 'scale', 'kernel': 'poly'}

0.970667 (0.018062) 使用: {'C': 50, 'gamma': 'scale', 'kernel': 'rbf'}

0.945333 (0.024594) 使用: {'C': 50, 'gamma': 'scale', 'kernel': 'sigmoid'}

0.973667 (0.012512) 使用: {'C': 10, 'gamma': 'scale', 'kernel': 'poly'}

0.970667 (0.018062) 使用: {'C': 10, 'gamma': 'scale', 'kernel': 'rbf'}

0.957000 (0.016763) 使用: {'C': 10, 'gamma': 'scale', 'kernel': 'sigmoid'}

0.974333 (0.012565) 使用: {'C': 1.0, 'gamma': 'scale', 'kernel': 'poly'}

0.971667 (0.016948) 使用: {'C': 1.0, 'gamma': 'scale', 'kernel': 'rbf'}

0.966333 (0.016224) 使用: {'C': 1.0, 'gamma': 'scale', 'kernel': 'sigmoid'}

0.972333 (0.013585) 使用: {'C': 0.1, 'gamma': 'scale', 'kernel': 'poly'}

0.974000 (0.013317) 使用: {'C': 0.1, 'gamma': 'scale', 'kernel': 'rbf'}

0.971667 (0.015934) 使用: {'C': 0.1, 'gamma': 'scale', 'kernel': 'sigmoid'}

0.972333 (0.013585) 使用: {'C': 0.01, 'gamma': 'scale', 'kernel': 'poly'}

0.973667 (0.014716) 使用: {'C': 0.01, 'gamma': 'scale', 'kernel': 'rbf'}

0.974333 (0.013828) 使用: {'C': 0.01, 'gamma': 'scale', 'kernel': 'sigmoid'}

装袋决策树 (Bagging)

装袋决策树最重要的参数是树的数量（n_estimators）。

理想情况下，应增加此值，直到模型没有进一步改进。

好的值可能是 10 到 1,000 的对数尺度。

n_estimators 在 [10, 100, 1000] 中

有关超参数的完整列表，请参阅

sklearn.ensemble.BaggingClassifier API

下面的示例演示了在合成二分类数据集上为 BaggingClassifier 的关键超参数进行网格搜索。

# example of grid searching key hyperparameters for BaggingClassifier
from sklearn.datasets import make_blobs
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import BaggingClassifier
# define dataset
X, y = make_blobs(n_samples=1000, centers=2, n_features=100, cluster_std=20)
# define models and parameters
model = BaggingClassifier()
n_estimators = [10, 100, 1000]
# define grid search
grid = dict(n_estimators=n_estimators)
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
grid_search = GridSearchCV(estimator=model, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy',error_score=0)
grid_result = grid_search.fit(X, y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

# BaggingClassifier 关键超参数的网格搜索示例

from sklearn.datasets import make_blobs

from sklearn.model_selection import RepeatedStratifiedKFold

from sklearn.model_selection import GridSearchCV

from sklearn.ensemble import BaggingClassifier

# 定义数据集

X, y = make_blobs(n_samples=1000, centers=2, n_features=100, cluster_std=20)

# 定义模型和参数

model = BaggingClassifier()

n_estimators = [10, 100, 1000]

# 定义网格搜索

grid = dict(n_estimators=n_estimators)

cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)

grid_search = GridSearchCV(estimator=model, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy',error_score=0)

grid_result = grid_search.fit(X, y)

# 总结结果

print("最佳：%f 使用 %s" % (grid_result.best_score_, grid_result.best_params_))

means = grid_result.cv_results_['mean_test_score']

stds = grid_result.cv_results_['std_test_score']

params = grid_result.cv_results_['params']

for mean, stdev, param in zip(means, stds, params):

print("%f (%f) 配合: %r" % (mean, stdev, param))

注意：由于算法或评估程序的随机性，或数值精度的差异，您的结果可能有所不同。请考虑多次运行示例并比较平均结果。

运行示例会打印最佳结果以及所有评估过的组合的结果。

Best: 0.873667 using {'n_estimators': 1000}
0.839000 (0.038588) with: {'n_estimators': 10}
0.869333 (0.030434) with: {'n_estimators': 100}
0.873667 (0.035070) with: {'n_estimators': 1000}

最佳：0.873667 使用 {'n_estimators': 1000}

0.839000 (0.038588) 使用: {'n_estimators': 10}

0.869333 (0.030434) 使用: {'n_estimators': 100}

0.873667 (0.035070) 使用: {'n_estimators': 1000}

随机森林

最重要的参数是每次拆分时要采样的随机特征的数量（max_features）。

您可以尝试整数值范围，例如 1 到 20，或 1 到输入特征数量的一半。

max_features [1 到 20]

或者，您可以尝试一套不同的默认值计算器。

max_features 在 ['sqrt', 'log2'] 中

随机森林的另一个重要参数是树的数量（n_estimators）。

理想情况下，应增加此值，直到模型没有进一步改进。

好的值可能是 10 到 1,000 的对数尺度。

n_estimators 在 [10, 100, 1000] 中

有关超参数的完整列表，请参阅

sklearn.ensemble.RandomForestClassifier API.

下面的示例演示了在合成二分类数据集上为 BaggingClassifier 的关键超参数进行网格搜索。

# example of grid searching key hyperparameters for RandomForestClassifier
from sklearn.datasets import make_blobs
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
# define dataset
X, y = make_blobs(n_samples=1000, centers=2, n_features=100, cluster_std=20)
# define models and parameters
model = RandomForestClassifier()
n_estimators = [10, 100, 1000]
max_features = ['sqrt', 'log2']
# define grid search
grid = dict(n_estimators=n_estimators,max_features=max_features)
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
grid_search = GridSearchCV(estimator=model, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy',error_score=0)
grid_result = grid_search.fit(X, y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

# RandomForestClassifier 关键超参数的网格搜索示例

from sklearn.datasets import make_blobs

from sklearn.model_selection import RepeatedStratifiedKFold

from sklearn.model_selection import GridSearchCV

from sklearn.ensemble import RandomForestClassifier

# 定义数据集

X, y = make_blobs(n_samples=1000, centers=2, n_features=100, cluster_std=20)

# 定义模型和参数

model = RandomForestClassifier()

n_estimators = [10, 100, 1000]

max_features = ['sqrt', 'log2']

# 定义网格搜索

grid = dict(n_estimators=n_estimators,max_features=max_features)

cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)

grid_search = GridSearchCV(estimator=model, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy',error_score=0)

grid_result = grid_search.fit(X, y)

# 总结结果

print("最佳：%f 使用 %s" % (grid_result.best_score_, grid_result.best_params_))

means = grid_result.cv_results_['mean_test_score']

stds = grid_result.cv_results_['std_test_score']

params = grid_result.cv_results_['params']

for mean, stdev, param in zip(means, stds, params):

print("%f (%f) 配合: %r" % (mean, stdev, param))

注意：由于算法或评估程序的随机性，或数值精度的差异，您的结果可能有所不同。请考虑多次运行示例并比较平均结果。

运行示例会打印最佳结果以及所有评估过的组合的结果。

Best: 0.952000 using {'max_features': 'log2', 'n_estimators': 1000}
0.841000 (0.032078) with: {'max_features': 'sqrt', 'n_estimators': 10}
0.938333 (0.020830) with: {'max_features': 'sqrt', 'n_estimators': 100}
0.944667 (0.024998) with: {'max_features': 'sqrt', 'n_estimators': 1000}
0.817667 (0.033235) with: {'max_features': 'log2', 'n_estimators': 10}
0.940667 (0.021592) with: {'max_features': 'log2', 'n_estimators': 100}
0.952000 (0.019562) with: {'max_features': 'log2', 'n_estimators': 1000}

最佳：0.952000 使用 {'max_features': 'log2', 'n_estimators': 1000}

0.841000 (0.032078) 使用: {'max_features': 'sqrt', 'n_estimators': 10}

0.938333 (0.020830) 使用: {'max_features': 'sqrt', 'n_estimators': 100}

0.944667 (0.024998) 使用: {'max_features': 'sqrt', 'n_estimators': 1000}

0.817667 (0.033235) 使用: {'max_features': 'log2', 'n_estimators': 10}

0.940667 (0.021592) 使用: {'max_features': 'log2', 'n_estimators': 100}

0.952000 (0.019562) 使用: {'max_features': 'log2', 'n_estimators': 1000}

随机梯度提升

也称为梯度提升机 (GBM) 或特定实现（如 XGBoost）的名称。

梯度提升算法有许多参数需要调优。

有一些重要的参数组合需要考虑。第一个是学习率，也称为收缩率或 eta（learning_rate），以及模型中的树数量（n_estimators）。两者都可以按对数尺度考虑，尽管方向不同。

learning_rate 在 [0.001, 0.01, 0.1] 中
n_estimators [10, 100, 1000]

另一组组合是用于每棵树的行数或数据子集（subsample）和每棵树的深度（max_depth）。这些可以分别以 0.1 和 1 的间隔进行网格搜索，尽管可以直接测试常见值。

subsample 在 [0.5, 0.7, 1.0] 中
max_depth 在 [3, 7, 9] 中

有关调优 XGBoost 实现的更详细建议，请参阅

如何配置梯度提升算法

有关超参数的完整列表，请参阅

sklearn.ensemble.GradientBoostingClassifier API.

下面的示例演示了在合成二分类数据集上为 GradientBoostingClassifier 的关键超参数进行网格搜索。

# example of grid searching key hyperparameters for GradientBoostingClassifier
from sklearn.datasets import make_blobs
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import GradientBoostingClassifier
# define dataset
X, y = make_blobs(n_samples=1000, centers=2, n_features=100, cluster_std=20)
# define models and parameters
model = GradientBoostingClassifier()
n_estimators = [10, 100, 1000]
learning_rate = [0.001, 0.01, 0.1]
subsample = [0.5, 0.7, 1.0]
max_depth = [3, 7, 9]
# define grid search
grid = dict(learning_rate=learning_rate, n_estimators=n_estimators, subsample=subsample, max_depth=max_depth)
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
grid_search = GridSearchCV(estimator=model, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy',error_score=0)
grid_result = grid_search.fit(X, y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

# GradientBoostingClassifier 关键超参数的网格搜索示例

from sklearn.datasets import make_blobs

from sklearn.model_selection import RepeatedStratifiedKFold

from sklearn.model_selection import GridSearchCV

from sklearn.ensemble import GradientBoostingClassifier

# 定义数据集

X, y = make_blobs(n_samples=1000, centers=2, n_features=100, cluster_std=20)

# 定义模型和参数

model = GradientBoostingClassifier()

n_estimators = [10, 100, 1000]

learning_rate = [0.001, 0.01, 0.1]

subsample = [0.5, 0.7, 1.0]

max_depth = [3, 7, 9]

# 定义网格搜索

grid = dict(learning_rate=learning_rate, n_estimators=n_estimators, subsample=subsample, max_depth=max_depth)

cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)

grid_search = GridSearchCV(estimator=model, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy',error_score=0)

grid_result = grid_search.fit(X, y)

# 总结结果

print("最佳：%f 使用 %s" % (grid_result.best_score_, grid_result.best_params_))

means = grid_result.cv_results_['mean_test_score']

stds = grid_result.cv_results_['std_test_score']

params = grid_result.cv_results_['params']

for mean, stdev, param in zip(means, stds, params):

print("%f (%f) 配合: %r" % (mean, stdev, param))

注意：由于算法或评估程序的随机性，或数值精度的差异，您的结果可能有所不同。请考虑多次运行示例并比较平均结果。

运行示例会打印最佳结果以及所有评估过的组合的结果。

Best: 0.936667 using {'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 1000, 'subsample': 0.5}
0.803333 (0.042058) with: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 10, 'subsample': 0.5}
0.783667 (0.042386) with: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 10, 'subsample': 0.7}
0.711667 (0.041157) with: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 10, 'subsample': 1.0}
0.832667 (0.040244) with: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 100, 'subsample': 0.5}
0.809667 (0.040040) with: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 100, 'subsample': 0.7}
0.741333 (0.043261) with: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 100, 'subsample': 1.0}
0.881333 (0.034130) with: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 1000, 'subsample': 0.5}
0.866667 (0.035150) with: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 1000, 'subsample': 0.7}
0.838333 (0.037424) with: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 1000, 'subsample': 1.0}
0.838333 (0.036614) with: {'learning_rate': 0.001, 'max_depth': 7, 'n_estimators': 10, 'subsample': 0.5}
0.821667 (0.040586) with: {'learning_rate': 0.001, 'max_depth': 7, 'n_estimators': 10, 'subsample': 0.7}
0.729000 (0.035903) with: {'learning_rate': 0.001, 'max_depth': 7, 'n_estimators': 10, 'subsample': 1.0}
0.884667 (0.036854) with: {'learning_rate': 0.001, 'max_depth': 7, 'n_estimators': 100, 'subsample': 0.5}
0.871333 (0.035094) with: {'learning_rate': 0.001, 'max_depth': 7, 'n_estimators': 100, 'subsample': 0.7}
0.729000 (0.037625) with: {'learning_rate': 0.001, 'max_depth': 7, 'n_estimators': 100, 'subsample': 1.0}
0.905667 (0.033134) with: {'learning_rate': 0.001, 'max_depth': 7, 'n_estimators': 1000, 'subsample': 0.5}
...

最佳：0.936667 使用 {'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 1000, 'subsample': 0.5}

0.803333 (0.042058) 使用: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 10, 'subsample': 0.5}

0.783667 (0.042386) 使用: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 10, 'subsample': 0.7}

0.711667 (0.041157) 使用: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 10, 'subsample': 1.0}

0.832667 (0.040244) 使用: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 100, 'subsample': 0.5}

0.809667 (0.040040) 使用: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 100, 'subsample': 0.7}

0.741333 (0.043261) 使用: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 100, 'subsample': 1.0}

0.881333 (0.034130) 使用: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 1000, 'subsample': 0.5}

0.866667 (0.035150) 使用: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 1000, 'subsample': 0.7}

0.838333 (0.037424) 使用: {'learning_rate': 0.001, 'max_depth': 3, 'n_estimators': 1000, 'subsample': 1.0}

0.838333 (0.036614) 使用: {'learning_rate': 0.001, 'max_depth': 7, 'n_estimators': 10, 'subsample': 0.5}

0.821667 (0.040586) 使用: {'learning_rate': 0.001, 'max_depth': 7, 'n_estimators': 10, 'subsample': 0.7}

0.729000 (0.035903) 使用: {'learning_rate': 0.001, 'max_depth': 7, 'n_estimators': 10, 'subsample': 1.0}

0.884667 (0.036854) 使用: {'learning_rate': 0.001, 'max_depth': 7, 'n_estimators': 100, 'subsample': 0.5}

0.871333 (0.035094) 使用: {'learning_rate': 0.001, 'max_depth': 7, 'n_estimators': 100, 'subsample': 0.7}

0.729000 (0.037625) 使用: {'learning_rate': 0.001, 'max_depth': 7, 'n_estimators': 100, 'subsample': 1.0}

0.905667 (0.033134) 使用: {'learning_rate': 0.001, 'max_depth': 7, 'n_estimators': 1000, 'subsample': 0.5}

...

进一步阅读

如果您想深入了解，本节提供了更多关于该主题的资源。

总结

在本教程中，您了解了顶级机器学习算法的关键超参数以及如何配置它们。

您是否有其他超参数建议？请在下面的评论中告诉我。

你有什么问题吗？
在下面的评论中提出你的问题，我会尽力回答。

关于此主题的更多信息

如何在 Python 中开发 Super Learner 集成

如何在 Python 中转换回归目标变量