R 中的机器学习数据集（10 个您现在就可以使用的数据集）

作者： Jason Brownlee 于 2020 年 8 月 15 日发布于 R 机器学习 21

你需要标准数据集来练习机器学习。

在这篇短文中，你将了解如何在 R 中加载标准分类和回归数据集。

本文将向你介绍 3 个可用于加载标准数据集的 R 库，以及 10 个可用于 R 中机器学习的特定数据集。

在 R 中加载标准数据集非常有价值，因为你可以测试、练习和实验机器学习技术，并提高你对该平台的技能。

使用我的新书《R 语言机器学习精通》来开启你的项目，书中包含分步教程和所有示例的R 源代码文件。

让我们开始吧。

在小巧且易于理解的数据集上练习

有数百个标准测试数据集可供你练习和提高机器学习能力。

其中大部分都免费托管在 UCI 机器学习数据库中。这些数据集很有用，因为它们易于理解、行为良好且规模小。

最后这一点在练习机器学习时至关重要，因为

你可以快速下载它们。
你可以轻松地将它们加载到内存中。
你可以快速地在它们上面运行算法。

在以下文章中了解更多关于使用 UCI 机器学习数据库中的数据集练习机器学习的信息：

使用 UCI 机器学习数据库中的小型内存数据集练习机器学习

在 R 中访问标准数据集

你可以将标准数据集加载到 R 中作为 CSV 文件。

还有一种更便捷的加载标准数据集的方法。它们已被打包，并可在第三方 R 库中使用，你可以从综合 R 存档网络 (CRAN) 下载。

你应该使用哪些库，以及哪些数据集适合开始。

需要更多关于R机器学习的帮助吗？

参加我为期14天的免费电子邮件课程，了解如何在您的项目中使用R（附带示例代码）。

点击注册，同时获得该课程的免费PDF电子书版本。

如何在 R 中加载标准数据集

在本节中，你将了解可用于访问标准机器学习数据集的库。

你还将了解可以加载和使用的特定分类和回归数据集，以便在 R 中练习机器学习。

库：datasets

鸢尾花数据集
照片由 Rick Ligthelm 提供，部分权利保留。

datasets 库随附于基础 R，这意味着你无需显式加载该库。它包含大量可用的数据集。

你可以通过输入以下命令来加载此库中的数据集：

data(DataSetName)

1	data(DataSetName)

例如，要加载非常常用的 iris 数据集。

data(iris)

1	data(iris)

要查看此库中可用数据集的列表，可以输入：

# list all datasets in the package
library(help = "datasets")

1 2	# 列出包中的所有数据集 library(help = "datasets")

下面是一些你可以使用的该包中的数据集亮点。

鸢尾花数据集

描述：根据花测量值预测鸢尾花种类。
类型：多类分类
维度：150 个实例，5 个属性
输入：数值
输出：类别，3 个类别标签
UCI 机器学习数据库：描述
已发布的准确率结果：摘要

# iris flowers datasets
data(iris)
dim(iris)
levels(iris$Species)
head(iris)

# 鸢尾花数据集

data(iris)

dim(iris)

levels(iris$Species)

head(iris)

你会看到：

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

1 5.1 3.5 1.4 0.2 setosa

2 4.9 3.0 1.4 0.2 setosa

3 4.7 3.2 1.3 0.2 setosa

4 4.6 3.1 1.5 0.2 setosa

5 5.0 3.6 1.4 0.2 setosa

6 5.4 3.9 1.7 0.4 setosa

Longley 的经济回归数据

描述：根据经济变量预测就业人数
类型：回归
维度：16 个实例，7 个属性
输入：数值
输出：数值

# Longley's Economic Regression Data
data(longley)
dim(longley)
head(longley)

# Longley 的经济回归数据

data(longley)

dim(longley)

head(longley)

你会看到：

     GNP.deflator     GNP Unemployed Armed.Forces Population Year Employed
1947         83.0 234.289      235.6        159.0    107.608 1947   60.323
1948         88.5 259.426      232.5        145.6    108.632 1948   61.122
1949         88.2 258.054      368.2        161.6    109.773 1949   60.171
1950         89.5 284.599      335.1        165.0    110.929 1950   61.187
1951         96.2 328.975      209.9        309.9    112.075 1951   63.221
1952         98.1 346.999      193.2        359.4    113.270 1952   63.639

GNP.deflator GNP Unemployed Armed.Forces Population Year Employed

1947 83.0 234.289 235.6 159.0 107.608 1947 60.323

1948 88.5 259.426 232.5 145.6 108.632 1948 61.122

1949 88.2 258.054 368.2 161.6 109.773 1949 60.171

1950 89.5 284.599 335.1 165.0 110.929 1950 61.187

1951 96.2 328.975 209.9 309.9 112.075 1951 63.221

1952 98.1 346.999 193.2 359.4 113.270 1952 63.639

库：mlbench

大豆数据集
照片由 United Soybean Board 提供，部分权利保留。

直接来自库的手册

一系列人工和真实世界的机器学习基准问题，包括 UCI 数据库的多个数据集。

你可以在 mlbench CRAN 页面上了解更多关于 mlbench 库的信息。

如果未安装，可以按以下方式安装此库：

install.packages("mlbench")

1	install.packages("mlbench")

你可以按以下方式加载库：

# load the library
library(mlbench)

1 2	# 加载库 library(mlbench)

要查看此库中可用数据集的列表，可以输入：

# list the contents of the library
library(help = "mlbench")

1 2	# 列出库的内容 library(help = "mlbench")

下面是你可以使用的该库的一些数据集亮点：

波士顿住房数据

描述：根据房屋细节预测波士顿的房价
类型：回归
维度：506 个实例，14 个属性
输入：数值
输出：数值
UCI 机器学习数据库：描述

# Boston Housing Data
data(BostonHousing)
dim(BostonHousing)
head(BostonHousing)

# 波士顿住房数据

data(BostonHousing)

dim(BostonHousing)

head(BostonHousing)

你会看到：

     crim zn indus chas   nox    rm  age    dis rad tax ptratio      b lstat medv
1 0.00632 18  2.31    0 0.538 6.575 65.2 4.0900   1 296    15.3 396.90  4.98 24.0
2 0.02731  0  7.07    0 0.469 6.421 78.9 4.9671   2 242    17.8 396.90  9.14 21.6
3 0.02729  0  7.07    0 0.469 7.185 61.1 4.9671   2 242    17.8 392.83  4.03 34.7
4 0.03237  0  2.18    0 0.458 6.998 45.8 6.0622   3 222    18.7 394.63  2.94 33.4
5 0.06905  0  2.18    0 0.458 7.147 54.2 6.0622   3 222    18.7 396.90  5.33 36.2
6 0.02985  0  2.18    0 0.458 6.430 58.7 6.0622   3 222    18.7 394.12  5.21 28.7

crim zn indus chas nox rm age dis rad tax ptratio b lstat medv

1 0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98 24.0

2 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14 21.6

3 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03 34.7

4 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94 33.4

5 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33 36.2

6 0.02985 0 2.18 0 0.458 6.430 58.7 6.0622 3 222 18.7 394.12 5.21 28.7

威斯康星乳腺癌数据库

描述：根据活检细节预测癌症是恶性还是良性。
类型：二元分类
维度：699 个实例，11 个属性
输入：整数（名义）
输出：类别，2 个类别标签
UCI 机器学习数据库：描述
已发布的准确率结果：摘要

# Wisconsin Breast Cancer Database
data(BreastCancer)
dim(BreastCancer)
levels(BreastCancer$Class)
head(BreastCancer)

# 威斯康星乳腺癌数据库

data(BreastCancer)

dim(BreastCancer)

levels(BreastCancer$Class)

head(BreastCancer)

你会看到：

       Id Cl.thickness Cell.size Cell.shape Marg.adhesion Epith.c.size Bare.nuclei Bl.cromatin Normal.nucleoli Mitoses     Class
1 1000025            5         1          1             1            2           1           3               1       1    benign
2 1002945            5         4          4             5            7          10           3               2       1    benign
3 1015425            3         1          1             1            2           2           3               1       1    benign
4 1016277            6         8          8             1            3           4           3               7       1    benign
5 1017023            4         1          1             3            2           1           3               1       1    benign
6 1017122            8        10         10             8            7          10           9               7       1 malignant

Id Cl.thickness Cell.size Cell.shape Marg.adhesion Epith.c.size Bare.nuclei Bl.cromatin Normal.nucleoli Mitoses Class

1 1000025 5 1 1 1 2 1 3 1 1 benign

2 1002945 5 4 4 5 7 10 3 2 1 benign

3 1015425 3 1 1 1 2 2 3 1 1 benign

4 1016277 6 8 8 1 3 4 3 7 1 benign

5 1017023 4 1 1 3 2 1 3 1 1 benign

6 1017122 8 10 10 8 7 10 9 7 1 malignant

玻璃识别数据库

描述：根据化学性质预测玻璃类型。
类型：分类
维度：214 个实例，10 个属性
输入：数值
输出：类别，7 个类别标签
UCI 机器学习数据库：描述
已发布的准确率结果：摘要

# Glass Identification Database
data(Glass)
dim(Glass)
levels(Glass$Type)
head(Glass)

# 玻璃识别数据库

data(Glass)

dim(Glass)

levels(Glass$Type)

head(Glass)

你会看到：

       RI    Na   Mg   Al    Si    K   Ca Ba   Fe Type
1 1.52101 13.64 4.49 1.10 71.78 0.06 8.75  0 0.00    1
2 1.51761 13.89 3.60 1.36 72.73 0.48 7.83  0 0.00    1
3 1.51618 13.53 3.55 1.54 72.99 0.39 7.78  0 0.00    1
4 1.51766 13.21 3.69 1.29 72.61 0.57 8.22  0 0.00    1
5 1.51742 13.27 3.62 1.24 73.08 0.55 8.07  0 0.00    1
6 1.51596 12.79 3.61 1.62 72.97 0.64 8.07  0 0.26    1

RI Na Mg Al Si K Ca Ba Fe Type

1 1.52101 13.64 4.49 1.10 71.78 0.06 8.75 0 0.00 1

2 1.51761 13.89 3.60 1.36 72.73 0.48 7.83 0 0.00 1

3 1.51618 13.53 3.55 1.54 72.99 0.39 7.78 0 0.00 1

4 1.51766 13.21 3.69 1.29 72.61 0.57 8.22 0 0.00 1

5 1.51742 13.27 3.62 1.24 73.08 0.55 8.07 0 0.00 1

6 1.51596 12.79 3.61 1.62 72.97 0.64 8.07 0 0.26 1

约翰霍普金斯大学电离层数据库

描述：根据天线数据预测大气中的高能结构。
类型：分类
维度：351 个实例，35 个属性
输入：数值
输出：类别，2 个类别标签
UCI 机器学习数据库：描述
已发布的准确率结果：摘要

# Johns Hopkins University Ionosphere database
data(Ionosphere)
dim(Ionosphere)
levels(Ionosphere$Class)
head(Ionosphere)

# 约翰霍普金斯大学电离层数据库

data(Ionosphere)

dim(Ionosphere)

levels(Ionosphere$Class)

head(Ionosphere)

你会看到：

  V1 V2      V3       V4       V5       V6       V7       V8      V9      V10     V11      V12     V13      V14      V15      V16      V17      V18      V19
1  1  0 0.99539 -0.05889  0.85243  0.02306  0.83398 -0.37708 1.00000  0.03760 0.85243 -0.17755 0.59755 -0.44945  0.60536 -0.38223  0.84356 -0.38542  0.58212
2  1  0 1.00000 -0.18829  0.93035 -0.36156 -0.10868 -0.93597 1.00000 -0.04549 0.50874 -0.67743 0.34432 -0.69707 -0.51685 -0.97515  0.05499 -0.62237  0.33109
3  1  0 1.00000 -0.03365  1.00000  0.00485  1.00000 -0.12062 0.88965  0.01198 0.73082  0.05346 0.85443  0.00827  0.54591  0.00299  0.83775 -0.13644  0.75535
4  1  0 1.00000 -0.45161  1.00000  1.00000  0.71216 -1.00000 0.00000  0.00000 0.00000  0.00000 0.00000  0.00000 -1.00000  0.14516  0.54094 -0.39330 -1.00000
5  1  0 1.00000 -0.02401  0.94140  0.06531  0.92106 -0.23255 0.77152 -0.16399 0.52798 -0.20275 0.56409 -0.00712  0.34395 -0.27457  0.52940 -0.21780  0.45107
6  1  0 0.02337 -0.00592 -0.09924 -0.11949 -0.00763 -0.11824 0.14706  0.06637 0.03786 -0.06302 0.00000  0.00000 -0.04572 -0.15540 -0.00343 -0.10196 -0.11575
       V20      V21      V22      V23      V24      V25      V26      V27      V28      V29      V30      V31      V32      V33      V34 Class
1 -0.32192  0.56971 -0.29674  0.36946 -0.47357  0.56811 -0.51171  0.41078 -0.46168  0.21266 -0.34090  0.42267 -0.54487  0.18641 -0.45300  good
2 -1.00000 -0.13151 -0.45300 -0.18056 -0.35734 -0.20332 -0.26569 -0.20468 -0.18401 -0.19040 -0.11593 -0.16626 -0.06288 -0.13738 -0.02447   bad
3 -0.08540  0.70887 -0.27502  0.43385 -0.12062  0.57528 -0.40220  0.58984 -0.22145  0.43100 -0.17365  0.60436 -0.24180  0.56045 -0.38238  good
4 -0.54467 -0.69975  1.00000  0.00000  0.00000  1.00000  0.90695  0.51613  1.00000  1.00000 -0.20099  0.25682  1.00000 -0.32382  1.00000   bad
5 -0.17813  0.05982 -0.35575  0.02309 -0.52879  0.03286 -0.65158  0.13290 -0.53206  0.02431 -0.62197 -0.05707 -0.59573 -0.04608 -0.65697  good
6 -0.05414  0.01838  0.03669  0.01519  0.00888  0.03513 -0.01535 -0.03240  0.09223 -0.07859  0.00732  0.00000  0.00000 -0.00039  0.12011   bad

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19

1 1 0 0.99539 -0.05889 0.85243 0.02306 0.83398 -0.37708 1.00000 0.03760 0.85243 -0.17755 0.59755 -0.44945 0.60536 -0.38223 0.84356 -0.38542 0.58212

2 1 0 1.00000 -0.18829 0.93035 -0.36156 -0.10868 -0.93597 1.00000 -0.04549 0.50874 -0.67743 0.34432 -0.69707 -0.51685 -0.97515 0.05499 -0.62237 0.33109

3 1 0 1.00000 -0.03365 1.00000 0.00485 1.00000 -0.12062 0.88965 0.01198 0.73082 0.05346 0.85443 0.00827 0.54591 0.00299 0.83775 -0.13644 0.75535

4 1 0 1.00000 -0.45161 1.00000 1.00000 0.71216 -1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 -1.00000 0.14516 0.54094 -0.39330 -1.00000

5 1 0 1.00000 -0.02401 0.94140 0.06531 0.92106 -0.23255 0.77152 -0.16399 0.52798 -0.20275 0.56409 -0.00712 0.34395 -0.27457 0.52940 -0.21780 0.45107

6 1 0 0.02337 -0.00592 -0.09924 -0.11949 -0.00763 -0.11824 0.14706 0.06637 0.03786 -0.06302 0.00000 0.00000 -0.04572 -0.15540 -0.00343 -0.10196 -0.11575

V20 V21 V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32 V33 V34 Class

1 -0.32192 0.56971 -0.29674 0.36946 -0.47357 0.56811 -0.51171 0.41078 -0.46168 0.21266 -0.34090 0.42267 -0.54487 0.18641 -0.45300 good

2 -1.00000 -0.13151 -0.45300 -0.18056 -0.35734 -0.20332 -0.26569 -0.20468 -0.18401 -0.19040 -0.11593 -0.16626 -0.06288 -0.13738 -0.02447 bad

3 -0.08540 0.70887 -0.27502 0.43385 -0.12062 0.57528 -0.40220 0.58984 -0.22145 0.43100 -0.17365 0.60436 -0.24180 0.56045 -0.38238 good

4 -0.54467 -0.69975 1.00000 0.00000 0.00000 1.00000 0.90695 0.51613 1.00000 1.00000 -0.20099 0.25682 1.00000 -0.32382 1.00000 bad

5 -0.17813 0.05982 -0.35575 0.02309 -0.52879 0.03286 -0.65158 0.13290 -0.53206 0.02431 -0.62197 -0.05707 -0.59573 -0.04608 -0.65697 good

6 -0.05414 0.01838 0.03669 0.01519 0.00888 0.03513 -0.01535 -0.03240 0.09223 -0.07859 0.00732 0.00000 0.00000 -0.00039 0.12011 bad

皮马印第安人糖尿病数据库

描述：根据医疗记录数据预测女性皮马印第安人糖尿病的发病。
类型：二元分类
维度：768 个实例，9 个属性
输入：数值
输出：类别，2 个类别标签
数据集详情：描述
已发布的准确率结果：摘要

# Pima Indians Diabetes Database
data(PimaIndiansDiabetes)
dim(PimaIndiansDiabetes)
levels(PimaIndiansDiabetes$diabetes)
head(PimaIndiansDiabetes)

# 皮马印第安人糖尿病数据库

data(PimaIndiansDiabetes)

dim(PimaIndiansDiabetes)

levels(PimaIndiansDiabetes$diabetes)

head(PimaIndiansDiabetes)

你会看到：

  pregnant glucose pressure triceps insulin mass pedigree age diabetes
1        6     148       72      35       0 33.6    0.627  50      pos
2        1      85       66      29       0 26.6    0.351  31      neg
3        8     183       64       0       0 23.3    0.672  32      pos
4        1      89       66      23      94 28.1    0.167  21      neg
5        0     137       40      35     168 43.1    2.288  33      pos
6        5     116       74       0       0 25.6    0.201  30      neg

pregnant glucose pressure triceps insulin mass pedigree age diabetes

1 6 148 72 35 0 33.6 0.627 50 pos

2 1 85 66 29 0 26.6 0.351 31 neg

3 8 183 64 0 0 23.3 0.672 32 pos

4 1 89 66 23 94 28.1 0.167 21 neg

5 0 137 40 35 168 43.1 2.288 33 pos

6 5 116 74 0 0 25.6 0.201 30 neg

声纳，地雷与岩石

描述：根据声纳回波数据预测金属或岩石回波。
类型：二元分类
维度：208 个实例，61 个属性
输入：数值
输出：类别，2 个类别标签
UCI 机器学习数据库：描述
已发布的准确率结果：摘要

# Sonar, Mines vs. Rocks
data(Sonar)
dim(Sonar)
levels(Sonar$Class)
head(Sonar)

# 声纳，地雷与岩石

data(Sonar)

dim(Sonar)

levels(Sonar$Class)

head(Sonar)

你会看到：

      V1     V2     V3     V4     V5     V6     V7     V8     V9    V10    V11    V12    V13    V14    V15    V16    V17    V18    V19    V20    V21    V22
1 0.0200 0.0371 0.0428 0.0207 0.0954 0.0986 0.1539 0.1601 0.3109 0.2111 0.1609 0.1582 0.2238 0.0645 0.0660 0.2273 0.3100 0.2999 0.5078 0.4797 0.5783 0.5071
2 0.0453 0.0523 0.0843 0.0689 0.1183 0.2583 0.2156 0.3481 0.3337 0.2872 0.4918 0.6552 0.6919 0.7797 0.7464 0.9444 1.0000 0.8874 0.8024 0.7818 0.5212 0.4052
3 0.0262 0.0582 0.1099 0.1083 0.0974 0.2280 0.2431 0.3771 0.5598 0.6194 0.6333 0.7060 0.5544 0.5320 0.6479 0.6931 0.6759 0.7551 0.8929 0.8619 0.7974 0.6737
4 0.0100 0.0171 0.0623 0.0205 0.0205 0.0368 0.1098 0.1276 0.0598 0.1264 0.0881 0.1992 0.0184 0.2261 0.1729 0.2131 0.0693 0.2281 0.4060 0.3973 0.2741 0.3690
5 0.0762 0.0666 0.0481 0.0394 0.0590 0.0649 0.1209 0.2467 0.3564 0.4459 0.4152 0.3952 0.4256 0.4135 0.4528 0.5326 0.7306 0.6193 0.2032 0.4636 0.4148 0.4292
6 0.0286 0.0453 0.0277 0.0174 0.0384 0.0990 0.1201 0.1833 0.2105 0.3039 0.2988 0.4250 0.6343 0.8198 1.0000 0.9988 0.9508 0.9025 0.7234 0.5122 0.2074 0.3985
     V23    V24    V25    V26    V27    V28    V29    V30    V31    V32    V33    V34    V35    V36    V37    V38    V39    V40    V41    V42    V43    V44
1 0.4328 0.5550 0.6711 0.6415 0.7104 0.8080 0.6791 0.3857 0.1307 0.2604 0.5121 0.7547 0.8537 0.8507 0.6692 0.6097 0.4943 0.2744 0.0510 0.2834 0.2825 0.4256
2 0.3957 0.3914 0.3250 0.3200 0.3271 0.2767 0.4423 0.2028 0.3788 0.2947 0.1984 0.2341 0.1306 0.4182 0.3835 0.1057 0.1840 0.1970 0.1674 0.0583 0.1401 0.1628
3 0.4293 0.3648 0.5331 0.2413 0.5070 0.8533 0.6036 0.8514 0.8512 0.5045 0.1862 0.2709 0.4232 0.3043 0.6116 0.6756 0.5375 0.4719 0.4647 0.2587 0.2129 0.2222
4 0.5556 0.4846 0.3140 0.5334 0.5256 0.2520 0.2090 0.3559 0.6260 0.7340 0.6120 0.3497 0.3953 0.3012 0.5408 0.8814 0.9857 0.9167 0.6121 0.5006 0.3210 0.3202
5 0.5730 0.5399 0.3161 0.2285 0.6995 1.0000 0.7262 0.4724 0.5103 0.5459 0.2881 0.0981 0.1951 0.4181 0.4604 0.3217 0.2828 0.2430 0.1979 0.2444 0.1847 0.0841
6 0.5890 0.2872 0.2043 0.5782 0.5389 0.3750 0.3411 0.5067 0.5580 0.4778 0.3299 0.2198 0.1407 0.2856 0.3807 0.4158 0.4054 0.3296 0.2707 0.2650 0.0723 0.1238
     V45    V46    V47    V48    V49    V50    V51    V52    V53    V54    V55    V56    V57    V58    V59    V60 Class
1 0.2641 0.1386 0.1051 0.1343 0.0383 0.0324 0.0232 0.0027 0.0065 0.0159 0.0072 0.0167 0.0180 0.0084 0.0090 0.0032     R
2 0.0621 0.0203 0.0530 0.0742 0.0409 0.0061 0.0125 0.0084 0.0089 0.0048 0.0094 0.0191 0.0140 0.0049 0.0052 0.0044     R
3 0.2111 0.0176 0.1348 0.0744 0.0130 0.0106 0.0033 0.0232 0.0166 0.0095 0.0180 0.0244 0.0316 0.0164 0.0095 0.0078     R
4 0.4295 0.3654 0.2655 0.1576 0.0681 0.0294 0.0241 0.0121 0.0036 0.0150 0.0085 0.0073 0.0050 0.0044 0.0040 0.0117     R
5 0.0692 0.0528 0.0357 0.0085 0.0230 0.0046 0.0156 0.0031 0.0054 0.0105 0.0110 0.0015 0.0072 0.0048 0.0107 0.0094     R
6 0.1192 0.1089 0.0623 0.0494 0.0264 0.0081 0.0104 0.0045 0.0014 0.0038 0.0013 0.0089 0.0057 0.0027 0.0051 0.0062     R

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22

1 0.0200 0.0371 0.0428 0.0207 0.0954 0.0986 0.1539 0.1601 0.3109 0.2111 0.1609 0.1582 0.2238 0.0645 0.0660 0.2273 0.3100 0.2999 0.5078 0.4797 0.5783 0.5071

2 0.0453 0.0523 0.0843 0.0689 0.1183 0.2583 0.2156 0.3481 0.3337 0.2872 0.4918 0.6552 0.6919 0.7797 0.7464 0.9444 1.0000 0.8874 0.8024 0.7818 0.5212 0.4052

3 0.0262 0.0582 0.1099 0.1083 0.0974 0.2280 0.2431 0.3771 0.5598 0.6194 0.6333 0.7060 0.5544 0.5320 0.6479 0.6931 0.6759 0.7551 0.8929 0.8619 0.7974 0.6737

4 0.0100 0.0171 0.0623 0.0205 0.0205 0.0368 0.1098 0.1276 0.0598 0.1264 0.0881 0.1992 0.0184 0.2261 0.1729 0.2131 0.0693 0.2281 0.4060 0.3973 0.2741 0.3690

5 0.0762 0.0666 0.0481 0.0394 0.0590 0.0649 0.1209 0.2467 0.3564 0.4459 0.4152 0.3952 0.4256 0.4135 0.4528 0.5326 0.7306 0.6193 0.2032 0.4636 0.4148 0.4292

6 0.0286 0.0453 0.0277 0.0174 0.0384 0.0990 0.1201 0.1833 0.2105 0.3039 0.2988 0.4250 0.6343 0.8198 1.0000 0.9988 0.9508 0.9025 0.7234 0.5122 0.2074 0.3985

V23 V24 V25 V26 V27 V28 V29 V30 V31 V32 V33 V34 V35 V36 V37 V38 V39 V40 V41 V42 V43 V44

1 0.4328 0.5550 0.6711 0.6415 0.7104 0.8080 0.6791 0.3857 0.1307 0.2604 0.5121 0.7547 0.8537 0.8507 0.6692 0.6097 0.4943 0.2744 0.0510 0.2834 0.2825 0.4256

2 0.3957 0.3914 0.3250 0.3200 0.3271 0.2767 0.4423 0.2028 0.3788 0.2947 0.1984 0.2341 0.1306 0.4182 0.3835 0.1057 0.1840 0.1970 0.1674 0.0583 0.1401 0.1628

3 0.4293 0.3648 0.5331 0.2413 0.5070 0.8533 0.6036 0.8514 0.8512 0.5045 0.1862 0.2709 0.4232 0.3043 0.6116 0.6756 0.5375 0.4719 0.4647 0.2587 0.2129 0.2222

4 0.5556 0.4846 0.3140 0.5334 0.5256 0.2520 0.2090 0.3559 0.6260 0.7340 0.6120 0.3497 0.3953 0.3012 0.5408 0.8814 0.9857 0.9167 0.6121 0.5006 0.3210 0.3202

5 0.5730 0.5399 0.3161 0.2285 0.6995 1.0000 0.7262 0.4724 0.5103 0.5459 0.2881 0.0981 0.1951 0.4181 0.4604 0.3217 0.2828 0.2430 0.1979 0.2444 0.1847 0.0841

6 0.5890 0.2872 0.2043 0.5782 0.5389 0.3750 0.3411 0.5067 0.5580 0.4778 0.3299 0.2198 0.1407 0.2856 0.3807 0.4158 0.4054 0.3296 0.2707 0.2650 0.0723 0.1238

V45 V46 V47 V48 V49 V50 V51 V52 V53 V54 V55 V56 V57 V58 V59 V60 Class

1 0.2641 0.1386 0.1051 0.1343 0.0383 0.0324 0.0232 0.0027 0.0065 0.0159 0.0072 0.0167 0.0180 0.0084 0.0090 0.0032 R

2 0.0621 0.0203 0.0530 0.0742 0.0409 0.0061 0.0125 0.0084 0.0089 0.0048 0.0094 0.0191 0.0140 0.0049 0.0052 0.0044 R

3 0.2111 0.0176 0.1348 0.0744 0.0130 0.0106 0.0033 0.0232 0.0166 0.0095 0.0180 0.0244 0.0316 0.0164 0.0095 0.0078 R

4 0.4295 0.3654 0.2655 0.1576 0.0681 0.0294 0.0241 0.0121 0.0036 0.0150 0.0085 0.0073 0.0050 0.0044 0.0040 0.0117 R

5 0.0692 0.0528 0.0357 0.0085 0.0230 0.0046 0.0156 0.0031 0.0054 0.0105 0.0110 0.0015 0.0072 0.0048 0.0107 0.0094 R

6 0.1192 0.1089 0.0623 0.0494 0.0264 0.0081 0.0104 0.0045 0.0014 0.0038 0.0013 0.0089 0.0057 0.0027 0.0051 0.0062 R

大豆数据库

描述：根据作物数据预测大豆作物的病虫害。
类型：多类分类
维度：683 个实例，26 个属性
输入：整数（名义）
输出：类别，19 个类别标签
UCI 机器学习数据库：描述

# Soybean Database
data(Soybean)
dim(Soybean)
levels(Soybean$Class)
head(Soybean)

# 大豆数据库

data(Soybean)

dim(Soybean)

levels(Soybean$Class)

head(Soybean)

你会看到：

                  Class date plant.stand precip temp hail crop.hist area.dam sever seed.tmt germ plant.growth leaves leaf.halo leaf.marg leaf.size leaf.shread
1 diaporthe-stem-canker    6           0      2    1    0         1        1     1        0    0            1      1         0         2         2           0
2 diaporthe-stem-canker    4           0      2    1    0         2        0     2        1    1            1      1         0         2         2           0
3 diaporthe-stem-canker    3           0      2    1    0         1        0     2        1    2            1      1         0         2         2           0
4 diaporthe-stem-canker    3           0      2    1    0         1        0     2        0    1            1      1         0         2         2           0
5 diaporthe-stem-canker    6           0      2    1    0         2        0     1        0    2            1      1         0         2         2           0
6 diaporthe-stem-canker    5           0      2    1    0         3        0     1        0    1            1      1         0         2         2           0
  leaf.malf leaf.mild stem lodging stem.cankers canker.lesion fruiting.bodies ext.decay mycelium int.discolor sclerotia fruit.pods fruit.spots seed mold.growth
1         0         0    1       1            3             1               1         1        0            0         0          0           4    0           0
2         0         0    1       0            3             1               1         1        0            0         0          0           4    0           0
3         0         0    1       0            3             0               1         1        0            0         0          0           4    0           0
4         0         0    1       0            3             0               1         1        0            0         0          0           4    0           0
5         0         0    1       0            3             1               1         1        0            0         0          0           4    0           0
6         0         0    1       0            3             0               1         1        0            0         0          0           4    0           0
  seed.discolor seed.size shriveling roots
1             0         0          0     0
2             0         0          0     0
3             0         0          0     0
4             0         0          0     0
5             0         0          0     0
6             0         0          0     0

Class date plant.stand precip temp hail crop.hist area.dam sever seed.tmt germ plant.growth leaves leaf.halo leaf.marg leaf.size leaf.shread

1 diaporthe-stem-canker 6 0 2 1 0 1 1 1 0 0 1 1 0 2 2 0

2 diaporthe-stem-canker 4 0 2 1 0 2 0 2 1 1 1 1 0 2 2 0

3 diaporthe-stem-canker 3 0 2 1 0 1 0 2 1 2 1 1 0 2 2 0

4 diaporthe-stem-canker 3 0 2 1 0 1 0 2 0 1 1 1 0 2 2 0

5 diaporthe-stem-canker 6 0 2 1 0 2 0 1 0 2 1 1 0 2 2 0

6 diaporthe-stem-canker 5 0 2 1 0 3 0 1 0 1 1 1 0 2 2 0

leaf.malf leaf.mild stem lodging stem.cankers canker.lesion fruiting.bodies ext.decay mycelium int.discolor sclerotia fruit.pods fruit.spots seed mold.growth

1 0 0 1 1 3 1 1 1 0 0 0 0 4 0 0

2 0 0 1 0 3 1 1 1 0 0 0 0 4 0 0

3 0 0 1 0 3 0 1 1 0 0 0 0 4 0 0

4 0 0 1 0 3 0 1 1 0 0 0 0 4 0 0

5 0 0 1 0 3 1 1 1 0 0 0 0 4 0 0

6 0 0 1 0 3 0 1 1 0 0 0 0 4 0 0

seed.discolor seed.size shriveling roots

1 0 0 0 0

2 0 0 0 0

3 0 0 0 0

4 0 0 0 0

5 0 0 0 0

6 0 0 0 0

库：AppliedPredictiveModeling

鲍鱼数据集
照片由 MAURO CATEB 提供，部分权利保留。

许多使用 R 的书籍还包含它们自己的 R 库，该库提供了书中使用的所有代码和数据集。

优秀的图书《应用预测建模》有自己的库，名为 AppliedPredictiveModeling。

如果未安装，可以按以下方式安装此库：

install.packages("AppliedPredictiveModeling")

1	install.packages("AppliedPredictiveModeling")

你可以按以下方式加载库：

# load the library
library(AppliedPredictiveModeling)

1 2	# 加载库 library(AppliedPredictiveModeling)

要查看此库中可用数据集的列表，可以输入：

# list the contents of the library
library(help = "AppliedPredictiveModeling")

1 2	# 列出库的内容 library(help = "AppliedPredictiveModeling")

该库中的一个数据集亮点是：

鲍鱼数据

描述：根据鲍鱼测量数据预测鲍鱼年龄。
类型：回归或分类
维度：4177 个实例，9 个属性
输入：数值和类别
输出：整数
UCI 机器学习数据库：描述

# Abalone Data
data(abalone)
dim(abalone)
head(abalone)

# 鲍鱼数据

data(abalone)

dim(abalone)

head(abalone)

你会看到：

  Type LongestShell Diameter Height WholeWeight ShuckedWeight VisceraWeight ShellWeight Rings
1    M        0.455    0.365  0.095      0.5140        0.2245        0.1010       0.150    15
2    M        0.350    0.265  0.090      0.2255        0.0995        0.0485       0.070     7
3    F        0.530    0.420  0.135      0.6770        0.2565        0.1415       0.210     9
4    M        0.440    0.365  0.125      0.5160        0.2155        0.1140       0.155    10
5    I        0.330    0.255  0.080      0.2050        0.0895        0.0395       0.055     7
6    I        0.425    0.300  0.095      0.3515        0.1410        0.0775       0.120     8

Type LongestShell Diameter Height WholeWeight ShuckedWeight VisceraWeight ShellWeight Rings

1 M 0.455 0.365 0.095 0.5140 0.2245 0.1010 0.150 15

2 M 0.350 0.265 0.090 0.2255 0.0995 0.0485 0.070 7

3 F 0.530 0.420 0.135 0.6770 0.2565 0.1415 0.210 9

4 M 0.440 0.365 0.125 0.5160 0.2155 0.1140 0.155 10

5 I 0.330 0.255 0.080 0.2050 0.0895 0.0395 0.055 7

6 I 0.425 0.300 0.095 0.3515 0.1410 0.0775 0.120 8

总结

在这篇文章中，你了解到你不需要收集或加载自己的数据就可以在 R 中练习机器学习。

你了解了 3 个提供样本机器学习数据集的不同库，你可以使用这些库：

datasets 库
mlbench 库
AppliedPredictiveModeling 库

你还发现了 10 个可以用来练习分类和回归机器学习技术的标准机器学习数据集。

鸢尾花数据集（多类分类）
Longley 的经济回归数据（回归）
波士顿住房数据（回归）
威斯康星乳腺癌数据库（二元分类）
玻璃识别数据库（多类分类）
约翰霍普金斯大学电离层数据库（二元分类）
皮马印第安人糖尿病数据库（二元分类）
声纳，地雷与岩石（二元分类）
大豆数据库（多类分类）
鲍鱼数据（回归或分类）

下一步

您是否尝试过这些食谱？

启动您的 R 交互式环境。
输入或复制粘贴上面的代码示例并尝试一下。
使用 R 中的内置帮助来了解有关所用函数的更多信息。

您有问题吗？在评论中提问，我将尽力回答。

关于此主题的更多信息

如何将机器学习数据加载到 R 中

在 R 中抽查机器学习算法（可在下一个项目尝试的算法）

21 条对 R 中的机器学习数据集（10 个立即可用的数据集）的回复

Rotimi 2016 年 2 月 16 日上午 6:11 #

谢谢，一如既往的精彩帖子

回复
Rotimi 2016 年 2 月 16 日上午 6:11 #

精彩的帖子！非常感谢您，先生

回复
Asia 2016 年 12 月 2 日下午 12:35 #

谢谢！我正在寻找回归数据集，但我不知道其中一个您写过的。但在一个地方您写的是用于回归，而在另一个地方您写的是用于分类。这有点误导。

回复
- Jason Brownlee 2016 年 12 月 3 日上午 8:26 #
  
  谢谢 Asia，我已经修正了。
  
  回复
V Malsoru 2017 年 5 月 13 日上午 7:06 #

我安装了 R 并练习了一些算法，例如使用“arules”包的 Apriori，但如何安装“mlbench”包来运行以下数据集？

“波士顿住房数据（回归）
威斯康星乳腺癌数据库（二元分类）
玻璃识别数据库（多类分类）
约翰霍普金斯大学电离层数据库（二元分类）
皮马印第安人糖尿病数据库（二元分类）
声纳，地雷与岩石（二元分类）
大豆数据库（多类分类）
鲍鱼数据（回归或分类）”。请提建议。

回复
- Jason Brownlee 2017年5月14日上午7:21 #
  
  您可以按照以下方式安装mlbench包
  
  install.packages(mlbench)
  
  1
  
  install.packages(mlbench)
  
  回复
Malsoru 2017年5月13日上午7:49 #

要运行癌症数据集，需要哪些包，您能推荐一下吗？

回复
- Jason Brownlee 2017年5月14日上午7:21 #
  
  mlbench包。
  
  回复
Malsoru 2017年6月14日下午5:52 #

Pima Indians Diabetes Database（二元分类）。
您能否推荐另一个“糖尿病”数据集，其属性比Pima Indians Diabetes Database（二元分类）多一个或两个属性，或者少一个或两个属性？不是相同的Pima Indians Diabetes Database（二元分类）。但我只需要“糖尿病”的。

回复
- Jason Brownlee 2017年6月15日上午8:44 #
  
  也许您可以在Kaggle或UCI机器学习仓库中搜索？
  
  回复
Nate George 2018年1月27日下午2:31 #

这一定是您所有帖子中图片与主题最匹配的一篇了。其他的似乎都是随机的。

回复
- Jason Brownlee 2018年1月28日上午8:21 #
  
  谢谢Nate。
  
  回复
  - Nate George 2018年3月9日上午9:16 #
    
    哈哈！我只是觉得很有趣。
    
    回复
Nate George 2018年1月27日下午2:33 #

另外，是install.packages()，不是install.library()

回复
- Jason Brownlee 2018年1月28日上午8:21 #
  
  谢谢，已修正。
  
  回复
GEORGE MASON UNIVERSITY 2018年9月18日上午6:35 #

请问如何将中心极限定理应用于大型多变量数据集？

回复
- Jason Brownlee 2018年9月18日下午2:14 #
  
  你具体指的是什么？
  
  回复
Rakesh Patel 2018年10月8日上午4:42 #

先生，我想使用年龄数据集来查找年龄。

但我找不到年龄数据集的csv文件。

先生，您能给我一个链接，以便我可以在我的项目中使用吗？

回复
- Jason Brownlee 2018年10月8日上午9:28 #
  
  这是我在这里回答的一个常见问题
  https://machinelearning.org.cn/faq/single-faq/where-can-i-get-a-dataset-on-___
  
  回复
Machine learning 2021年7月10日下午8:23 #

机器学习是科技行业的革命。

回复
- Jason Brownlee 2021年7月11日上午5:38 #
  
  确实如此！
  
  回复

导航

R 中的机器学习数据集（10 个您现在就可以使用的数据集）

在小巧且易于理解的数据集上练习

在 R 中访问标准数据集

需要更多关于R机器学习的帮助吗？

如何在 R 中加载标准数据集

库：datasets

鸢尾花数据集

Longley 的经济回归数据

库：mlbench

波士顿住房数据

威斯康星乳腺癌数据库

玻璃识别数据库

约翰霍普金斯大学电离层数据库

皮马印第安人糖尿病数据库

声纳，地雷与岩石

大豆数据库

库：AppliedPredictiveModeling

鲍鱼数据

总结

下一步

在R中发现更快的机器学习！

在几分钟内开发您自己的模型

最终将机器学习应用到您自己的项目中

关于此主题的更多信息

21 条对 R 中的机器学习数据集（10 个立即可用的数据集）的回复

留下回复点击此处取消回复。

导航

在小巧且易于理解的数据集上练习

在 R 中访问标准数据集

需要更多关于R机器学习的帮助吗？

如何在 R 中加载标准数据集

库：datasets

鸢尾花数据集

Longley 的经济回归数据

库：mlbench

波士顿住房数据

威斯康星乳腺癌数据库

玻璃识别数据库

约翰霍普金斯大学电离层数据库

皮马印第安人糖尿病数据库

声纳，地雷与岩石

大豆数据库

库：AppliedPredictiveModeling

鲍鱼数据

总结

下一步

在R中发现更快的机器学习！

在几分钟内开发您自己的模型

最终将机器学习应用到您自己的项目中

关于此主题的更多信息

21 条对 *R 中的机器学习数据集（10 个立即可用的数据集）* 的回复

留下回复 点击此处取消回复。

21 条对 R 中的机器学习数据集（10 个立即可用的数据集）的回复

留下回复点击此处取消回复。