分析 Python 代码

作者： Adrian Tam 于 2022 年 6 月 21 日在机器学习 Python 3 条评论

性能分析是一种找出程序耗时情况的技术。有了这些统计数据，我们可以找到程序的“热点”并思考改进方法。有时，意外位置的热点也可能暗示程序中存在错误。

在本教程中，我们将了解如何在 Python 中使用性能分析工具。具体来说，您将看到：

如何使用 `timeit` 模块比较小段代码
如何使用 `cProfile` 模块对整个程序进行性能分析
如何在现有程序中调用性能分析器
性能分析器不能做什么

用我的新书《Python for Machine Learning》启动您的项目，包括分步教程和所有示例的Python 源代码文件。

让我们开始吧。

分析 Python 代码。图片来源：Prashant Saini。保留部分权利。

教程概述

本教程分为四个部分；它们是：

分析小片段
profile 模块
在代码中使用分析器
注意事项

分析小片段

当您被问到在 Python 中实现相同功能的多种方法时，一个角度是检查哪种方法更有效。在 Python 的标准库中，我们有 `timeit` 模块，它允许我们进行一些简单的性能分析。

例如，要连接许多短字符串，我们可以使用字符串的 `join()` 函数或 `+` 运算符。那么，我们如何知道哪个更快呢？考虑以下 Python 代码：

longstr = ""
for x in range(1000):
  longstr += str(x)

longstr = ""

for x in range(1000):

longstr += str(x)

这将在变量 `longstr` 中生成一个长字符串 `012345....`。另一种编写方式是：

longstr = "".join([str(x) for x in range(1000)])

1	longstr = "".join([str(x) for x in range(1000)])

为了比较这两种方法，我们可以在命令行中执行以下操作：

python -m timeit 'longstr=""' 'for x in range(1000): longstr += str(x)'
python -m timeit '"".join([str(x) for x in range(1000)])'

1 2	python -m timeit 'longstr=""' 'for x in range(1000): longstr += str(x)' python -m timeit '"".join([str(x) for x in range(1000)])'

这两个命令将产生以下输出：

1000 loops, best of 5: 265 usec per loop
2000 loops, best of 5: 160 usec per loop

1 2	1000 循环，最佳 5 次：每次循环 265 微秒 2000 循环，最佳 5 次：每次循环 160 微秒

上述命令用于加载 `timeit` 模块并传入一行代码进行测量。在第一种情况下，我们有两行语句，它们作为两个单独的参数传递给 `timeit` 模块。同理，第一个命令也可以表示为三行语句（通过将 for 循环分成两行），但每行的缩进需要正确引用：

python -m timeit 'longstr=""' 'for x in range(1000):' ' longstr += str(x)'

1	python -m timeit 'longstr=""' 'for x in range(1000):' ' longstr += str(x)'

`timeit` 的输出是在多次运行（默认为 5 次）中找到最佳性能。每次运行都会执行提供的语句几次（动态确定）。时间报告为在最佳运行中执行语句一次的平均时间。

虽然 `join` 函数在字符串连接方面确实比 `+` 运算符快，但上述计时并不是一个公平的比较。这是因为我们在循环中实时使用 `str(x)` 创建短字符串。更好的方法如下：

python -m timeit -s 'strings = [str(x) for x in range(1000)]' 'longstr=""' 'for x in strings:' ' longstr += str(x)'
python -m timeit -s 'strings = [str(x) for x in range(1000)]' '"".join(strings)'

1 2	python -m timeit -s 'strings = [str(x) for x in range(1000)]' 'longstr=""' 'for x in strings:' ' longstr += str(x)' python -m timeit -s 'strings = [str(x) for x in range(1000)]' '"".join(strings)'

产生以下结果：

2000 loops, best of 5: 173 usec per loop
50000 loops, best of 5: 6.91 usec per loop

1 2	2000 循环，最佳 5 次：每次循环 173 微秒 50000 循环，最佳 5 次：每次循环 6.91 微秒

`-s` 选项允许我们提供“设置”代码，该代码在性能分析之前执行，并且不计时。在上述示例中，我们在开始循环之前创建了短字符串列表。因此，创建这些字符串的时间不会在“每循环”计时中测量。上述结果表明 `join()` 函数比 `+` 运算符快两个数量级。`-s` 选项更常见的用法是导入库。例如，我们可以比较 Python 的 `math` 模块中的平方根函数与 NumPy 中的平方根函数，并使用指数运算符 `**` 如下：

python -m timeit '[x**0.5 for x in range(1000)]'
python -m timeit -s 'from math import sqrt' '[sqrt(x) for x in range(1000)]'
python -m timeit -s 'from numpy import sqrt' '[sqrt(x) for x in range(1000)]'

python -m timeit '[x**0.5 for x in range(1000)]'

python -m timeit -s 'from math import sqrt' '[sqrt(x) for x in range(1000)]'

python -m timeit -s 'from numpy import sqrt' '[sqrt(x) for x in range(1000)]'

上述命令产生以下测量结果，我们看到在这个特定示例中，`math.sqrt()` 最快，而 `numpy.sqrt()` 最慢：

5000 loops, best of 5: 93.2 usec per loop
5000 loops, best of 5: 72.3 usec per loop
200 loops, best of 5: 974 usec per loop

5000 循环，最佳 5 次：每次循环 93.2 微秒

5000 循环，最佳 5 次：每次循环 72.3 微秒

200 循环，最佳 5 次：每次循环 974 微秒

如果您想知道为什么 NumPy 最慢，那是因为 NumPy 是为数组优化的。在以下替代方案中，您将看到其出色的速度：

python -m timeit -s 'import numpy as np; x=np.array(range(1000))' 'np.sqrt(x)'

1	python -m timeit -s 'import numpy as np; x=np.array(range(1000))' 'np.sqrt(x)'

结果是：

100000 loops, best of 5: 2.08 usec per loop

1	100000 循环，最佳 5 次：每次循环 2.08 微秒

如果您愿意，也可以在 Python 代码中运行 `timeit`。例如，以下代码与上述类似，但会为您提供每次运行的总原始计时：

import timeit
measurements = timeit.repeat('[x**0.5 for x in range(1000)]', number=10000)
print(measurements)

import timeit

measurements = timeit.repeat('[x**0.5 for x in range(1000)]', number=10000)

print(measurements)

在上述代码中，每次运行执行语句 10,000 次；结果如下。您可以看到最佳运行中每次循环大约 98 微秒的结果：

[1.0888952040000106, 0.9799715450000122, 1.0921516899999801, 1.0946189250000202, 1.2792069260000005]

1	[1.0888952040000106, 0.9799715450000122, 1.0921516899999801, 1.0946189250000202, 1.2792069260000005]

Profile 模块

从微观角度关注一个或两个语句的性能。我们很可能有一个很长的程序，并且想知道是什么导致它运行缓慢。这发生在我们可以考虑替代语句或算法之前。

程序运行缓慢通常是由于两个原因：某一部分运行缓慢，或者某一部分运行次数过多，累积起来占用太多时间。我们称这些“性能瓶颈”为热点。让我们看一个例子。考虑以下使用爬山算法为感知器模型寻找超参数的程序：

# manually search perceptron hyperparameters for binary classification
from numpy import mean
from numpy.random import randn
from numpy.random import rand
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.linear_model import Perceptron

# objective function
def objective(X, y, cfg):
	# unpack config
	eta, alpha = cfg
	# define model
	model = Perceptron(penalty='elasticnet', alpha=alpha, eta0=eta)
	# define evaluation procedure
	cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
	# evaluate model
	scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)
	# calculate mean accuracy
	result = mean(scores)
	return result

# take a step in the search space
def step(cfg, step_size):
	# unpack the configuration
	eta, alpha = cfg
	# step eta
	new_eta = eta + randn() * step_size
	# check the bounds of eta
	if new_eta <= 0.0:
		new_eta = 1e-8
	if new_eta > 1.0:
		new_eta = 1.0
	# step alpha
	new_alpha = alpha + randn() * step_size
	# check the bounds of alpha
	if new_alpha < 0.0:
		new_alpha = 0.0
	# return the new configuration
	return [new_eta, new_alpha]

# hill climbing local search algorithm
def hillclimbing(X, y, objective, n_iter, step_size):
	# starting point for the search
	solution = [rand(), rand()]
	# evaluate the initial point
	solution_eval = objective(X, y, solution)
	# run the hill climb
	for i in range(n_iter):
		# take a step
		candidate = step(solution, step_size)
		# evaluate candidate point
		candidate_eval = objective(X, y, candidate)
		# check if we should keep the new point
		if candidate_eval >= solution_eval:
			# store the new point
			solution, solution_eval = candidate, candidate_eval
			# report progress
			print('>%d, cfg=%s %.5f' % (i, solution, solution_eval))
	return [solution, solution_eval]

# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# define the total iterations
n_iter = 100
# step size in the search space
step_size = 0.1
# perform the hill climbing search
cfg, score = hillclimbing(X, y, objective, n_iter, step_size)
print('Done!')
print('cfg=%s: Mean Accuracy: %f' % (cfg, score))

# 手动搜索用于二元分类的感知器超参数

from numpy import mean

from numpy.random import randn

from numpy.random import rand

from sklearn.datasets import make_classification

from sklearn.model_selection import cross_val_score

from sklearn.model_selection import RepeatedStratifiedKFold

from sklearn.linear_model import Perceptron

# 目标函数

def objective(X, y, cfg):

# 解包配置

eta, alpha = cfg

# 定义模型

model = Perceptron(penalty='elasticnet', alpha=alpha, eta0=eta)

# 定义评估过程

cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)

# 评估模型

scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)

# 计算平均准确度

result = mean(scores)

return result

# 在搜索空间中迈出一步

def step(cfg, step_size):

# 解包配置

eta, alpha = cfg

# 步进 eta

new_eta = eta + randn() * step_size

# 检查 eta 的边界

if new_eta <= 0.0:

new_eta = 1e-8

if new_eta > 1.0:

new_eta = 1.0

# 步进 alpha

new_alpha = alpha + randn() * step_size

# 检查 alpha 的边界

if new_alpha < 0.0:

new_alpha = 0.0

# 返回新配置

return [new_eta, new_alpha]

# 爬山局部搜索算法

def hillclimbing(X, y, objective, n_iter, step_size):

# 搜索的起点

solution = [rand(), rand()]

# 评估初始点

solution_eval = objective(X, y, solution)

# 运行爬山算法

for i in range(n_iter):

# 迈出一步

candidate = step(solution, step_size)

# 评估候选点

candidate_eval = objective(X, y, candidate)

# 检查是否应该保留新点

if candidate_eval >= solution_eval:

# 存储新点

solution, solution_eval = candidate, candidate_eval

# 报告进度

print('>%d, cfg=%s %.5f' % (i,solution, solution_eval))

return [solution, solution_eval]

# 定义数据集

X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)

# 定义总迭代次数

n_iter = 100

# 搜索空间中的步长

step_size = 0.1

# 执行爬山搜索

cfg, score = hillclimbing(X, y, objective, n_iter, step_size)

print('Done!')

print('cfg=%s: Mean Accuracy: %f' % (cfg, score))

假设我们将此程序保存在文件 `hillclimb.py` 中，我们可以通过命令行运行性能分析器，如下所示：

python -m cProfile hillclimb.py

1	python -m cProfile hillclimb.py

输出将是：

>10, cfg=[0.3792455490265847, 0.21589566352848377] 0.78400
>17, cfg=[0.49105438202347707, 0.1342150084854657] 0.79833
>26, cfg=[0.5737524712834843, 0.016749795596210315] 0.80033
>47, cfg=[0.5067828976025809, 0.05280380038497864] 0.80133
>48, cfg=[0.5427345321546029, 0.0049895870979695875] 0.81167
Done!
cfg=[0.5427345321546029, 0.0049895870979695875]: Mean Accuracy: 0.811667
         2686451 function calls (2638255 primitive calls) in 5.500 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      101    0.001    0.000    4.892    0.048 hillclimb.py:11(objective)
        1    0.000    0.000    5.501    5.501 hillclimb.py:2(<module>)
      100    0.000    0.000    0.001    0.000 hillclimb.py:25(step)
        1    0.001    0.001    4.894    4.894 hillclimb.py:44(hillclimbing)
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(<module>)
      303    0.000    0.000    0.008    0.000 <__array_function__ internals>:2(all)
      303    0.000    0.000    0.005    0.000 <__array_function__ internals>:2(amin)
        2    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(any)
        4    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(atleast_1d)
     3333    0.003    0.000    0.018    0.000 <__array_function__ internals>:2(bincount)
      103    0.000    0.000    0.001    0.000 <__array_function__ internals>:2(concatenate)
        3    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(copyto)
      606    0.001    0.000    0.010    0.000 <__array_function__ internals>:2(cumsum)
        6    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(dot)
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(empty_like)
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(inv)
        2    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(linspace)
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(lstsq)
      101    0.000    0.000    0.005    0.000 <__array_function__ internals>:2(mean)
        2    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(ndim)
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(outer)
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(polyfit)
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(polyval)
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(prod)
      303    0.000    0.000    0.002    0.000 <__array_function__ internals>:2(ravel)
        2    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(result_type)
      303    0.001    0.000    0.001    0.000 <__array_function__ internals>:2(shape)
      303    0.000    0.000    0.035    0.000 <__array_function__ internals>:2(sort)
        4    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(trim_zeros)
     1617    0.002    0.000    0.112    0.000 <__array_function__ internals>:2(unique)
...

>10, cfg=[0.3792455490265847, 0.21589566352848377] 0.78400

>17, cfg=[0.49105438202347707, 0.1342150084854657] 0.79833

>26, cfg=[0.5737524712834843, 0.016749795596210315] 0.80033

>47, cfg=[0.5067828976025809, 0.05280380038497864] 0.80133

>48, cfg=[0.5427345321546029, 0.0049895870979695875] 0.81167

完成！

cfg=[0.5427345321546029, 0.0049895870979695875]: 平均准确率: 0.811667

2686451 次函数调用 (2638255 次原始调用) 耗时 5.500 秒

排序方式: 标准名称

ncalls tottime percall cumtime percall filename:lineno(function)

101 0.001 0.000 4.892 0.048 hillclimb.py:11(objective)

1 0.000 0.000 5.501 5.501 hillclimb.py:2(<module>)

100 0.000 0.000 0.001 0.000 hillclimb.py:25(step)

1 0.001 0.001 4.894 4.894 hillclimb.py:44(hillclimbing)

1 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(<module>)

303 0.000 0.000 0.008 0.000 <__array_function__ internals>:2(all)

303 0.000 0.000 0.005 0.000 <__array_function__ internals>:2(amin)

2 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(any)

4 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(atleast_1d)

3333 0.003 0.000 0.018 0.000 <__array_function__ internals>:2(bincount)

103 0.000 0.000 0.001 0.000 <__array_function__ internals>:2(concatenate)

3 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(copyto)

606 0.001 0.000 0.010 0.000 <__array_function__ internals>:2(cumsum)

6 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(dot)

1 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(empty_like)

1 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(inv)

2 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(linspace)

1 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(lstsq)

101 0.000 0.000 0.005 0.000 <__array_function__ internals>:2(mean)

2 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(ndim)

1 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(outer)

1 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(polyfit)

1 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(polyval)

1 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(prod)

303 0.000 0.000 0.002 0.000 <__array_function__ internals>:2(ravel)

2 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(result_type)

303 0.001 0.000 0.001 0.000 <__array_function__ internals>:2(shape)

303 0.000 0.000 0.035 0.000 <__array_function__ internals>:2(sort)

4 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(trim_zeros)

1617 0.002 0.000 0.112 0.000 <__array_function__ internals>:2(unique)

...

程序正常输出会首先打印，然后打印性能分析器的统计信息。从第一行我们看到，程序中的 `objective()` 函数运行了 101 次，耗时 4.89 秒。但这 4.89 秒主要花费在其调用的函数上，该函数本身仅耗时 0.001 秒。来自依赖模块的函数也经过了性能分析。因此，您也会看到很多 NumPy 函数。

上述输出很长，可能对您没有用，因为它很难分辨哪个函数是热点。实际上，我们可以对上述输出进行排序。例如，要查看哪个函数被调用次数最多，我们可以按 `ncalls` 排序：

python -m cProfile -s ncalls hillclimb.py

1	python -m cProfile -s ncalls hillclimb.py

其输出如下：它表示 Python 字典中的 `get()` 函数是使用最多的函数（但它在程序完成的 5.6 秒总时间中仅消耗了 0.03 秒）：

         2685349 function calls (2637153 primitive calls) in 5.609 seconds

   Ordered by: call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   247588    0.029    0.000    0.029    0.000 {method 'get' of 'dict' objects}
   246196    0.028    0.000    0.028    0.000 inspect.py:2548(name)
   168057    0.018    0.000    0.018    0.000 {method 'append' of 'list' objects}
   161738    0.018    0.000    0.018    0.000 inspect.py:2560(kind)
   144431    0.021    0.000    0.029    0.000 {built-in method builtins.isinstance}
   142213    0.030    0.000    0.031    0.000 {built-in method builtins.getattr}
...

2685349 次函数调用 (2637153 次原始调用) 耗时 5.609 秒

排序方式: 调用计数

ncalls tottime percall cumtime percall filename:lineno(function)

247588 0.029 0.000 0.029 0.000 {method 'get' of 'dict' objects}

246196 0.028 0.000 0.028 0.000 inspect.py:2548(name)

168057 0.018 0.000 0.018 0.000 {method 'append' of 'list' objects}

161738 0.018 0.000 0.018 0.000 inspect.py:2560(kind)

144431 0.021 0.000 0.029 0.000 {built-in method builtins.isinstance}

142213 0.030 0.000 0.031 0.000 {built-in method builtins.getattr}

...

其他排序选项如下：

排序字符串	含义
calls	调用计数
cumulative	累积时间
cumtime	累积时间
file	文件名
filename	文件名
module	文件名
ncalls	调用计数
pcalls	原始调用计数
line	行号
name	函数名
nfl	名称/文件/行
stdname	标准名称
time	内部时间
tottime	内部时间

如果程序需要一些时间才能完成，为了以不同的排序顺序查找性能分析结果而多次运行程序是不合理的。实际上，我们可以保存性能分析器的统计数据以进行进一步处理，如下所示：

python -m cProfile -o hillclimb.stats hillclimb.py

1	python -m cProfile -o hillclimb.stats hillclimb.py

与上述类似，它将运行程序。但这不会将统计数据打印到屏幕上，而是将其保存到文件中。之后，我们可以使用 `pstats` 模块，如下所示，打开统计文件并提供一个提示来操作数据：

python -m pstats hillclimb.stats

1	python -m pstats hillclimb.stats

例如，我们可以使用 `sort` 命令更改排序顺序，并使用 `stats` 打印我们上面看到的内容：

Welcome to the profile statistics browser.
hillclimb.stat% help

Documented commands (type help <topic>):
========================================
EOF  add  callees  callers  help  quit  read  reverse  sort  stats  strip

hillclimb.stat% sort ncall
hillclimb.stat% stats hillclimb
Thu Jan 13 16:44:10 2022    hillclimb.stat

         2686227 function calls (2638031 primitive calls) in 5.582 seconds

   Ordered by: call count
   List reduced from 3456 to 4 due to restriction <'hillclimb'>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      101    0.001    0.000    4.951    0.049 hillclimb.py:11(objective)
      100    0.000    0.000    0.001    0.000 hillclimb.py:25(step)
        1    0.000    0.000    5.583    5.583 hillclimb.py:2(<module>)
        1    0.000    0.000    4.952    4.952 hillclimb.py:44(hillclimbing)

hillclimb.stat%

欢迎使用性能统计浏览器。

hillclimb.stat% help

已记录的命令 (输入 help <主题>)

========================================

EOF add callees callers help quit read reverse sort stats strip

hillclimb.stat% sort ncall

hillclimb.stat% stats hillclimb

2022 年 1 月 13 日星期四 16:44:10 hillclimb.stat

2686227 次函数调用 (2638031 次原始调用) 耗时 5.582 秒

排序方式: 调用计数

由于限制 <'hillclimb'>，列表从 3456 行减少到 4 行

ncalls tottime percall cumtime percall filename:lineno(function)

101 0.001 0.000 4.951 0.049 hillclimb.py:11(objective)

100 0.000 0.000 0.001 0.000 hillclimb.py:25(step)

1 0.000 0.000 5.583 5.583 hillclimb.py:2(<module>)

1 0.000 0.000 4.952 4.952 hillclimb.py:44(hillclimbing)

hillclimb.stat%

您会注意到上面的 `stats` 命令允许我们提供一个额外的参数。该参数可以是正则表达式，用于搜索函数，这样只会打印匹配的函数。因此，这是一种提供搜索字符串进行过滤的方法。

想开始学习机器学习 Python 吗？

立即参加我为期7天的免费电子邮件速成课程（附示例代码）。

点击注册，同时获得该课程的免费PDF电子书版本。

这个 `pstats` 浏览器允许我们看到不仅仅是上面的表格。`callers` 和 `callees` 命令向我们展示了哪个函数调用了哪个函数，调用了多少次，以及花费了多少时间。因此，我们可以将其视为函数级统计信息的细分。如果您有很多函数相互调用，并且想知道时间是如何在不同场景中花费的，这会很有用。例如，这表明 `objective()` 函数仅由 `hillclimbing()` 函数调用，而 `hillclimbing()` 函数调用了其他几个函数：

hillclimb.stat% callers objective
   Ordered by: call count
   List reduced from 3456 to 1 due to restriction <'objective'>

Function                    was called by...
                                ncalls  tottime  cumtime
hillclimb.py:11(objective)  <-     101    0.001    4.951  hillclimb.py:44(hillclimbing)


hillclimb.stat% callees hillclimbing
   Ordered by: call count
   List reduced from 3456 to 1 due to restriction <'hillclimbing'>

Function                       called...
                                   ncalls  tottime  cumtime
hillclimb.py:44(hillclimbing)  ->     101    0.001    4.951  hillclimb.py:11(objective)
                                      100    0.000    0.001  hillclimb.py:25(step)
                                        4    0.000    0.000  {built-in method builtins.print}
                                        2    0.000    0.000  {method 'rand' of 'numpy.random.mtrand.RandomState' objects}


hillclimb.stat%

hillclimb.stat% callers objective

排序方式: 调用计数

由于限制 <'objective'>，列表从 3456 行减少到 1 行

函数被...调用

ncalls tottime cumtime

hillclimb.py:11(objective) <- 101 0.001 4.951 hillclimb.py:44(hillclimbing)

hillclimb.stat% callees hillclimbing

排序方式: 调用计数

由于限制 <'hillclimbing'>，列表从 3456 行减少到 1 行

函数调用了...

ncalls tottime cumtime

hillclimb.py:44(hillclimbing) -> 101 0.001 4.951 hillclimb.py:11(objective)

100 0.000 0.001 hillclimb.py:25(step)

4 0.000 0.000 {built-in method builtins.print}

2 0.000 0.000 {method 'rand' of 'numpy.random.mtrand.RandomState' objects}

hillclimb.stat%

在代码中使用分析器

上面的例子假设您有一个完整的程序保存在文件中，并且您对整个程序进行了性能分析。有时，我们只关注整个程序的一部分。例如，如果我们加载一个大型模块，它需要时间来启动，我们希望将其从性能分析器中排除。在这种情况下，我们只能为某些行调用性能分析器。以下是一个修改自上述程序的示例：

# manually search perceptron hyperparameters for binary classification
import cProfile as profile
import pstats
from numpy import mean
from numpy.random import randn
from numpy.random import rand
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.linear_model import Perceptron

# objective function
def objective(X, y, cfg):
	# unpack config
	eta, alpha = cfg
	# define model
	model = Perceptron(penalty='elasticnet', alpha=alpha, eta0=eta)
	# define evaluation procedure
	cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
	# evaluate model
	scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)
	# calculate mean accuracy
	result = mean(scores)
	return result

# take a step in the search space
def step(cfg, step_size):
	# unpack the configuration
	eta, alpha = cfg
	# step eta
	new_eta = eta + randn() * step_size
	# check the bounds of eta
	if new_eta <= 0.0:
		new_eta = 1e-8
	if new_eta > 1.0:
		new_eta = 1.0
	# step alpha
	new_alpha = alpha + randn() * step_size
	# check the bounds of alpha
	if new_alpha < 0.0:
		new_alpha = 0.0
	# return the new configuration
	return [new_eta, new_alpha]

# hill climbing local search algorithm
def hillclimbing(X, y, objective, n_iter, step_size):
	# starting point for the search
	solution = [rand(), rand()]
	# evaluate the initial point
	solution_eval = objective(X, y, solution)
	# run the hill climb
	for i in range(n_iter):
		# take a step
		candidate = step(solution, step_size)
		# evaluate candidate point
		candidate_eval = objective(X, y, candidate)
		# check if we should keep the new point
		if candidate_eval >= solution_eval:
			# store the new point
			solution, solution_eval = candidate, candidate_eval
			# report progress
			print('>%d, cfg=%s %.5f' % (i, solution, solution_eval))
	return [solution, solution_eval]

# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# define the total iterations
n_iter = 100
# step size in the search space
step_size = 0.1
# perform the hill climbing search with profiling
prof = profile.Profile()
prof.enable()
cfg, score = hillclimbing(X, y, objective, n_iter, step_size)
prof.disable()
# print program output
print('Done!')
print('cfg=%s: Mean Accuracy: %f' % (cfg, score))
# print profiling output
stats = pstats.Stats(prof).strip_dirs().sort_stats("cumtime")
stats.print_stats(10) # top 10 rows

# 手动搜索用于二元分类的感知器超参数

import cProfile as profile

import pstats

from numpy import mean

from numpy.random import randn

from numpy.random import rand

from sklearn.datasets import make_classification

from sklearn.model_selection import cross_val_score

from sklearn.model_selection import RepeatedStratifiedKFold

from sklearn.linear_model import Perceptron

# 目标函数

def objective(X, y, cfg):

# 解包配置

eta, alpha = cfg

# 定义模型

model = Perceptron(penalty='elasticnet', alpha=alpha, eta0=eta)

# 定义评估过程

cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)

# 评估模型

scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)

# 计算平均准确度

result = mean(scores)

return result

# 在搜索空间中迈出一步

def step(cfg, step_size):

# 解包配置

eta, alpha = cfg

# 步进 eta

new_eta = eta + randn() * step_size

# 检查 eta 的边界

if new_eta <= 0.0:

new_eta = 1e-8

if new_eta > 1.0:

new_eta = 1.0

# 步进 alpha

new_alpha = alpha + randn() * step_size

# 检查 alpha 的边界

if new_alpha < 0.0:

new_alpha = 0.0

# 返回新配置

return [new_eta, new_alpha]

# 爬山局部搜索算法

def hillclimbing(X, y, objective, n_iter, step_size):

# 搜索的起点

solution = [rand(), rand()]

# 评估初始点

solution_eval = objective(X, y, solution)

# 运行爬山算法

for i in range(n_iter):

# 迈出一步

candidate = step(solution, step_size)

# 评估候选点

candidate_eval = objective(X, y, candidate)

# 检查是否应该保留新点

if candidate_eval >= solution_eval:

# 存储新点

solution, solution_eval = candidate, candidate_eval

# 报告进度

print('>%d, cfg=%s %.5f' % (i,solution, solution_eval))

return [solution, solution_eval]

# 定义数据集

X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)

# 定义总迭代次数

n_iter = 100

# 搜索空间中的步长

step_size = 0.1

# 对爬山搜索进行性能分析

prof = profile.Profile()

prof.enable()

cfg, score = hillclimbing(X, y, objective, n_iter, step_size)

prof.disable()

# 打印程序输出

print('Done!')

print('cfg=%s: Mean Accuracy: %f' % (cfg, score))

# 打印性能分析输出

stats = pstats.Stats(prof).strip_dirs().sort_stats("cumtime")

stats.print_stats(10) # 前 10 行

它将输出以下内容：

>0, cfg=[0.3776271076534661, 0.2308364063203663] 0.75700
>3, cfg=[0.35803234662466354, 0.03204434939660264] 0.77567
>8, cfg=[0.3001050823005957, 0.0] 0.78633
>10, cfg=[0.39518618870158934, 0.0] 0.78633
>12, cfg=[0.4291267905390187, 0.0] 0.78633
>13, cfg=[0.4403131521968569, 0.0] 0.78633
>16, cfg=[0.38865272555918756, 0.0] 0.78633
>17, cfg=[0.38871654921891885, 0.0] 0.78633
>18, cfg=[0.4542440671724224, 0.0] 0.78633
>19, cfg=[0.44899743344802734, 0.0] 0.78633
>20, cfg=[0.5855375509507891, 0.0] 0.78633
>21, cfg=[0.5935318064858227, 0.0] 0.78633
>23, cfg=[0.7606367310048543, 0.0] 0.78633
>24, cfg=[0.855444293727846, 0.0] 0.78633
>25, cfg=[0.9505501566826242, 0.0] 0.78633
>26, cfg=[1.0, 0.0244821888204496] 0.79800
Done!
cfg=[1.0, 0.0244821888204496]: Mean Accuracy: 0.798000
         2179559 function calls (2140124 primitive calls) in 4.941 seconds

   Ordered by: cumulative time
   List reduced from 581 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001    4.941    4.941 hillclimb.py:46(hillclimbing)
      101    0.001    0.000    4.939    0.049 hillclimb.py:13(objective)
      101    0.001    0.000    4.931    0.049 _validation.py:375(cross_val_score)
      101    0.002    0.000    4.930    0.049 _validation.py:48(cross_validate)
      101    0.005    0.000    4.903    0.049 parallel.py:960(__call__)
      101    0.235    0.002    3.089    0.031 parallel.py:920(retrieve)
     3030    0.004    0.000    2.849    0.001 _parallel_backends.py:537(wrap_future_result)
     3030    0.020    0.000    2.845    0.001 _base.py:417(result)
     2602    0.016    0.000    2.819    0.001 threading.py:280(wait)
    12447    2.796    0.000    2.796    0.000 {method 'acquire' of '_thread.lock' objects}

>0, cfg=[0.3776271076534661, 0.2308364063203663] 0.75700

>3, cfg=[0.35803234662466354, 0.03204434939660264] 0.77567

>8, cfg=[0.3001050823005957, 0.0] 0.78633

>10, cfg=[0.39518618870158934, 0.0] 0.78633

>12, cfg=[0.4291267905390187, 0.0] 0.78633

>13, cfg=[0.4403131521968569, 0.0] 0.78633

>16, cfg=[0.38865272555918756, 0.0] 0.78633

>17, cfg=[0.38871654921891885, 0.0] 0.78633

>18, cfg=[0.4542440671724224, 0.0] 0.78633

>19, cfg=[0.44899743344802734, 0.0] 0.78633

>20, cfg=[0.5855375509507891, 0.0] 0.78633

>21, cfg=[0.5935318064858227, 0.0] 0.78633

>23, cfg=[0.7606367310048543, 0.0] 0.78633

>24, cfg=[0.855444293727846, 0.0] 0.78633

>25, cfg=[0.9505501566826242, 0.0] 0.78633

>26, cfg=[1.0, 0.0244821888204496] 0.79800

完成！

cfg=[1.0, 0.0244821888204496]: 平均准确率: 0.798000

2179559 次函数调用 (2140124 次原始调用) 耗时 4.941 秒

排序方式: 累积时间

由于限制 <10>，列表从 581 行减少到 10 行

ncalls tottime percall cumtime percall filename:lineno(function)

1 0.001 0.001 4.941 4.941 hillclimb.py:46(hillclimbing)

101 0.001 0.000 4.939 0.049 hillclimb.py:13(objective)

101 0.001 0.000 4.931 0.049 _validation.py:375(cross_val_score)

101 0.002 0.000 4.930 0.049 _validation.py:48(cross_validate)

101 0.005 0.000 4.903 0.049 parallel.py:960(__call__)

101 0.235 0.002 3.089 0.031 parallel.py:920(retrieve)

3030 0.004 0.000 2.849 0.001 _parallel_backends.py:537(wrap_future_result)

3030 0.020 0.000 2.845 0.001 _base.py:417(result)

2602 0.016 0.000 2.819 0.001 threading.py:280(wait)

12447 2.796 0.000 2.796 0.000 {method 'acquire' of '_thread.lock' objects}

注意事项

将性能分析器与 Tensorflow 模型一起使用可能不会产生您预期的结果，特别是如果您为模型编写了自定义层或自定义函数。如果您正确地完成了，Tensorflow 应该在模型执行之前构建计算图，因此逻辑将改变。性能分析器输出因此不会显示您的自定义类。

对于一些涉及二进制代码的高级模块也是如此。性能分析器可以看到您调用了一些函数并将其标记为“内置”方法，但它无法进一步深入编译代码。

以下是 MNIST 分类问题的 LeNet5 模型的简短代码。如果您尝试对其进行性能分析并打印前 15 行，您将看到一个包装器占据了大部分时间，并且在此之外什么也无法显示：

import numpy as np
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, Dense, AveragePooling2D, Flatten
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import EarlyStopping

# Load and reshape data to shape of (n_sample, height, width, n_channel)
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = np.expand_dims(X_train, axis=3).astype('float32')
X_test = np.expand_dims(X_test, axis=3).astype('float32')

# One-hot encode the output
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

# LeNet5 model
model = Sequential([
    Conv2D(6, (5,5), input_shape=(28,28,1), padding="same", activation="tanh"),
    AveragePooling2D((2,2), strides=2),
    Conv2D(16, (5,5), activation="tanh"),
    AveragePooling2D((2,2), strides=2),
    Conv2D(120, (5,5), activation="tanh"),
    Flatten(),
    Dense(84, activation="tanh"),
    Dense(10, activation="softmax")
])
model.summary(line_length=100)

# Training
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
earlystopping = EarlyStopping(monitor="val_loss", patience=2, restore_best_weights=True)
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=20, batch_size=32, callbacks=[earlystopping])

# Evaluate
print(model.evaluate(X_test, y_test, verbose=0))

import numpy as np

import tensorflow as tf

from tensorflow.keras.datasets import mnist

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Conv2D, Dense, AveragePooling2D, Flatten

from tensorflow.keras.utils import to_categorical

from tensorflow.keras.callbacks import EarlyStopping

# 加载数据并将其重塑为 (n_sample, height, width, n_channel) 的形状

(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = np.expand_dims(X_train, axis=3).astype('float32')

X_test = np.expand_dims(X_test, axis=3).astype('float32')

# 对输出进行独热编码

y_train = to_categorical(y_train)

y_test = to_categorical(y_test)

# LeNet5 模型

model = Sequential([

Conv2D(6, (5,5), input_shape=(28,28,1), padding="same", activation="tanh"),

AveragePooling2D((2,2), strides=2),

Conv2D(16, (5,5), activation="tanh"),

AveragePooling2D((2,2), strides=2),

Conv2D(120, (5,5), activation="tanh"),

Flatten(),

Dense(84, activation="tanh"),

Dense(10, activation="softmax")

])

model.summary(line_length=100)

# 训练

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

earlystopping = EarlyStopping(monitor="val_loss", patience=2, restore_best_weights=True)

model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=20, batch_size=32, callbacks=[earlystopping])

# 评估

print(model.evaluate(X_test, y_test, verbose=0))

在下面的结果中，`TFE_Py_Execute` 被标记为“内置”方法，它在总运行时间 39.6 秒中消耗了 30.1 秒。请注意，tottime 与 cumtime 相同，这意味着从性能分析器的角度来看，所有时间似乎都花在此函数上，并且它没有调用任何其他函数。这说明了 Python 性能分析器的局限性。

         5962698 function calls (5728324 primitive calls) in 39.674 seconds

   Ordered by: cumulative time
   List reduced from 12295 to 15 due to restriction <15>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   3212/1    0.013    0.000   39.699   39.699 {built-in method builtins.exec}
        1    0.003    0.003   39.699   39.699 mnist.py:4(<module>)
     52/4    0.005    0.000   35.470    8.868 /usr/local/lib/python3.9/site-packages/keras/utils/traceback_utils.py:58(error_handler)
        1    0.089    0.089   34.334   34.334 /usr/local/lib/python3.9/site-packages/keras/engine/training.py:901(fit)
11075/9531    0.032    0.000   33.406    0.004 /usr/local/lib/python3.9/site-packages/tensorflow/python/util/traceback_utils.py:138(error_handler)
     4689    0.089    0.000   33.017    0.007 /usr/local/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py:882(__call__)
     4689    0.023    0.000   32.771    0.007 /usr/local/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py:929(_call)
     4688    0.042    0.000   32.134    0.007 /usr/local/lib/python3.9/site-packages/tensorflow/python/eager/function.py:3125(__call__)
     4689    0.075    0.000   30.941    0.007 /usr/local/lib/python3.9/site-packages/tensorflow/python/eager/function.py:1888(_call_flat)
     4689    0.158    0.000   30.472    0.006 /usr/local/lib/python3.9/site-packages/tensorflow/python/eager/function.py:553(call)
     4689    0.034    0.000   30.152    0.006 /usr/local/lib/python3.9/site-packages/tensorflow/python/eager/execute.py:33(quick_execute)
     4689   30.105    0.006   30.105    0.006 {built-in method tensorflow.python._pywrap_tfe.TFE_Py_Execute}
  3185/24    0.021    0.000    3.902    0.163 <frozen importlib._bootstrap>:1002(_find_and_load)
  3169/10    0.014    0.000    3.901    0.390 <frozen importlib._bootstrap>:967(_find_and_load_unlocked)
  2885/12    0.009    0.000    3.901    0.325 <frozen importlib._bootstrap_external>:844(exec_module)

5962698 次函数调用 (5728324 次原始调用) 耗时 39.674 秒

排序方式: 累积时间

由于限制 <15>，列表从 12295 行减少到 15 行

ncalls tottime percall cumtime percall filename:lineno(function)

3212/1 0.013 0.000 39.699 39.699 {built-in method builtins.exec}

1 0.003 0.003 39.699 39.699 mnist.py:4(<module>)

52/4 0.005 0.000 35.470 8.868 /usr/local/lib/python3.9/site-packages/keras/utils/traceback_utils.py:58(error_handler)

1 0.089 0.089 34.334 34.334 /usr/local/lib/python3.9/site-packages/keras/engine/training.py:901(fit)

11075/9531 0.032 0.000 33.406 0.004 /usr/local/lib/python3.9/site-packages/tensorflow/python/util/traceback_utils.py:138(error_handler)

4689 0.089 0.000 33.017 0.007 /usr/local/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py:882(__call__)

4689 0.023 0.000 32.771 0.007 /usr/local/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py:929(_call)

4688 0.042 0.000 32.134 0.007 /usr/local/lib/python3.9/site-packages/tensorflow/python/eager/function.py:3125(__call__)

4689 0.075 0.000 30.941 0.007 /usr/local/lib/python3.9/site-packages/tensorflow/python/eager/function.py:1888(_call_flat)

4689 0.158 0.000 30.472 0.006 /usr/local/lib/python3.9/site-packages/tensorflow/python/eager/function.py:553(call)

4689 0.034 0.000 30.152 0.006 /usr/local/lib/python3.9/site-packages/tensorflow/python/eager/execute.py:33(quick_execute)

4689 30.105 0.006 30.105 0.006 {built-in method tensorflow.python._pywrap_tfe.TFE_Py_Execute}

3185/24 0.021 0.000 3.902 0.163 <frozen importlib._bootstrap>:1002(_find_and_load)

3169/10 0.014 0.000 3.901 0.390 <frozen importlib._bootstrap>:967(_find_and_load_unlocked)

2885/12 0.009 0.000 3.901 0.325 <frozen importlib._bootstrap_external>:844(exec_module)

最后，Python 的性能分析器仅为您提供时间统计数据，而不提供内存使用情况。您可能需要为此目的寻找其他库或工具。

进一步阅读

标准库模块 `timeit`、`cProfile` 和 `pstats` 在 Python 文档中都有其文档：

`timeit` 模块：https://docs.pythonlang.cn/3/library/timeit.html
`cProfile` 模块和 `pstats` 模块：https://docs.pythonlang.cn/3/library/profile.html

标准库的性能分析器非常强大，但并非唯一。如果您想要更可视化的东西，可以尝试 Python Call Graph 模块。它可以使用 GraphViz 工具生成函数如何相互调用的图片：

Python Call Graph：https://pycallgraph.readthedocs.io/en/master/

无法深入编译代码的局限性可以通过不使用 Python 的性能分析器，而是使用针对编译程序的性能分析器来解决。我最喜欢的是 Valgrind：

Valgrind：https://valgrind.org/

但要使用它，您可能需要重新编译 Python 解释器以开启调试支持。

总结

在本教程中，我们了解了什么是性能分析器及其功能。具体来说：

我们知道如何使用 `timeit` 模块比较小段代码
我们看到 Python 的 `cProfile` 模块可以为我们提供有关时间花费的详细统计数据
我们学习了如何使用 `pstats` 模块处理 `cProfile` 的输出进行排序或过滤

导航

分析 Python 代码

教程概述

分析小片段

Profile 模块

想开始学习机器学习 Python 吗？

在代码中使用分析器

注意事项

进一步阅读

总结

掌握机器学习 Python！

更自信地用 Python 编写代码

向您展示高级 Python 工具箱，用于
您的项目

关于此主题的更多信息

对《分析 Python 代码》的 3 条评论

发表评论点击此处取消回复。

导航

教程概述

分析小片段

Profile 模块

想开始学习机器学习 Python 吗？

在代码中使用分析器

注意事项

进一步阅读

总结

掌握机器学习 Python！

更自信地用 Python 编写代码

向您展示高级 Python 工具箱，用于您的项目

关于此主题的更多信息

对《分析 Python 代码》的 3 条评论

发表评论 点击此处取消回复。

向您展示高级 Python 工具箱，用于
您的项目

发表评论点击此处取消回复。