使用 Diffusers 进行图像修复 (Inpainting) 和扩展 (Outpainting)

作者： Kanwal Mehreen 于 2024年7月18日分类于 Stable Diffusion 0

图像修复（inpainting）和图像外绘（outpainting）是流行的图像编辑技术。您已经了解了如何使用 WebUI 执行图像修复和外绘。您也可以通过代码来实现相同的功能。

在本帖中，您将看到如何使用 Hugging Face 的 diffusers 库运行 Stable Diffusion 管道来执行图像修复和外绘。

完成本教程后，你将学到：

如何使用 diffusers 的相应管道执行图像修复
如何将图像外绘问题理解为一种特殊的图像修复问题

开始您的项目，阅读我的书《掌握 Stable Diffusion 数字艺术》。它提供了包含可用代码的自学教程。

让我们开始吧。

使用 Diffusers 进行图像修复 (Inpainting) 和扩展 (Outpainting)
照片来源：Anna Kolosyuk。部分权利保留。

概述

本教程分为两部分；它们是：

使用 Diffusers 库进行图像修复
使用 Diffusers 库进行图像外绘

使用 Diffusers 库进行图像修复

我们在上一篇帖子中讨论了图像修复的概念，并展示了如何使用 WebUI 进行图像修复。在本节中，您将看到如何使用 Python 代码实现相同的功能。

在本文中，您将使用 Google Colab，因为这样您就不需要拥有 GPU。如果您决定在本地运行代码，可能需要进行一些小的修改。例如，您可以直接调用 cv2.imshow() 函数，而不是使用 Google 的补丁版本 cv2_imshow() 函数。

图像修复需要您对需要重建的图像区域进行蒙版处理，并使用一个能够填充缺失像素的强大模型。您将利用

Meta AI 的 SAM（Segment Anything Model）——一个非常强大的图像分割模型，您将利用它为输入图像生成蒙版。
Hugging Face 库中的 StableDiffusionInpaintPipeline，用于进行文本引导的 Stable Diffusion 图像修复。

首先，您应该在 Google Colab 上创建一个 Notebook，并设置为使用 T4 GPU。在 Notebook 的开头，您应该安装所有依赖项，并加载 SAM 的检查点 ViT-B（URL：https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth）。

以下代码应首先执行：

import numpy as np
import torch
import cv2
from PIL import Image
from google.colab.patches import cv2_imshow

!pip install 'git+https://github.com/facebookresearch/segment-anything.git'
from segment_anything import sam_model_registry, SamPredictor

!pip install diffusers accelerate
from diffusers import StableDiffusionInpaintPipeline

!wget -q -nc https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth
CHECKPOINT_PATH='/content/sam_vit_b_01ec64.pth'

DEVICE = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
MODEL_TYPE = "vit_b"

import numpy as np

import torch

import cv2

from PIL import Image

from google.colab.patches import cv2_imshow

!pip install 'git+https://github.com/facebookresearch/segment-anything.git'

from segment_anything import sam_model_registry, SamPredictor

!pip install diffusers accelerate

from diffusers import StableDiffusionInpaintPipeline

!wget -q -nc https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth

CHECKPOINT_PATH='/content/sam_vit_b_01ec64.pth'

DEVICE = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

MODEL_TYPE = "vit_b"

接下来，您可以将一张图片上传到 Colab 以进行重建。您可以通过单击左侧工具栏上的“文件”图标，然后从本地计算机上传文件来方便地完成此操作。

Google Colab 左侧面板允许您上传文件。

您上传的文件位于 /content/ 目录下。您可以通过提供完整路径来加载图像，并将其转换为 RGB 格式。

# Give the path of your image
IMAGE_PATH = '/content/Dog.png'
# Read the image from the path
image = cv2.imread(IMAGE_PATH)
cv2_imshow(image)
# Convert to RGB format
image_rgb = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)

# 提供图像路径

IMAGE_PATH = '/content/Dog.png'

# 从路径读取图像

image = cv2.imread(IMAGE_PATH)

cv2_imshow(image)

# 转换为 RGB 格式

image_rgb = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)

这是开始的示例图像。

用于图像修复的示例图片。

现在加载您之前下载的检查点处的 SAM 模型。在这里，您将使用 SamPredictor 类来分割图像。您提供要蒙版处理的对象图像坐标，模型将自动分割图像。

sam = sam_model_registry[MODEL_TYPE](checkpoint=CHECKPOINT_PATH)
sam.to(device=DEVICE)
mask_predictor = SamPredictor(sam)
mask_predictor.set_image(image_rgb)

# Provide points as input prompt [X, Y]-coordinates
input_point = np.array([[250, 250]])
input_label = np.array([1])

# Predicting Segmentation mask
masks, scores, logits = mask_predictor.predict(
    point_coords=input_point,
    point_labels=input_label,
    multimask_output=False,
)

sam = sam_model_registry[MODEL_TYPE](checkpoint=CHECKPOINT_PATH)

sam.to(device=DEVICE)

mask_predictor = SamPredictor(sam)

mask_predictor.set_image(image_rgb)

# 提供点作为输入提示 [X, Y] 坐标

input_point = np.array([[250, 250]])

input_label = np.array([1])

# 预测分割蒙版

masks, scores, logits = mask_predictor.predict(

point_coords=input_point,

point_labels=input_label,

multimask_output=False,

)

选择的对象是像素位于（250,250）处的那个，即图像中心。数组 mask 是一个布尔数组（用于二值图像），我们将将其转换为像素值，将形状从 (1,512,512) 更改为 (512,512,1)，并将其转换为黑白版本。

mask = masks.astype(float) * 255
mask = np.transpose(mask, (1, 2, 0))
_ , bw_image = cv2.threshold(mask, 100, 255, cv2.THRESH_BINARY)
cv2_imshow(bw_image)
cv2.imwrite('mask.png', bw_image)
del sam, mask_predictor   # delete models to conserve GPU memory

mask = masks.astype(float) * 255

mask = np.transpose(mask, (1, 2, 0))

_ , bw_image = cv2.threshold(mask, 100, 255, cv2.THRESH_BINARY)

cv2_imshow(bw_image)

cv2.imwrite('mask.png', bw_image)

del sam, mask_predictor # 删除模型以节省 GPU 内存

创建的蒙版如下：

SAM 为图像修复创建的蒙版。白色像素将被更改，黑色像素将保留。

SAM 已完成其工作，帮助我们生成了蒙版，现在我们可以使用 Stable Diffusion 进行图像修复了。

使用 Hugging Face 存储库中的 Stable Diffusion 模型创建管道。

# Load images using PIL
init_image = Image.open(IMAGE_PATH)
mask_image = Image.open('mask.png')

pipe = StableDiffusionInpaintPipeline.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16
)
pipe = pipe.to(DEVICE)

# 使用 PIL 加载图像

init_image = Image.open(IMAGE_PATH)

mask_image = Image.open('mask.png')

pipe = StableDiffusionInpaintPipeline.from_pretrained(

"runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16

)

pipe = pipe.to(DEVICE)

在上面，您使用了 StableDiffusionInpaintPipeline，它仅适用于 Stable Diffusion 1.x 图像修复模型。如果您不确定您的模型属于哪种类型，也可以尝试使用 AutoPipelineForInpainting，看看是否能自动识别正确的架构。

现在提供一个用于重建的提示，然后等待奇迹发生！

prompt = "a grey cat sitting on a bench, high resolution"
image = pipe(prompt=prompt, image=init_image, mask_image=mask_image).images[0]
image.save('output.png')

prompt = "a grey cat sitting on a bench, high resolution"

image = pipe(prompt=prompt, image=init_image, mask_image=mask_image).images[0]

image.save('output.png')

这张图片也会在 Colab 的 /content 目录下创建。您现在可以像之前一样显示图片。

image = cv2.imread('/content/output.png')
cv2_imshow(image)

1 2	image = cv2.imread('/content/output.png') cv2_imshow(image)

您可能会看到这样的结果：

图像修复结果。

恭喜您完成了这个快速教程！现在，真正的乐趣才刚刚开始。本次简短教程到此结束。请注意，在示例图像中，只有一个主要的物体（狗），但如果存在多个物体，或者您想尝试不同的蒙版技术，可以尝试探索 SamAutomaticMaskGenerator，或者使用相同的 SamPredictor，但使用边界框来处理不同的物体。

使用 Diffusers 库进行图像外绘

与图像修复不同，diffusers 库没有专门用于图像外绘（outpainting）的管道。但实际上，图像外绘只是对蒙版和图像进行一些修改。让我们看看如何做到这一点。

与之前一样，您需要相同的先决条件，例如设置带 GPU 的 Notebook 并安装 diffusers 库。但是，您不会使用 SAM 作为分割模型来创建图像内部物体的蒙版，而是应该创建蒙版来突出图像外部边框的像素。

# Give the path of your image
IMAGE_PATH = '/content/Dog.png'
# Read the image from the path
image = cv2.imread(IMAGE_PATH)
height, width = image.shape[:2]
padding = 100 # num pixels to outpaint
mask = np.ones((height+2*padding, width+2*padding), dtype=np.uint8) * 255
mask[padding:-padding, padding:-padding] = 0
cv2_imshow(mask)
cv2.imwrite("mask.png", mask)

# 提供图像路径

IMAGE_PATH = '/content/Dog.png'

# 从路径读取图像

image = cv2.imread(IMAGE_PATH)

height, width = image.shape[:2]

padding = 100 # 外绘像素数

mask = np.ones((height+2*padding, width+2*padding), dtype=np.uint8) * 255

mask[padding:-padding, padding:-padding] = 0

cv2_imshow(mask)

cv2.imwrite("mask.png", mask)

上面的代码检查原始图像的大小（并将其保存到变量 height 和 width 中）。然后创建一个具有 100 像素边框的外绘蒙版，使其成为一个整数值为 255 的数组，匹配外绘图像的尺寸，然后将中心（不包括边框）设置为零值。请记住，蒙版中的零值表示像素不会改变。

接下来，您可以创建一个“扩展图像”来匹配外绘图像的尺寸。与创建的蒙版一起，您将一个外绘问题转换成一个图像修复问题，其中蒙版位于边框处。

您可以通过 numpy 轻松地用灰色填充原始边框外部的像素。

# extend the original image
image_extended = np.pad(image, ((padding, padding), (padding, padding), (0, 0)), mode='constant', constant_values=128)
cv2_imshow(image_extended)
cv2.imwrite("image_extended.png", image_extended)

# 扩展原始图像

image_extended = np.pad(image, ((padding, padding), (padding, padding), (0, 0)), mode='constant', constant_values=128)

cv2_imshow(image_extended)

cv2.imwrite("image_extended.png", image_extended)

扩展后的图像看起来是这样的：

用于外绘的扩展图像。

现在您可以像上一节一样运行图像修复。

# Load images using PIL
init_image = Image.open('image_extended.png')
mask_image = Image.open('mask.png')

pipe = StableDiffusionInpaintPipeline.from_pretrained(
    "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

inpaint_image = pipe(prompt="a dog on a bench in a park", image=init_image, mask_image=mask_image).images[0]
inpaint_image.save('output.png')

# 使用 PIL 加载图像

init_image = Image.open('image_extended.png')

mask_image = Image.open('mask.png')

pipe = StableDiffusionInpaintPipeline.from_pretrained(

"runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16

)

pipe = pipe.to("cuda")

inpaint_image = pipe(prompt="a dog on a bench in a park", image=init_image, mask_image=mask_image).images[0]

inpaint_image.save('output.png')

您可以按以下方式检查输出：

image = cv2.imread('/content/output.png')
cv2_imshow(image)

1 2	image = cv2.imread('/content/output.png') cv2_imshow(image)

结果如下：

外绘结果。请注意，两侧添加了树木。

您可能想知道为什么在外绘中仍然需要提供提示。这是管道 API 所必需的，但您可以提供一个空字符串作为提示。但描述原始图片确实是必要的。您可以尝试使用不同的提示，例如“a framed picture of a dog on a bench”，然后观察结果。

进一步阅读

如果您想深入了解此主题，本节提供了更多资源。

总结

在本帖中，您学习了使用 diffusers 库通过 Stable Diffusion 进行图像修复和外绘的构建块。特别是，您学习了如何使用 StablediffusionInpaintPipeline 和 SAM 进行图像分割和蒙版创建以进行图像修复。您还学习了如何将图像外绘问题转换为图像修复问题，以便在 Python 代码中执行相同操作。

立即开始用 Stable Diffusion 精通数字艺术！

Mastering Digital Art with Stable Diffusion

学习如何让 Stable Diffusion 为您服务

……通过学习图像生成过程中的一些关键要素

在我的新电子书中探索如何实现
使用 Stable Diffusion 精通数字艺术

这本书提供了自学教程，包含所有可用 Python 代码，指导您从新手成长为图像生成专家。它教您如何设置 Stable Diffusion、微调模型、自动化工作流程、调整关键参数等等……所有这些都将帮助您创作出令人惊叹的数字艺术。

通过实践练习，开启您的数字艺术之旅

查看内容

导航

使用 Diffusers 进行图像修复 (Inpainting) 和扩展 (Outpainting)

概述

使用 Diffusers 库进行图像修复

使用 Diffusers 库进行图像外绘

进一步阅读

总结

立即开始用 Stable Diffusion 精通数字艺术！

学习如何让 Stable Diffusion 为您服务

通过实践练习，开启您的数字艺术之旅

关于此主题的更多信息

暂无评论。

留下回复点击此处取消回复。

导航

概述

使用 Diffusers 库进行图像修复

使用 Diffusers 库进行图像外绘

进一步阅读

总结

立即开始用 Stable Diffusion 精通数字艺术！

学习如何让 Stable Diffusion 为您服务

通过实践练习，开启您的数字艺术之旅

关于此主题的更多信息

暂无评论。

留下回复 点击此处取消回复。

留下回复点击此处取消回复。