ControlNet¶

ControlNet 是一种神经网络架构，旨在通过文本和图像提示增强图像生成的精度和控制力。它允许您影响图像构图、调整特定元素并确保空间一致性。ControlNet 可用于各种创意和精确的图像生成任务，例如为人像定义特定姿势，以及将一张图像的构图或布局复制到新图像中。

本文档演示了如何使用 ControlNet 和 Stable Diffusion XL 创建满足特定用户需求的图像生成应用程序。

源代码

部署到 BentoCloud

使用 BentoML 服务

您可以使用指定所需图像特征的参数调用 ControlNet 推理 API。例如，发送以下查询以生成一个复制所提供参考图像姿势的新场景

{
   "prompt": "A young man walking in a park, wearing jeans.",
   "negative_prompt": "ugly, disfigured, ill-structured, low resolution",
   "controlnet_conditioning_scale": 0.5,
   "num_inference_steps": 25,
   "image": "example-image.png",
}

输入参考图像

Reference image showing a person in a specific pose that will be used as input for the ControlNet model

这是生成的输出图像，在新环境中复制了姿势

Generated output image showing a person in the same pose as the reference image but in a park setting with different clothing as specified in the prompt

此示例已准备好在 BentoCloud 上快速部署和扩展。只需一条命令，您就可以获得具有快速自动扩展、云中安全部署和全面可观测性的生产级应用程序。

Screenshot of ControlNet application deployed on BentoCloud showing the image generation interface with prompt inputs and controls

代码解释¶

您可以在 GitHub 中找到源代码。以下是本项目中关键代码实现的详细说明。

设置 ControlNet 和 SDXL 管道使用的模型 ID。您可以根据需要切换到任何其他扩散模型。
- diffusers/controlnet-canny-sdxl-1.0：在图像生成过程中提供增强的控制。它允许根据文本和图像输入进行精确修改，确保生成的图像更符合特定的用户要求（例如，复制某些构图）。
- madebyollin/sdxl-vae-fp16-fix：这个变分自动编码器 (VAE) 负责在管道内对图像进行编码和解码。
- stabilityai/stable-diffusion-xl-base-1.0：接收文本提示和图像输入，通过上述两个集成模型进行处理，并生成反映给定提示的图像。
service.py¶
```
CONTROLNET_MODEL_ID = "diffusers/controlnet-canny-sdxl-1.0"
VAE_MODEL_ID = "madebyollin/sdxl-vae-fp16-fix"
BASE_MODEL_ID = "stabilityai/stable-diffusion-xl-base-1.0"
```
使用 @bentoml.service 装饰器定义一个 BentoML 服务，您可以在其中自定义模型的服务方式。该装饰器允许您设置配置，例如在 BentoCloud 上使用的超时时间和 GPU 资源。请注意，这些模型至少需要一块 NVIDIA L4 GPU 才能获得最佳性能。
service.py¶
```
@bentoml.service(
      traffic={"timeout": 600},
      resources={
          "gpu": 1,
          "gpu_type": "nvidia-l4",
      }
)
class ControlNet:
    controlnet_path = bentoml.models.HuggingFaceModel(CONTROLNET_MODEL_ID)
    vae_path = bentoml.models.HuggingFaceModel(VAE_MODEL_ID)
    base_path = bentoml.models.HuggingFaceModel(BASE_MODEL_ID)
    ...
```
在类中，从 Hugging Face 加载模型并将其定义为类变量。HuggingFaceModel 方法提供了一种高效的机制来加载 AI 模型，从而加速在 BentoCloud 上的模型部署，减少镜像构建时间和冷启动时间。

@bentoml.service 装饰器还允许您定义 Bento 的运行时环境，Bento 是 BentoML 中的统一分发格式。Bento 打包了所有源代码、Python 依赖项、模型引用和环境设置，使其易于在不同环境中一致地部署。

以下是一个示例

service.py¶

my_image = bentoml.images.Image(python_version="3.11", distro="debian") \
            .system_packages("ffmpeg") \
            .requirements_file("requirements.txt")

@bentoml.service(
    image=my_image, # Apply the specifications
    ...
)
class ControlNet:
    ...

使用 @bentoml.api 装饰器定义一个异步 API 端点 generate。它接收图像和一组参数作为输入，并通过调用管道处理图像和文本提示来返回生成的图像。

service.py¶

class ControlNet:
    ...

    def __init__(self) -> None:

        import torch
        from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, AutoencoderKL
        # Logic to initialize models here
        ...

    @bentoml.api
    def generate(
            self,
            image: PIL_Image,
            prompt: str,
            negative_prompt: t.Optional[str] = None,
            controlnet_conditioning_scale: t.Optional[float] = 1.0,
            num_inference_steps: t.Optional[int] = 50,
            guidance_scale: t.Optional[float] = 5.0,
    ) -> PIL_Image:
        ...
        return self.pipe(
            prompt,
            image=image,
            negative_prompt=negative_prompt,
            controlnet_conditioning_scale=controlnet_conditioning_scale,
            num_inference_steps=num_inference_steps,
            guidance_scale=guidance_scale,
        ).to_tuple()[0][0]

尝试一下¶

您可以将此示例项目在 BentoCloud 上运行，或在本地服务它，将其容器化为 OCI 兼容镜像，并在任何地方部署。

BentoCloud¶

BentoCloud 提供快速可扩展的基础设施，用于在云中使用 BentoML 构建和扩展 AI 应用程序。

安装 BentoML 并通过 BentoML CLI 登录 BentoCloud。如果您没有 BentoCloud 账户，请在此免费注册。
```
pip install bentoml
bentoml cloud login
```

克隆 BentoDiffusion 仓库并部署该项目。

git clone https://github.com/bentoml/BentoDiffusion.git
cd BentoDiffusion/controlnet
bentoml deploy

一旦它在 BentoCloud 上运行起来，您可以通过以下方式调用端点

BentoCloud Playground

Python 客户端

创建 BentoML 客户端来调用端点。请确保将部署 URL 替换为您在 BentoCloud 上的 URL。请参考获取端点 URL了解详情。

import bentoml
from pathlib import Path

# Define the path to save the generated image
output_path = Path("generated_image.png")

with bentoml.SyncHTTPClient("https://controlnet-new-testt-e3c1c7db.mt-guc1.bentoml.ai") as client:
  result = client.generate(
      controlnet_conditioning_scale=0.5,
      guidance_scale=5,
      image=Path("./example-image.png"),
      negative_prompt="ugly, disfigured, ill-structure, low resolution",
      num_inference_steps=25,
      prompt="A young man walking in a park, wearing jeans.",
)

# The result should be a PIL.Image object
result.save(output_path)

print(f"Image saved at {output_path}")

CURL

请确保将部署 URL 替换为您在 BentoCloud 上的 URL。请参考获取端点 URL了解详情。

curl -s -X POST \
      'https://controlnet-new-testt-e3c1c7db.mt-guc1.bentoml.ai/generate' \
      -F controlnet_conditioning_scale='0.5' \
      -F guidance_scale='5' \
      -F negative_prompt='"ugly, disfigured, ill-structure, low resolution"' \
      -F num_inference_steps='25' \
      -F prompt='"A young man walking in a park, wearing jeans."' \
      -F 'image=@example-image.png' \
      -o output.jpg

为确保部署在特定副本范围内自动扩展，请添加扩展标志
```
bentoml deploy --scaling-min 0 --scaling-max 3 # Set your desired count
```
如果已部署，请按如下方式更新其允许的副本数
```
bentoml deployment update <deployment-name> --scaling-min 0 --scaling-max 3 # Set your desired count
```
更多信息，请参阅如何配置并发和自动扩展。

本地服务¶

BentoML 允许您在本地运行和测试代码，以便您可以使用本地计算资源快速验证代码。

克隆仓库并选择您想要的项目。

git clone https://github.com/bentoml/BentoDiffusion.git
cd BentoDiffusion/controlnet

# Recommend Python 3.11
pip install -r requirements.txt

在本地服务它。
```
bentoml serve
```
注意

要在本地运行此项目，您需要一块至少有 12G 显存的 Nvidia GPU。
访问或发送 API 请求到 https://:3000。

为了在您自己的基础设施中进行自定义部署，请使用 BentoML 生成符合 OCI 标准的镜像。