运行分布式服务¶

BentoML 提供了一个灵活的框架，用于将机器学习模型部署为服务（Service）。虽然单个服务通常足以满足大多数用例，但在更复杂的场景中，以分布式方式运行多个服务会很有用。

本文档提供关于创建和部署包含分布式服务的 BentoML 项目的指南。

单个服务和分布式服务¶

在 service.py 中使用单个 BentoML 服务通常足以满足大多数用例。这种方法直接、易于管理，并且在你只需要部署单个模型且 API 逻辑简单时效果很好。

在部署中，一个 BentoML 服务在一个容器中作为多个进程运行。如果你定义了多个服务，它们将在不同的容器中作为进程运行。这种分布式方法在处理更复杂的场景时非常有用，例如：

流水线化 CPU 和 GPU 处理以获得更好的吞吐量：在 CPU 和 GPU 之间分配任务可以提高吞吐量。某些预处理或后处理任务可能由 CPU 更高效地处理，而 GPU 则专注于模型推理。
优化资源利用率和可伸缩性：分布式服务可以在不同的实例上运行，从而实现独立扩展和高效的资源利用。这种灵活性在处理不同负载和优化特定资源需求时非常重要。
不对称的 GPU 需求：不同的模型可能有不同的 GPU 需求。将这些模型分布在不同的服务中可以帮助你更有效地分配资源并降低成本。
处理复杂工作流：对于涉及复杂工作流的应用，如顺序处理、并行处理或多个模型的组合，你可以根据需要创建多个服务来模块化这些流程，从而提高可维护性和效率。

服务间通信¶

分布式服务通过服务间通信支持复杂、模块化的架构。不同的服务可以使用 bentoml.depends() 函数相互交互。这使得服务之间可以直接调用方法，就像调用本地类函数一样。服务间通信的主要特点：

自动服务发现和路由：部署服务时，BentoML 会处理服务的发现，适当地路由请求，并管理有效载荷的序列化和反序列化。
任意长度的依赖链：服务可以形成任意长度的依赖链，从而实现复杂的服务编排。
菱形依赖：支持菱形依赖模式，即多个服务依赖于单个下游服务，以最大限度地提高服务复用性。

基本用法¶

以下 service.py 文件包含两个具有不同硬件需求的服务。要声明依赖关系，请使用 bentoml.depends() 函数，将依赖的服务类作为参数传入。这在服务之间创建了直接链接，以便轻松调用方法

service.py¶

import bentoml
import numpy as np


@bentoml.service(resources={"cpu": "200m", "memory": "512Mi"})
class Preprocessing:
    # A dummy prepocessing Service
    @bentoml.api
    def preprocess(self, input_series: np.ndarray) -> np.ndarray:
        return input_series

@bentoml.service(resources={"cpu": "1", "memory": "2Gi"})
class IrisClassifier:
    # Load the model from the Model Store
    iris_model = bentoml.models.BentoModel("iris_sklearn:latest")
    # Declare the preprocessing Service as a dependency
    preprocessing = bentoml.depends(Preprocessing)

    def __init__(self):
        import joblib

        self.model = joblib.load(self.iris_model.path_of("model.pkl"))

    @bentoml.api
    def classify(self, input_series: np.ndarray) -> np.ndarray:
        input_series = self.preprocessing.preprocess(input_series)
        return self.model.predict(input_series)

一旦声明了依赖关系，调用依赖服务上的方法类似于调用本地方法。换句话说，服务 A 可以调用服务 B，就像服务 A 调用服务 B 上的类级别函数一样。这抽象了网络通信、序列化和反序列化的复杂性。

使用 bentoml.depends() 是创建包含分布式服务的 BentoML 项目的推荐方法。它增强了模块化，因为你可以开发可重用、松耦合的服务，这些服务可以独立维护和扩展。

依赖外部部署¶

BentoML 还允许你将外部部署设置为服务的依赖项。这意味着该服务可以调用远程模型及其暴露的 API 端点。要指定外部部署，请使用 bentoml.depends() 函数，可以通过提供 BentoCloud 上的部署名称或（如果已运行）提供 URL 来实现。

指定 BentoCloud 上的部署名称

你也可以传递 cluster 参数来指定你的部署正在运行的集群。

import bentoml

@bentoml.service
class MyService:
    # `cluster` is optional if your Deployment is in a non-default cluster
    iris = bentoml.depends(deployment="iris-classifier-x6dewa", cluster="my_cluster_name")

    @bentoml.api
    def predict(self, input: np.ndarray) -> int:
        # Call the predict function from the remote Deployment
        return int(self.iris.predict(input)[0][0])

指定 URL

如果外部部署已在运行且其 API 通过公共 URL 暴露，你可以通过指定 url 参数来引用它。请注意，url 与 deployment/cluster 是互斥的。

import bentoml

@bentoml.service
class MyService:
    # Call the model deployed on BentoCloud by specifying its URL
    iris = bentoml.depends(url="https://<iris.example-url.bentoml.ai>")

    # Call the model served elsewhere
    # iris = bentoml.depends(url="http://192.168.1.1:3000")

    @bentoml.api
    def predict(self, input: np.ndarray) -> int:
        # Make a request to the external service hosted at the specified URL
        return int(self.iris.predict(input)[0][0])

提示

我们建议你在使用 bentoml.depends() 时指定外部服务的类。这使得验证远程服务上可用的类型和方法更加容易。

import bentoml

@bentoml.service
class MyService:
    # Specify the external Service class for type-safe integration
    iris = bentoml.depends(IrisClassifier, deployment="iris-classifier-x6dewa", cluster="my_cluster")

部署分布式服务¶

要将包含分布式服务的项目部署到 BentoCloud，我们建议你使用一个单独的配置文件，并在 BentoML CLI 命令或 Python API 中引用它进行部署。

示例如下：

config-file.yaml¶

name: "deployment-name"
bento: .
description: "This project creates an AI agent application"
envs: # Optional. If you specify environment variables here, they will be applied to all Services
  - name: "GLOBAL_ENV_VAR_NAME"
    value: "env_var_value"
services: # Add the configs of each Service under this field
  Preprocessing: # Service one
    instance_type: "gpu.l4.1"
    scaling:
      max_replicas: 2
      min_replicas: 1
    envs: # Environment variables specific to Service one
      - name: "ENV_VAR_NAME"
        value: "env_var_value"
    deployment_strategy: "RollingUpdate"
    config_overrides:
      traffic:
        # float in seconds
        timeout: 700
        max_concurrency: 20
        external_queue: true
      resources:
        cpu: "400m"
        memory: "1Gi"
      workers:
        - gpu: 1
  Inference: # Service two
    instance_type: "cpu.1"
    scaling:
      max_replicas: 5
      min_replicas: 1

要将这些服务部署到 BentoCloud，你可以选择使用 BentoML CLI 或 Python API。

BentoML CLI

bentoml deploy -f config-file.yaml

Python API

import bentoml
bentoml.deployment.create(config_file="config-file.yaml")

请参考配置部署查看可用的配置字段。