配置¶

BentoML 提供了一个配置接口，允许您自定义 Bento 中各个服务的运行时行为。本文档解释了可用的配置字段，并提供了配置 BentoML 服务的最佳实践建议。

配置如何工作¶

BentoML 的默认配置适用于广泛的使用场景。然而，为了更精细地控制 BentoML 的特性，您可以使用 service.py 文件中每个服务的 @bentoml.service 装饰器来自定义这些运行时行为（如资源分配和超时）。

注意

如果您使用的是 BentoML 1.2 之前的版本，您需要通过单独的 configuration.yaml 文件来设置这些运行时配置。

您只需要指定您想要自定义的配置。BentoML 会自动用默认值填充任何未指定的配置。以下是一个示例

@bentoml.service(
    resources={"cpu": "2"},
    traffic={"timeout": 60},
)
class Summarization:
    # Service implementation

"cpu": "2" 表示为服务分配了 2 个 CPU 核。这对于确保服务具有足够的处理资源非常重要。
"timeout": 60 设置了服务在超时前等待响应的最长持续时间（以秒为单位）。在此示例中，设置为 60 秒。

配置字段¶

BentoML 提供了一整套配置字段，允许对服务进行详细自定义。您可以使用它们来满足不同部署环境和使用场景的特定需求。

`resources`¶

BentoML 中的 resources 字段允许您指定服务的资源分配，包括 CPU、内存和 GPU。它对于管理和优化资源使用非常有用，尤其是在处理资源密集型任务时。请注意，此字段仅在 BentoCloud 上生效。可用字段如下

cpu: 服务应使用的 CPU 数量（或量）。这是一个字符串，例如 “200m” 或 “1”。
memory: 服务应使用的内存量。
gpu: 分配给服务的 GPU 数量。
gpu_type: 在 BentoCloud 上使用的特定 GPU 类型，遵循 AWS EC2 或 GCP GPU 实例等云提供商的命名约定。要列出可用的 GPU 类型，请运行 bentoml deployment list-instance-types。

我们建议您至少指定 resources 中的一个字段，以便在 BentoCloud 上自动为服务分配资源。

以下是一个示例

@bentoml.service(
    resources={
        "cpu": "1",
        "memory": "500Mi",
        "gpu": 4,
        "gpu_type": "nvidia-tesla-a100"
    }
)
class MyService:
    # Service implementation

`workers`¶

并行处理请求是在服务实例中运行代码逻辑的进程。您可以在 @bentoml.service 装饰器中使用 workers 来定义服务内的进程级并行度。此配置对于优化性能非常有用，特别是对于高吞吐量或计算密集型服务。workers 默认为 1。

要指定服务内的 worker 数量（例如 3）

@bentoml.service(workers=3)
class MyService:
    # Service implementation

workers 还允许使用 cpu_count 选项进行动态 CPU 分配。此功能在您希望根据可用 CPU 核自动扩展 worker 进程数量的环境中特别有用。

@bentoml.service(workers="cpu_count")
class MyService:
    # Service implementation

`image`¶

image 允许您配置构建 Bento 的运行时规范。

my_image = bentoml.images.Image(python_version='3.11') \
    .python_packages("torch", "transformers")

@bentoml.service(image=my_image)
class MyService:
    # Service implementation

有关更多信息，请参阅定义运行时环境。

`envs`¶

envs 允许您设置服务所需的环境变量。每个环境变量都使用 name 和 value 键定义。为避免暴露敏感信息，您可以省略值并在将其部署到 BentoCloud 时设置它。

@bentoml.service(
    envs=[
        {"name": "HF_TOKEN"},  # Omit the value
        {"name": "DB_HOST", "value": "localhost"}
    ]
)
class MyService:
    # Service implementation

`traffic`¶

BentoML 中的 traffic 允许您管理服务如何处理请求。它包括用于管理请求并发和确保及时响应的设置，这有助于优化服务的响应能力和负载管理。以下字段可用

timeout: 确定服务在将响应发送回客户端之前等待的最长时间。默认超时设置为 60 秒。
max_concurrency: 指定单个服务实例可以同时处理的请求数量的硬限制。它帮助您控制负载并防止服务被过多的并发请求淹没。
concurrency: BentoCloud 特有的字段，表示服务设计用于处理的理想并发请求数量。并发有助于优化资源利用并影响 BentoCloud 如何自动扩展您的服务。默认情况下，并发设置为允许无限请求，以避免系统瓶颈。有关详细信息，请参阅并发与自动扩展。
external_queue: BentoCloud 特有的字段。当在 BentoCloud 上部署启用了此字段的服务时，将使用外部请求队列来更有效地管理传入流量。这是通过对超出 concurrency 限制的额外请求进行排队，直到它们可以被处理来实现的。

以下是在您的服务定义中配置这些设置的示例

@bentoml.service(
    traffic={
        "timeout": 120,
        "max_concurrency": 50,
        "concurrency": 32, # BentoCloud only
        "external_queue": True, # BentoCloud only
    }
)
class MyService:
    # Service implementation

`runner_probe`¶

使用 readyz、livez 和 healthz 端点配置 BentoCloud 上服务的健康检查设置。可用字段如下

enabled: 确定是否启用健康检查。
timeout: 健康检查探针在被认为失败之前等待完成的最大时间（以秒为单位）。
period: 执行健康检查探针的频率（以秒为单位）。

以下是一个示例

@bentoml.service(runner_probe={"enabled": True, "timeout": 1, "period": 10})
class MyService:
    # Service implementation

`logging`¶

自定义服务器端日志记录，包括请求和响应的内容类型、长度以及追踪 ID 格式。

以下是一个示例

@bentoml.service(logging={
    "access": {
        "enabled": True,
        "request_content_length": True,
        "request_content_type": True,
        "response_content_length": True,
        "response_content_type": True,
        "format": {
            "trace_id": "032x",
            "span_id": "016x"
        }
    }
})
class MyService:
    # Service implementation

有关更多信息，请参阅日志记录。

`ssl`¶

ssl 启用 SSL/TLS，用于通过 HTTP 请求进行安全通信。它有助于保护传输中的敏感数据，并确保客户端与您的服务之间的安全连接。

BentoML 将所有可用字段直接解析给 Uvicorn。以下是一个示例

@bentoml.service(ssl={
    "enabled": True,
    "certfile": "/path/to/certfile",
    "keyfile": "/path/to/keyfile",
    "ca_certs": "/path/to/ca_certs",
    "keyfile_password": "",
    "version": 17,
    "cert_reqs": 0,
    "ciphers": "TLSv1"
})
class MyService:
    # Service implementation

`http`¶

http 允许您自定义为您的 BentoML 服务提供 HTTP 服务的设置。

默认情况下，BentoML 在端口 3000 上启动 HTTP 服务器。要更改端口

@bentoml.service(http={"port": 5000})
class MyService:
    # Service implementation

如果您的服务需要接受跨源请求，您可以配置 CORS 设置。默认情况下，CORS 是禁用的。如果启用，http.cors 下的所有字段都将被解析到 CORSMiddleware。以下是一个示例

@bentoml.service(http={
    "cors": {
        "enabled": True,
        "access_control_allow_origins": ["http://myorg.com:8080", "https://myorg.com:8080"],
        "access_control_allow_methods": ["GET", "OPTIONS", "POST", "HEAD", "PUT"],
        "access_control_allow_credentials": True,
        "access_control_allow_headers": ["*"],
        "access_control_allow_origin_regex": "https://.*\.my_org\.com",
        "access_control_max_age": 1200,
        "access_control_expose_headers": ["Content-Length"]
    }
})
class MyService:
    # Service implementation

当您的服务从托管在不同域的 Web 应用程序访问时，配置 CORS 非常重要。适当的 CORS 设置可确保您的服务能够安全地处理来自允许源的请求，从而增强安全性和可用性。

通过自定义 http 配置，您可以微调 BentoML 服务如何通过 HTTP 进行交互，包括适应特定的网络环境、保护跨源交互以及确保与各种客户端应用程序的兼容性。

`monitoring`¶

monitoring 允许您收集日志并跟踪服务的性能和健康状况，以维护其可靠性和效率。默认情况下，BentoML 提供内置监控机制，您也可以通过设置 YAML 格式的配置文件来对其进行自定义。

以下是一个示例

@bentoml.service(monitoring={
    "enabled": True,
    "type": "default",
    "options": {
        "log_config_file": "path/to/log_config.yaml", # A configuration file for customizing monitoring behavior, using Python's logging module
        "log_path": "monitoring" # The directory where logs will be exported
    }
})
class MyService:
    # Service implementation

有关更多信息，请参阅监控。

`metrics`¶

metrics 允许您收集和自定义 Counter、Histogram、Summary 和 Gauge 类型的指标。默认情况下，此功能是启用的。

以下是一个示例

@bentoml.service(metrics={
    "enabled": True,
    "namespace": "bentoml_service",
    "duration": {
        "buckets": [0.1, 0.2, 0.5, 1, 2, 5, 10]
    }
})
class MyService:
    # Service implementation

有关更多信息，请参阅指标。

`tracing`¶

您可以使用不同的 exporter（如 Zipkin、Jaeger 和 OTLP）配置追踪。具体配置可能因定义的 exporter 类型而异。

以下是一个示例

import bentoml

@bentoml.service(
    resources={"cpu": "2"},
    traffic={"timeout": 10},
    tracing={
        # Common configurations
        "exporter_type": "jaeger",
        "sample_rate": 1.0,
        "timeout": 5,
        "max_tag_value_length": 256,
        "excluded_urls": "readyz",
        "jaeger": {
            # Specific configurations of the exporter
    }
)
class MyService:
   # Service implementation code

有关更多信息，请参阅追踪。

有关完整的配置 schema，请参阅此文件。

配置¶

配置如何工作¶

配置字段¶

resources¶

workers¶

image¶

envs¶

traffic¶

runner_probe¶

logging¶

ssl¶

http¶

monitoring¶

metrics¶

tracing¶

`resources`¶

`workers`¶

`image`¶

`envs`¶

`traffic`¶

`runner_probe`¶

`logging`¶

`ssl`¶

`http`¶

`monitoring`¶

`metrics`¶

`tracing`¶