配置生命周期钩子

BentoML 中的生命周期钩子提供了一种机制,用于在服务的不同生命周期阶段运行自定义逻辑。通过利用这些钩子,您可以在启动时执行设置操作,在关闭前清理资源,以及更多其他操作。

本文档概述了生命周期钩子以及如何在 BentoML 服务 中使用它们。

理解服务器生命周期

BentoML 的服务器生命周期由多个阶段组成,每个阶段都提供了执行特定任务的独特机会

  1. 部署钩子。这些钩子在生成任何 工作进程 之前运行,因此适合执行一次性的全局设置任务。它们对于无论工作进程数量多少都应该发生一次的操作至关重要。

  2. 生成工作进程。然后,BentoML 根据在 @bentoml.service 装饰器中指定的 workers 配置生成工作进程。

  3. 服务初始化和 ASGI 应用启动。在每个工作进程启动期间,任何 集成的 ASGI 应用 都开始其生命周期。此时会执行您服务类的 __init__ 方法,允许进行实例特定的初始化。

  4. ASGI 应用清理。最后,当服务器关闭时(包括 ASGI 应用),会执行关闭钩子。此阶段非常适合执行清理任务,确保优雅关闭。

在 BentoML 服务中配置钩子

本节提供了配置不同 BentoML 钩子的代码示例。

部署钩子

部署钩子类似于静态方法,因为它们不接收 self 参数。您可以在服务中定义多个部署钩子。使用 @bentoml.on_deployment 装饰器将方法指定为部署钩子。例如:

import bentoml

@bentoml.service(workers=4)
class HookService:
    # Deployment hook does not receive `self` argument. It acts similarly to a static method.
    @bentoml.on_deployment
    def prepare():
        print("Do some preparation work, running only once.")

    # Multiple deployment hooks can be defined
    @bentoml.on_deployment
    def additional_setup():
        print("Do more preparation work if needed, also running only once.")

    def __init__(self) -> None:
        # Startup logic and initialization code
        print("This runs on Service startup, once for each worker, so it runs 4 times.")

    @bentoml.api
    def predict(self, text) -> str:
        # Endpoint implementation logic

服务启动后,您可以在服务器端按顺序看到以下输出:

$ bentoml serve

Do some preparation work, running only once. # First on_deployment hook
Do more preparation work if needed, also running only once. # Second on_deployment hook
2024-03-13T03:12:33+0000 [INFO] [cli] Starting production HTTP BentoServer from "service:HookService" listening on https://:3000 (Press CTRL+C to quit)
This runs on Service startup, once for each worker, so it runs 4 times.
This runs on Service startup, once for each worker, so it runs 4 times.
This runs on Service startup, once for each worker, so it runs 4 times.
This runs on Service startup, once for each worker, so it runs 4 times.

启动钩子

启动钩子在服务初始化期间执行,在部署钩子之后但在任何 API 端点可用之前。这些钩子每个工作进程运行一次,非常适合工作进程特定的初始化任务,例如建立数据库连接或加载资源。

使用 @bentoml.on_startup 装饰器将方法指定为启动钩子。例如:

import bentoml

@bentoml.service(workers=4)
class HookService:
    @bentoml.on_deployment
    def prepare():
        print("Global preparation, runs once before workers start.")

    @bentoml.on_startup
    def init_resources(self):
        # This runs once per worker
        print("Initializing resources for worker.")
        self.db_connection = setup_database()

    @bentoml.on_startup
    async def init_async_resources(self):
        # For async initialization tasks
        print("Async resource initialization for worker.")
        self.cache = await setup_cache()

    @bentoml.api
    def predict(self, text) -> str:
        # Use initialized resources in API endpoints
        return self.db_connection.query(text)

当您启动此服务时,您将看到以下输出:

$ bentoml serve service:HookService

Global preparation, runs once before workers start. # on_deployment hook
2024-03-13T03:12:33+0000 [INFO] [cli] Starting production HTTP BentoServer from "service:HookService" listening on https://:3000
Initializing resources for worker. # First worker's startup hooks
Async resource initialization for worker.
Initializing resources for worker. # Second worker's startup hooks
Async resource initialization for worker.
Initializing resources for worker. # Third worker's startup hooks
Async resource initialization for worker.
Initializing resources for worker. # Fourth worker's startup hooks
Async resource initialization for worker.

关闭钩子

关闭钩子在 BentoML 服务正在关闭过程中执行。它允许执行清理逻辑,例如关闭连接、释放资源或任何其他必要的清理任务。您可以在服务中定义多个关闭钩子。

使用 @bentoml.on_shutdown 装饰器将方法指定为关闭钩子。例如:

import bentoml

@bentoml.service(workers=4)
class HookService:
    @bentoml.on_deployment
    def prepare():
        print("Do some preparation work, running only once.")

    def __init__(self) -> None:
        # Startup logic and initialization code
        print("This runs on Service startup, once for each worker, so it runs 4 times.")

    @bentoml.api
    def predict(self, text) -> str:
        # Endpoint implementation logic

    @bentoml.on_shutdown
    def shutdown(self):
        # Logic on shutdown
        print("Cleanup actions on Service shutdown.")

    @bentoml.on_shutdown
    async def async_shutdown(self):
        print("Async cleanup actions on Service shutdown.")

健康检查钩子

健康检查钩子允许您指定自定义逻辑来确定服务何时健康并准备好处理请求。当您的服务依赖于需要先检查外部资源才能被视为可运行时,这尤其有用。

您可以在服务类中定义以下方法来实现健康检查:

  • __is_alive__: 此方法用于检查服务是否存活。它应返回一个布尔值,指示服务的健康状态。它响应 /livez 端点。

  • __is_ready__: 此方法用于检查服务是否准备好处理请求。它应返回一个布尔值,指示服务的就绪状态。它响应 /readyz 端点。

两者都可以是异步函数。

例如:

import bentoml

@bentoml.service(workers=4)
class HookService:
    def __init__(self) -> None:
        self.db_connection = None
        self.cache = None

    @bentoml.on_startup
    def init_resources(self):
        self.db_connection = setup_database()
        self.cache = setup_cache()

    def __is_ready__(self) -> bool:
        # Check if required resources are available
        if self.db_connection is None or self.cache is None:
            return False
        return self.db_connection.is_connected() and self.cache.is_available()

当您调用 /readyz 端点时,它返回:

  • 如果服务就绪(钩子返回 True),则返回 HTTP 200

  • 如果服务未就绪(钩子返回 False),则返回 HTTP 503