批量推理¶

批量推理后端¶

bentoml.batch.run_in_spark(bento: Bento, df: pyspark.sql.dataframe.DataFrame, spark: pyspark.sql.session.SparkSession, api_name: str | None = None, output_schema: StructType | None = None) → pyspark.sql.dataframe.DataFrame[source]¶

在 Spark 中运行 BentoService 推理 API。

要运行的 API 必须接受批量输入并返回批量输出。

参数:

bento – 包含要运行的推理 API 的 bento。
df – 用于运行推理 API 的输入 DataFrame。
spark – 用于运行推理 API 的 Spark 会话。
api_name – 要运行的推理 API 的名称。如果未提供，bento 中必须只包含一个 API；该 API 将被运行。
output_schema – 输出 DataFrame 的 Spark schema。如果未提供，BentoML 将尝试从推理 API 的输出描述符中推断 schema。

返回值:

在输入 df 上运行推理 API 的结果。

示例¶

>>> import bentoml
>>> import pyspark
>>> from pyspark.sql import SparkSession
>>> from pyspark.sql.types import StructType, StructField, StringType

>>> spark = SparkSession.builder.getOrCreate()
>>> schema = StructType([
...     StructField("name", StringType(), True),
...     StructField("age", StringType(), True),
... ])
>>> df = spark.createDataFrame([("John", 30), ("Mike", 25), ("Sally", 40)], schema)

>>> bento = bentoml.get("my_service:latest")
>>> results = bentoml.batch.run_in_spark(bento, df, spark)
>>> results.show()
+-----+---+
| name|age|
+-----+---+
|John |30 |
+-----+---+