SparkComponent 類別

參考

Spark 元件版本，用來定義 Spark 元件或作業。

繼承: azure.ai.ml.entities._component.component.Component

SparkComponent

azure.ai.ml.entities._job.parameterized_spark.ParameterizedSpark

SparkComponent

azure.ai.ml.entities._job.spark_job_entry_mixin.SparkJobEntryMixin

SparkComponent

azure.ai.ml.entities._component.code.ComponentCodeMixin

SparkComponent

建構函式

SparkComponent(*, code: PathLike | str | None = '.', entry: Dict[str, str] | SparkJobEntry | None = None, py_files: List[str] | None = None, jars: List[str] | None = None, files: List[str] | None = None, archives: List[str] | None = None, driver_cores: int | str | None = None, driver_memory: str | None = None, executor_cores: int | str | None = None, executor_memory: str | None = None, executor_instances: int | str | None = None, dynamic_allocation_enabled: bool | str | None = None, dynamic_allocation_min_executors: int | str | None = None, dynamic_allocation_max_executors: int | str | None = None, conf: Dict[str, str] | None = None, environment: Environment | str | None = None, inputs: Dict | None = None, outputs: Dict | None = None, args: str | None = None, **kwargs: Any)

僅限關鍵字的參數

名稱	Description
code	要執行作業的原始程式碼。可以是指向遠端位置的本機路徑或「HTTP：」、「HTTPs：」或「azureml：」 URL。預設為「.」，表示目前目錄。預設值: .
entry	Optional[Union[dict[str, str], SparkJobEntry]] 檔案或類別進入點。
py_files	Optional[List[str]] 要放在 Python 應用程式的 PYTHONPATH 上 .zip、.egg 或 .py 檔案的清單。預設為 [無]。
jars	Optional[List[str]] 的清單。要包含在驅動程式和執行程式類別路徑上的 JAR 檔案。預設為 [無]。
files	Optional[List[str]] 要放在每個執行程式工作目錄中的檔案清單。預設為 [無]。
archives	Optional[List[str]] 要擷取到每個執行程式的工作目錄中的封存清單。預設為 [無]。
driver_cores	Optional[int] 用於驅動程式進程的核心數目，僅適用于叢集模式。
driver_memory	Optional[str] 要用於驅動程式進程的記憶體數量，格式化為大小單位尾碼 (「k」、「m」、「g」或「t」) (例如「512m」、「2g」) 。
executor_cores	Optional[int] 要用於每個執行程式的核心數目。
executor_memory	Optional[str] 每個執行程式進程使用的記憶體數量，格式化為大小單位尾碼為 (「k」、「m」、「g」或「t」) (的字串，例如「512m」、「2g」) 。
executor_instances	Optional[int] 執行程式的初始數目。
dynamic_allocation_enabled	Optional[bool] 是否要使用動態資源配置，這會根據工作負載來相應增加和減少向此應用程式註冊的執行程式數目。預設為 False。
dynamic_allocation_min_executors	Optional[int] 如果已啟用動態配置，則執行程式數目的下限。
dynamic_allocation_max_executors	Optional[int] 如果啟用動態配置，執行程式數目的上限。
conf	Optional[dict[str, str]] 具有預先定義 Spark 組態索引鍵和值的字典。預設為 [無]。
environment	Optional[Union[str, Environment]] 要執行作業的 Azure ML 環境。
inputs	Optional[dict[str, Union[ <xref:azure.ai.ml.entities._job.pipeline._io.NodeOutput>, Input, str, bool, int, float, <xref:Enum>, ]]] 輸入名稱與作業中使用的輸入資料來源對應。預設為 [無]。
outputs	Optional[dict[str, Union[str, Output]]] 輸出名稱與作業中使用的輸出資料來源對應。預設為 [無]。
args	Optional[str] 作業的引數。預設為 [無]。

範例

建立 SparkComponent。


   from azure.ai.ml.entities import SparkComponent

   component = SparkComponent(
       name="add_greeting_column_spark_component",
       display_name="Aml Spark add greeting column test module",
       description="Aml Spark add greeting column test module",
       version="1",
       inputs={
           "file_input": {"type": "uri_file", "mode": "direct"},
       },
       driver_cores=2,
       driver_memory="1g",
       executor_cores=1,
       executor_memory="1g",
       executor_instances=1,
       code="./src",
       entry={"file": "add_greeting_column.py"},
       py_files=["utils.zip"],
       files=["my_files.txt"],
       args="--file_input ${{inputs.file_input}}",
       base_path="./sdk/ml/azure-ai-ml/tests/test_configs/dsl_pipeline/spark_job_in_pipeline",
   )

方法

dump	以 yaml 格式將元件內容傾印到檔案中。

dump

以 yaml 格式將元件內容傾印到檔案中。

dump(dest: str | PathLike | IO, **kwargs: Any) -> None

參數

名稱	Description
dest 必要	Union[<xref:PathLike>, str, IO[AnyStr]] 接收此元件內容的目的地。必須是本機檔案的路徑，或是已開啟的檔案資料流程。如果 dest 是檔案路徑，則會建立新的檔案，並在檔案存在時引發例外狀況。如果 dest 是開啟的檔案，檔案將直接寫入，如果檔案無法寫入，則會引發例外狀況。

屬性

base_path

資源的基底路徑。

傳回

類型	Description
str	資源的基底路徑。

creation_context

資源的建立內容。

傳回

類型	Description
Optional[SystemData]	資源的建立中繼資料。

display_name

元件的顯示名稱。

傳回

類型	Description
str	元件的顯示名稱。

entry

environment

要執行 Spark 元件或作業的 Azure ML 環境。

傳回

類型	Description
Optional[Union[str, Environment]]	要執行 Spark 元件或作業的 Azure ML 環境。

id

資源識別碼。

傳回

類型	Description
Optional[str]	資源的全域識別碼，Azure Resource Manager (ARM) 識別碼。

inputs

元件的輸入。

傳回

類型	Description
dict	元件的輸入。

is_deterministic

元件是否具決定性。

傳回

類型	Description
bool	元件是否具決定性

outputs

元件的輸出。

傳回

類型	Description
dict	元件的輸出。

type

元件的類型，預設值為 'command'。

傳回

類型	Description
str	元件的類型。

version

元件的版本。

傳回

類型	Description
str	元件的版本。

CODE_ID_RE_PATTERN

CODE_ID_RE_PATTERN = re.compile('\\/subscriptions\\/(?P<subscription>[\\w,-]+)\\/resourceGroups\\/(?P<resource_group>[\\w,-]+)\\/providers\\/Microsoft\\.MachineLearningServices\\/workspaces\\/(?P<workspace>[\\w,-]+)\\/codes\\/(?P<co)

Share via

SparkComponent 類別

建構函式

僅限關鍵字的參數

範例

方法

dump

參數

屬性

base_path

傳回

creation_context

傳回

display_name

傳回

entry

environment

傳回

id

傳回

inputs

傳回

is_deterministic

傳回

outputs

傳回

type

傳回

version

傳回

CODE_ID_RE_PATTERN

意見反應

其他資源