Build your first SynapseML model

Straipsnis
01/22/2024

This article introduces how to build your first machine learning model using SynapseML and demonstrates how SynapseML simplifies complex machine learning tasks. We use SynapseML to create a small ML training pipeline that includes a featurization stage and a LightGBM regression stage. The pipeline predicts ratings based on review text from a dataset of Amazon book reviews. Finally, we showcase how SynapseML simplifies the use of prebuilt models to solve ML problems.

Prerequisites

Get a Microsoft Fabric subscription. Or, sign up for a free Microsoft Fabric trial.
Sign in to Microsoft Fabric.
Use the experience switcher on the left side of your home page to switch to the Synapse Data Science experience.

Go to the Data Science experience in Microsoft Fabric.
Create a new notebook.
Attach your notebook to a lakehouse. On the left side of your notebook, select Add to add an existing lakehouse or create a new one.
Obtain an Azure AI services key by following the Quickstart: Create a multi-service resource for Azure AI services quickstart. You'll need this key for the Use Azure AI services to transform data in one step section of this article.

Set up the environment

Import SynapseML libraries and initialize your Spark session.

from pyspark.sql import SparkSession
from synapse.ml.core.platform import *

spark = SparkSession.builder.getOrCreate()

Load a dataset

Load your dataset and split it into train and test sets.

train, test = (
    spark.read.parquet(
        "wasbs://publicwasb@mmlspark.blob.core.windows.net/BookReviewsFromAmazon10K.parquet"
    )
    .limit(1000)
    .cache()
    .randomSplit([0.8, 0.2])
)

display(train)

Create the training pipeline

Create a pipeline that featurizes data using TextFeaturizer from the synapse.ml.featurize.text library and derives a rating using the LightGBMRegressor function.

from pyspark.ml import Pipeline
from synapse.ml.featurize.text import TextFeaturizer
from synapse.ml.lightgbm import LightGBMRegressor

model = Pipeline(
    stages=[
        TextFeaturizer(inputCol="text", outputCol="features"),
        LightGBMRegressor(featuresCol="features", labelCol="rating"),
    ]
).fit(train)

Predict the output of the test data

Call the transform function on the model to predict and display the output of the test data as a dataframe.

display(model.transform(test))

Use Azure AI services to transform data in one step

Alternatively, for these kinds of tasks that have a prebuilt solution, you can use SynapseML's integration with Azure AI services to transform your data in one step.

from synapse.ml.cognitive import TextSentiment
from synapse.ml.core.platform import find_secret

model = TextSentiment(
    textCol="text",
    outputCol="sentiment",
    subscriptionKey=find_secret("cognitive-api-key"), # Replace it with your cognitive service key, check prerequisites for more details
).setLocation("eastus")

display(model.transform(test))

Bendrinti naudojant

Build your first SynapseML model

Prerequisites

Set up the environment

Load a dataset

Create the training pipeline

Predict the output of the test data

Use Azure AI services to transform data in one step

Atsiliepimai

Atsiliepimai

Papildomi ištekliai

Bendrinti naudojant

Build your first SynapseML model

Prerequisites

Set up the environment

Load a dataset

Create the training pipeline

Predict the output of the test data

Use Azure AI services to transform data in one step

Related content

Atsiliepimai

Atsiliepimai

Papildomi ištekliai