您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

操作指南:使用适用于大数据的认知服务进行智能艺术探索Recipe: Intelligent Art Exploration with the Cognitive Services for Big Data

在本例中,我们将使用适用于大数据的认知服务将智能注释添加到大都会艺术博物馆 (MET) 的开放存取集合。In this example, we'll use the Cognitive Services for Big Data to add intelligent annotations to the Open Access collection from the Metropolitan Museum of Art (MET). 这使我们可以使用 Azure 搜索创建智能搜索引擎,甚至无需手动注释。This will enable us to create an intelligent search engine using Azure Search even without manual annotations.

先决条件Prerequisites

导入库Import Libraries

运行以下命令,为此操作指南导入库。Run the following command to import libraries for this recipe.

import os, sys, time, json, requests
from pyspark.ml import Transformer, Estimator, Pipeline
from pyspark.ml.feature import SQLTransformer
from pyspark.sql.functions import lit, udf, col, split

设置订阅密钥Set up Subscription Keys

运行以下命令设置服务密钥的变量。Run the following command to set up variables for service keys. 插入计算机视觉和 Azure 认知搜索的订阅密钥。Insert your subscription keys for Computer Vision and Azure Cognitive Search.

VISION_API_KEY = 'INSERT_COMPUTER_VISION_SUBSCRIPTION_KEY'
AZURE_SEARCH_KEY = 'INSERT_AZURE_COGNITIVE_SEARCH_SUBSCRIPTION_KEY'
search_service = "mmlspark-azure-search"
search_index = "test"

读取数据Read the Data

运行以下命令,从 MET 的开放存取集合加载数据。Run the following command to load data from the MET's Open Access collection.

data = spark.read\
  .format("csv")\
  .option("header", True)\
  .load("wasbs://publicwasb@mmlspark.blob.core.windows.net/metartworks_sample.csv")\
  .withColumn("searchAction", lit("upload"))\
  .withColumn("Neighbors", split(col("Neighbors"), ",").cast("array<string>"))\
  .withColumn("Tags", split(col("Tags"), ",").cast("array<string>"))\
  .limit(25)

分析图像Analyze the Images

运行以下命令,在 MET 的开放存取艺术品集合上使用计算机视觉。Run the following command to use Computer Vision on the MET's Open Access artworks collection. 因此,你将获取艺术品的视觉特征。As a result, you'll get visual features from the artworks.

from mmlspark.cognitive import AnalyzeImage
from mmlspark.stages import SelectColumns

#define pipeline
describeImage = (AnalyzeImage()
  .setSubscriptionKey(VISION_API_KEY)
  .setLocation("eastus")
  .setImageUrlCol("PrimaryImageUrl")
  .setOutputCol("RawImageDescription")
  .setErrorCol("Errors")
  .setVisualFeatures(["Categories", "Tags", "Description", "Faces", "ImageType", "Color", "Adult"])
  .setConcurrency(5))

df2 = describeImage.transform(data)\
  .select("*", "RawImageDescription.*").drop("Errors", "RawImageDescription")

创建搜索索引Create the Search Index

运行以下命令将结果写入 Azure 搜索,以使用计算机视觉中的丰富元数据创建艺术品搜索引擎。Run the following command to write the results to Azure Search to create a search engine of the artworks with enriched metadata from Computer Vision.

from mmlspark.cognitive import *
df2.writeToAzureSearch(
  subscriptionKey=AZURE_SEARCH_KEY,
  actionCol="searchAction",
  serviceName=search_service,
  indexName=search_index,
  keyCol="ObjectID"
)

查询搜索索引Query the Search Index

运行以下命令以查询 Azure 搜索索引。Run the following command to query the Azure Search index.

url = 'https://{}.search.windows.net/indexes/{}/docs/search?api-version=2019-05-06'.format(search_service, search_index)
requests.post(url, json={"search": "Glass"}, headers = {"api-key": AZURE_SEARCH_KEY}).json()

后续步骤Next steps

了解如何使用认知服务对大数据进行异常情况检测Learn how to use Cognitive Services for Big Data for Anomaly Detection.