2 - Create and load Search Index with Python

Article
07/20/2023

Continue to build your search-enabled website by following these steps:

Create a search resource
Create a new index
Import data with Python using the sample script and Azure SDK azure-search-documents.

Create an Azure AI Search resource

Create a new search resource from the command line using either the Azure CLI or Azure PowerShell. You also retrieve a query key used for read-access to the index, and get the built-in admin key used for adding objects.

You must have Azure CLI or Azure PowerShell installed on your device. If you aren't a local admin on your device, choose Azure PowerShell and use the Scope parameter to run as the current user.

Note

This task doesn't require the Visual Studio Code extensions for Azure CLI and Azure PowerShell. Visual Studio Code recognizes the command line tools without the extensions.

Azure CLI
PowerShell

In Visual Studio Code, under Terminal, select New Terminal.
Connect to Azure:
```
az login
```
Before creating a new search service, list the existing services for your subscription:
```
az resource list --resource-type Microsoft.Search/searchServices --output table
```
If you have a service that you want to use, note the name, and then skip ahead to the next section.
Create a new search service. Use the following command as a template, substituting valid values for the resource group, service name, tier, region, partitions, and replicas. The following statement uses the "cognitive-search-demo-rg" resource group created in a previous step and specifies the "free" tier. If your Azure subscription already has a free search service, specify a billable tier such as "basic" instead.
```
az search service create --name my-cog-search-demo-svc --resource-group cognitive-search-demo-rg --sku free --partition-count 1 --replica-count 1
```
Get a query key that grants read access to a search service. A search service is provisioned with two admin keys and one query key. Substitute valid names for the resource group and search service. Copy the query key to Notepad so that you can paste it into the client code in a later step:
```
az search query-key list --resource-group cognitive-search-demo-rg --service-name my-cog-search-demo-svc
```
Get a search service admin API key. An admin API key provides write access to the search service. Copy either one of the admin keys to Notepad so that you can use it in the bulk import step that creates and loads an index:
```
az search admin-key show --resource-group cognitive-search-demo-rg --service-name my-cog-search-demo-svc
```

In Visual Studio Code, under Terminal, select New Terminal.
Connect to Azure:
```
Connect-AzAccount
```
If you have multiple tenants and subscriptions, add the TenantID and SubscriptionID parameters for Connect-AzAccount to the cmdlet.
Before creating a new search service, you can list existing search services for your subscription to see if there's one you want to use:
```
Get-AzResource -ResourceType Microsoft.Search/searchServices | ft
```
If you have a service that you want to use, note the name, and then skip ahead to the next section.
Load the Az.Search module (you can omit Scope if you're a local administrator):
```
Install-Module -Name Az.Search -Scope CurrentUser
```
Create a new search service. Use the following cmdlet as a template, substituting valid values for the resource group, service name, tier, region, partitions, and replicas. The following statement uses the "cognitive-search-demo-rg" resource group created in a previous step and specifies the "free" tier. If your Azure subscription already has a free search service, specify a billable tier such as "basic" instead. For more information about this cmdlet, see Manage your Azure AI Search service with PowerShell.
```
New-AzSearchService -ResourceGroupName "cognitive-search-demo-rg"  -Name "my-cog-search-demo-svc" -Sku "free" -Location "West US" -PartitionCount 1 -ReplicaCount 1 -HostingMode Default
```

Get a query key and copy it to Notepad for a future step:

Get-AzSearchQueryKey -ResourceGroupName "cognitive-search-demo-rg" -ServiceName "my-cog-search-demo-svc"

Get the admin keys and copy either one to Notepad for a future step:

Get-AzSearchAdminKeyPair -ResourceGroupName "cognitive-search-demo-rg" -ServiceName "my-cog-search-demo-svc"

You now have an Azure AI Search resource and keys used for authenticating requests on connections to the endpoint.

Prepare the bulk import script for Search

The script uses the Azure SDK for Azure AI Search:

In Visual Studio Code, open the bulk_upload.py file in the subdirectory, search-website-functions-v4/bulk-upload, replace the following variables with your own values to authenticate with the Azure Search SDK:

YOUR-SEARCH-SERVICE-NAME
YOUR-SEARCH-SERVICE-ADMIN-API-KEY

import sys
import json
import requests
import pandas as pd
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import SearchIndex
from azure.search.documents.indexes.models import (
    ComplexField,
    CorsOptions,
    SearchIndex,
    ScoringProfile,
    SearchFieldDataType,
    SimpleField,
    SearchableField,
)

# Get the service name (short name) and admin API key from the environment
service_name = "YOUR-SEARCH-SERVICE-NAME"
key = "YOUR-SEARCH-SERVICE-ADMIN-API-KEY"
endpoint = "https://{}.search.windows.net/".format(service_name)

# Give your index a name
# You can also supply this at runtime in __main__
index_name = "good-books"

# Search Index Schema definition
index_schema = "./good-books-index.json"

# Books catalog
books_url = "https://raw.githubusercontent.com/Azure-Samples/azure-search-sample-data/main/good-books/books.csv"
batch_size = 1000

# Instantiate a client
class CreateClient(object):
    def __init__(self, endpoint, key, index_name):
        self.endpoint = endpoint
        self.index_name = index_name
        self.key = key
        self.credentials = AzureKeyCredential(key)

    # Create a SearchClient
    # Use this to upload docs to the Index
    def create_search_client(self):
        return SearchClient(
            endpoint=self.endpoint,
            index_name=self.index_name,
            credential=self.credentials,
        )

    # Create a SearchIndexClient
    # This is used to create, manage, and delete an index
    def create_admin_client(self):
        return SearchIndexClient(endpoint=endpoint, credential=self.credentials)


# Get Schema from File or URL
def get_schema_data(schema, url=False):
    if not url:
        with open(schema) as json_file:
            schema_data = json.load(json_file)
            return schema_data
    else:
        data_from_url = requests.get(schema)
        schema_data = json.loads(data_from_url.content)
        return schema_data


# Create Search Index from the schema
# If reading the schema from a URL, set url=True
def create_schema_from_json_and_upload(schema, index_name, admin_client, url=False):

    cors_options = CorsOptions(allowed_origins=["*"], max_age_in_seconds=60)
    scoring_profiles = []
    schema_data = get_schema_data(schema, url)

    index = SearchIndex(
        name=index_name,
        fields=schema_data["fields"],
        scoring_profiles=scoring_profiles,
        suggesters=schema_data["suggesters"],
        cors_options=cors_options,
    )

    try:
        upload_schema = admin_client.create_index(index)
        if upload_schema:
            print(f"Schema uploaded; Index created for {index_name}.")
        else:
            exit(0)
    except:
        print("Unexpected error:", sys.exc_info()[0])


# Convert CSV data to JSON
def convert_csv_to_json(url):
    df = pd.read_csv(url)
    convert = df.to_json(orient="records")
    return json.loads(convert)


# Batch your uploads to Azure Search
def batch_upload_json_data_to_index(json_file, client):
    batch_array = []
    count = 0
    batch_counter = 0
    for i in json_file:
        count += 1
        batch_array.append(
            {
                "id": str(i["book_id"]),
                "goodreads_book_id": int(i["goodreads_book_id"]),
                "best_book_id": int(i["best_book_id"]),
                "work_id": int(i["work_id"]),
                "books_count": i["books_count"] if i["books_count"] else 0,
                "isbn": str(i["isbn"]),
                "isbn13": str(i["isbn13"]),
                "authors": i["authors"].split(",") if i["authors"] else None,
                "original_publication_year": int(i["original_publication_year"])
                if i["original_publication_year"]
                else 0,
                "original_title": i["original_title"],
                "title": i["title"],
                "language_code": i["language_code"],
                "average_rating": int(i["average_rating"])
                if i["average_rating"]
                else 0,
                "ratings_count": int(i["ratings_count"]) if i["ratings_count"] else 0,
                "work_ratings_count": int(i["work_ratings_count"])
                if i["work_ratings_count"]
                else 0,
                "work_text_reviews_count": i["work_text_reviews_count"]
                if i["work_text_reviews_count"]
                else 0,
                "ratings_1": int(i["ratings_1"]) if i["ratings_1"] else 0,
                "ratings_2": int(i["ratings_2"]) if i["ratings_2"] else 0,
                "ratings_3": int(i["ratings_3"]) if i["ratings_3"] else 0,
                "ratings_4": int(i["ratings_4"]) if i["ratings_4"] else 0,
                "ratings_5": int(i["ratings_5"]) if i["ratings_5"] else 0,
                "image_url": i["image_url"],
                "small_image_url": i["small_image_url"],
            }
        )

        # In this sample, we limit batches to 1000 records.
        # When the counter hits a number divisible by 1000, the batch is sent.
        if count % batch_size == 0:
            client.upload_documents(documents=batch_array)
            batch_counter += 1
            print(f"Batch sent! - #{batch_counter}")
            batch_array = []

    # This will catch any records left over, when not divisible by 1000
    if len(batch_array) > 0:
        client.upload_documents(documents=batch_array)
        batch_counter += 1
        print(f"Final batch sent! - #{batch_counter}")

    print("Done!")


if __name__ == "__main__":
    start_client = CreateClient(endpoint, key, index_name)
    admin_client = start_client.create_admin_client()
    search_client = start_client.create_search_client()
    schema = create_schema_from_json_and_upload(
        index_schema, index_name, admin_client, url=False
    )
    books_data = convert_csv_to_json(books_url)
    batch_upload = batch_upload_json_data_to_index(books_data, search_client)
    print("Upload complete")

Open an integrated terminal in Visual Studio for the project directory's subdirectory, search-website-functions-v4/bulk-upload, and run the following command to install the dependencies.
- macOS/Linux
- Windows
```
python3 -m pip install -r requirements.txt 
```
```
py -m pip install -r requirements.txt 
```

Run the bulk import script for Search

Continue using the integrated terminal in Visual Studio for the project directory's subdirectory, search-website-functions-v4/bulk-upload, to run the following bash command to run the bulk_upload.py script:
- macOS/Linux
- Windows
```
python3 bulk-upload.py
```
```
py bulk-upload.py
```
As the code runs, the console displays progress.
When the upload is complete, the last statement printed to the console is "Done! Upload complete".

Review the new Search Index

Once the upload completes, the search index is ready to use. Review your new index in Azure portal.

In Azure portal, find the search service you created in the previous step.
On the left, select Indexes, and then select the good-books index.
By default, the index opens in the Search explorer tab. Select Search to return documents from the index.

Rollback bulk import file changes

Use the following git command in the Visual Studio Code integrated terminal at the bulk-insert directory, to roll back the changes. They aren't needed to continue the tutorial and you shouldn't save or push these secrets to your repo.

git checkout .

Copy your Search resource name

Note your Search resource name. You'll need this to connect the Azure Function app to your search resource.

Caution

While you may be tempted to use your search admin key in the Azure Function, that isn't following the principle of least privilege. The Azure Function will use the query key to conform to least privilege.

Next steps

Deploy your Static Web App