February 2017

Volume 32 Number 2

[Data Points]

First Look at Azure Search—a Handheld Walk-Through

By Julie Lerman | February 2017

Julie LermanI’ve been curious about Azure Search since I first heard about it from Pablo Castro, who I knew as one of the originators of Entity Framework and OData. He’s currently a director of engineering at Microsoft and has always been a true innovator and big-time data geek. His Medium article on how Azure Search came about as a startup project at Microsoft is a fascinating read (bit.ly/2gzVFTQ).

Azure Search is a service that lets you use Microsoft’s processing and intelligence to add sophisticated search capabilities to your own data. The service builds its own search indexes from your data, keeping its own “copy” of your data on which it can perform its searches. You can automate Azure Search’s process of updating the indexes as your data changes and this is even easier if your data source is an Azure data store such as DocumentDB.

I found it easier to understand the basics of Azure Search by diving in—so that’s what I’ll do in this article, rather than just trying to explain. You can create the service and indexes in code using REST or other APIs such as the .NET Client. But you can also set things up visually in the Azure Portal, which is where I’ll begin—creating and querying a free search service in the Azure Portal with a sample data store. I’ll handhold you through this first exploration and share with you some of my discoveries. With the comprehension that this first step provides, you’ll be prepared to dig further into the services capabilities and begin interacting with it using the REST API or the .NET Client API.

If you have an MSDN subscription, you already have an Azure subscription you can use to follow along. We’ll use a free service so as to not eat up your credits. If you don’t currently have an Azure subscription, you can quickly set up a free trial at azure.com/free.

Create a New Search Service

After choosing to add a new resource in the Azure Portal, you’ll see Search within the “Web + Mobile” category, although I find the easiest path is to just type search in the search box. (That’s awfully meta, isn’t it?) Setting the service up requires that you provide a service name, which becomes the first part of its URL. I’ll use thedatafarm, which gives me the URL thedatafarm.search.windows.net. As with any resource, you have to choose to add this into an existing resource group or to create a new one. I don’t have anything else related to my service yet, so I created a new resource group that I named datafarmsearchgroup in East US. The last bit of info for setting up the resource is to select a pricing tier. Azure Search has a free pricing tier that’s great for testing out the service. It allows for 10,000 documents, three indexes, and 50MB of storage with no scaling. You’re allowed to set up one free Search Service in your subscription. Be sure you click the Select button after choosing the Free tier or you’ll get the default Standard. Guess how I found out about that mistake.

When you click Create, Azure will verify your settings and if everything checks out, it should take only a few seconds for the service to be live.

So now you have a service, but what will you search? On my first go-round with Azure Search I spent a lot of time looking for a search-worthy data set and then figuring out how to share that data with my service. It was fun, but in hindsight, I wish I’d begun with one of the sample data sets that Azure Search provides just to get my feet wet. So that’s what I’ll do here.

Search is performed not on data, but on indexes of data, and the indexes are in the form of documents. For us relational database people, an index is roughly like a table while a document is akin to an individual row in a table—a single unit of data. So the first thing to do is create an index of the data you want to search. The data can come from a variety of resources. Creating indexes from data that lives in another Azure service is the quickest path, although this is simplest to do with services in the same subscription as the Search. You can point to an existing Azure DocumentDB, an Azure SQL database or a SQL Server that lives on an Azure VM. As I’m writing this, Azure Table Storage and Blob Storage support are currently in preview. Azure Search gives you access to some pre-created sample data. So rather than begin by clicking the Add index option, choose Import data, which will also create the first index for you.

In the Import data blade, shown in Figure 1, choose Samples and then the realestate-us-sample data. Today as I write this article, that’s the only option, but there are more samples coming. You can see by its icon that this is an Azure SQL Database, so you’ll be creating an index of documents from the relational data. Search will transform the data into the structure it needs for indexing. It’s a three-step process. First, you select the data source, then you define the index and, finally, you import the data into your index.

Azure Search Already Knows How to Connect to Data Stored in Azure
Figure 1 Azure Search Already Knows How to Connect to Data Stored in Azure

Define an Index

After you select the sample, the portal will present you with a grid listing all of the fields it discovered in the sample data source. This is the step where you define your index and, in this case, the service’s wizard has pre-defined the index for you (Figure 2). This is your only opportunity to fine-tune the index before importing the data. Redefining an index means re-importing your data, which may not be desirable in a production environment.

First Few Rows of the Default Index Created from the realestate-us-sample Data
Figure 2 First Few Rows of the Default Index Created from the realestate-us-sample Data

An index against Azure SQL Database or SQL Server can target only a single table or view. If you’re using a non-relational data store as your source, Search comprehends the full graphs that are stored within a particular document or blob. So what you see here is from the single table in the database. There are 26 fields, and the grid lets you define how each field will be involved in the search, using the options Retrievable, Filterable, Sortable, Facetable and Searchable. So, as you can already see, search provides some flexibility beyond just typing in a search term. Across the grid are columns to define how the field will be used in the search. The index wizard selected some defaults for you.

You might not need to be able to search on all of the fields. For example, there are seven description fields, one in English and each of the others in a different language. If the application that’s using this resource will be used only by people in Quebec, and you plan to support only French and English searches, you could delete the other five fields from the search index. In the portal, you do this by right-clicking on the field and choosing Delete.

Defining indexes is something that takes some thought and planning, but I’ll just go with the selections the importing tool made. But do take some time to look over the grid to understand what’s being defined. Notice, for example, that the description fields are searchable and retrievable, but not filterable or sortable. In contrast, simple, scalar fields like square feet (sqft) and price are both sortable and filterable.

Using an Indexer to Import the Data

Now that the index is defined, you can go ahead and pull in the data. The index definition will have a huge impact on how the data is imported. Remember that the data is being pulled into the index as opposed to an index being applied to the data. The amount of time this takes will depend on the size, structure and location of the data, as well as the index definition. For example, indexing seven different description fields will take more time than indexing only two of them.

You may already be wondering what happens when the data in your data source changes. Azure Search has a few mechanisms for updating the index and in many cases this can be automated as part of your Azure Search service definition. The rules and parameters around this change detection and data movement differ depending on the data source. You can learn more about this in the documentation.

Next, the wizard will ask you to create an indexer. An indexer is a crawler that reads data from the data source and populates the target index. It’s what performs the initial indexing, as well as updating the index either based on a defined schedule or on demand. Because I’ve chosen an Azure SQL Database data source, the wizard will use a specially defined indexer that knows how to crawl an Azure SQL Database. Like every resource, the indexer needs a name.  I’ve called mine defaultindexer, which might not be a best practice in naming, but got the job done for this demonstration.

When you click OK, the indexing will begin. The portal will pop up a notification to let you know that it’s started and that you can watch its progress in the indexer blade. My import happened so quickly that by the time I looked, it had already completed. In Figure 3, I’ve scrolled down the page in the blade for thedatafarm search service. As you can see, the Indexers box is highlighted and, because I clicked that, the Indexer blade is open to the right, showing that defaultindexer finished its job, creating 4,959 search docs from the data source.

The Indexer Blade in My Search Service Shows the Status of the Data Import
Figure 3 The Indexer Blade in My Search Service Shows the Status of the Data Import

Checking Out the Searchable Documents

Now the index is ready for searching. The search explorer is a good way to get a feel for searching, as well as to test searches directly before implementing them in your code. If you’ve ever worked with OData, you might notice that the way you express filtering, sorting and paging uses OData syntax. Searching itself can be done using one of two query syntaxes. The default syntax is referred to as a simple syntax and the other is the Lucene query syntax.

For your first search, I recommend just clicking the Search button without putting anything into the Query string field. This returns all the data, although it will arrive in pages of 50 at a time. It’s worth a look because this is an unfamiliar data source. It also gives you a feel for how the data is structured.

The results begin with a header that describes the index. The entire array of documents is wrapped in a “value” tag and each document begins with a search score value, and then each field is listed with its value.

Combining Search with OData Filter and Count

For the second search, let’s stick to pure searching. Enter condominium into the query string field and hit Search. Notice the URI ends with &search=condominium.

By default, Azure Search will return paged data with 50 documents at a time, but it’s pretty hard to see what you’re getting here. Let’s add in an OData parameter to ask for the count. Because you’re now using multiple parameters, you’ll need to specify that condominium is part of a search. Here’s the query string:


Now in the results, shown in Figure 4, after the index description and before the first value is listed, you can see the count of the results is 399 documents. The search looked in every field that the index defined for searching and returned a count of every document where it found the string “condominium” and then the first 50 of those 399 matching documents. If you scroll to the bottom, you’ll see that there’s a URI to assist you in retrieving the next 50 documents. This is a pattern you may recognize if you’ve worked with OData before.

The Beginning of a Result Set from a Query with OData $count in the Request; the Count Is Part of the Results Root
Figure 4 The Beginning of a Result Set from a Query with OData $count in the Request; the Count Is Part of the Results Root

To search the results pane, first make sure your cursor is somewhere in the results and then use the browser find command (for example Ctrl+F). A special search box pops up that’s clearly different from the browser’s search box. Type condominium and click through the find results and you’ll see that word pop up in the description and tag fields, and perhaps in others, as well, but I only looked at a few.

Remember that some fields were defined in the index to be filter­able. Modify the query to add in a filter for three-bedroom listings; in other words, where the beds field is three. Filter is an OData parameter and requires a dollar sign in front of it. Here’s the new query string using the OData syntax for filtering:

$count=true&search=condominium&$filter=beds eq 3

The count shows we’re down to 73 documents.

Exploring Azure Search Even Further

At this point, I began to have a better idea of how I might want to define my index, which should be strongly influenced by what you want your app to accomplish. You may have various apps that access the same data store but require different capabilities or different data in the results. With Azure Search you can build separate indexes to satisfy those different needs, and because each index has its own set of data, the apps will not conflict with each other for resources, which could degrade performance.

I like to be able to visualize what I’m working with. Starting my Azure Search education through the portal made it possible for me to visualize the index, and visualize the results. It also allowed me to test out queries. With that experience under my belt, I have more confidence to try Azure Search in an app using the .NET Client or simply executing queries directly with REST. I can use my own databases or other sample data. One experiment I’ve already done was to import data from the public data set shared by the Cooper-Hewitt Museum at bit.ly/2hrOCej. I uploaded a set of JSON documents from their objects data set into an Azure DocumentDB, then built a search service and an index from that. After this project, I discovered that one of the sample applications from the Azure team had gone in a similar direction using publicly available data from the Tate Gallery. You can play with their demo at bit.ly/2gxoHQL to witness more of the power of Azure Search, such as the use of faceted navigation. (Facets are another option to set up when you define an index.) The sample uses JavaScript and interacts with Azure Search directly through the URIs, so looking at the code on GitHub (bit.ly/2gI1ej9) isn’t only educational, but also an impressive view of how simple your own logic can be because Azure Search does all of the heavy lifting for you.

I hope this first look at Azure Search will get you over any initial fear that this is a big, daunting service meant for advanced Azure experts only. Azure exists to take on the hard stuff so that you don’t have to, and Azure Search is a great example of this because it takes care of the hard part of adding search capabilities into your apps.

Julie Lerman is a Microsoft MVP, .NET mentor and consultant who lives in the hills of Vermont. You can find her presenting on data access and other .NET topics at user groups and conferences around the world. She blogs at thedatafarm.com/blog and is the author of “Programming Entity Framework,” as well as a Code First and a DbContext edition, all from O’Reilly Media. Follow her on Twitter: @julielerman and see her Pluralsight courses at juliel.me/PS-Videos.

Thanks to the following Microsoft technical expert for reviewing this article: Pablo Castro
Pablo is a Software Architect in the Data Group at Microsoft. He’s currently the director of engineering for Azure Search, a global scale search-as-a-service product that’s part of the Azure cloud platform.

Discuss this article in the MSDN Magazine forum