Searching Blob storage with Azure Search
Searching across the variety of content types stored in Azure Blob storage can be a difficult problem to solve. However, you can index and search the content of your Blobs in just a few clicks by using Azure Search. Searching over Blob storage requires provisioning an Azure Search service. The various service limits and pricing tiers of Azure Search can be found on the pricing page.
What is Azure Search?
Azure Search is a search service that makes it easy for developers to add robust full-text search experiences to web and mobile applications. As a service, Azure Search removes the need to manage any search infrastructure while offering a 99.9% uptime SLA.
Index and search enterprise document formats
With support for document extraction in Azure Blob storage, you can index the following content:
- Microsoft Office formats: DOCX/DOC, XLSX/XLS, PPTX/PPT, MSG (Outlook emails)
- Plain text files (see also Indexing plain text)
- JSON (see Indexing JSON blobs)
- CSV (see Indexing CSV blobs preview feature)
Support for CSV and JSON arrays is currently in preview. These formats are available only using version 2016-09-01-Preview of the REST API or version 2.x-preview of the .NET SDK. Please remember, preview APIs are intended for testing and evaluation, and should not be used in production environments.
By extracting text and metadata from these file types, you can search across multiple file formats with a single query.
Search through your blob metadata
A common scenario that makes it easy to sort through blobs of any content type is to index both custom metadata and system properties for each blob. In this way, information for all blobs is indexed regardless of document type. You can then proceed to sort, filter, and facet across all Blob storage content.
Azure Search’s full-text search, faceted navigation, and sorting capabilities can now be applied to the metadata of images stored in blobs.
If these images are pre-processed using the Computer Vision API from Microsoft’s Cognitive Services, then it is possible to index the visual content found in each image including OCR and handwriting recognition. We are working on adding OCR and other image processing capabilities directly to Azure Search, if you are interested in these capabilities, submit a request on our UserVoice or email us.
Index and search through JSON blobs
Azure Search can be configured to extract structured content found in blobs that contain JSON. Azure Search can read JSON blobs and parse the structured content into the appropriate fields of an Azure Search document. Azure Search can also take blobs that contain an array of JSON objects and map each element to a separate Azure Search document.
JSON parsing is not currently configurable through the portal. Learn more about JSON parsing in Azure Search.
Azure Search can be added to blobs directly from the Blob storage portal page.
Click Add Azure Search to launch a flow where you can select an existing Azure Search service or create a new service. If you create a new service, you are navigated out of your Storage account's portal experience. You can navigate back to the Storage portal page and re-select the Add Azure Search option, where you can select the existing service.
Learn more about the Azure Search Blob Indexer in the full documentation.
We'd love to hear your thoughts. Choose the type you'd like to provide:
Our feedback system is built on GitHub Issues. Read more on our blog.