Predefined skills for content enrichment (Azure Search)
In this article, you learn about the cognitive skills provided with Azure Search. A cognitive skill is an operation that transforms content in some way. Often, it is a component that extracts data or infers structure, and therefore augments our understanding of the input data. Almost always, the output is text-based. A skillset is collection of skills that define the enrichment pipeline.
Starting December 21, 2018, you will be able to associate a Cognitive Services resource with an Azure Search skillset. This will allow us to start charging for skillset execution. On this date, we will also begin charging for image extraction as part of the document-cracking stage. Text extraction from documents will continue to be offered at no additional cost.
The execution of built-in skills will be charged at the existing Cognitive Services pay-as-you go price . Image extraction pricing will be charged at preview pricing, and is described on the Azure Search pricing page. Learn more.
Several skills are flexible in what they consume or produce. In general, most skills are based on pre-trained models, which means you cannot train the model using your own training data. For guidance on creating a custom skill, see How to define a custom interface and Example: creating a custom skill. The following table enumerates and describes the skills provided by Microsoft.
|Microsoft.Skills.Text.KeyPhraseSkill||This skill uses a pretrained model to detect important phrases based on term placement, linguistic rules, proximity to other terms, and how unusual the term is within the source data.|
|Microsoft.Skills.Text.LanguageDetectionSkill||This skill uses a pretrained model to detect which language is used (one language ID per document). When multiple languages are used within the same text segments, the output is the LCID of the predominantly used language.|
|Microsoft.Skills.Text.MergerSkill||Consolidates text from a collection of fields into a single field.|
|Microsoft.Skills.Text.NamedEntityRecognitionSkill||This skill uses a pretrained model to establish entities for a fixed set of categories: people, location, organization.|
|Microsoft.Skills.Text.SentimentSkill||This skill uses a pretrained model to score positive or negative sentiment on a record by record basis. The score is between 0 and 1. Neutral scores occur for both the null case when sentiment cannot be detected, and for text that is considered neutral.|
|Microsoft.Skills.Text.SplitSkill||Splits text into pages so that you can enrich or augment content incrementally.|
|Microsoft.Skills.Vision.ImageAnalysisSkill||This skill uses an image detection algorithm to identify the content of an image and generate a text description.|
|Microsoft.Skills.Vision.OcrSkill||Optical character recognition.|
|Microsoft.Skills.Util.ShaperSkill||Maps output to a complex type (a multi-part data type, which might be used for a full name, a multi-line address, or a combination of last name and a personal identifier.)|