Built-in skills for text and image processing during indexing (Azure Cognitive Search)

This article describes the skills provided with Azure Cognitive Search that you can include in a skillset to extract content and structure from raw unstructured text and image files. A skill is an atomic operation that transforms content in some way. Often, it is an operation that recognizes or extracts text, but it can also be a utility skill that reshapes the enrichments that are already created. Typically, the output is text-based so that it can be used in full text queries.

Built-in skills

Built-in skills are based on pre-trained models from Microsoft, which means you cannot train the model using your own training data. Skills that call the Cognitive Resources APIs have a dependency on those services and are billed at the Cognitive Services pay-as-you-go price when you attach a resource. Other skills are metered by Azure Cognitive Search, or are utility skills that are available at no charge.

The following table enumerates and describes the built-in skills.

OData type Description Metered by
Microsoft.Skills.Text.CustomEntityLookupSkill Looks for text from a custom, user-defined list of words and phrases. Azure Cognitive Search (pricing)
Microsoft.Skills.Text.KeyPhraseExtractionSkill This skill uses a pretrained model to detect important phrases based on term placement, linguistic rules, proximity to other terms, and how unusual the term is within the source data. Cognitive Services (pricing)
Microsoft.Skills.Text.LanguageDetectionSkill This skill uses a pretrained model to detect which language is used (one language ID per document). When multiple languages are used within the same text segments, the output is the LCID of the predominantly used language. Cognitive Services (pricing)
Microsoft.Skills.Text.MergeSkill Consolidates text from a collection of fields into a single field. Not applicable
Microsoft.Skills.Text.V3.EntityLinkingSkill This skill uses a pretrained model to generate links for recognized entities to articles in Wikipedia. Cognitive Services (pricing)
Microsoft.Skills.Text.V3.EntityRecognitionSkill This skill uses a pretrained model to establish entities for a fixed set of categories: "Person", "Location", "Organization", "Quantity", "DateTime", "URL", "Email", "PersonType", "Event", "Product", "Skill", "Address", "Phone Number" and "IP Address" fields. Cognitive Services (pricing)
Microsoft.Skills.Text.PIIDetectionSkill This skill uses a pretrained model to extract personal information from a given text. The skill also gives various options for masking the detected personal information entities in the text. Cognitive Services (pricing)
Microsoft.Skills.Text.V3.SentimentSkill This skill uses a pretrained model to assign sentiment labels (such as "negative", "neutral" and "positive") based on the highest confidence score found by the service at a sentence and document-level on a record by record basis. Cognitive Services (pricing)
Microsoft.Skills.Text.SplitSkill Splits text into pages so that you can enrich or augment content incrementally. Not applicable
Microsoft.Skills.Text.TranslationSkill This skill uses a pretrained model to translate the input text into a variety of languages for normalization or localization use cases. Cognitive Services (pricing)
Microsoft.Skills.Vision.ImageAnalysisSkill This skill uses an image detection algorithm to identify the content of an image and generate a text description. Cognitive Services (pricing)
Microsoft.Skills.Vision.OcrSkill Optical character recognition. Cognitive Services (pricing)
Microsoft.Skills.Util.ConditionalSkill Allows filtering, assigning a default value, and merging data based on a condition. Not applicable
Microsoft.Skills.Util.DocumentExtractionSkill Extracts content from a file within the enrichment pipeline. Azure Cognitive Search (pricing)
Microsoft.Skills.Util.ShaperSkill Maps output to a complex type (a multi-part data type, which might be used for a full name, a multi-line address, or a combination of last name and a personal identifier.) Not applicable

Custom skills

Custom skills are modules that you design, develop, and deploy to the web. You can then call the module from within a skillset as a custom skill.

Type Description Metered by
Microsoft.Skills.Custom.WebApiSkill Allows extensibility of an AI enrichment pipeline by making an HTTP call into a custom Web API None unless your solution uses a metered Azure service
Microsoft.Skills.Custom.AmlSkill Allows extensibility of an AI enrichment pipeline with an Azure Machine Learning model None unless your solution uses a metered Azure service

For guidance on creating a custom skill, see Define a custom interface and Example: Creating a custom skill for AI enrichment.

See also