What is the Web Language Model API? (Preview)
The Microsoft Web Language Model API is a REST-based cloud service providing state-of-the-art tools for natural language processing. Using this API, your application can leverage the power of big data through language models trained on web-scale corpora collected by Bing in the en-US market.
These smoothed backoff N-gram language models, supporting up to fifth-order Markov chains, are trained on the following corpora:
- Web page body text
- Web page title text
- Web page anchor text
- Web search query text
The Web Language Model API supports four lookup operations:
- Joint (log10) probability of a sequence of words.
- Conditional (log10) probability of one word given a sequence of preceding words.
- List of words (completions) most likely to follow a given sequence of words.
- Word breaking of strings that contain no spaces.
- Subscribe to the service.
- Download the SDK.
- Run the SDK sample code.
- Refer to the API Reference for full details of the endpoints, including code snippets in a variety of languages.
The following paper provides details on the development of these language models, and should be cited in research publications that use this service:
- An Overview of Microsoft Web N-gram Corpus and Applications, NAACL-HLT 2010
Click here for a current list of papers citing this work.