Welcome to the Microsoft Web Language Model API, a REST-based cloud service providing state-of-the-art tools for natural language processing. Using this API, your application can leverage the power of big data through language models trained on web-scale corpora collected by Bing in the EN-US market.
These smoothed backoff N-gram language models, supporting Markov order up to 5, are trained on the following corpora:
- Web page body text
- Web page title text
- Web page anchor text
- Web search query text
The Web LM REST API supports four lookup operations:
- Joint (log10) probability of a sequence of words.
- Conditional (log10) probability of one word given a sequence of preceding words.
- List of words (completions) most likely to follow a given sequence of words.
- Word breaking of strings that contain no spaces.
- Subscribe to the service.
- Download the SDK.
- Run the SDK sample code.
- Consult the API Reference for further details, including code snippets in a variety of languages.
The following paper provides details on the development of these language models, and should be cited in research publications that utilize this service:
- An Overview of Microsoft Web N-gram Corpus and Applications, NAACL-HLT 2010
Click here for a current list of papers citing this work.