How Personalizer works

Personalizer uses machine learning to discover what action to use in a context. Each learning loop has a model that is trained exclusively on data that you have sent to it via Rank and Reward calls. Every learning loop is completely independent of each other. Create a learning loop for each part or behavior of your application you want to personalize.

For each loop, call the Rank API with based on the current context, with:

  • List of possible actions: content items from which to select top action.
  • List of context features: contextually relevant data such as user, content, and context.

The Rank API decides to use either:

  • Exploit: The current model to decide the best action based on past data.
  • Explore: Select a different action instead of the top action.

The Reward API:

  • Collects data to train the model by recording the features and reward scores of each rank call.
  • Uses that data to update the model based on configuration specified in the Learning Policy.


The following image shows the architectural flow of calling the Rank and Reward calls:

alt text

  1. Personalizer uses an internal AI model to determine the rank of the action.
  2. The service decides whether to exploit the current model or explore new choices for the model.
  3. The ranking result is sent to EventHub.
  4. When Personalizer receives the reward, the reward is sent to EventHub.
  5. The rank and reward are correlated.
  6. The AI model is updated based on the correlation results.
  7. The inference engine is updated with the new model.

Research behind Personalizer

Personalizer is based on cutting-edge science and research in the area of Reinforcement Learning including papers, research activities, and ongoing areas of exploration in Microsoft Research.


  • Learning Loop: You can create a learning loop for every part of your application that can benefit from personalization. If you have more than one experience to personalize, create a loop for each.

  • Actions: Actions are the content items, such as products or promotions, to choose from. Personalizer chooses the top action to show to your users, known as the Reward action, via the Rank API. Each action can have features submitted with the Rank request.

  • Context: To provide a more accurate rank, provide information about your context, for example:

    • Your user.
    • The device they are on.
    • The current time.
    • Other data about the current situation.
    • Historical data about the user or context.

    Your specific application may have different context information.

  • Features: A unit of information about a content item or a user context.

  • Reward: A measure of how the user responded to the Rank API returned action, as a score between 0 and 1. The 0 to 1 value is set by your business logic, based on how the choice helped achieve your business goals of personalization.

  • Exploration: The Personalizer service is exploring when, instead of returning the best action, it chooses a different action for the user. The Personalizer service avoids drift, stagnation, and can adapt to ongoing user behavior by exploring.

  • Experiment Duration: The amount of time the Personalizer service waits for a reward, starting from the moment the Rank call happened for that event.

  • Inactive Events: An inactive event is one where you called Rank, but you're not sure the user will ever see the result, due to client application decisions. Inactive events allow you to create and store personalization results, then decide to discard them later without impacting the machine learning model.

  • Model: A Personalizer model captures all data learned about user behavior, getting training data from the combination of the arguments you send to Rank and Reward calls, and with a training behavior determined by the Learning Policy.

  • Learning Policy: How Personalizer trains a model on every event will be determined by some meta-parameters that affect how the machine learning algorithms work. New Personalizer loops will start with a default Learning Policy, which can yield moderate performance. When running Evaluations, Personalizer can create new Learning Policies specifically optimized to the use cases of your loop. Personalizer will perform significantly better with policies optimized for each specific loop, generated during Evaluation.

Example use cases for Personalizer

  • Intent clarification & disambiguation: help your users have a better experience when their intent is not clear by providing an option that is personalized to each user.
  • Default suggestions for menus & options: have the bot suggest the most likely item in a personalized way as a first step, instead of presenting an impersonal menu or list of alternatives.
  • Bot traits & tone: for bots that can vary tone, verbosity, and writing style, consider varying these traits in a personalized ways.
  • Notification & alert content: decide what text to use for alerts in order to engage users more.
  • Notification & alert timing: have personalized learning of when to send notifications to users to engage them more.

How to use Personalizer in a web application

Adding a loop to a web application includes:

  • Determine which experience to personalize, what actions and features you have, what context features to use, and what reward you'll set.
  • Add a reference to the Personalization SDK in your application.
  • Call the Rank API when you are ready to personalize.
  • Store the eventId. You send a reward with the Reward API later.
  1. Call Activate for the event once you're sure the user has seen your personalized page.
  2. Wait for user selection of ranked content.
  3. Call Reward API to specify how well the output of the Rank API did.

How to use Personalizer with a chat bot

In this example, you will see how to use Personalization to make a default suggestion instead of sending the user down a series of menus or choices every time.

  • Get the code for this sample.
  • Set up your bot solution. Make sure to publish your LUIS application.
  • Manage Rank and Reward API calls for bot.
    • Add code to manage LUIS intent processing. If the None is returned as the top intent or the top intent's score is below your business logic threshold, send the intents list to Personalizer to Rank the intents.
    • Show intent list to user as selectable links with the first intent being the top-ranked intent from Rank API response.
    • Capture the user's selection and send this in the Reward API call.
  • Make Personalizer Rank API calls every time a disambiguation is needed, as opposed to caching results for each user. The result of disambiguating intent may change over time for one person, and allowing the Rank API to explore variances will accelerate overall learning.
  • Choose an interaction that is common with many users so that you have enough data to personalize. For example, introductory questions may be better fits than smaller clarifications deep in the conversation graph that only a few users may get to.
  • Use Rank API calls to enable "first suggestion is right" conversations, where the user gets asked "Would you like X?" or "Did you mean X?" and the user can just confirm; as opposed to giving options to the user where they must choose from a menu. For example User:"I'd like to order a coffee" Bot:"Would you like a double espresso?". This way the reward signal is also strong as it pertains directly to the one suggestion.

How to use Personalizer with a recommendation solution

Use your recommendation engine to filter down a large catalog to a few items which can then be presented as 30 possible actions sent to the Rank API.

You can use recommendation engines with Personalizer:

  • Set up the recommendation solution.
  • When displaying a page, invoke the Recommendation Model to get a short list of recommendations.
  • Call Personalization to Rank the Output of Recommendation Solution.
  • Send feedback about your user action with the Reward API call.

Pitfalls to avoid

  • Don't use Personalizer where the personalized behavior isn't something that can be discovered across all users but rather something that should be remembered for specific users, or comes from a user-specific list of alternatives. For example, using Personalizer to suggest a first pizza order from a list of 20 possible menu items is useful, but which contact to call from the users' contact list when requiring help with childcare (such as "Grandma") is not something that is personalizable across your user base.

Adding content safeguards to your application

If your application allows for large variances in content shown to users, and some of that content may be unsafe or inappropriate for some users, you should plan ahead to make sure that the right safeguards are in place to prevent your users from seeing unacceptable content. The best pattern to implement safeguards is: * Obtain the list of actions to rank. * Filter out the ones that are not viable for the audience. * Only rank these viable actions. * Display the top ranked action to the user.

In some architectures, the above sequence may be hard to implement. In that case, there is an alternative approach to implementing safeguards after ranking, but a provision needs to be made so actions that falls outside the safeguard are not used to train the Personalizer model.

  • Obtain the list of actions to rank, with learning deactivated.
  • Rank actions.
  • Check if the top action is viable.
    • If the top action is viable, activate learning for this rank, then show it to the user.
    • If the top action is not viable, do not activate learning for this ranking, and decide through your own logic or alternative approaches what to show to the user. Even if you use the second-best ranked option, do not activate learning for this ranking.

Verifying adequate effectiveness of Personalizer

You can monitor the effectiveness of Personalizer periodically by performing offline evaluations

Next steps

Understand where you can use Personalizer. Perform Offline Evaluations