Metadata tagging and user bucketing

Intelligent Recommendations can be used to improve relevant personalization for end users, even when they're anonymous. Customers can integrate a personalized metadata tagging experience for their storefront. This experience is achieved by using the ability to identify metadata tags for content (like written articles, podcasts, videos, retail products, etc.) and recommend similar tags or content based on the taste/preference of that user. User metadata can be powerful for recommending relevant content to all users, including:

  • New or infrequent customers (also known as “cold users”).
  • Connecting users to other users based on unique metadata tagging.
  • Connecting users to both relevant and short-lead-time content.

When Metadata tagging is enabled, users can create new recommendations scenarios such as:

  • Metadata Categories we picked for you
  • Other People also look at these categories
  • Recent Events based on your recent activity
  • Similar products/content based on their attributed metadata tags
  • Picks for you based on user behavior buckets

What is a tag?

Tags are a descriptor for something of interest within the items/content, which users gravitate to, and must be specific to the activity of the end user. E.g., , In the world of movies, genre, cast members, mood, etc. may all be considered tags for a movie, as well something that end-Users have a special fondness/dislike for. Tags can even include celebrity players/users, Article titles, genre, Product Categories, events, other content terminology. The goal is to ensure that end users are recommended relevant content that fits their interests/taste/preference based on available metadata.

Architecture overview

To configure metadata tagging as shown in the architecture diagram, the prerequisites are as follows:

  1. Authoritative storage for content with rich metadata tags – Catalog.
  2. User interaction behavior (clicks on content/Usage). End-user profile information may also be available to be used.
  3. A separate Intelligent Recommendations Account and modeling instance for understanding user interests presented as tags.
  4. A component to rank content based on personalized tags with a real-time API query.

This image shows the architecture outline for configuring metadata tagging on a separate intelligent recommendations account.

When enabled, the service produces a model of personalized "tags” for users, based on:

  1. Historical User interactions
  2. Metadata-rich content with “tags”
    1. The assumption here's that the tags are cleaned (no spelling errors and the tags are a predetermined, rationalized set from experts and not randomly created or attached).

Data Contract Configuration

To configure a Data contract to support metadata tagging, do as follows: Take note of the changes between the ItemId, TagId, and InteractionGroupingId.

In the Applications section, you see examples of how the introduction of a TagID or BucketId changes the configuration of the Data Contract. We suggest having a separate Intelligent Recommendations account and modeling instance when testing metadata tagging.

IR Capability Name CATALOG
Data Entity
CATALOG
Data Entity Fields
INTERACTIONS
Data Entity
INTERACTIONS
Data Entity Fields
(Required for all responses) Reco_ItemsAndVariants ItemId as the TagId
Title
Filtering ability
(Applies to all lists)
Reco_ItemCategories ItemCategories: ItemId (or TagId), Category
People also view Reco_Interactions InteractionGroupingId as the UserId
ItemId as the TagId
UserId
InteractionType: Purchase == viewed
Timestamp
Picks for you Reco_Interactions (Same as previous)

Applications and examples

The following sections walk through two common use cases that benefit from metadata tagging and provide some examples with demo data for each.

  1. To get "most popular items for you" for cold users. To see an example, see the section titled "Get Most Popular Items for you for Cold Users".
  2. To create a machine learned map of Users' metadata-values. To see an example, see the section titled "Create an ML map of users' metadata values".

A common problem in the world of AI-ML is how to provide relevant recommendations to users who are new or infrequent customers (also known as “Cold users”). As mentioned previously, the aim here's to create some distinct buckets based on meaningful categories and available demographic information (that is, Age and Gender). Then use all Interactions to connect all users to their corresponding demographic buckets, which in turn enables the buckets to be connected to items during the model training phase. During the serving phase, a Cold User’s demographic bucket can be assigned and then used to recommend items, for example “most popular items by user bucket”.

The steps are as follows:

  1. Prepare a bucketing of Users with their metadata information.
  2. Create the connections for the model in the “Reco_Interactions.csv” data storage file.
  3. Query the model to get “most popular items by user bucket” API.

Step 1: Prepare a bucketing of the Users with their metadata information

Few best practices when creating your buckets are as follows:

  • User Metadata can be represented as ranged buckets. Consider using the metadata that makes sense for your business domain and use case. For example, if you wanted to create a bucket for age data, then you could use these values: Age5To11, Age30To40, etc.
  • Some User metadata can even be combined in buckets together. Consider using the metadata and combinations that make sense for your business domain and use case. For example, you could combine both Age and Gender data to create buckets like this: Age20To30Male, Age20To30Female, Age30To40Male, Age30To40Female, etc.
  • Once buckets are created, you need to assign each bucket a unique BucketId.

Step 2: Create the connections for the model in the “Reco_Interactions.csv” data storage file

Depending on the number of buckets being over or under 1000, the way data is configured in the Data Contract may change.

If there's LESS than 1000 buckets

For each Interaction Row, you set the ChannelId to the BucketId, which corresponds to (or best fits) the user. The Interaction CSV row is changed to: InteractionGroupingID, ItemId, UserId, and BucketId as the ChannelId. An example of the Interactions CSV is shown as follows:

Sample CSV for LESS than 1000 buckets

Interactions CSV Headers appear for convenience only and shouldn't be part of the actual data.

InteractionGroupingId ItemId ItemVariantId UserId InteractionType Timestamp Future Attribute Future Attribute Channel Catalog Strength IsPositive
InteractionGroupingID ItemId UserId BucketId
If there are MORE than 1000 buckets

If there are more than 1000 buckets of data, then you create more interaction rows using the BucketId. Turn each original interaction row between a User and an Item into two new distinct rows with a unique InteractionGroupingId that is only unique to these two rows. The example shows:

  1. The original interaction row using UserId,ItemId, and the InteractionGroupingId as UNIQUE_ID.
  2. The additional interaction row with the BucketId as the ItemId.
Sample CSV for MORE than 1000 buckets:

Interactions CSV Headers appear for convenience only and shouldn't be part of the actual data.

InteractionGroupingId ItemId ItemVariantId UserId InteractionType Timestamp Future Attribute Future Attribute Channel Catalog Strength IsPositive
UNIQUE_ID ItemId UserId
UNIQUE_ID BucketId UserId

Take into consideration the model construction outline discussed previously. After a Cold User and their demographic bucket have been determined, query the Serving Endpoint using the “Next Best Action” (formerly CART) list type with the demographic-focused BucketId to recommend the most popular Items for that bucket.

When there's LESS than 1000 buckets

A sample API Query link where the parameter for ChannelId is replaced with the BucketId value, which would look like this:

<serving-endpoint>/Reco/V1.0/Popular?channelID=<BucketId>
Example 1: Less than 1000 buckets

Assume a User with UserId=100, with a custom assigned BucketId=Age30To40, who recently purchased an item with ItemId=98005. This example creates a row in the “Reco_Interactions.csv” file, which uses a BucketId (in the ChannelId field of the IR schema) which best matches the User (represented by UserId in the IR schema):

  • Original Interaction info is: InteractionGroupingId=1, UserId=100, ItemId=98005
  • Notice in the CSV example, that the relevant ChannelId, which best matches the UserId is appended. In the example, theUserId was matched to the BucketId= Age30To40, so the modified Interaction row is:
InteractionGroupingId ItemId ItemVariantId UserId InteractionType Timestamp Future Attribute Future Attribute Channel Catalog Strength IsPositive
1 98005 100 Age30To40
  • The API Query and Response return a list of ItemIds, including ItemId=43218 in the third position, which is a popular item for users of this category.
API Query
GET <serving-endpoint>/reco/v1.0/Popular?ChannelId=Age30To40
Response
{
    "id": "Lists",
    "name": "Lists",
    "version": "v1.0",
    "interactionsVersion": "20220104115104",
    "items": [
        {
            "id": "65106",
            "trackingId": "00000000-0000-0000-0000-000000000003"
        },
        {
            "id": "62604",
            "trackingId": "00000000-0000-0000-0000-000000000003"
        },
        {
            "id": "43218",
            "trackingId": "00000000-0000-0000-0000-000000000003"
        },
        {
            "id": "63503",
            "trackingId": "00000000-0000-0000-0000-000000000003"
        },
        {
            "id": "62452",
            "trackingId": "00000000-0000-0000-0000-000000000003"
        }
    ],
    "title": "Popular",
    "longTitle": "Popular",
    "titleId": 5,
    "pagingInfo": {
        "totalItems": 200
    },
    "status": "Success"
}
When there are MORE than 1000 buckets

A sample API Query link where the ItemId is replaced with the BucketId for a cold user would look like this:

<serving-endpoint>/Reco/V1.0/Cart/<BucketId>?
Example 2: More than 1000 buckets

Assume a User with UserId=100, with a custom assigned BucketId=Age30To40Female, who recently purchased an item with ItemId=98005. Now you can use the original interaction data and construct rows in the “Reco_Interactions.csv” file:

  • Original Interaction info is: InteractionGroupingId= NEW_UNIQUE_ID, UserId=100, ItemId=98005
  • The two rows of constructed Interaction info that should be in the “Reco_Interactions.csv” file that is read by Intelligent Recommendations service:
InteractionGroupingId ItemId ItemVariantId UserId InteractionType Timestamp Future Attribute Future Attribute Channel Catalog Strength IsPositive
UNIQUE_ID 98005 100
UNIQUE_ID Age30To40Female 100
  • The API Query and Response return a list of ItemIds, including ItemId=43218 in the third position, which is a popular product for users in this category.
API Query
GET <serving-endpoint>/reco/v1.0/Cart/Age30To40Female? 
Response
{
    "id": "Lists",
    "name": "Lists",
    "version": "v1.0",
    "interactionsVersion": "20220104115104",
    "items": [
        {
            "id": "65106",
            "trackingId": "00000000-0000-0000-0000-000000000003"
        },
        {
            "id": "62604",
            "trackingId": "00000000-0000-0000-0000-000000000003"
        },
        {
            "id": "43218",
            "trackingId": "00000000-0000-0000-0000-000000000003"
        },
        {
            "id": "63503",
            "trackingId": "00000000-0000-0000-0000-000000000003"
        },
        {
            "id": "62452",
            "trackingId": "00000000-0000-0000-0000-000000000003"
        }
    ],
    "title": "Cart",
    "longTitle": "FrequentlyBoughtTogether",
    "titleId": 5,
    "pagingInfo": {
        "totalItems": 200
    },
    "status": "Success"
}

Application 2: Create an ML Map of users' metadata-values

Modeling user metadata “Tags” in place of direct user interactions can be a powerful modification when the goal is to produce an outcome, which shows how connected users are with those tags, and which tags are truly similar by behavior. Assign each meaningful and available tag (e.g. demographics like Age and Gender, or other metadata) a unique identifier, which the service refers to as the TagId. During the Model training phase, all interactions data is used to build a connection between UserIds and TagIds.

During the serving phase, the system can provide a personalized list of Tags by calling “Picks for you” with UserId, and similar tags by calling “People also like” with TagId.

How to use TagIds for recommendations:

  1. Prepare a list of user metadata values (tags) and assign each of them a unique TagId.
  2. Create the connections for the model in the Reco_Interactions.CSV data storage file.
  3. Query the model to get “personalized tags by user” or “similar tags” API.

Step 1: Prepare a list of user metadata values (tags) and assign each of them a unique TagId

When constructing values for Age data, bucketing is still a good approach: Age5To11, Age12To18, etc.

For other metadata values, create a separate TagId for each. For example, if we wanted a category for Family Status: Single, Couple, CoupleWithKids, etc.

Step 2: Create the connections for the model in the Reco_Interactions.CSV data storage file

Use each original Interaction between a User and Item, to construct a row of Interaction data with the TagId. [!Note:]

Some important reminders with this approach:

  1. Only the newly constructed data will be used in the Interactions data entity for the model.
  2. The creation of an Interaction row that connects Users to TagIds does not necessarily need to be based on an interaction. This is an example to illustrate how one can create an Interaction to connect Users to Tags in the model.
  3. For the InteractionGroupingId it might make sense to reuse the original Interaction, if available. Otherwise, try either grouping by UserId. During the Model training phase, all interactions data is used to build a connection between the different TagIds, and between UserIDs and TagIds. Trying with different ways to group and then seeing which yields the better relevant results is our suggestion as different scenarios and usage patterns can differ.
    1. Original Interaction row: with UserId, ItemId, InteractionGroupingId. Unlike the example above with BucketId, DO NOT INCLUDE this row in the input dataset.
    2. NEW Interaction row: with UserId, TagId as the ItemId, UserId as the InteractionGroupingId.

An example Data Contract would look like this:

InteractionGroupingId ItemId ItemVariantId UserId InteractionType Timestamp Future Attribute Future Attribute Channel Catalog Strength IsPositive
UserId TagId UserId

Step 3: Query the model to get personalized tags by user or similar tags

With careful model construction, querying the Serving Endpoint using the “Picks for you” and “People also like” list types yield the desired outcomes.

A "Picks for you" API Query, which returns the recommended TagIds for a given UserId would look like this:

<serving-endpoint>Reco/v1.0/picks?userId=<UserId>

A "People also like" API Query where the seed-item parameter is replaced by the corresponding TagId:

<serving-endpoint>/Reco/V1.0/Similar/<TagID-value>?
Sample response output
{
    "id": "Picks",
    "name": "Picks",
    "version": "v1.0",
    "items": [
        {
            "id": "68100",
            "trackingId": "00000000-0000-0000-0000-000000000003"
        },
        {
            "id": "62500",
            "trackingId": "00000000-0000-0000-0000-000000000003"
        },
        {
            "id": "61504",
            "trackingId": "00000000-0000-0000-0000-000000000003"
        },
        {
            "id": "65103",
            "trackingId": "00000000-0000-0000-0000-000000000003"
        },
        {
            "id": "61401",
            "trackingId": "00000000-0000-0000-0000-000000000003"
        }
    ],
    "title": "Picks for you",
    "longTitle": "Picks for you",
    "titleId": 6,
    "personalizationConfidence": 1.0,
    "pagingInfo": {
        "totalItems": 139
    },
    "status": "Success"
}

Example 3: Query for tagIds with demo data

Assume a User with UserId=100, has indicated that they're aligned with the following tags: 123 (which represents “Soccer fan”), Age30To40Female, and FamilyWithKids.

You can use the original interaction row to construct the following rows in the “Reco_Interactions.csv” file: o New 3 rows of Interaction info, one for each Tag for that User, that should be in the “Reco_Interactions.csv” file that is read by Intelligent Recommendations service:

![Note]

In this example, we’ve chosen to group by UserId, and have set the InteractionGroupingId equal to the UserId. Also note that the ItemId is representing the TagId.

InteractionGroupingId ItemId ItemVariantId UserId InteractionType Timestamp Future Attribute Future Attribute Channel Catalog Strength IsPositive
100 123 100
100 Age30To40Female 100
100 FamilyWithKids 100
Query and responses for picks

Here's what the constructed "Picks for you" request looks like:

GET <serving-endpoint>/reco/v1.0/picks?UserId=100

The Picks Response returns a List of 200 ItemIds (for tags) including, TagID=FamilyWithKids in first position.

{
    "id": "Picks",
    "name": "Picks",
    "version": "v1.0",
    "items": [
        {
            "id": "FamilyWithKids",
            "trackingId": "00000000-0000-0000-0000-000000000003"
        },
        {
            "id": "625",
            "trackingId": "00000000-0000-0000-0000-000000000003"
        },
        {
            "id": "Sports",
            "trackingId": "00000000-0000-0000-0000-000000000003"
        },
        {
            "id": "651",
            "trackingId": "00000000-0000-0000-0000-000000000003"
        },
        {
            "id": "611",
            "trackingId": "00000000-0000-0000-0000-000000000003"
        }
    ],
    "title": "Picks for you",
    "longTitle": "Picks for you",
    "titleId": 6,
    "personalizationConfidence": 1.0,
    "pagingInfo": {
        "totalItems": 139
    },
    "status": "Success"
}

Query and response for similar

Here's what the constructed "People also" request using the Similar API looks like:

GET <serving-endpoint>/Reco/V1.0/Similar/FamilyWithKids?

The "People also like" Response returns a List of 200 ItemIds (for tags) including, Age30To40Female in first position and FamilyWithKids in the second position.


{
    "id": "Similar",
    "name": "Similar",
    "version": "v1.0",
    "items": [
        {
            "id": "Age30To40Female",
            "trackingId": "00000000-0000-0000-0000-000000000003"
        },
        {
            "id": "FamilyWithKids",
            "trackingId": "00000000-0000-0000-0000-000000000003"
        },
        {
            "id": "SportsParent",
            "trackingId": "00000000-0000-0000-0000-000000000003"
        },
        {
            "id": "651",
            "trackingId": "00000000-0000-0000-0000-000000000003"
        },
        {
            "id": "123",
            "trackingId": "00000000-0000-0000-0000-000000000003"
        }
    ],
    "title": "People also like",
    "longTitle": "People also like",
    "titleId": 6,
    "pagingInfo": {
        "totalItems": 200
    },
    "status": "Success"
}

To learn more about our service and the models we support, check out our Modeling Guide.

See Also

Quickstart Guide: Create an IR Account
Modeling Q&A
Data Contract Guide
Sample API Requests