Data contract overview

Article
04/24/2024

This article explains how to share data with Intelligent Recommendations so you can enable it and provide meaningful recommendations.

The corresponding Intelligent Recommendations API for the described data contracts is Intelligent Recommendations API.

Download the latest model.json file for Intelligent Recommendations data contracts: model.json.

Prerequisites

For data integration, Intelligent Recommendations use Microsoft Azure Data Lake Storage. This article describes the logical structure of the data that Intelligent Recommendations expect to consume from your Azure Data Lake Storage account.

To allow Intelligent Recommendations to easily find your data within the Azure Data Lake Storage account, you must create a dedicated folder within the Azure Data Lake Storage account and provide the folder path (Intelligent Recommendations root folder) to Intelligent Recommendations.

For information about onboarding and creating your Data Lake Storage account, go to Deploy Intelligent Recommendations or visit our Quickstart Guide.

Data contracts

Data contracts are a set of definitions and constraints for the structure of the data that Intelligent Recommendations consume. To allow Intelligent Recommendations to ingest the data shared with it and provide recommendations, you need to adhere to the data contracts as described in this article.

Model JSON file

Intelligent Recommendations data contracts are logically divided into a set of data entities. Each data entity comprises zero or more input CSV files, which are also called partitions. A separate JSON text file called model.json describes the set of data entities. The model JSON file is preconfigured and can be immediately added to the Intelligent Recommendations root folder.

Download the default model

Download the latest default model JSON file for Intelligent Recommendations data contracts: model.json.

[!NOTE]

The model.json file is required to be included in the Intelligent Recommendations Root folder in addition with the Data Entity Files. You can learn about making adjustments to the model.json under the Modify the default file section of this Data Contract.

Modify the default file

Modifying the provided model JSON file isn't recommended, until you become familiar with the Intelligent Recommendations service and only when using one the following features:

Numeric inputs format. The culture attribute specifies what Intelligent Recommendations use as the input format for numeric values. The decimal separator can be a period (.) or comma (,) in different cultures. To use a decimal separator other than a period (.), specify the appropriate culture in the culture attribute.

Note

If you're using a comma (,) as a decimal separator, you'll need to properly escape each decimal value in the input CSV file. For more information about how to escape characters in CSV input files, go to the Data format section.
Explicit partitions locations. To specify explicit locations of the data entity partition files, you can use the partitions attribute. By default, the partitions attribute value is null, which means that Intelligent Recommendations automatically search for the relevant data entity partition files. For more information, go to Data format. The partitions attribute is an array of partitions. Each partition contains the following attributes:
- name: A string representation of the partition, not used by Intelligent Recommendations for any specific logic.
- location: Full URI to the partition data file (CSV). The URI needs to be accessible to Intelligent Recommendations (read-only), which might require you to provide proper permissions for Intelligent Recommendations. For more information on how to give Intelligent Recommendations access to data, go to Set up Azure Data Lake Storage.
- fileFormatSettings: Contains the following attribute:
  - columnsHeaders: A Boolean value specifying whether the partition data contains a headers line. Intelligent Recommendations automatically discard header lines when input data is ingested. Default value is false, meaning no headers.

Here's a sample of the partitions attribute:

"partitions": [
        {
            "name": "Partition1",
            "location": "https://myStorageAcount.blob.core.windows.net/intelligent-recommnedations-container/intelligent-recommendations-root-folder/partition1.csv",
            "fileFormatSettings": {
                "columnHeaders": true
            }
        }
    ]

Best practices for updating your input data

Avoid a situation in which data is being modeled and updated at the same time, as it can lead to modeling of data from mixed dataset versions and undesired recommendation results. Some best practices for updating your input data are as follows:

Write all of the data entities to a different folder. This folder doesn't have to be located in the same container or storage account that your current input data is located in. Make sure to provide Intelligent Recommendation permissions to read data from the container of your updated input data. For more information, go to Set up Azure Data Lake Storage.
For each of the data entities you're using, add the 'partitions' attribute to your Model Json file. For each partition, update the 'location' attribute so that it points to the new data location. An explanation about how to add and edit 'partitions' attribute can be found here
You can delete the old data if it isn't in use anymore. We recommend deleting old data after the estimated modeling cycle duration (at least 36 hours), with some buffer to avoid data being deleted while it's being modeled.
Repeat steps 1-3 every time you want to update your input data.

Data entities

A data entity is a set of one or more data text files, each having a list of columns (also called attributes) and rows containing the actual data values.

Intelligent Recommendations defines logical groups of data entities, each with its own purpose. Data entities are considered optional (unless explicitly stated otherwise), which means that their data can be empty (or entirely missing).

Intelligent Recommendations define the following data entities groups:

Group	Data entities
Catalog data entities	Items and variants Item categories Item and variant images Item and variant filters Item and variant availabilities
Interactions data entities	Interactions
Reco configuration data entities	Reco configuration
Opted-out users data entities	Opted-out users
Recommendations enrichment data entities	Seeded recommendations enrichment Recommendations enrichment
Image to item mapping data entities	Images inventory Image to item mappings
External lists data entities	External recommendations lists External recommendations items

Data format

Intelligent Recommendations expects all data entities partition input files to conform to the following format:

The content within the partition input file should be in comma-delimited text files (CSV) format, using UTF-8 encoded text only.
Each CSV file should include all the fields specified in the data contract of the relevant data entity. In addition, the fields should be displayed according to the order described in that contract.
CSV files should hold only data entries, according to RFC 4180.

Here are some common examples for CSV data format behavior in different cases:

Each field may or may not be enclosed in double quotation marks.

For example: aaa, “bbb”, ccc
Fields containing line breaks (CRLF), double quotation marks, and commas must be enclosed in double quotation marks.

For example: aaa, “bbCRLFb”, “c, cc”
A double quotation mark appearing inside a field must be escaped by preceding it with another double quotation mark.

For example: aaa, “b””bb”, ccc

In the case that you didn’t explicitly state the partitions attribute (in the model JSON file) for a data entity, Intelligent Recommendations search for the data entity partition files within a subfolder (under the Intelligent Recommendations root folder) that has the same name as the data entity.

In this case, all partition input files within the data entity subfolder should have a CSV file extension, such as MyData.csv, and shouldn't contain a headers data line.

Intelligent Recommendations searches and aggregates data from all files that use the CSV extension, while ignoring the filename itself.

Intelligent Recommendations folder structure example

Here's a screenshot example of an Intelligent Recommendations root folder structure. CSVs aren't required to match the Folder names:

Example structure of an Intelligent Recommendations root folder.

Required data entities for each recommendations scenario

Recommendations Scenarios might rely on different data entities in order to function properly. To see a complete table mapping scenarios and data entities, see our Data Entities Mapping Table.

Data content requirements and limitations

All data entities contents must respect the following requirements and limitations.

Any data row that doesn't respect these requirements are treated as specified in the Invalid Value Behavior column for the relevant data entity and attributes:

All item and item variant IDs must comply with exactly one of these restrictions (you can't mix between item IDs formats of both options):
- Length must be 16 characters or less and contain only the following characters: A-Z, 0-9, _, -, ~,.
- In GUID format—a string of exactly 36 characters containing hexadecimal characters and dashes; for example, 12345678-1234-5678-90ab-1234567890ab. If you want to use this GUID format, add an entry to the Reco_Config data entity with the following data: Key=ItemIdAsGuid, Value=True (that is, ItemIdAsGuid,True). Otherwise, Intelligent Recommendations fails to generate recommendations.
Item variant IDs should be globally unique (across all items and item variants).
Item variant ID should be left empty for data rows that represent data about an item-master or a standalone item.
Item IDs and item variant IDs are case-insensitive, which means that:
- IDs ABCD1234, abcd1234, and AbCd1234 are all considered the same.
- In recommendations API responses, the returned IDs are all in uppercase.
String attributes have a length limit. String values exceeding their limit is trimmed (excess characters are removed), or the entire data row is dropped (exact behavior is listed in the data entity table for each attribute).
All DateTime values should be in UTC, in the following format: yyyy-MM-ddTHH: mm:ss.fffZ.
All strings, except for item title and item description, are case-insensitive. For example, filter names Color and color are considered the same, and filter value Red is the same as filter value red.
For any nonmandatory attribute that is empty, the default value is used (if a default value is specified).
Boolean values should be either true or false and are case-insensitive (meaning that true is considered the same as True).

Changes from previous version

Here's the list of data contract changes between version 1.3 and version 1.4:

Data Entity	Change Summary
Reco_ItemCategories	Data entity is now supported and can be nonempty.
Reco_ItemAndVariantFilters	FilterName supports custom filter name. Filters now support multi-value filtering (more than one filter value).
Reco_ItemAndVariantAvailabilities	Channel and Catalog now support any string value (not just 0).
Reco_Interactions	Channel and Catalog now support any string value (not just 0).
Reco_ImagesInventory	New data entity.
Reco_ImageToItemMappings	New data entity.