Working with Data in Customer Insights

[This topic is pre-release documentation and is subject to change.]

Customer Insights is essentially a data collection, preparation, modeling, analysis, and presentation service. It provides a solution to the problem of how organizations can best use the increasing volume and types of "big data" they are either amassing or have access to.

Customer Insights Domain Modeling

As explained in the Overview topic, Customer Insights models any business domain using the following primary platform modeling types. These types typically contain a collection of custom properties that refine the type or provide additional context for its use. The supported fundamental data types for properties are bool, byte, datetimeoffset, decimal, double, guid, int, long, short and string.

  • A Profile represents a real-world entity or concept in the business domain, such as an organization, customer, asset, or an email message. Each such domain entity type is modeled as an instance of a corresponding profile type. Profiles describe their associated entities through a collection of properties. Each profile type contains a key, composed of a single property or a combination (tuple) of properties that enable the lookup of specific profile instances.
  • A Relationship represents an explicit connection between two different profile types. For example, a new customer might be assigned to a specific salesperson. A profile may have any number of associated relationships. Relationships always have a primary (forward) direction, although a relationship can always be navigated forward (from source to target) or backward (from target to source). A model may or may not contain a mirror relationship that explicitly denotes the reverse connection. For example, SalespersonCustomer and CustomerSalesperson relationships would mirror each other.
    Note that Customer Insights can also discover implicit relationships, as described in the section Relationship analysis below.
  • An Interaction represents a business event, process or activity. Interactions are usually performed by or target a primary profiled entity, although they may also have secondary participants or targets. Interactions are commonly used to model transactions and interchanges across line-of-business (LOB) systems or communication channels. By default, interaction instances are immutable—once created they cannot be modified or deleted. Another interaction within the same business process, usually at a later time, is represented by a new interaction instance. In this way, time-series representations of processes are supported. (Interactions have an associated system property called Timestamp that typically indicates when the interaction was created or last updated in the Customer Insights service.)
    An Activity is a special category of Interaction that represents an ongoing business process. An activity is mutable and is often periodically enriched with data from multiple data sources (for example, from connectors or interactions using links). An Interaction is marked as an Activity by setting the isActivity property to true during creation.

  • A Link associates an Interaction to a Profile. Links are often represent a single action or activity, for example deposit being made into a bank branch. Links come in two varieties, reference and non-reference links.

    • Reference links refer to the source of their data in another entity instance. As a result they "contain" the most recent value for that field.
    • A non-reference link contains a copy of the data from the entity/field to which it is associated with. Non-reference links perform the vital roll in upset operations, where Profile or Interaction instances are populated from a data source (using an appropriate connector).
  • A RelationshipLink associates a Relationship with an Interaction. A relationship link represents one mechanism to either create or update a relationship instance from an interaction.

Other entities fortify this primary model, including: Segments enable subgrouping, KPIs define metrics, and so on.

Model Ontology and Implementation

Customer Insights implements an extensible entity framework based upon the ontology. Customer Insights implements its business models as custom graph databases. Each Customer Insights tenant has its own instance of a graph database with its own customized ontology. The following table lists the equivalent entity types between these two platforms as well as the graph implementation type for that entity. Customer Insights uses the Graph Extensions in Microsoft SQL Server 2017 to store instances of these entity-relationships.

Customer Insights Entity Entity Implementation
Profile Thing node
Relationship edge
Interaction Action edge (but sometimes node)
Link (reference) edge
RelationshipLink edge

If you are using or plan to incorporate a schema into your solution's data model, Customer Insights provides support through the SchemaItemTypeLink and SchemaItemPropLink properties found in the various entity types, such as Profile, Interaction, and Field.

Custom Type Structure and Implementation

Developers may find it beneficial to understand how the primary custom types are constructed in Customer Insights.

Each of the Profile types defined on the hub become an open entity type on the OData API for the hub, and each has an entity set of its own (defined by the ApiEntitySetName parameter when the Profile type is defined. Each of the Profile entity types have a key field called ProfileId (edm string type) that acts as the key for the Profile entity set feed. The ProfileId is generated based on the IdPropertyNames specified in the Profile type model when the Profile type is created. Basically it is a concatenation of the values of the Id properties (using '_' as the separator) in lexical order. Each of the Profile entity types also has a navigation link to Interactions associated with the Profile.

Interactions are exposed by the Interactions entity set, and are backed by an open entity Type 'Interaction'. Interaction entity type has three declared fields: InteractionId (edm string), InteractionType (edm string), and Timestamp (edm datetimeoffset). InteractionId is generated in similar way as ProfileId described above, using the IdPropertyNames for the modeled Interaction type. If the IdPropertyNames are not specified for an Interaction type, then the system generates the InteractionId. InteractionType is the name of the modeled Interaction type. Timestamp is system generated if the TimeStampPropertyName is not used when modeling the Interaction type.

Key Performance Indicators (KPIs)

A KPI is a quantifiable measure of progress or success against a business objective. As such they represent a form of simple data analysis. Customer Insights supports the definition of KPIs based upon properties of the Profile or Interaction types. Depending upon the connector and data source used, KPIs may be imported from the data source. More likely, KPIs are defined either through the user interface (see Data Modeling) or through the KPI Management (ARM or Hub) APIs.

A KPI can be used to modify a property of a Profile. Such a KPI is called an enriching KPI. Enriching KPIs for a specific profile type can be obtained through the Get Enriching KPIs operation, available in both the ARM or Hub APIs.

Data Sources and Ingestion

Data sources are used to populate or enrich type instances within a Customer Insights solution. There are three primary sources of data:

  • External data sources via connectors
  • Internally generated data via Link definitions or enriching KPIs
  • Direct data injection using the Hub Data APIs (external and/or internally generated data)

For more information about adding data sources though the user interface, see Add a Data Source.

The typical application will see the majority of its data imported from external sources—including other Dynamics 365 applications—in a process called data ingestion. Customer Insights can work with a wide variety of data with the following characteristics:

  • Structured, semi-structured, or unstructured data
  • Fixed or dynamic schema
  • One-time static, periodic or streaming (dynamic) content

Customer Insights automatically updates dynamic content on a periodic basis. By default, the refresh rate is set to 15 minutes. Data ingestion involves several general steps:

  1. Reading the external data
  2. Mapping the data onto the appropriate (new or existing) Customer Insights data model
  3. Creating the corresponding instances of Customer Insights entities

Customer Insights provides a set of standard inbound connectors to enable ingestion from popular data sources, including the Dynamics 365 Customer Engagement Connector and the Azure Storage (blob) Connector. Additional third party connectors will be offered in the AppSource marketplace. For more information about programmatic access, see the Connector entity type. For more information about the user experience (UX) around selecting data sources, see Add a data source.

Standard Connectors

The following table compares the standard connectors with respect to some common connector characteristics. In the table, "source" refers to the data source, whereas "target" refers to the associated Customer Insights data model (or instances thereof).

Connector Characteristic Customer Engagement Connector Azure Storage Connector
Compatible data sources Dynamics 365 Customer Engagement solution Azure blob
Data modeling and mapping Connector understands and automatically maps source types to target types, creating target entity types where required. Requires interactive user input to map source and target types.
Data preparation Connector automatically prepares and transforms source data User can chose from a limited set of data formatting transformations
Data sync policy and frequency Automatically performs full sync initially and re-syncs whenever source data changes Users specify sync policy and frequency
Data metadata changes Automatically updates target data model Users must manage changes manually

Profile Predictive Matching

Data that is ingested from multiple data sources often contains information about the same entity. Unfortunately, without a shared unique identifier, it is non-trivial to match the same entity instance across data sources. Predictive matching, sometimes called conflation, enables this matching, based upon contextual information. This feature also optionally enables merging these separate references based upon specified criteria. By combining information in this way, data is enriched. (Conflation is also useful in duplicate record detection and removal.) Note that conflation is generally a non-trivial process because different sources rarely use a shared unique entity identifier, and data quality often varies between sources.

Customer Insights offers optional conflation processing that initially supports:

  • Profile-to-profile matching, where the profiles represent individuals or organizations. Matching typically focusses on common properties such as name, title, address, email address, phone number, organization name, and web domain.
  • Profile-to-interaction matching, useful where the original source for the interaction data does not contain separate, well-defined profile information.

Predictive matching works across common name spellings, nicknames, partial data (e.g. partial phone numbers or addresses), and organization membership. This process is controlled by an associated conflation policy, which also contains a match threshold that indicates the minimum strength required for acceptance of a match.

For more information about the conflation UX, see Predictive matching. For more information about the programming interface, see Predictive Matching Policy APIs.

Relationship Analysis

In addition to relationships explicitly specified by the data model, Customer Insights can discover "hidden relationships" in data sets. These come in two varieties:

  • Indirect relationships occur when entities are involved in the same interaction, share the same resource or characteristic, or have a order-removed relationship. Examples include items frequently purchased together suggestions, people who share the same home phone number, and friend-of-friend networking.
  • Inferred relationships are only be deduced through contextual or statistical analysis. For example, political affiliation might be inferred based upon characteristics such as home zip code, occupation, club memberships, and so on.

For more information, see the Suggest Relationships for Interaction Type ARM or Hub operations.

Data Segmentation

Even with advanced analysis and metrics, it is often difficult to perceive trends within a large entity set. Customer Insights supports the process of segmentation, subdividing the set into multiple subsets based upon specified criteria. Although segmentation is commonly used for marketing investigations of and campaigns with current or potential customers, it is a powerful general-purpose tool for understanding subgroups in a larger population. Designing appropriate segments is critical to such efforts, as segments can depend on a wide range of profile properties, interactions, and relationships with other entities.

Segments can either have static membership, dynamic membership based upon a query expression, or some compounding of the two. Segments are native types in Customer Insights, and the full range of CRUD operations is supported, as described in the topic Segmentation Management APIs (Hub). Segmentation is also supported extensively through the Segment Exploration and Segment Builder user experiences; for more information, see Segment your insights.

Predictive Scoring

While predictive matching assists in intelligently conflating ingested records, predictive scoring applies machine learning (ML) to analyze modeled data against a business objective specified by the user. The result of this analysis is a relative weighting for each profile instance against the objective. Then the user can apply thresholds and relative gradings against these results. Once configured, prediction scoring is automatically applied against relevant incoming data. The following table outlines the general process followed when using predictive scoring.

Processing Step Responsible Party Description/Notes
Specification of business objective (outcome) User/Programmer Outcome data must already exist in the model or be readily calculated.
Sample data analysis Prediction engine Applies machine learning techniques to the Customer Insights solution model:
1. Analyzes and prepares model data
2. Selects best positive and negative features (descriptive model)
3. Creates predictive model (best-fit learning algorithm)
4. Validates and iteratively improves model
Score relevant profiles Prediction engine Applies predictive model with identified factors to score associated profiles
Grading of profiles User/Programmer Application of thresholds to "bucketize" profiles by expected outcomes
Prediction insights User Examination of predictive results leads to better business management (resource allocation and process optimization)

Much of complexity of using traditional machine learning techniques—such as feature selection, learning algorithm selection and training-testing cycles—occur automatically and are thus mostly hidden from the user.