Refresh data sources

Completed

As mentioned in the previous unit, whether you import data using Power Query or ingest data from an Azure Data Lake or Microsoft Dataverse, data sources aren't refreshed automatically. If you want to ensure that new data from your data sources is available in Customer Insights - Data, you need to refresh manually. Once your data is refreshed, the new information is reflected in the tables. After you complete the data unification process, the refresh also refreshes any segments, measures, activities, enrichment, search, data preparation, and insights that are configured in the application.

Refreshing data manually isn't an ideal scenario for most organizations. For Customer Insights - Data to be as effective as possible for your organization, the data should be as current as possible. This helps ensure that the organization has the clearest picture of their customers possible. To assist with this, Customer Insights - Data provide multiple options for refreshing data sources.

These options include:

  • Scheduled data refreshes - Refreshes all of your data sources on a predefined schedule.

  • Incremental refresh - Refreshes smaller subsets of data for a data source based on the configured incremental refresh settings.

  • Near real-time data ingestion - Can ingest profile and activity-related information in near real time.

Scheduled refresh

The easiest way to automate refreshing of your data is to set up scheduled refreshes. Scheduled refreshes are configured by selecting System under the Settings heading in the application navigation. You can modify the schedule from the Schedule tab. When you enable scheduled refreshes, it impacts all your data sources. Not only does it refresh the data, but it also refreshes all other configured items such as segments, measures, activities, enrichment, search, data preparation, and insights.

You need to provide the following options:

  • Repeat (Frequency) - Defines how often to refresh the data. Data can be refreshed daily or weekly. Selecting weekly allows you to define the specific days the schedule should run.

  • Time Zone - Defines the time zone the job is running in.

  • Time - Specifies the time that you want the refresh to occur. If you want refreshes to happen multiple times a day, you can add more times.

Screenshot of where to turn on scheduled refreshes and select a time.

Important

Refresh schedules perform a complete data refresh on ALL configured data sources. It is not possible to configure different schedules for different data sources. Scheduled refreshes typically include large numbers of records and several complex operations.

When a scheduled refresh runs, the following occurs:

  1. Data is pulled from the data source.

  2. Data is unified.

  3. Data is enriched with additional information.

Depending on factors such as the number of data sources and the volume of data scheduled, runs can range from minutes to hours.

Incremental refresh

Incremental refreshes are available for Power Query and Azure Data Lake Storage data sources.

Incrementally refreshing data provides the following advantages:

  • Faster refreshes - Only data that changes gets refreshed.

  • Increased reliability - Due to smaller refreshes, connections to volatile source systems don't need to be maintained for as long, reducing the risk of connection issues.

  • Reduced resource consumption - Refreshing only a subset of your total data leads to more efficient use of computing resources and decreases the environmental footprint.

Power Query

You can incrementally refresh data sources that were imported using Power Query. Not all data sources imported through Power Query support incremental refresh. You need to verify that the data source you work with allows it. When you select a data source that allows incremental refresh such as an Azure SQL DB, you're provided with incremental refresh settings to configure after you transform your data. You can modify the refresh settings for all entities selected when you create the data source.

For example, a historical data set stored in an Azure SQL database could contain thousands or even millions of records that span multiple years. With incremental refreshes, you can choose to refresh only records that were created or modified in the last five days.

For each table, you need to provide the following details:

  • Primary key - Select a primary key for the table or table.

  • Last updated field - Specifies the attribute that indicates when the records were last updated. It's used to identify records that fall within the incremental refresh time frame.

  • Check for updates every - Specifies how long the incremental refresh time frame is.

schedule incremental refreshes by setting the primary key, last updated field, and check for updates.

Azure Data Lake Storage

Incremental ingestion and refreshes can be configured for Azure Data Lake storage data sources. Incremental ingestion and refresh for a table can be configured when adding the Azure Data Lake data source or later by editing the data source.

In order to use incremental refreshes, the table data folder must contain the following folders:

  • FullData: Folder with data files containing initial records

  • IncrementalData: Folder with date/time hierarchy folders in yyyy/mm/dd/hh format containing the incremental updates.

    • hh represents the UTC hour of the updates and contains the Upserts and Deletes folders.

      • Upserts: Data files with updates to existing records or new records.

      • Deletes: Data files with records to remove.

For more information on incremental refreshes for Azure Data Lake Storage data sources, and step-by-step instructions, see: Configure incremental refresh for Azure Data Lake Storage.

Real-time data ingestion

Customer Insights - Data's near real-time functionality allows you to see, within seconds, the latest interactions that your customers make with your products or services. The purpose of real-time updates is to keep customer profile and activity information current in near real time. So it can be consumed immediately, until a scheduled refresh job updates the data from the data source.

Real-time operations take place after data unification happens and they only apply to unified customer profiles. For this reason, existing customer profiles can be updated, but new profiles can't be created. Existing profiles can't be deleted. So, real-time profile changes don't update measures, segment membership, or enrichments. Once the normal schedule refresh occurs, items like measures, segments, and enrichments update as normal.

Note

Exporting real-time updates to external systems, like Power BI, is not possible at the moment.

Using the real-time API allows you to publish a new activity from your source system (an individual source record) to a unified customer profile. The new activity is available as a unified activity in that unified customer profile's timeline within seconds. You can see the timeline in the customer card view or any other timeline integration you configured. Like customer profile information that is updated in real-time, new activities aren't reflected in segments or measures until a schedule refresh occurs. Activities added only through the real-time API aren't part of exports and don't show up in Power BI.

There are two ways to connect to the real-time API:

  • Indirectly - Uses the Dynamics 365 Customer Insights - Data connector.

  • Directly - Programmatically through code.

Regardless of the method you select, both methods require:

  • A Dynamics 365 Customer Insights - Data environment,

  • That unified customer profiles are created,

  • That activities are configured and run,

  • And that the account used to configure has either contributor or administrator permissions to authenticate your account.

The information discussed in the unit was designed to provide a starting point in working with real-time ingestion. For more detailed information Real-time data ingestion, refer to the following links: