Retail product recommendations

Solution Idea

If you'd like to see us expand this article with more information (implementation details, pricing guidance, code examples, etc), let us know with GitHub Feedback!

A deep understanding between customer interests and purchasing patterns is a critical component of any retail business intelligence operation. This solution implements a process of aggregating customer data into a complete profile, and uses advanced machine learning models backed by the reliability and processing power of Azure to provide predictive insights on simulated customers.

Architecture

Architecture diagram Download an SVG of this architecture.

Description

For more details on how this solution is built, visit the solution guide in GitHub.

A typical retail business collects customer data through a variety of channels, including web-browsing patterns, purchase behaviors, demographics, and other session-based web data. Some of the data originates from core business operations, but other data must be pulled and joined from external sources like partners, manufacturers, public domain, etc.

Many businesses leverage only a small portion of the available data, but in order to maximize ROI, a business must integrate relevant data from all sources. Traditionally, the integration of external, heterogeneous data sources into a shared data processing engine has required significant effort and resources to set up. This solution describes a simple, scalable approach to integrating analytics and machine learning to predict customer purchasing activity.

This solution addresses the above problems by:

  • Uniformly accessing data from multiple data sources while minimizing data movement and system complexity in order to boost performance.
  • Performing ETL and feature engineering needed to use a predictive Machine Learning model.
  • Creating a comprehensive customer 360 profile enriched by predictive analytics running across a distributed system backed by Microsoft R Server and Azure HDInsight.

Data Flow

  1. A Data Generator pipes simulated customer events to an Event Hub
  2. A Stream Analytics job reads from the EventHub, performs aggregations
  3. Stream Analytics persists time-grouped data to an Azure Storage Blob
  4. A Spark job running in HDInsight merges the latest customer browsing data with historical purchase and demographic data to build a consolidated user profile
  5. A second Spark job scores each customer profile against a machine learning model to predict future purchasing patterns (in other words, is a given customer likely to make a purchase in the next 30 days, and if so, in which product category?)
  6. Predictions and other profile data are visualized and shared as charts and tables in Power BI Online