您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

运用 Azure 针对零售业提供产品建议

Blob 存储
事件中心
HDInsight
流分析
Power BI

解决方案构想 Solution Idea

若要查看有关详细信息、实现细节、定价指南或代码示例的信息,请向我们提供 GitHub 反馈If you'd like to see us expand this article with more information, implementation details, pricing guidance, or code examples, let us know with GitHub Feedback!

在客户感兴趣和购买模式之间深入了解,是任何零售商业智能操作的关键组成部分。A deep understanding between customer interests and purchasing patterns is a critical component of any retail business intelligence operation. 该解决方案实现了一个将客户数据聚合到完整概况资料的过程,并使用以 Azure 的可靠性和处理能力为后盾的高级机器学习模型,就模拟的客户提供预测性见解。This solution implements a process of aggregating customer data into a complete profile, and uses advanced machine learning models backed by the reliability and processing power of Azure to provide predictive insights on simulated customers.

体系结构Architecture

体系结构关系图 下载此体系结构的SVGArchitecture diagram Download an SVG of this architecture.

说明Description

有关如何构建此解决方案的更多详细信息,请访问 GitHub中的解决方案指南。For more details on how this solution is built, visit the solution guide in GitHub.

典型的零售企业通过各种渠道(包括 web 浏览模式、购买行为、人口统计信息和其他基于会话的 web 数据)收集客户数据。A typical retail business collects customer data through a variety of channels, including web-browsing patterns, purchase behaviors, demographics, and other session-based web data. 某些数据源自核心业务运营,但其他数据必须从合作伙伴、制造商、公共域等外部源进行拉取和联接。Some of the data originates from core business operations, but other data must be pulled and joined from external sources like partners, manufacturers, public domain, etc.

许多企业只利用一小部分的可用数据,但为了最大限度地提高投资回报,企业必须集成来自所有源的相关数据。Many businesses leverage only a small portion of the available data, but in order to maximize ROI, a business must integrate relevant data from all sources. 传统上,将外部数据源和异类数据源集成到共享数据处理引擎中需要进行大量的工作量和资源设置。Traditionally, the integration of external, heterogeneous data sources into a shared data processing engine has required significant effort and resources to set up. 此解决方案介绍了一种简单、可扩展的方法,可将分析和机器学习集成,以预测客户购买活动。This solution describes a simple, scalable approach to integrating analytics and machine learning to predict customer purchasing activity.

此解决方案通过以下方式解决了上述问题:This solution addresses the above problems by:

  • 统一访问来自多个数据源的数据,同时最大限度地减少数据移动和系统复杂性,以便提高性能。Uniformly accessing data from multiple data sources while minimizing data movement and system complexity in order to boost performance.
  • 执行 ETL 和特征工程需要使用预测机器学习模型。Performing ETL and feature engineering needed to use a predictive Machine Learning model.
  • 通过在 Microsoft R Server 和 Azure HDInsight 支持的分布式系统中运行的预测分析,创建综合性 customer 360 配置文件。Creating a comprehensive customer 360 profile enriched by predictive analytics running across a distributed system backed by Microsoft R Server and Azure HDInsight.

数据流Data Flow

  1. 数据生成器管道将客户事件模拟到事件中心A Data Generator pipes simulated customer events to an Event Hub
  2. 流分析作业从 EventHub 读取,执行聚合A Stream Analytics job reads from the EventHub, performs aggregations
  3. 流分析将按时间分组的数据保存到 Azure 存储 BlobStream Analytics persists time-grouped data to an Azure Storage Blob
  4. 在 HDInsight 中运行的 Spark 作业将最新客户浏览数据与历史购买数据和人口统计数据合并,以生成合并的用户配置文件A Spark job running in HDInsight merges the latest customer browsing data with historical purchase and demographic data to build a consolidated user profile
  5. 第二个 Spark 作业对机器学习模型的每个客户配置文件进行评分,以预测未来的购买模式 (换言之,这是一位客户可能在接下来的30天内购买一次,如果是,则在哪一种产品类别? ) A second Spark job scores each customer profile against a machine learning model to predict future purchasing patterns (in other words, is a given customer likely to make a purchase in the next 30 days, and if so, in which product category?)
  6. 预测和其他配置文件数据作为 Power BI Online 中的图表和表进行可视化和共享Predictions and other profile data are visualized and shared as charts and tables in Power BI Online