Interactive querying with HDInsight

Data Factory
Data Lake Storage
HDInsight
Power BI

Solution Idea

If you'd like to see us expand this article with more information, implementation details, pricing guidance, or code examples, let us know with GitHub Feedback!

Perform fast, interactive SQL like queries at scale over structured or unstructured data with Apache Hive LLAP on Azure HDInsight.

Architecture

Architecture Diagram Download an SVG of this architecture.

Data Flow

  1. Move data between Azure cloud or any other non Azure cloud using Azure Data Factory
  2. Create a data landing zone using Azure Data Lake Gen2 service, which is also the primary storage account for the Azure HDInsights hadoop cluster
  3. Run ELT procedures using Azure Data Factory or Hive to transform incoming data in HDFS
  4. Create external tables in Hive using this data in HDFS
  5. Use Power BI to interpret this data and create new visualizations

Components

  • Azure Data Factory is a hybrid data integration service that allows you to create, schedule and orchestrate your ETL/ELT workflows.
  • Azure Data Lake Storage is a set of capabilities such as file system semantics, and file-level security dedicated to big data analytics built on Azure Blob storage.
  • Azure HDInsight makes it easy, fast, and cost-effective to process massive amounts of data. You can use the most popular open-source frameworks such as Hadoop, Spark, Hive, LLAP, Kafka, Storm, R, and more.
  • Power BI is a suite of business analytics tools that deliver insights throughout your organization. Connect to hundreds of data sources, simplify data prep, and drive adhoc analysis.

See Also