Solution Idea
If you'd like to see us expand this article with more information, implementation details, pricing guidance, or code examples, let us know with GitHub Feedback!
Perform fast, interactive SQL like queries at scale over structured or unstructured data with Apache Hive LLAP on Azure HDInsight.
Architecture
Download an SVG of this architecture.
Data Flow
- Move data between Azure cloud or any other non Azure cloud using Azure Data Factory
- Create a data landing zone using Azure Data Lake Gen2 service, which is also the primary storage account for the Azure HDInsights hadoop cluster
- Run ELT procedures using Azure Data Factory or Hive to transform incoming data in HDFS
- Create external tables in Hive using this data in HDFS
- Use Power BI to interpret this data and create new visualizations
Components
- Azure Data Factory is a hybrid data integration service that allows you to create, schedule and orchestrate your ETL/ELT workflows.
- Azure Data Lake Storage is a set of capabilities such as file system semantics, and file-level security dedicated to big data analytics built on Azure Blob storage.
- Azure HDInsight makes it easy, fast, and cost-effective to process massive amounts of data. You can use the most popular open-source frameworks such as Hadoop, Spark, Hive, LLAP, Kafka, Storm, R, and more.
- Power BI is a suite of business analytics tools that deliver insights throughout your organization. Connect to hundreds of data sources, simplify data prep, and drive adhoc analysis.
See Also
- Create a data pipeline to derive sales insights in Azure HDInsight build an end-to-end data pipeline that performs extract, transform, and load (ETL) operations.
- Visualize Apache Hive data with Microsoft Power BI learn how to connect Microsoft Power BI Desktop to Azure HDInsight using ODBC and visualize Apache Hive data.
- Apache Hive and HiveQL on Azure HDInsight is a data warehouse system for Apache Hadoop. Hive enables data summarization, querying, and analysis of data. Hive queries are written in HiveQL, which is a query language similar to SQL.