何謂 Azure Databricks 工作區?What is Azure Databricks Workspace?

Azure Databricks 工作區是 Apache Spark 型分析平台。Azure Databricks Workspace is an analytics platform based on Apache Spark. Azure Databricks 工作區可與 Azure 整合,提供一鍵式設定、順暢的工作流程以及互動式的工作區,可讓資料工程師、資料科學家及機器學習工程師共同作業。Azure Databricks Workspace is integrated with Azure to provide one-click setup, streamlined workflows, and an interactive workspace that enables collaboration between data engineers, data scientists, and machine learning engineers.

什麼是 Azure Databricks?(機器翻譯)What is Azure Databricks?

對於巨量資料管線,資料 (原始或結構化) 會透過 Azure Data Factory 分批內嵌到 Azure 中,或使用 Apache Kafka、事件中樞或 IoT 中樞以近乎即時的方式進行串流處理。For a big data pipeline, the data (raw or structured) is ingested into Azure through Azure Data Factory in batches, or streamed near real-time using Apache Kafka, Event Hub, or IoT Hub. 此資料會放置在 Data Lake 中長期持續儲存、在 Azure Blob 儲存體或 Azure Data Lake 儲存體中。This data lands in a data lake for long term persisted storage, in Azure Blob Storage or Azure Data Lake Storage. 在您的分析工作流程中,使用 Azure Databricks 從多個資料來源 (例如 Azure Blob 儲存體Azure Data Lake StorageAzure Cosmos DBAzure SQL 資料倉儲) 讀取資料,並使用 Spark 將它轉換成突破性見解。As part of your analytics workflow, use Azure Databricks to read data from multiple data sources such as Azure Blob Storage, Azure Data Lake Storage, Azure Cosmos DB, or Azure SQL Data Warehouse and turn it into breakthrough insights using Spark.

Databricks 管線

Apache Spark 分析平台Apache Spark analytics platform

Azure Databricks 工作區包含完整的開放原始碼 Apache Spark 叢集技術和功能。Azure Databricks Workspace comprises the complete open-source Apache Spark cluster technologies and capabilities. Azure Databricks 工作區中的 Spark 包括下列元件:Spark in Azure Databricks Workspace includes the following components:

Azure Databricks 中的 Apache SparkApache Spark in Azure Databricks

  • Spark SQL 和 Dataframe:Spark SQL 是用於使用結構化資料的 Spark 模組。Spark SQL and DataFrames: Spark SQL is the Spark module for working with structured data. DataFrame 則是組織成具名資料行的分散式資料集合。A DataFrame is a distributed collection of data organized into named columns. 其概念等同於關聯式資料庫中的資料表或 R/Python 中的資料框架。It is conceptually equivalent to a table in a relational database or a data frame in R/Python.

  • 串流:分析應用程式和互動式應用程式的即時資料處理和分析。Streaming: Real-time data processing and analysis for analytical and interactive applications. 與 HDFS、Flume 和 Kafka 整合。Integrates with HDFS, Flume, and Kafka.

  • MLlib:一種機器學習程式庫,由常見的學習演算法和公用程式 (包括分類、迴歸、群集、協同篩選、維度縮減,以及基礎最佳化基本項目) 所組成。MLlib: Machine Learning library consisting of common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as underlying optimization primitives.

  • GraphX:圖表和圖表計算,適用於廣泛的使用案例範圍,從認知分析到資料探索。GraphX: Graphs and graph computation for a broad scope of use cases from cognitive analytics to data exploration.

  • Spark Core API:包括 R、SQL、Python、Scala 和 Java 的支援。Spark Core API: Includes support for R, SQL, Python, Scala, and Java.

Azure Databricks 工作區中的 Apache SparkApache Spark in Azure Databricks Workspace

Azure Databricks 工作區是以 Spark 的功能為基礎所建置,其方法是藉由提供包含下列各項的零管理雲端平台:Azure Databricks Workspace builds on the capabilities of Spark by providing a zero-management cloud platform that includes:

  • 完全受控的 Spark 叢集Fully managed Spark clusters
  • 適用於探索和視覺效果的互動式工作區An interactive workspace for exploration and visualization
  • 為您最愛的 Spark 應用程式賦予能力的平台A platform for powering your favorite Spark applications

雲端中完全受控的 Apache Spark 叢集Fully managed Apache Spark clusters in the cloud

Azure Databricks 在雲端中擁有安全而可靠的生產環境,並由 Spark 專家管理和支援。Azure Databricks has a secure and reliable production environment in the cloud, managed and supported by Spark experts. 您可以:You can:

  • 在數秒鐘內建立叢集。Create clusters in seconds.
  • 動態地自動擴大和縮減叢集規模,並與所有小組分享這些叢集。Dynamically autoscale clusters up and down and share them across teams.
  • 透過叫用 REST API,以程式設計方式使用叢集。Use clusters programmatically by invoking REST APIs.
  • 使用以 Spark 為基礎所建置的安全資料整合功能,讓您不必集中資料就可予以整合。Use secure data integration capabilities built on top of Spark that enable you to unify your data without centralization.
  • 立即存取每個版本的最新 Apache Spark 功能。Get instant access to the latest Apache Spark features with each release.

Databricks 執行階段Databricks Runtime

Databricks Runtime 是以 Apache Spark 為基礎所建置,並且原生就是針對 Azure 雲端所建置的。Databricks Runtime is built on top of Apache Spark and is natively built for the Azure cloud.

Azure Databricks 可完全去除基礎結構的複雜性,讓您不必擁有特殊的專業技術,就能安裝和設定資料基礎結構。Azure Databricks completely abstracts out the infrastructure complexity and the need for specialized expertise to set up and configure your data infrastructure.

至於重視生產作業效能的資料工程師,Azure Databricks 則提供 Spark 引擎,其可透過各種最佳化在 I/O 層和處理層 (Databricks I/O) 獲得較快的速度和較優異的效能。For data engineers, who care about the performance of production jobs, Azure Databricks provides a Spark engine that is faster and performant through various optimizations at the I/O layer and processing layer (Databricks I/O).

共同作業的工作區Workspace for collaboration

透過共同作業且整合的環境,Azure Databricks 可簡化資料探索、建立原型和在 Spark 中執行資料導向應用程式的程序。Through a collaborative and integrated environment, Azure Databricks streamlines the process of exploring data, prototyping, and running data-driven applications in Spark.

  • 決定如何利用簡單的資料探索來使用資料。Determine how to use data with easy data exploration.
  • 在 Notebook 中以 R、Python、Scala 或 SQL 記載進度。Document your progress in notebooks in R, Python, Scala, or SQL.
  • 點選幾下即可將資料視覺化,並使用熟悉的工具,例如 Matplotlib、ggplot 或 d3。Visualize data in a few clicks, and use familiar tools like Matplotlib, ggplot, or d3.
  • 使用互動式儀表板來建立動態報告。Use interactive dashboards to create dynamic reports.
  • 使用 Spark,並同時與資料互動。Use Spark and interact with the data simultaneously.

企業安全性Enterprise security

Azure Databricks 工作區可提供企業級的 Azure 安全性,包括 Azure Active Directory 整合、角色型控制,以及可保護您的資料和業務的 SLA。Azure Databricks Workspace provides enterprise-grade Azure security, including Azure Active Directory integration, role-based controls, and SLAs that protect your data and your business.

  • 與 Azure Active Directory 整合可讓您使用 Azure Databricks 執行完整的 Azure 型解決方案。Integration with Azure Active Directory enables you to run complete Azure-based solutions using Azure Databricks.
  • Azure Databricks 的角色型存取可讓您針對 Notebook、叢集、作業和資料提供更細緻的使用者權限。Azure Databricks roles-based access enables fine-grained user permissions for notebooks, clusters, jobs, and data.
  • 企業級 SLA。Enterprise-grade SLAs.

重要

Azure Databricks 工作區是部署在全域 Azure 公用雲端基礎結構上的 Microsoft Azure 第一方服務。Azure Databricks Workspace is a Microsoft Azure first-party service that is deployed on the Global Azure Public Cloud infrastructure. 服務元件之間的所有通訊 (包括控制平面和客戶資料平面中公用 IP 間的通訊) 都會保留在 Microsoft Azure 網路骨幹內。All communications between components of the service, including between the public IPs in the control plane and the customer data plane, remain within the Microsoft Azure network backbone. 另請參閱 Microsoft 全域網路See also Microsoft global network.

Azure 服務整合Integration with Azure services

Azure Databricks 工作區會與 Azure 資料庫和存放區深入整合:Synapse Analytics、Cosmos DB、Data Lake Store 和 Blob 儲存體。Azure Databricks Workspace integrates deeply with Azure databases and stores: Synapse Analytics, Cosmos DB, Data Lake Store, and Blob storage.

Power BI 整合Integration with Power BI

透過 Power BI 的豐富整合,Azure Databricks 工作區可讓您快速且輕鬆地探索和分享具有影響力的深入解析。Through rich integration with Power BI, Azure Databricks Workspace allows you to discover and share your impactful insights quickly and easily. 您也可以使用其他 BI 工具,例如 Tableau Software。You can use other BI tools as well, such as Tableau Software.

後續步驟Next steps