Pricing scenario using Dataflow Gen2 to load 2 GB of on-premises CSV data to a Lakehouse table

In this scenario, Dataflow Gen2 was used to load 2 GB of on-premises CSV data to a Lakehouse table in Microsoft Fabric.

The prices used in the following example are hypothetical and don’t intend to imply exact actual pricing. These are just to demonstrate how you can estimate, plan, and manage cost for Data Factory projects in Microsoft Fabric. Also, since Fabric capacities are priced uniquely across regions, we use the pay-as-you-go pricing for a Fabric capacity at US West 2 (a typical Azure region), at $0.18 per CU per hour. Refer here to Microsoft Fabric - Pricing to explore other Fabric capacity pricing options.

Configuration

To accomplish this scenario, you need to create a dataflow with the following steps:

  1. Initialize Dataflow: Start by uploading 2 GB CSV files from your on-premises environment into the dataflow.
  2. Configure Power Query:
    1. Navigate to Power Query.
    2. Disable the option for staging the query.
    3. Proceed to combine the CSV files.
  3. Data Transformation:
    1. Promote headers for clarity.
    2. Remove unnecessary columns.
    3. Adjust column data types as needed.
  4. Define Output Data Destination:
    1. Configure Lakehouse as the data output destination.
    2. In this example, a Lakehouse within Fabric was created and utilized.

Cost estimation using the Fabric Metrics App

Screenshot showing the duration and CU consumption of the job in the Fabric Metrics App.

Screenshot showing details of the Dataflow Gen2 Refresh cost

Screenshot showing details of a Dataflow Gen2 High Scale Dataflow Compute consumption used in the run.

Screenshot showing details of a second Dataflow Gen2 High Scale Dataflow Compute consumption used in the run.

The Dataflow Gen2 Refresh operation consumed 4749.42 CU seconds, and two High Scale Dataflows Compute operations consumed 7.78 CU seconds + 7.85 CU seconds each.

Note

Although reported as a metric, the actual duration of the run isn't relevant when calculating the effective CU hours with the Fabric Metrics App since the CU seconds metric it also reports already accounts for its duration.

Metric Compute consumption
Dataflow Gen2 Refresh CU seconds 4749.42 CU seconds
High Scale Dataflows Compute CU seconds (7.78 + 7.85) 15.63 CU seconds
Effective CU hours billed (4749.42 + 15.63) / (60*60) = 1.32 CU hours

Total run cost at $0.18/CU hour = (1.32 CU-hours) * ($0.18/CU hour) ~= $0.24