Retrieve large cost datasets recurringly with exports

This article helps you regularly export large amounts of data with exports from Cost Management. Exporting is the recommended way to retrieve unaggregated cost data. Especially when usage files are too large to reliably call and download using the Cost Details API. Exported data is placed in the Azure Storage account that you choose. From there, you can load it into your own systems and analyze it as needed. To configure exports in the Azure portal, see Export data.

If you want to automate exports at various scopes, the sample API request in the next section is a good starting point. You can use the Exports API to create automatic exports as a part of your general environment configuration. Automatic exports help ensure that you have the data that you need. You can use in your own organization's systems as you expand your Azure use.

Common export configurations

Before you create your first export, consider your scenario and the configuration options need to enable it. Consider the following export options:

  • Recurrence - Determines how frequently the export job runs and when a file is put in your Azure Storage account. Choose between Daily, Weekly, and Monthly. Try to configure your recurrence to match the data import jobs used by your organization's internal system.
  • Recurrence Period - Determines how long the Export remains valid. Files are only exported during the recurrence period.
  • Time Frame - Determines the amount of data that's generated by the export on a given run. Common options are MonthToDate and WeekToDate.
  • StartDate - Configures when you want the export schedule to begin. An export is created on the StartDate and then later based on your Recurrence.
  • Type - There are three export types:
    • ActualCost - Shows the total usage and costs for the period specified, as they're accrued and shows on your bill.
    • AmortizedCost - Shows the total usage and costs for the period specified, with amortization applied to the reservation purchase costs that are applicable.
    • Usage - All exports created before July 20 2020 are of type Usage. Update all your scheduled exports as either ActualCost or AmortizedCost.
  • Columns – Defines the data fields you want included in your export file. They correspond with the fields available in the Cost Details API.
  • Partitioning - Set the option to true if you have a large dataset and would like it to be broken up into multiple files. This makes data ingestion much faster and easier. For more information about partitioning, see File partitioning for large datasets.

Create a daily month-to-date export for a subscription

Request URL: PUT https://management.azure.com/{scope}/providers/Microsoft.CostManagement/exports/{exportName}?api-version=2020-06-01

{
  "properties": {
    "schedule": {
      "status": "Active",
      "recurrence": "Daily",
      "recurrencePeriod": {
        "from": "2020-06-01T00:00:00Z",
        "to": "2020-10-31T00:00:00Z"
      }
    },
    "format": "Csv",
    "deliveryInfo": {
      "destination": {
        "resourceId": "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/MYDEVTESTRG/providers/Microsoft.Storage/storageAccounts/{yourStorageAccount} ",
        "container": "{yourContainer}",
        "rootFolderPath": "{yourDirectory}"
      }
    },
    "definition": {
      "type": "ActualCost",
      "timeframe": "MonthToDate",
      "dataSet": {
        "granularity": "Daily",
        "configuration": {
          "columns": [
            "Date",
            "MeterId",
            "ResourceId",
            "ResourceLocation",
            "Quantity"
          ]
        }
      }
    }
}

Copy large Azure storage blobs

You can use Cost Management to schedule exports of your Azure usage details into your Azure Storage accounts as blobs. The resulting blob sizes could be over gigabytes in size. The Cost Management team worked with the Azure Storage team to test copying large Azure storage blobs. The results are documented in the following sections. You can expect to have similar results as you copy storage blobs from one Azure region to another.

To test its performance, the team transferred blobs from storage accounts in the US West region to the same and other regions. The team measured speeds that ranged from 2 GB per second in the same region to 150 MB per second to storage accounts in the South East Asia region.

Test configuration

To measure blob transfer speeds, the team created a simple .NET console application referencing the latest version (v2.0.1) of the Azure Data Movement Library (DLM) via NuGet. DLM is an SDK provided by the Azure Storage team that enables programmatic access to their transfer services. Then they created Standard V2 storage accounts in multiple regions and use the West US as the source region. They populated the storage accounts there with containers, where each held ten 2-GB block blobs. They copied the containers to other storage accounts using DLM's TransferManager.CopyDirectoryAsync() method with the CopyMethod.ServiceSideSyncCopy option. Tests were conducted on a computer running Windows 10 with 12 cores and 1-GbE network.

Application settings used:

  • TransferManager.Configurations.ParallelOperations = Environment.ProcessorCount * 32. The team found the setting to have the most effect on overall throughput. A value of 32 times the number of cores provided the best throughput for the test client.
  • ServicePointManager.DefaultConnectionLimit = int.MaxValue. Setting it to a maximum value effectively passes full control of transfer parallelism to the ParallelOperations setting above.
  • TransferManager.Configurations.BlockSize = 4,194,304. It had some effect on transfer rates with 4 MB, proving to be best for testing.

For more information and sample code, see links in the Next steps section.

Test results

Test number To region Blobs Time (secs) MB/s Comments
1 WestUS 2 GB x 10 10 2,000
2 WestUS2 2 GB x 10 33 600
3 EastUS 2 GB x 10 67 300
4 EastUS 2 GB x 10 x 4 99 200 4 parallel transfers using 8 storage accounts: 4 West to 4 East average per transfer
6 EastUS 2 GB x 10 x 4 92 870 4 parallel transfers from 1 storage account to another
5 EastUS 2G x 10 x 8 148 135 8 parallel transfers using 8 storage accounts: 4 West to 4x2 East average per transfer
7 SE Asia 2 GB x 10 133 150
8 SE Asia 2 GB x 10 x 4 444 180 4 parallel transfers from 1 storage account to another

Sync transfer characteristics

Here are some of the characteristics of the service-side sync transfer used with DML that are relevant to its use:

  • DML can transfer a single blob or a directory. For directory transfer, you can use a search pattern to match on blob prefix.
  • Block blob transfers happen in parallel. All complete towards the end of the transfer process. Individual blob blocks are transferred in parallel.
  • The transfer is executed asynchronously on the client. The transfer status is available periodically via a callback to a method that can be defined in a TransferContext object.
  • The transfer creates checkpoints during its progress and exposes a TransferCheckpoint object. The object represents the latest checkpoint via the TransferContext object. If the TransferCheckpoint is saved before a transfer is cancelled/aborted, the transfer can be resumed from the checkpoint for up to seven days. The transfer can be resumed from any checkpoint, not just the latest.
  • If the transfer client process is killed and restarted without implementing the checkpoint feature.
    • Before any blob transfers have been completed, the transfer restarts.
    • After some of the blobs have been completed, the transfer restarts for only the incompleted blobs.
  • Pausing the client execution pauses the transfers.
  • The blob transfer feature abstracts the client from transient failures. For instance, storage account throttling won't normally cause a transfer to fail but will slow the transfer.
  • Service-side transfers have low client resource usage for CPU and memory, some network bandwidth, and connections.

Async transfer characteristics

You can invoke the TransferManager.CopyDirectoryAsync() method with the CopyMethod.ServiceSideAsyncCopy option. It operates similar to the sync transfer mechanism from the client perspective but with the following differences in operation:

  • Transfer rates are much slower than the equivalent sync transfer (typically 10 MB/s or less).
  • The transfer continues even if the client process terminates.
  • Although checkpoints are supported, resuming a transfer using a TransferCheckpoint won't resume at the checkpoint time but at the current state of the transfer.

Test summary

Azure blob storage supports high global transfer rates with its service-side sync transfer feature. Using the feature in .NET applications is straightforward using the Data Movement Library. It's possible for Cost Management exports to reliably copy hundreds of gigabytes of data to a storage account anywhere in less than an hour.

Next steps