Guidance for personal data stored in Log Analytics and Application Insights

Log Analytics is a data store where personal data is likely to be found. Application Insights stores its data in a Log Analytics partition. This article will discuss where in Log Analytics and Application Insights such data is typically found, as well as the capabilities available to you to handle such data.

Note

For the purposes of this article log data refers to data sent to a Log Analytics workspace, while application data refers to data collected by Application Insights.

Note

If you’re interested in viewing or deleting personal data, please see the Azure Data Subject Requests for the GDPR article. If you’re looking for general info about GDPR, see the GDPR section of the Service Trust portal.

Strategy for personal data handling

While it will be up to you and your company to ultimately determine the strategy with which you will handle your private data (if at all), the following are some possible approaches. They are listed in order of preference from a technical point of view from most to least preferable:

  • Where possible, stop collection of, obfuscate, anonymize, or otherwise adjust the data being collected to exclude it from being considered "private". This is by far the preferred approach, saving you the need to create a very costly and impactful data handling strategy.
  • Where not possible, attempt to normalize the data to reduce the impact on the data platform and performance. For example, instead of logging an explicit User ID, create a lookup data that will correlate the username and their details to an internal ID that can then be logged elsewhere. That way, should one of your users ask you to delete their personal information, it is possible that only deleting the row in the lookup table corresponding to the user will be sufficient.
  • Finally, if private data must be collected, build a process around the purge API path and the existing query API path to meet any obligations you may have around exporting and deleting any private data associated with a user.

Where to look for private data in Log Analytics?

Log Analytics is a flexible store, which while prescribing a schema to your data, allows you to override every field with custom values. Additionally, any custom schema can be ingested. As such, it is impossible to say exactly where Private data will be found in your specific workspace. The following locations, however, are good starting points in your inventory:

Log data

  • IP addresses: Log Analytics collects a variety of IP information across many different tables. For example, the following query shows all tables where IPv4 addresses have been collected over the last 24 hours: search * | where * matches regex @'\b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.|$)){4}\b' //RegEx originally provided on https://stackoverflow.com/questions/5284147/validating-ipv4-addresses-with-regexp | summarize count() by $table
  • User IDs: User IDs are found in a large variety of solutions and tables. You can look for a particular username across your entire dataset using the search command: search "[username goes here]" Remember to look not only for human-readable user names but also GUIDs that can directly be traced back to a particular user!
  • Device IDs: Like user IDs, device IDs are sometimes considered "private". Use the same approach as listed above for user IDs to identify tables where this might be a concern.
  • Custom data: Log Analytics allows the collection in a variety of methods: custom logs and custom fields, the HTTP Data Collector API , and custom data collected as part of system event logs. All of these are susceptible to containing private data, and should be examined to verify whether any such data exists.
  • Solution-captured data: Because the solution mechanism is an open-ended one, we recommend reviewing all tables generated by solutions to ensure compliance.

Application data

  • IP addresses: While Application Insights will by default obfuscate all IP address fields to "0.0.0.0", it is a fairly common pattern to override this value with the actual user IP to maintain session information. The Analytics query below can be used to find any table that contains values in the IP address column other than "0.0.0.0" over the last 24 hours: search client_IP != "0.0.0.0" | where timestamp > ago(1d) | summarize numNonObfuscatedIPs_24h = count() by $table
  • User IDs: By default, Application Insights will use randomly generated IDs for user and session tracking. However, it is common to see these fields overridden to store an ID more relevant to the application. For example: usernames, AAD GUIDs, etc. These IDs are often considered to be in-scope as personal data, and therefore, should be handled appropriately. Our recommendation is always to attempt to obfuscate or anonymize these IDs. Fields where these values are commonly found include session_Id, user_Id, user_AuthenticatedId, user_AccountId, as well as customDimensions.
  • Custom data: Application Insights allows you to append a set of custom dimensions to any data type. These dimensions can be any data. Use the following query to identify any custom dimensions collected over the last 24 hours: search * | where isnotempty(customDimensions) | where timestamp > ago(1d) | project $table, timestamp, name, customDimensions
  • In-memory and in-transit data: Application Insights will track exceptions, requests, dependency calls, and traces. Private data can often be collected at the code and HTTP call level. Review the exceptions, requests, dependencies, and traces tables to identify any such data. Use telemetry initializers where possible to obfuscate this data.
  • Snapshot Debugger captures: The Snapshot Debugger feature in Application Insights allows you to collect debug snapshots whenever an exception is caught on the production instance of your application. Snapshots will expose the full stack trace leading to the exceptions as well as the values for local variables at every step in the stack. Unfortunately, this feature does not allow for selective deletion of snap points, or programmatic access to data within the snapshot. Therefore, if the default snapshot retention rate does not satisfy your compliance requirements, the recommendation is to turn off the feature.

How to export and delete private data

As mentioned in the strategy for personal data handling section earlier, it is strongly recommended to if it all possible, to restructure your data collection policy to disable the collection of private data, obfuscating or anonymizing it, or otherwise modifying it to remove it from being considered "private". Handling the data will foremost result in costs to you and your team to define and automate a strategy, build an interface for your customers to interact with their data through, and ongoing maintenance costs. Further, it is computationally costly for Log Analytics and Application Insights, and a large volume of concurrent query or purge API calls have the potential to negatively impact all other interaction with Log Analytics functionality. That said, there are indeed some valid scenarios where private data must be collected. For these cases, data should be handled as described in this section.

Note

This article provides steps for how to delete personal data from the device or service and can be used to support your obligations under the GDPR. If you’re looking for general info about GDPR, see the GDPR section of the Service Trust portal.

View and export

For both view and export data requests, the Log Analytics query API or the Application Insights query API should be used. Logic to convert the shape of the data to an appropriate one to deliver to your users will be up to you to implement. Azure Functions makes a great place to host such logic.

Delete

Warning

Deletes in Log Analytics are destructive and non-reversible! Please use extreme caution in their execution.

We have made available as part of a privacy handling a purge API path. This path should be used sparingly due to the risk associated with doing so, the potential performance impact, and the potential to skew all-up aggregations, measurements, and other aspects of your Log Analytics data. See the Strategy for personal data handling section for alternative approaches to handle private data.

Purge is a highly privileged operation that no app or user in Azure (including even the resource owner) will have permissions to execute without explicitly being granted a role in Azure Resource Manager. This role is Data Purger and should be cautiously delegated due to the potential for data loss.

Once the Azure Resource Manager role has been assigned, two new API paths are available:

Log data

  • POST purge - takes an object specifying parameters of data to delete and returns a reference GUID
  • GET purge status - the POST purge call will return an 'x-ms-status-location' header that will include a URL that you can call to determine the status of your purge API. For example:

    x-ms-status-location: https://management.azure.com/subscriptions/[SubscriptionId]/resourceGroups/[ResourceGroupName]/providers/Microsoft.OperatonalInsights/workspaces/[WorkspaceName]/operations/purge-[PurgeOperationId]?api-version=2015-03-20
    

Important

While we expect the vast majority of purge operations to complete much quicker than our SLA, due to their heavy impact on the data platform used by Log Analytics, the formal SLA for the completion of purge operations is set at 30 days.

Application data

  • POST purge - takes an object specifying parameters of data to delete and returns a reference GUID
  • GET purge status - the POST purge call will return an 'x-ms-status-location' header that will include a URL that you can call to determine the status of your purge API. For example:

    x-ms-status-location: https://management.azure.com/subscriptions/[SubscriptionId]/resourceGroups/[ResourceGroupName]/providers/microsoft.insights/components/[ComponentName]/operations/purge-[PurgeOperationId]?api-version=2015-05-01
    

Important

While the vast majority of purge operations may complete much quicker than the SLA, due to their heavy impact on the data platform used by Application Insights, the formal SLA for the completion of purge operations is set at 30 days.

Next steps