Investigating partially indexed items in Office 365 eDiscovery

A Content Search that you run from the Security & Compliance Center automatically includes partially indexed items in the estimated search results when you run a search. Partially indexed items are Exchange mailbox items and documents on SharePoint and OneDrive for Business sites that for some reason weren't completely indexed for search. Most email messages and site documents are successfully indexed because they fall within the Indexing limits for email messages. However, some items may exceed these indexing limits, and will be partially indexed. Here are other reasons why items can't be indexed for search and are returned as partially indexed items when you run a Content Search:

  • Email messages have an attached file of a file type that can't be indexed; in most cases, the file type is unrecognized or unsupported for indexing

  • Email messages have an attached file without a valid handler, such as image files; this is the most common cause of partially indexed email items

  • Too many files attached to an email message

  • A file attached to an email message is too large

  • The file type is supported for indexing but an indexing error occurred for a specific file

Although it varies, most Office 365 organizations customers have less than 1% of content by volume and less than 12% of content by size that is partially indexed. The reason for the difference between the volume versus size is that larger files have a higher probability of containing content that can't be completely indexed.

After you run a Content Search in the Security & Compliance Center, the total number and size of partially indexed items in the locations that were searched are listed in the search result statistics that are displayed in the detailed statistics for the search. Note these are called unindexed items in the search statistics. Here are a few things that will affect the number of partially indexed items that are returned in the search results:

  • If an item is partially indexed and matches the search query, it's included in both the count (and size) of search result items and partially indexed items. However, when the results of that same search are exported, the item is included only with set of search results; it's not included as a partially indexed item.

  • If you specify a date range for a search query (by including it in the keyword query or by using a condition), any partially indexed item that doesn't match the date range isn't included in the count of partially indexed items. Only the partially indexed items that fall within date range are included in the count of partially indexed items.

Note: Partially indexed items located in SharePoint and OneDrive sites are not included in the estimate of partially indexed items that's displayed in the detailed statistics for the search. However, partially indexed items can be exported when you export the results of a Content Search. For example, if you only search sites in a Content Search, the estimated number partially indexed items will be zero.

Calculating the ratio of partially indexed items in your organization

To understand your organization's exposure to partially indexed items, you can run a search for all content in all mailboxes (by using a blank keyword query). In the following example below, there are 56,208 (4,830 MB) fully indexed items and 470 (316 MB) partially indexed items.

Example of search statistics showing partially indexed items

You can determine the percentage of partially indexed items by using the following calculations.

To calculate the ratio of partially indexed items in your organization:

(Total number of partially indexed items/Total number of items) x 100

(470/56,208) x 100 = 0.84%

By using the search results from the previous example, .84% of all mailboxes items are partially indexed.

To calculate the percentage of the size of partially indexed items in your organization:

(Size of all partially indexed items/Size of all items) x 100

(316 MB/4830 MB) x 100 = 6.54%

So in the previous example, 6.54% of the total size of mailbox items are from partially indexed items. As previously stated, most Office 365 organizations customers have less than 1% of content by volume and less than 12% of content by size that is partially indexed.

Working with partially indexed items

In cases when you need to examine partially items to validate that they don't contain relevant information, you can export a content search report that contains information about partially indexed items. When you export a content search report, be sure to choose one of the export options that includes partially indexed items.

Choose the second or third option to export partially indexed items

When you export content search results or a content search report using one of these options, the export includes a report named Unindexed Items.csv. This report includes most of the same information as the ResultsLog.csv file; however, the Unindexed Items.csv file also includes two fields related to partially indexed items: Error Tags and Error Properties. These fields contain information about the indexing error for each partially indexed item. Using the information in these two fields can help you determine whether or not the indexing error for a particular impacts your investigation. If it does, you can perform a targeted content search and retrieve and export specific email messages and SharePoint or OneDrive documents so that you can examine them to determine if they're relevant to your investigation. For step-by-step instructions, see Prepare a CSV file for a targeted Content Search in Office 365.

Note: The Unindexed Items.csv file also contains fields named Error Type and Error Message. These are legacy fields that contain information that is similar to the information in the Error Tags and Error Properties fields, but with less detailed information. You can safely ignore these legacy fields.

Error tags are made up of two pieces of information, the error and the file type. For example, in this error/filetype pair:


parseroutputsize is the error and xls is the file type of the file the error occurred on. In cases were the file type wasn't recognized or the file type was doesn't apply to the error, you will see the value noformat in place of the file type.

The following is a list of indexing errors and a description of the possible cause of the error.

Error tag Description
An email message had too many attachments, and some of these attachments weren't processed.
The content retriever and document parser found too many levels of attachments nested inside other attachments. Some of these attachments were not processed.
An attachment failed decoding because it was RMS-protected.
A file attached to an email message was too large and couldn't be processed.
When writing the processed email message to the index, one of the indexable properties was too large and was truncated. The truncated properties are listed in Error Properties field.
An email message contained text that couldn't be processed as valid Unicode. Indexing for this item may be incomplete.
The content of attachment or email message is encrypted, and Office 365 couldn't decode the content.
An unknown error occurred during parsing. This typically results from a software bug or a service crash.
An attachment was too large for the parser to handle, and the parsing of that attachment didn't happen or wasn't completed.
An attachment was malformed and couldn't be handled by the parser. This result from can old file formats, files created by incompatible software, or viruses pretending to be something other than claimed.
The output from the parsing of an attachment was too large and had to be truncated.
An attachment had a file type that Office 365 couldn't detect.
An attachment had a file type that Office 365could detect, but parsing that file type isn't supported.
The value of an email property in Exchange Store was too large to be retrieved and the message couldn't be processed. This typically only happens to the body property of an email message.
The content retriever failed to decode an RMS-protected message.
Too many words were identified in the document during indexing. Processing of the property stopped when reaching the limit, and the property is truncated.

Error fields describe which fields are affected by the processing error listed in the Error Tags field. If you're searching a property such as subject or participants, errors in the body of the message won't impact the results of your search. This can be useful when determining exactly which partially indexed items you might need to further investigate.

Using a PowerShell script to determine your organization's exposure to partially indexed email items

The following steps show you how to run a PowerShell script that searches for all items in all Exchange mailboxes, and then generates a report about your organization's ratio of partially indexed email items (by count and by size) and displays the number of items (and their file type) for each indexing error that occurs. Use the error tag descriptions in the previous section to identify the indexing error.

  1. Save the following text to a Windows PowerShell script file by using a filename suffix of .ps1; for example, PartiallyIndexedItems.ps1.
  write-host "**************************************************"
  write-host "     Security & Compliance Center      " -foregroundColor yellow -backgroundcolor darkgreen
  write-host "   eDiscovery Partially Indexed Item Statistics   " -foregroundColor yellow -backgroundcolor darkgreen
  write-host "**************************************************"
  " " 
  # Create a search with Error Tags Refinders enabled
  Remove-ComplianceSearch "RefinerTest" -Confirm:$false -ErrorAction 'SilentlyContinue'
  New-ComplianceSearch -Name "RefinerTest" -ContentMatchQuery "size>0" -RefinerNames ErrorTags -ExchangeLocation ALL
  Start-ComplianceSearch "RefinerTest"
  # Loop while search is in progress
      Write-host "Waiting for search to complete..."
      Start-Sleep -s 5
      $complianceSearch = Get-ComplianceSearch "RefinerTest"
  }while ($complianceSearch.Status -ne 'Completed')
  $refiners = $complianceSearch.Refiners | ConvertFrom-Json
  $errorTagProperties = $refiners.Entries | Get-Member -MemberType NoteProperty
  $partiallyIndexedRatio = $complianceSearch.UnindexedItems / $complianceSearch.Items
  $partiallyIndexedSizeRatio = $complianceSearch.UnindexedSize / $complianceSearch.Size
  " "
  "===== Partially indexed items ====="
  "         Total          Ratio"
  "Count    {0:N0}{1:P2}" -f $complianceSearch.Items.ToString("N0").PadRight(15, " "), $partiallyIndexedRatio
  "Size(GB) {0:N2}{1:P2}" -f ($complianceSearch.Size / 1GB).ToString("N2").PadRight(15, " "), $partiallyIndexedSizeRatio
  " "
  Write-Host ===== Reasons for partially indexed items =====
  foreach($errorTagProperty in $errorTagProperties)
      $name = $refiners.Entries.($errorTagProperty.Name).Name
      $count = $refiners.Entries.($errorTagProperty.Name).TotalCount
      $frag = $name.Split("{_}")
      $errorTag = $frag[0]
      $fileType = $frag[1]
      if ($errorTag -ne $lastErrorTag)
      "    " + $fileType + " => " + $count
      $lastErrorTag = $errorTag
  1. Connect to Security & Compliance Center PowerShell.

  2. In Security & Compliance Center PowerShell, go to the folder where you saved the script in step 1, and then run the script; for example:


Here's an example fo the output returned by the script.

Example of output from script that generates a report on your organization's exposure to partially indexed email items

Note the following:

  1. The total number and size of email items, and your organization's ratio of partially indexed email items (by count and by size)

  2. A list error tags and the corresponding file types for which the error occurred.

See also

Partially indexed items in Content Search in Office 365