Searching Private App Data in Windows 10

Up until this point I’ve walked through how you can use the indexer to search through public data on a device, but what if your data cannot be stored in a public location or isn’t resident on the device but you’d like local search. In this case the indexer provides two different methods to help you add fast, internationalized search to your app. The methods come with a couple of names, but we will refer to them as the ContentIndexer and the indexed folder.

What is the ContentIndexer?

The ContentIndexer class enables apps to provide a property bag full of metadata that the system will then index and make searchable. The property bag can contain any properties that are present in the shell property system (see note below) and will be searchable as soon as the API call returns.

The ContentIndexer was initially designed to be used by Edge for searching its history. When a user visits a webpage, the metadata is pushed into the index. Then when the user starts typing in the address box if there is a match with anything in the history it will be included in the drop down.

What is the Indexed Folder?

The indexed folder is a special folder that can be created in the local storage of a UWA. When the folder is created the indexer will add that location to its scope and indexed it as if the location was in a library. The contents will still say private to your app but you will have the same search, sorting, and filtering capabilities as the public libraries

An example of this being used in the system is the settings app in Windows 10. All the of the settings entries in the app have a special file in the apps indexed folder. When a user types a search query, the app passes it to the indexer to match to the files in the indexed folder. It then maps the results back to the settings entries that they represent.

Which option to use?

The general rule of thumb is that if your data is going to be frequently updated, the ContentIndexer will probably make more sense. But as always there are nuances that need to be considered. The following chart will try to outline the differences between the two options

Feature ContentIndexer Indexed Folder
How data gets into the index App pushes in the data, must listen to make sure indexing was successful App creates files on the disk. Indexer manages the indexing process automatically
Type of data provided by app Property bags Any file type, metadata only appcontent-ms files can be used
Indexing Priority Control High priority and app controls indexing order No control over order, can force indexing priority bump by holding a query open over the entire folder
Behaviour on reset App required to re-push data Indexer automatically reindexes all the files

The shell properties system is the list of metadata fields that can be used to describe an item on Windows. The entire list of properties includes hundreds of properties that can be used to describe everything from faxes to video streams. Any of the properties can be used to describe any IShellItem, but in practice only a subset of the most relevant properties are ever used for a given item.

In our case we only need to know how to read a small part of the system in order to use the search APIs. We need to know if the property is going to be indexed, if it will be stored intact, and what data type the property is expecting.
Let’s take a look at the documentation for System.Music.AlbumArtist and make sure that we can use it for searching. I’ve copied the relevant section below:

   name = System.Music.AlbumArtist
   shellPKey = PKEY_Music_AlbumArtist
   formatID = 56A3372E-CE9C-11D2-9F0E-006097C686F6
   propID = 13
      inInvertedIndex = true
      isColumn = true
      isColumnSparse = true
      columnIndexType = OnDisk
      maxSize = 128
      mnemonics = album artist
      type = String
      groupingRange = Discrete
      isInnate = false
      multipleValues = false
      isGroup = false
      aggregationType = Default
      isTreeProperty = false
      isViewable = true
      isQueryable = true
      includeInFullTextQuery = false
      conditionType = String
      defaultOperation = Equal

Walking through the properties we can see what each of them means:

Property Meaning Values (For illustration purposes only, not complete) Notes
inInvertedIndex Is the property contained in the inverted index and is it searchable True = Property is searchableFalse = Property cannot be searched with the indexer If the property isn’t in the inverted index, then searching against it isn’t going to work. Try to find another property instead.
isColumn Is the property stored in the index for fast retrieval. Data stored in the inverted index cannot be recovered and returned in search results instead it must be written in a separate location in the database. True = Property can be retrieved from the indexer as a part of the query resultsFalse = The indexer is going to have to try to recover this data from the original item if it is requested as a part of search results Requesting query results to contain data not held in a column results in a >100x slowdown in query performance
maxSize The number of bytes reserved in the index for a given property to be stored in Trying to store anything larger than max size isn’t supported.
isColumnSparsecolumnIndexType These are used to describe how the indexer is going to set up the columns on disk These can be safely ignored assuming that you have isColumn = true.
Type The datatype of the property. Datatype mismatches are going to result in the indexer ignoring that field String, FILETIME (DateTimeOffset in C#), UINT32, Custom Enumeration Make sure you are using the international ready values for shell enumerations using the # sign
isInnate Indicates if the datatype can be set by the metadata in the file or comes from the system True = the property value will be derived from file system metadataFalse = the property can be set by information in the file Important to note that setting innate properties with the ContentIndexer sometimes will fail silently. Makes for exciting to find bugs, so try to avoid them.

The other properties describe parts of the property system that we aren’t going to need for searching. And in the interest of not turning this into a rant about IShellItems they are ignored in the above chart.

This means if we are going to use a property in a query, we’ll need to make sure it is a column and in the inverted index. As well, if you are planning on pushing data you’ll need to ensure that the data being pushed in matches the type the index is expecting.

Indexed Folder

The indexed folder was designed to be as simple to use as possible. The app only has to create a folder named “indexed” in its app data directory

StorageFolder localFolder = ApplicationData.Current.LocalFolder; //Don't internationalize the folder name StorageFolder indexedFolder = await localFolder.CreateFolderAsync("indexed");

And that is it. Any file that you create in the indexed folder are going to be automatically indexed. All the fancy AQS queries that work in public libraries are now available to your app’s private data.

Appcontent-ms Files

There are a few rare cases where your app may have data that it wants to search, but doesn’t want to deal with the complexity of the ContentIndexer. For example if you have data that isn’t going to change over time, such as the settings on a PC. In this case the appcontent-ms files are a great way to provide rich local search with very little work in your app

The documentation gives a very complete example of how to use these files, so I won’t rehash any of that information here. Instead, I’ll highlight a few important things that might be helpful when using the indexed folder.

All the information from my previous post about reading the shell property system documentation holds. There are no restrictions about which properties that you can populate with an appcontent-ms file other than the data type restrictions from the shell schema. The indexer will very happily let you put email addresses in System.Music.AlbumTile, but if you try and store a string in System.Music.IsCompilation it will ignore the value and index the rest of the file.

Indexer isn’t instant, there will be some time lag between the when you create the appcontent-ms files and when indexing completes. On an otherwise idle machine the delay should be only a few milliseconds, but that is going to be a lot slower than the very next line of code in your app. Give it some time before you start querying, and if you need to query right away use the IndexerOptions.UseIndexerWhenAvailable option. That way the system will make the smart choice about using the indexer vs. the slower option of just scanning the disk.

All subfolders of the indexed folder will be indexed as well. The settings app has a neat setup where they will have a subfolder for each language installed on the machine. Makes their code for adding and removing languages really clean and easy to maintain.

Content Indexer

The ContentIndexer class and its related APIs were added to the system in Windows 8.1 as a new way for apps to push data into the indexer. They were created as a direct replacement for the protocol handler model that was previously used and have been used by a number of apps internally. The biggest customers by far are Edge, pushing in the user’s history, and Groove Music, pushing in the user’s cloud music collection.

Both of these cases are highlight what these APIs are great for, data that can be recreated from another source but needs to be easy to search. The indexer is especially powerful in the case of Edge, since URLs often contain bunched together words in a number of languages combined with unusual punctuation. The indexer is able to separate out the words from the other text tokens and make the URLs easier to search.

Using the APIs

I’m not going to rehash the basic samples here, the basic code works great on Windows 10. Instead I’d like to walk through a few important things to note about using ContentIndexer APIs.

Read the shell property system documentation

Make sure you understand the previous section about reading the documentation for which properties can be used. The indexer doesn’t have any way of knowing if you want a property to be searchable (isInvertedIndex), retrievable from a query result (isColumn), or both.

Check your data types

Before storing data make sure that you have the right datatypes and that you aren’t going to be overflowing text fields. In C# it is really easy to end up with a massive string that isn’t going to fit into the particular field you’re using.

Implementing IIndexableContent is a Good Thing

Every successful app I’ve seen use the ContentIndexer has had a domain specific class that implements IIndexableContent. Not only does it make your data model easier but it isolates the nasty pattern of retrieve property  check for null  cast to correct data type  make sure that doesn’t fail  Sanity check resulting value to one location in your code.

Check for indexer resets

The indexer can be reset which will cause all the data in it to be lost. On a Windows upgrade (not just a patch, it has to be a significant diff) the indexer will detect that the shell schema has changed and will rebuild to match the new schema. An app simply needs to check the ContentIndexer.Revision property to see if there has been a rebuild. If there is a rebuild, all the previous data will be gone and the app has to re-push any data they’d like to be searchable.

Unit test storage and retrieval until you’re sick of it

This is for my own wellbeing as well as yours. A couple unit tests that hammer storing, searching, and retrieving a value go a long way to making sure your code works. It’s embarrassing the number of hours I’ve wasted debugging apps only to realize the property being used doesn’t support retrieval. It takes a minute to write the test and will save both of us a lot of hassle.

And that’s it. Let me know if you have any more questions about the ContentIndexer, indexed folder or reading the shell properties in the comments below. I look forward to seeing how you use it in your apps.