Greg Shields

I remember the day I accidentally got a U.S. Government clearance.

Even crazier, it was an Above Top Secret clearance for a government program so classified I can’t even tell you its name. I was fresh out of college and had just landed my first real job with a real company. On my first day, I found myself filling out an inordinate amount of paperwork —stacks of it—and wondering, “I never knew that a real job would need all this information about me!” Every place I’d lived, every person I’d met, every job I’d had, incredible amounts of Greg’s Life, documented for posterity into what I later discovered was a Federal Personnel Security Questionnaire.

About six months later, they called me in to announce I’d been “cleared.” Surprise!  Walking back through the special “closed area” for the first time, I found myself immediately surrounded by levels of process and security, documentation and special markings that would make the pages of your IT project plan weep.

It’s exactly that documentation and the ways its classified contents were handled that come to mind as I think about Microsoft’s new File Classification Infrastructure (FCI), introduced with Windows Server 2008 R2.

Content Administration: How Our Jobs Have Changed

Think for a minute about how our jobs as IT administrators have evolved over the years. In the beginning, we spent much of our time simply keeping the desktops running. As desktops grew more stable and users more educated, our focus eventually migrated toward the server room. There, we found ourselves extending our server infrastructure, making applications highly available to clients on the LAN and eventually on the Internet.

We’re now at the point where many of our line-of-business applications are ubiquitously available. Users can access their data from virtually any connection, enabling them to work from wherever they need. Yet with that ubiquity also comes a greater risk for data exposure, a potentially very expensive situation we all want to avoid. And there’s a further problem: the sheer growth of data under management, large quantities of which is created, never accessed again and expensively stored for no good use.

To deal with these problems, our job role shifts once again. Today, industry and regulatory compliance mandates that we actively manage the data that is housed on our systems, not just the systems themselves. This means that we IT administrators must take an active role in ensuring that our different classes of data are given proper levels of due diligence. Greater care must be given to data that is proprietary or sensitive, has a higher impact on our business or deals with personally identifiable information. Data that is no longer relevant or useful must be disposed of properly if we are to keep costs in line.

The problem with this new job role is that the tools to manage all this data appropriately—with respect to its content—have historically been non-existent, or expensive and difficult to use. In your environment today, how do you know if Stan’s accounting spreadsheet counts the number of Oreos he’s eaten this week, or if it’s actually the budget for a new and highly sensitive project? How do you know if Stan created that spreadsheet years ago and hasn’t looked at it since? Lacking expensive third-party tools, you probably don’t.

Solving these kinds of problems is why Microsoft developed its new File Classification Infrastructure, released as a no-added-cost feature in Windows Server 2008 R2. FCI arrives as an augmentation to the File Server Resource Manager (FSRM), a tool we first saw and probably ignored in Windows Server 2003 R2. FSRM’s early abilities to manage quotas, create file-screening rules and generate storage reports were all interesting, but still dealt with data as files, not as content.

Classified is Easy. Sensitive is Hard.

Thinking back to my time in the classified world, I’m amazed at how mature the government was in classifying its data compared to the business world. Easy enough, they lock it behind closed doors. With the U.S. Government, classified data must be kept inside locked facilities whose networks are air-gapped from the rest of the world. At no point should a government classified document ever leave its protected Secure Compartmented Information Facility (SCIF) without very special protections. Simply put, if you head home with the wrong document in your briefcase, you might get a visit from the people with guns.

Although “people with guns” visiting your house always makes for an interesting evening, it really isn’t the interesting part of this story. What is interesting is how those documents are marked. You see, with the government, every created document is marked with special codes. What the codes actually mean is, of course, classified, but suffice it to say that every classified document always knows its level of classification (Top Secret, Secret and so on), the program to which it relates, as well as other information about how it should be handled.

This means that a lost document can be traced back to its proper owner. It also means that an individual always knows what to do with any document.

The business world doesn’t typically have these required-by-law markings on each document on their file servers. Yet even without a central taxonomy, every document can be scanned for known characteristics to identify and mark its importance to the business. That’s part of what FCI does.

Using FCI’s automatic classification rules, you can specially mark documents in your infrastructure in any way you see fit. Those with personally identifiable information can be labeled in one way. Those with a high business impact can be labeled in another. Even images such as TIFFs can be scanned via OCR for notable words. The net result is that documents that need special handling can be given the appropriate care as a function of the markings you assign.

The first step is to identify what markings you need.

Marking Documents with FCI

Microsoft’s FCI stores each document’s markings in an NTFS Alternate Data Stream (ADS). This means that markings remain with documents throughout their travels as long as they never leave NTFS storage. It also means that any file type can be classified, extending FCI to any of the files in your infrastructure. Microsoft Office files get special handling, with FCI markings being stored within the file itself as well as within the ADS. This dual marking enables Microsoft Office documents to maintain classifications both on an NTFS share as well as within SharePoint.

The actual categories as well as assigned properties are left to the individual implementation, so your business is free to create whatever markings make sense. Some examples could include secrecy levels such as:

  • Secrecy = Top Secret, Classified or Unclassified
  • Personal Information = Yes or No
  • Business Impact = High, Medium or Low

Actually applying markings to a document can occur through any of four possible mechanisms:

  • Manual classification. Microsoft Office templates can be created that already include the necessary level of classification. As with my experience in the classified world, documents that are classified Top Secret can be required to begin their lives from your business’s “Top Secret” .DOTX file.
  • Application classification. Built into FCI are a number of extensibility points for third-party hooks. This means that other software companies can write their own tools for analyzing documents and applying classification properties as they’re created or manipulated.
  • Custom classification via scripts. Using the add-on Windows PowerShell classifier module, you can write you own scripts to scan and apply markings to documents.
  • Automatic classification. Lastly, and most easily, you can use the automatic classification engine built into FSRM to do the work for you.

For us Jack-of-all-Trades administrators, the most practical of these is FCI’s automatic classification engine. Using a simple GUI console, you can configure FSRM to automatically scan the file servers in your infrastructure and apply markings based on a document’s location or on its content.

For example, let’s assume you have two sets of very important documents. The first group contains personally identifiable information and the documents are always initially created in a particular folder on a special file share. While those documents might move to other locations during their lifecycle, you know that they’re always created in this single location. These documents must be specially handled because of compliance regulations.

Your second set of very important documents relates to a special venture being worked on by your company called “Project X.”  These documents may be housed anywhere on your file servers, making them difficult to find. However, you can safely assume that every document relating to the project includes some permutation of “Project X” somewhere in its text. While no outside regulation requires these documents to have special handling, they are sensitive to your business’ operations and thus need extra care.

You need to provide some special protections for these documents. Your first step is to add the FSRM role service to the File Services role on your file server. This action adds the Share and Storage Management console under File Services in Server Manager, as shown in Figure 1. As you can see, three shares are currently configured on \\server1:  \Home, for users’ home drives; \Shared, for shared documents; and \Shared – PII Restricted, for the compliance-regulated documents with personally identifiable information.


FSRM Share and Storage Management Console

Figure 1  FSRM’s Share and Storage Management Console

Two classification properties must be created for these two document types, which you can do by right-clicking Classification Properties in the left pane and choosing Create Property. The first will have a property name of “PII” with a property type of Yes/No. This configuration, shown in Figure 2, enables documents to be identified as containing personally identifiable information.

 The second set of documents will get a property name of “Special Project” with a property type of String. This configuration will allow any sensitive document to be optionally marked with its special project name.


Yes/No Calssification Property

Figure 2  Creating a Yes/No Classification Property

Creating these classification properties establishes the taxonomy of markings you want to apply to your documents. The next step is to actually apply those properties to the right documents. Using FCI’s automatic classification services, you can apply these properties to your documents based on their presence in a specific folder.  You can also instruct FCI to read through each document on your file servers searching for specific text strings. In this case, let’s do both.

Right-click on Classification Rules and choose to Create a New Rule. The first rule to be created will be a file-path rule for your PII share. Give the rule a name and description and apply the rule to the folder path that contains the PII documents. Then, under the Classification tab, configure the settings as shown in Figure 3. You’ll see that this rule will use the Folder Classifier mechanism to assign the PII property value of Yes to the documents in this folder.


Classification Rule

Figure 3  Setting the Definitions for a Classification Rule

Configuring the second rule involves slightly more effort. Here, the scope for the second rule will be the \Shared drive and the \Shared – PII Restricted drive (in reality, this rule would include any location where users might store Special Project documents). Also, you will use the Content Classifier classification mechanism with the Special Project property name. Because you’re searching for “Project X” documents with this rule, set “Project X” as its property value.

The final step is to tell the rule what to search for in the documents it finds. Clicking the Advanced button and navigating to the tab marked Additional Classification Parameters presents a dialog box where you enter the rule’s search string. You can configure both case-sensitive and insensitive strings, and you can use regular expressions for more complex needs. Figure 4 shows how you can search for four different case-insensitive permutations of the words “Project X.”


Searching for Permutations of Text

Figure 4  Searching for Permutations of the Text “Project X”

The final step is to configure a schedule for the automatic classification engine. This is done by right-clicking Classification Rules and choosing to Configure Classification Schedule. You can create multiple schedules for scanning all folders with the relevant rules applied, and you can create reports in multiple formats (DHTML, HTML, XML, CSV and Text) and optionally e-mail them to administrators or auditors.

Actually Doing Things

This entire effort only marks your documents. Your next step is to actually accomplish tasks with those marked documents. What’s exciting here is that your options are limited only by your scripting prowess and your imagination.

Now FCI-aware in Windows Server 2008 R2, FSRM’s File Management Tasks engine eliminates virtually all the pain in this process by automatically applying your selected action to any marked document in any location:

  • They can be set to expire after a certain period of time, relegating them to a special expiration directory for final archiving before deletion.
  • They can be collected into reports for auditors, proving that you know where all your compliance-relevant documents are in your infrastructure.
  • They can be backed up to special locations to ensure regular backup tapes aren’t “corrupted” by sensitive information.

Essentially any action that is accomplished through an executable or script can be applied to marked documents once or on a scheduled basis. All actions are created within the FSRM console by right-clicking File Management Tasks and choosing to Create File Management Task. The resulting multi-tab window provides a location for defining the task and the command or script to use, as well as notification, reporting and scheduling options.

You can also set a condition for the occurrence of a task, as shown in Figure 5. As you can see, a task has been created to accomplish some action on all documents that are marked as containing personally identifiable information and that haven’t been modified or accessed in the past year. Perhaps this task is configured to automatically delete such documents to prevent their stale data from falling into the wrong hands. Maybe it relocates those documents to an archival location for preservation prior to deletion. The power here is that once the condition is met, the actions you need to accomplish will take place automatically.


Conditions for an FSRM File Management Task

Figure 5  Setting the Conditions for an FSRM File Management Task

FCI Gives You Content Control

I’ll admit, getting that clearance was the start of a wild first year in my professional career. I may not fit the profile of your typical “cleared” individual:  I wear my hair long, my reading interests lean toward pop sociology and mountain bike magazines over military history, and I call Colorado my home rather than that Federal locus known as Maryland. But, that job taught me most of what I know today about formal systems of security.

It also brought me a great deal of appreciation for the level of effort required to manage content in a high-security environment. With today’s technologies in Windows Server 2008 R2, you can enjoy that same level of control in yours with a lot more built-in automation.

Greg Shields, MVP, is a partner at Concentrated Technology. Get more of Greg’s Jack-of-all-Trades tips and tricks at