Filtering Metadirectory Dataflow

Applies To: Windows Server 2003 with SP1

Previous Sections in This Guide

The dataflow model defines the data and how it flows into and out of the metadirectory. Objects in various data sources are identified as either providing data that is imported into the metadirectory or as using data exported from the metadirectory. The goal of the dataflow model is to define these objects and list any special formatting or data validation requirements those objects place on the data being imported and exported.

These requirements are implemented through the use of filtering and general policies. Although the dataflow designer is not responsible for creating the actual rules and filters, part of the dataflow design process is to record the special requirements so that the metaverse and rules planners can create the actual rules and policies.

The dataflow designer can implement the logical design by using different strategies, resulting in solutions that do the same job but that have different characteristics for storage, performance, and coding. For example, suppose one of the goals listed in the solution proposal is “All telephone-related information for users in Active Directory will be maintained by the SQL Server telephone system database.” This tells you that you need to use data in Active Directory to create and maintain telephone information for users on your network and that information will be stored in a SQL Server database. In this case, data from user objects in Active Directory will be used to populate employee object types in a SQL Server telephone system database. This is an example of data, in this case user objects, flowing from a connected data source, Active Directory, to the metadirectory, which then flows the data to employee objects maintained by another connected data source, the SQL Server telephone system database.

Consider a user name such as Mike Danseglio. In Active Directory this user information is stored in a User object. The first name, Mike, is stored in the givenName attribute of the User object and the last name, Danseglio, is stored in the surname attribute of the user object. The user object also has a displayName attribute that contains the full user name in the form of Mike Danseglio. While creating the dataflow model, the designer needs to make the decision regarding which of these attributes to store in the metadirectory.

In this example, one of the data stores is Active Directory, which is being used to send user-related data to the metadirectory — in this case, user data for Mike Danseglio. To expand on this example, there is an additional data store used to maintain an enterprise-wide phone directory. This data store is kept up-to-date through exports from the metadirectory. In this case, the users’ first and last names are exported from the metadirectory into the SQL Server phone directory.

During the dataflow design, the designer needs to determine which attributes of the user object will be imported into the metadirectory so that the appropriate data can be exported to the phone system. In this example you have two choices that will solve the problem: you can import the displayName attribute, or you can import both the givenName and Surname attributes.

If you import the displayName attribute, you get both the first and last name, Mike Danseglio, as a single attribute. Although the displayName attribute gives you the data in a single attribute, it might require additional processing in order to be in the right format for the phone system.

Typically, phone directories maintain the first and last name separately or they maintain the first and last name in the form of “Lastname, Firstname*.*” Both options make it easier to perform sort and search operations. Either way, the design decision is the same: if you use the displayName attribute for import purposes, some type of data manipulation will be required to get the first and last name into a format that can be exported to the phone directory. You must either split the data into two parts — first and last name — or change the format of the data from the displayName format of “Firstname Lastname” to the export format required by the phone directory, “Lastname, Firstname.”

An optional solution is to use the givenName and Surname attributes for importing into the metadirectory because the first and last names are maintained as separate attributes and can each be managed independently of each other. This way the first and last names can be exported to the phone directory and the phone directory can use the last name for sorting and searching without the need to separate the two values as would be the case if they were stored in a single attribute.

If the phone directory needs the first and last name for sorting and searching and if it also needs the full name for display purposes, it is not necessary to import the givenName, Surname and Display-Name attributes. You also have the ability to configure the metadirectory to use the givenName and Surname attributes to create a display name that can be exported to the phone directory in the required format. In this case the metadirectory can use the givenName attribute value, Mike, and the Surname attribute value, Danseglio, and combine them into the value Mike Danseglio, which can be exported to the phone directory as a display name.

This is what designing the dataflow is all about:

  • Determining what data needs to be imported into the metadirectory from a data source so that data can be exported to other data sources.

  • Determining what changes need to be made to the data inside the metadirectory to meet your business goals.

There are two main strategies that can be applied to the dataflow design process: a minimalist approach where only the data needed to solve the current business problem is imported and stored in the metaverse; or a more robust approach where additional data is imported so that the metaverse can facilitate future growth of your identity management environment without a lot of changes to your import and export rules. These strategies depend on how you choose to filter the data flowing in and out of the metaverse.

The Minimalist Approach

Using the minimalist approach, all data that is imported into the metaverse is heavily filtered and the only data that is actually stored in the metaverse is needed to meet the needs of the established business rules. The advantages of this approach are reduced resource requirements and less planning because the metadirectory only stores the necessary data and only considers the current business rules. The disadvantages are that if you decide to implement more business rules and expand your identity management platform, you will need to alter your filtering policy in order to get any additional data required by the new rules into the metaverse.

If your metaverse is already established you may have many filtering rules in place and implementing new changes requires careful planning and execution to avoid unexpected results from incorrect changes. The metadirectory represents information gathered from multiple data sources and can contain millions of objects. Attempting to troubleshoot and undo incorrect changes to the metadirectory can be time consuming and expensive.

Planning for Growth

The alternative approach is to import as much data as possible into the metaverse and then filter the data that is exported. This approach uses more storage resources and is not as efficient because you might end up storing information that you do not need. However, this method allows for growth and the addition of new business rules without necessarily requiring any changes to the import filtering policies. The data is already in the metaverse. All that is required are new export filters to meet the needs of the new rules.


Do not automatically assume you should include all attributes. Some objects have countless attributes, many of which are not needed in your current or future solutions.

In a real-world deployment, changing the import filters and rules can have unforeseen results unless you carefully plan and test the changes. Import rules affect the information stored in the metaverse. As a result, they impact any import or export operations that use the objects affected by the import rules. Based on the connected data sources involved, those unforeseen results can be far reaching within your organization.

You might find it preferable to import more information into your metaverse at the beginning of your implementation and then control the data flow through the use of export filters to the individual data sources. This avoids the need to modify the import filters after the metaverse is populated and helps avoid the risk of introducing unforeseen changes into the metaverse.

It might be more cost effective for you to invest more time in planning for future growth at the beginning of the project and creating a design that is more robust, rather than focusing on the bare minimum requirements of your environment in the immediate timeframe.


See Also


Overview of designing a system dataflow model
Dataflow Design Concepts
Design Constraints
Authority and Precedence
Process Steps for Designing the System Dataflow