Should SharePoint Replace File Servers?
Many in the SharePoint market claim that SharePoint can and should replace your file servers. Is this a best practice? Should you plan to move all of your files that currently exist into SharePoint? There are pros and cons to this debate, each of which we’ll outline below. However, if you don’t want to read this entire section, then you should know that our simple answer is no.
The reasons offered for replacing your file servers with SharePoint can be summarized in one thought: Collaborative file shares are on their way out because shared drives that are used for document collaboration are being replaced with SharePoint document libraries. SharePoint Server 2007 provides a better collaborative environment because of the built-in collaboration features, such as check out, check in, versioning, publishing, and single instance storage.
While SharePoint Server 2007 does provide many compelling collaboration features, there are also solid reasons to retain your file servers. First, document storage in SharePoint is generally more expensive than an NTFS file system. This cost can be mitigated to a point with the use of expiration information policies, but not completely equalized. Second, file storage is not the same thing as file collaboration. SharePoint is not a good file storage solution for all scenarios. Hence, file servers are a better storage solution in the following scenarios:
- File servers are preferred for large document storage. SharePoint best handles documents in the 50- to 300-MB range and can handle documents up to 2 GB with configuration modifications, but documents over 2 GB must be stored on a file server.
- Systems Management Service (SMS) distribution points for hotfixes, updates, and application distribution is handled much better from a file server.
- File servers are better suited for My Documents redirection and backups. Many companies use group policies to redirect the location of users’ My Documents so that they can back up their content each night. Creating mapped drives to document libraries and then using policies to redirect users’ My Documents to those libraries is an untested and unsupported scenario in SharePoint. File servers should be used for this purpose and are supported.
- Storing databases in a SharePoint list is the same as storing a database within a database and is not recommended. If your data need triggers or stored procedures, you may look at the workflows and events as mechanisms for this process, but creating triggers or stored procedures inside the SharePoint databases is not supported. Database type files such as .mdb, .pst, and .ost are best stored on a file server.
- Developer source control of emerging assemblies and new code files are better managed in Visual Studio Team Services, which requires a file server.
- Archive files that will not change and will not be included in future collaboration are best stored on file servers.
The final reason for retaining file servers in your server topology for the foreseeable future is that, for those documents that require long-term storage, file servers are usually a cheaper solution than SharePoint.
So when should you place files in SharePoint? In our opinion, these scenarios point to SharePoint as a great file storage solution:
- When the files need to be accessed over HTTP or HTTPS
- When the files need to be managed in a DMS
- When the files need to be engaged in a collaboration process
- When your document life cycle and governance plans are completed and reflected in technical requirements for SharePoint
After reading these lists, we hope you agree that not all file servers should be replaced with SharePoint and that you should plan to retain certain files on file servers for the foreseeable future. So what do you say to the folks who want you to migrate all of your existing content from your file servers into SharePoint?
The short answer is that unless there is a positive reason to migrate those documents into SharePoint, there is no need to spend the money, effort, and time to do so. Moving documents from one storage location to another simply because someone says you should is not a reason to move those documents. You should be able to demonstrate a value-add that moving the documents into SharePoint brings to your organization that will outweigh the costs of such a move. Admittedly, SharePoint is the popular technology, but that is precisely why cost justifications need to accompany the migration recommendations. It’s easy to get on the hot technology bandwagon and find that unforeseen effects burn you down the road. We prefer a more thoughtful, steady pace toward a migration that allows time for hard questions to be answered and cost justifications to be developed.
Notes from the Field: Replacing Your Current DMS with SharePoint Server 2007
Many customers with whom I work ask me if they should replace their current DMS with SharePoint Server 2007. My answer always comes back to a clear description of the business requirements that formed the foundation for the decision to purchase their current document management solution. Many of these DMSs are expensive, and the decision to jettison them in favor of SharePoint Server 2007 should not be a rash decision.
I tend to ask my customers questions along the following lines:
- Does your current DMS meet all of your needs?
- Does your current DMS do some things that SharePoint Server 2007 doesn’t do?
- Does SharePoint Server 2007 do something that your current DMS doesn’t do?
- Is your current DMS paid for?
- Are users familiar with your current DMS’s interface and functionality?
- Is there grassroots demand to move from your current DMS to SharePoint Server 2007?
- Is there any loss of functionality by moving to SharePoint Server 2007?
In my opinion, if the functionality offered by your current DMS is not surpassed by the functionality of SharePoint Server 2007, it is paid for, and it meets all of your business requirements, then I see little reason to change DMSs.
Bill English, Microsoft MVP, Mindsharp
Having said all this, we must point out that the usability of a SharePoint interface often requires more work (more clicks) than does a mapped drive to a file share on a file server. At the foundation of this discussion is the effect on the daily life of a non-technical knowledge worker who works with documents on an hourly basis. Frankly, in many cases, it is just plain easier to work with files in a file share than it is to work with them in a SharePoint site. This is especially true if the individual is working in the document alone and is not collaborating on the document with others. In addition, in most usability scenarios, fewer clicks are required to work with a document on a file share than in a SharePoint document library, and the interface is familiar to the end-user.
Notes from the Field: Using SharePoint Server 2007 and File Servers Together
Before SharePoint was released, I was one of the system administrators, or a jack-of-all-trades type of worker. Because many of you have a large selection of technologies that you need to administrate, some technologies tend to get overlooked. One such technology is file shares, including the directory structure and the setting of permissions on the shares. For most of you who are reading this sidebar, it was and probably is a mess. File shares are notorious as being dumping grounds for everything from Microsoft Office documents to applications, movie files, music files, code—the list goes on and on. Often, these file shares started out as a good idea and had a governance plan that quickly faded due to lack of enforcement. The end result is an environment with a lot of duplicate files and highly sensitive documents that are not secured properly—generally, what I like to call “space wasters.”
Now that I’m on the other side of the fence specializing in SharePoint, I have found that many companies I work with look at SharePoint Server 2007 and think a migration would be a good alternative to file shares. So they spin up large, expensive projects to accomplish a file migration from file shares to Windows SharePoint Services or SharePoint Server 2007.
When I work with clients who bring up this subject, I try to help them understand that they first need to properly analyze their file share environments and spend time fixing the security problems in these shares. File shares are not a dead concept, but like any technology, they need to be given some tender loving care and regular management. Numerous third-party utilities can be purchased to help maintain these shares. Once the information is in a useful state, then I ask, “What do you see now as fitting into your SharePoint environment?” It’s a tough question, but generally the answer I’m pushing for is one that encompasses a thorough understanding of the points I’ve made above. For example, this is not where you want to hold your music files for sharing or as a code source repository for your developers.
Both technologies (SharePoint Server 2007 and file shares) can easily coexist, in part by adding your file shares to your search scopes. The only effect this will have in your SharePoint environment is in the size of your index. Assuming that documents are properly tagged with metadata, they will be just as findable on a file server as they are in SharePoint.
Bob Fox, Microsoft MVP, B&R Business Solutions
The location where the document is developed should be in a document library within a SharePoint site. SharePoint Server 2007 is an ideal environment in which to develop documents when multiple people are working together to develop the document. Either a document library or a file share is acceptable for a document location if the document is being developed by a single user. However, as soon as people need to collaborate on the document, it should be uploaded into SharePoint. The document library has several features to configure that we believe reach the level of a best practice.
The first feature is the Require Check Out feature in a document library (Figure 8-1). When selected, a document cannot be modified without first being checked out. The second is the document versioning feature (refer to Figure 8-1). When used together, there is a full history of who checked out the document, when the document was checked out, and what modifications were made to the document before it was checked back in. Now, this history can be truncated with the Version Pruning options, but in the absence of these options, you can keep track of the document’s pedigree and pinpoint the user who made each change to the document.
Figure 8-1 Document library settings for requiring content approval and version settings
Note The check out, check in, and versioning features are applied globally to the document. These features do not replace the Track Changes and Comments features that are so useful in tracking the micro changes that are made to a document. We strongly suggest that your users learn to use the Track Changes and Comments features in each document being developed while using the SharePoint check out, check in, and versioning features that will apply to the documents overall. In the absence of the Track Changes and other in-document Reviewer features, you’ll know that the document has changed based on a new version number, but you won’t know the exact change that occurred in the document. That granular level of tracking changes is a Word activity, not a SharePoint activity.
Third, we recommend that content approval be turned on at the document library when necessary to fulfill the purposes of the library (Figure 8-2). When this is enabled, approval workflows can be triggered when a document is configured to be published as the next major version. Because most of these documents will need to be approved, requiring content approval is the most logical way to allow the workflow trigger to work effectively.
Figure 8-2 Require Content Approval feature in the document library settings
Notes from the Field: Naming Document Libraries Can Create Confusion or Findability: Two Best Practices You Should Implement
When a site is first created, if a document library is created with the site template, the document library will be named “Shared Documents.” This default naming of the document library lends to ease of use, but it also creates considerable confusion if it is not managed properly.
When a user first uses the document library, a Web folder client will be created automatically in her My Network Places on her desktop. That Web folder client will use the default naming convention of <document library> on <root site>. If there is a Web folder client naming conflict, the code will iterate the names by appending a number at the end of the name, as illustrated in Figure 8-3. If the user isn’t careful (and in the absence of solid governance and best practice recommendations), she will find that her Web folder client names will be useless because a long list of Web folder client names with the same name across different sites will be created.
Figure 8-3 My Network Places with multiple Web folder client connections. Note the numeric suffix iteration when there are naming conflicts.
While it is possible that users could use the URLs listed in the Comments column, it is highly unlikely that the majority will be willing to use this differentiator or that many would understand how to read the URLs in a discriminatory fashion. While elementary to us technologists, many more users will wonder why they have to go to such lengths just to find the right document library when they need to find their documents. And I think they are correct on this point.
This is why I recommend that all of your site definitions and site templates do not include the creation of a Shared Documents document library when the site is created. Instead, instruct your users to create their own document libraries with a naming convention that you propagate. This means that you’ll need to go through and modify all of your site definitions and site templates to reflect this best practice.
Bill English, Microsoft MVP, Mindsharp
Traditional, paper-based documents are filed in filing cabinets. Drawers are labeled—usually with some type of category name—and then documents are placed in the filing cabinets in folders that represented some type of subcategorization of the drawer’s category. The name of the folder and drawer really become metadata that is used to help sort, describe, and retrieve the documents.
In SharePoint, the “filing” of the document is accomplished through the creation of a content type, where the metadata fields are created for that type of content. When the document is created using that content type, the metadata fields are inherited (for lack of a better term) from the content type and applied to the document. The first time the document is checked into a document library, the metadata fields are populated with values that will persist for the life of the document unless they are manually modified or removed.
More Info To gain an in-depth understanding of content types, please see Chapter 10, “Configuring Office SharePoint Server Enterprise Content Management,” in Microsoft SharePoint Products and Technologies Administrator’s Pocket Consultant (Microsoft Press, 2007) and Chapter 15, “Managing Content Types,” in Microsoft Office SharePoint Server 2007 Administrator’s Companion (Microsoft Press, 2007).
In a sense, the filing of a document in SharePoint Products and Technologies occurs both before and after the document is created, but for sure, the filing of a document is mostly completed once the first check-in of the first draft of the document is performed in SharePoint.
A number of best practices present themselves as part of this discussion. First, the content types need to be created with forethought and planning. The metadata field names should be unique not only within each content type but also across all other content types. The more robust your content type deployment, the more necessary it will be to have a naming convention for metadata fields as well as (perhaps) a full-time person dedicated to managing and creating content types.
Notes from the Field: Understanding Data Elements in Your Organization
I recently worked with a customer who was probably the best prepared customer I’ve ever worked with in terms of understanding and developing data elements for the organization. This customer had 1,300 users, most of whom worked at the company’s main office in Southern California. Because this customer’s business is highly process driven, the CEO had mandated an information review that identified 56 primary data elements with 258 supporting data elements. Each element had its own set of metadata, and each metadata field was unique across all 314 data elements. The customer had already done this work when I arrived to do a two-day design and architecture session.
During the session, we discussed how content types would be used to help the company formulate its document management implementation. Their response was one that I’ve never heard before or since: “We’re ready for that.” All the customer had to do was turn the 314 data elements into content types.
Not to scare you, but it took the customer two years to develop these 314 data elements. The company had 1.5 FTE (full-time equivalent) positions dedicated to this effort, plus the support of management, starting with the CEO. The company was doing this to support an industry-specific application that tracked its raw materials and manufacturing processes. The fact that the company could easily use this information in SharePoint was a major, unexpected benefit and increased the return on investment on the dollars that had been invested in that project.
You should not believe that developing content types is just for SharePoint. The identification of the different data types and their metadata elements should be generic, because the information that is developed will likely translate into content types without too much trouble.
One unexpected decision the company made was to not use SharePoint for the conceptual development of its content types. The company found SharePoint itself to be an inadequate system for hosting the ongoing development of its data elements and the mapping of these elements to content types. That work was performed outside of SharePoint using a different management information system with relational databases and relational lists. Essentially, the company developed a metadata warehouse with a data element and content type mapping scheme such that it was able to quickly pivot on metadata elements, major/minor data elements, or content types to look at its data elements from any of the three angles and know that it was viewing accurate information. The ability to move metadata between elements in a relational way as the content types were developed was essential to the company’s success.
Bill English, Microsoft MVP, Mindsharp
Second, you need to know that every time a new content type is created, a new GUID is attached to that content type. Because these are managed at the site collection level, it is possible to create exactly the same content type in two different site collections with the same name and metadata fields, yet each one will have a different GUID. If you need to programmatically work with the same content types across site collections via their GUIDs, then you’ll need to deploy the content types as features wrapped up in one or more solution deployments. Deploying content types in this manner will allow your developers to write code against the content types’ GUID across the enterprise. Third, if you create a DDoc for a new document, be sure that the metadata in the DDoc matches the metadata that you’ll create for the new content type.
Fourth, the values assigned to the metadata fields need to be communicated to your users so that they will know which keywords and other values to use to help them quickly and easily find their documents using the search and indexing feature in SharePoint Server 2007. Connecting the metadata from content type through value assignment to end-user education is one of the key workflows that deserves attention. It is a best practice to ensure there is consistency in the use of content types, metadata assignments, and end-user education about the metadata and the values assigned to the documents.
The last best practice for this topic is that the development of the metadata fields and their value assignments should be carried out with the aid and input of those who will use those fields and values to retrieve the documents. IT people are the last people on earth who should be deciding the names of metadata fields and the values that will be used to populate them. A robust content type deployment will involve constant chatter with those who will develop, use, and retrieve the documents.
The preferred methods of document retrieval will vary from user to user. If users know exactly where a document is, they will probably use one or more of the following methods to retrieve the document (this is not an exhaustive list):
- A favorites shortcut to the document (or the document library) in Internet Explorer
- A desktop shortcut to the document (or the document library)
- Shared links in My Sites
- Links that are created in the Summary Links Web part
- If accessed recently, their list of recently accessed documents in Windows XP or Vista
- Physically typing in the URL to the site or document library
- Customized navigation in a SharePoint Server 2007 portal or site
- RSS feeds
As you can see, the methods of retrieval for a document when users know its location are varied and numerous. SharePoint Server 2007 provides a plethora of findability tools in this scenario to ensure that users can find the information for which they are looking. Because there are a plethora of findability tools in SharePoint Server 2007, it is a best practice to cover the various findability methods in SharePoint Server 2007 training.
But what if users don’t know where a document is and they need to find it quickly and easily? Well, the most common answer to that question is to use search and indexing—to execute a query against the index that will allow users to find the document. Normally, an advanced search is required if one is to combine multiple metadata queries into a single query. In other words, users will need to be trained on how to enter the metadata values they are looking for in the Advanced Search Web part (Figure 8-4) in order to pinpoint the document they need to retrieve from their query of the index.
Most often, users will need to know how to use the following features in concert with one another in order to execute a highly discriminatory query:
- Keywords and Boolean operators
- Metadata queries
The more skilled users are at using these three features together, plus the more robust the metadata assignments on documents and the scopes topology, the more likely it is that users will be able to pinpoint a document from your index and find the information they are seeking.
Figure 8-4 Default Advanced Search Web part without the language selection boxes
Moreover, you need to work with your taxonomist or librarian (whoever is responsible for implementing information architecture in your organization) to ensure that certain metadata reserved for enterprise use is exposed properly in the advanced search Web part.
Mapping the Features of SharePoint Server 2007 to Your Information Architecture
Most companies that we work with do not have an enterprise-wide taxonomy into which the various types of content are described and codified. While SharePoint Server 2007 gives you the ability to meticulously create metadata for each type of content, it is a time-consuming task, to say the least, and most companies simply don’t allocate the budget to get this job done.
The findability of individual content items depends directly on the level of discrimination that the metadata assignments provide your end users. This is why it is so important to set forth a set of primary data elements that represent the core of your business documents and to reserve those documents’ metadata fields for enterprise use. Let’s consider a somewhat absurd illustration to make this point.
Let’s assume that we’re in the business of developing tracts of land for new housing projects. As part of our business processes, we go out and find new land to develop. As we consider larger tracts of land, we decide to call those tracts “fields.” Inside the fields, we’ll have “neighborhoods.” Inside neighborhoods, we’ll have “lots” upon which individual homes are built. So we develop a land document that describes potential “fields” to buy, and it has a metadata assignment called “field name.”
Let’s further assume that our company is growing and is in need of some fun company-sponsored events that help our employees get to know each other better. So we assign one of our staff to develop some intramural sports programs. That staff member develops a soccer league for our growing company, and, on the game assignment document, he creates a metadata assignment called “field name” to differentiate between the different soccer fields that our company will use for our games.
Do you see the problem? The metadata assignment “field name” has two entirely different meanings because there was no metadata control at the enterprise level. A “field name” for one department means something entirely different than its meaning for the other department.
This is why it is a best practice to ensure that certain metadata assignments are not available for general use within your company. These assignments need to be reserved company-wide and can’t be used to create new metadata assignments in content types or be the names of new site columns in a site collection. To the extent that you allow users to create content types and site columns that allow for duplicate metadata names, you injure the findability of those content items in the enterprise and you hurt the discriminatory power of those metadata assignments.
This is not inconsequential. Retrieval of information is one of the key reasons that customers implement SharePoint Server 2007 in the first place. Contrary to popular opinion, placing content into the SharePoint Server 2007 environment does not automatically make that content more findable. You must also be ready to assign metadata to those content items and to ensure that the metadata assignments are unique across the enterprise and that the values they hold are highly discriminatory.
However, when it comes to document retrieval, we humans are as likely to ask other humans where a document is as we are to query the index. So being able to find the right people is as important as finding the right document. Sometimes, it’s just plain faster (and perhaps more fun) to ask a person where a document is than to query for it.
In terms of retrieval best practices, we believe that it is a best practice to take the more important and popular metadata keyword terms and make those metadata fields that host the terms available for querying in the Advanced Search Web part. Moreover, be sure to teach your end-users that they can query metadata directly in standard search boxes without having to find those fields exposed in the Advanced Search Web part by simply entering a customized query using the following syntax:
Hence, if we wanted to find all of the documents written by Bill English and the Author field wasn’t exposed in the Advanced Search Web part, the user could use any standard search box and enter the query string
author: “Bill English”
Any managed property can be used in this fashion, so another best practice would be to list your managed properties and the included crawled properties if you’re going to train your end-users on how to enter metadata queries. This list would need to exist in the Search Center and either be listed below the search box or contain a link to a Web page that would include this information. Using a managed property to filter search results only works in Web parts that use SQL syntax, the managed property must be exposed in the Web part and not entered manually.
Metadata Results Depend on How You Create Them
What we have learned about metadata creation after working in the field for many years can be encapsulated in the following bullet points. Be sure to understand that how you create metadata will impact how the result set displays it. Obviously, a more concise result set results in a happier end-user. And we administrators live for their happiness, right?
- The ability to execute a query on a managed property really does work, but you need a space between the colon “:” and the start of the query keyword. “DocNum:12345” is interpreted as a single keyword. “DocNum: 12345” is interpreted as a macro+keyword combination by the search Web part. You cannot directly query crawled properties in the search Web part. They must be made part of a managed property first.
- If the metadata is held in a site column that is created from within the document library interface, then the library name will also appear in the result set along with the document when the property query is executed.
- If the metadata is held in a site column that is created from within the content type, then only the document will appear in the result set.
- If the metadata is held in a custom property in the document itself, then only the document will appear in the result set.
- The creation of new managed and crawled properties requires only an incremental crawl to expose them.
- Site columns are added to the SharePoint category of crawled properties whether they are created within the document library or within the content type.
- Document custom properties are added to the Office category of crawled properties.
When it comes to securing the document, your two main choices are to explicitly assign permissions to the document or inherit permissions from the document library in which the document resides. Item-level permissions can be set on each document if you need to explicitly assign permissions, but in most cases it will be sufficient for documents to inherit permissions from the document library if not the site itself. Setting item-level permissions greatly increases the administrative effort on the part of your end-users, and it increases the chances that some items will be incorrectly secured.
When it comes to securing information in SharePoint Server 2007, most of the security assignments will be executed by your site administrators who are (presumably) non-technical, non-IT end-users. To be sure, some site collections will be wholly managed by IT, but to have IT manage all site collections, site collections and sites in the entire farm will require a significant personnel investment to which most organizations are unwilling to commit. So, while the-end-user-is-now-the-security-administrator reality can scare most IT administrators into psychotherapy, it must be managed and addressed. The following best practices will help you in this regard.
First, the content owners of the document should be given site-level management of the site in which the document will reside. To be more precise, the content owners should be site administrators of the site in which the document will reside. This will ensure that the document’s security is managed by the content owner or the owner’s designee. Use the DDoc’s content owner specification to ensure that the content owner is the site administrator. Now, this brings up two related points:
- Because content owners will be site administrators, they will need training on how to manage and secure a site.
- The content owners, in some instances, will need direction on who can access the site and document and who cannot. The DDoc should specify any unique security needs.
A second best practice is to ensure that the site owners are not managing documents to which they should not have permissions. Remember that site administrators can grant themselves permissions to any content within their site, so if a document to which administrators should not have access is placed in that site, then security has been compromised. So for each unique set of exclusionary permissions to documents that exist in your deployment, you’ll need, at a minimum, different sites in which to develop those documents.
Third, we can’t forget about site collection administrators in this discussion. Recall that site collection administrators can grant themselves permissions to any content item hosted within the site collection. So part of your DDoc should specify the potential user accounts that can be listed as site collection administrators and should also specify any user accounts that should not be site collection administrators for sites in which the document will be hosted. For each unique set of exclusive permissions that are developed, you’ll need another site collection.
Note Active Directory directory services groups cannot be site collection administrators. Only Active Directory user accounts can be site collection owners.
Remember that site collection administrators do not appear in the groups interface because they are not a group, but an assignment to individual accounts. Figure 8-5 illustrates how the site collection administrators do not appear as a group in the list of SharePoint groups. Figure 8-6 illustrates the people interface with the Is Site Admin check box selected in the default view. Selecting this check box will inform you who is a site collection administrator. Note that while the account bcurry is an owner in the portal, his account is marked as “no,” indicating that he is not a site collection administrator.
Fourth, it is a reality that people change roles within organizations and sometimes change jobs between organizations. You will need third-party software to help you efficiently track content ownership changes, site owner changes, and site collection administrator changes in SharePoint Server 2007. If you don’t track these human resource changes efficiently in SharePoint, user accounts will be left with access to content that perhaps violates security rules and policies.
Note There is a third-party software package, DeliverPoint 2007, published by Barracuda, available for download from http://www.deliverpoint.com. DeliverPoint 2007 will help your organization work with user accounts in a number of ways, including permissions discovery as well as cloning, deleting, and transferring permissions and alerts between accounts at the farm, Web application, managed path, site collection, and site levels.
In addition, you’ll need to perform Permissions Discovery reporting for certain documents that you host in SharePoint to ensure that your implementation can meet compliance requirements. Because SharePoint does not offer this functionality, consider using one of the third-party tools on the CD to help you ensure that your security assignments are in compliance with industry standards and regulatory requirements.
Our last security best practice is to ensure that, if major/minor versioning is implemented, only those who can edit the document will be able to see the minor versions. This is a document library setting and is not turned on by default.
Figure 8-5 A listing of default groups in the collaboration portal in SharePoint Server 2007
Figure 8-6 Is Site Admin column showing that the Administrator is a site collection administrator, but that the bcurry account, which is in the portal owners group, is not a site collection administrator
Lessons Learned: Be Sure You Know Who Has Been Assigned Permissions Throughout Your SharePoint Deployment
One customer learned the hard way about not understanding the effects of the various permission assignments throughout a farm. We’ll explain in just a moment. To understand the background, you need to understand that the following groups and/or accounts have pervasive permissions in a SharePoint Server 2007 deployment that are not revealed or exposed in the user and group interfaces for sites and site collections.
First, the Application Pool account has uber-read permissions to every object within the Web applications that are using that application pool. Be sure to understand that if anyone logs on as the Application Pool account, he can read all of the SharePoint content in all of the sites and site collections in all of the Web applications that are associated with that application pool. Second, the farm administrators have the ability to grant to themselves access to any information in your SharePoint Server 2007 deployment. Be sure that those who are added to the farm administrators group can be trusted not to use their access improperly. If they do add themselves to any part of the farm, that is an audited event. Third, the Default Content Access account has default read permissions throughout the farm. Fourth, any accounts given permissions via a policy will have access to those areas granted, but will not appear in the users and groups interfaces at either the site or site collection levels. Last, site collection administrators can grant themselves access to any information within the site collection, but the granting action is audited by SharePoint.
One client learned that much of his information had been exposed to the wrong group of people because a consultant who didn’t understand how to implement SharePoint added the Domain Administrators and Enterprise Administrators accounts to the Farm Administrators group. This individual was under the mistaken assumption that by adding these groups, certain permission problems he was experiencing would go away. When he learned this wasn’t the case, the groups were not removed and, for an extended period of time, members of these two groups could have given themselves access to any sensitive information in the farm they desired. Because they also managed Exchange and other platforms, it turned out that they didn’t try to take advantage of their position because they were already trustworthy, but the end result could have been very different.
Bear in mind that there are two things that no developer can write code to guard against: an unwise administrator and an untrustworthy administrator. When hiring network administrators, understand that they will have pervasive access to nearly all, if not all, of your most sensitive information. Best practice is to perform thorough background security and personnel checks to be sure the person(s) being hiring have the highest ethical integrity possible.
Workflow and Approval
Some documents go through an iterative creation-approval-improvement-approval cycle several times before they are ready to be consumed by the larger intended audience. Documents are not written in a vacuum. Every document written, including this chapter, has an intended audience. The main reason that documents are sent through an approval process before being distributed to the intended audience is because the creators need to ensure that the document has the right messaging and focus. It is important to note that workflows are simply electronic restatements of processes that are already occurring in the organization without the use of workflow technology.
The first best practice to creating workflows is to have a clear understanding of the current processes that are being used to publish a document. Following closely behind is a clear understanding of how to improve that process so that the workflow can be written to follow the most efficient path available. If there is not a clear set of policies and procedures in place that users can reference when they create their workflows, the chances are high that the workflows they do create will end up routing documents through the wrong people or leaving out individuals who should have been included. The order might be wrong, too. For workflows to be effective, they must match the written policies and procedures.
A second best practice concerns the way that SharePoint implements workflows. SharePoint Server 2007 causes some consternation among IT administrators because the approvers in the workflow are account based, not position based. To be more precise, the approvers are individual users identified by their Active Directory account instead of users who occupy a particular position in the organization. Usually, a document’s approval needs to be performed by a position. The inability to specify a position in a workflow means that every time a user changes positions, the workflows that are related both to the outgoing and incoming user need to be modified.
Hence, a best practice for working around this is to place those users who need to be in a particular workflow into global security groups in Active Directory. The groups should correspond to an organizational position and then the group can be used as an approver in the workflow. Implemented this way, a user can be changed in the group to represent user changes in the organizational positions and the workflow will continue to operate without any changes.
If you choose to not use Active Directory groups to create position-based workflows, then you’ll need to notify site owners whenever a user leaves or is added to their site because of positional changes. Site owners may need to modify existing workflows to ensure the correct people are included in the workflow.
Another best practice is to make sure that the workflow names meet a particular naming convention. The reason for this is because the workflow names appear in the e-mail as pronouns for the workflow itself. For example, if the workflow is named “Final Approval,” then the e-mail verbiage will say “Final Approval Tasks have started <document>.” In Figure 8-7, the workflow name is “Final Approval” and the name of the document is “This is the apple document.” You can see how this text, put together, is rather confusing. Figure 8-8 shows that when the e-mail is opened, the arrangement of the verbiage in the e-mail makes more sense, but it is still not clear.
Figure 8-7 Appearance of the workflow e-mail that indicates there is a workflow task waiting for the Administrator account to perform
Figure 8-8 Workflow task e-mail opened in Microsoft Office Outlook 2007
This can be confusing to those at the receiving end of the workflow e-mail. So be sure to set up a workflow naming convention that your users are expected to follow, such as <workflow_name for document_name>. Create a naming convention that will make sense in the e-mail interface so that the users don’t confuse workflow task e-mails with alerts and other system-generated e-mails.
The distribution of documents in SharePoint is very different than distributing hard copy documents or routing the documents through a workflow. Distributing a document in SharePoint really means two essential things:
- Ensuring that the finished document is placed in the correct location
- Ensuring that those who need to consume the document have permissions to the document in its location
Location of the Finished Document
Wherever you want the finished document to reside, if that location is within SharePoint, it will be in a document library. Because team sites, Document Centers, and record repositories can host document libraries, the question becomes this: Where should the finished document reside for its active life while it is being consumed by the appropriate audience? There are an infinite number of specific answers to this question, but we can safely group most of those answers into the SharePoint objects illustrated in Figure 8-9.
Figure 8-9 Possible document hosting locations and breadth of consumption illustration
As a general rule of thumb, the wider the audience (those who will consume the document) for the document, the wider or broader the scope of the site should be in which the document is hosted. Stated another way, the breadth of consumption of the document should match the breadth of consumption of the site in which it is hosted. For example, the human resource policy manual is a document that is consumed by nearly everyone in an organization at one time or another. So it would only make sense that the finished (or currently published) version would be hosted in a site that has broad access by everyone in the company, such as a company portal or the HR portal.
From this perspective, SharePoint can be conceptualized as having a series of tools that have intended scopes of consumption. For example, referring back to Figure 8-10, when documents are hosted in a Records Center, they are intended to be official, compliant, (potentially) public, truthful records of communication. Interested parties from inside and outside the company will (potentially) consume this information. By the same token, when documents are hosted in a corporate portal (most often in the Document Center), then it is assumed that those who have access to the corporate portal should also be able to consume (read) the documents in the Document Center. While the breadth of consumption can vary from organization to organization, usually the most widely consumed documents will be placed in the Document Center of a portal. Official records will land in the Records Center. These documents may be consumed by a wide audience, but the more common scenario is that these documents are consumed primarily by the record librarian and selected, interested parties. As you go down the scale, the breadth of consumption narrows to the point where only a few members will consume data in lists.
Note Figure 8-9 is somewhat arbitrary because each of the SharePoint objects can be configured with very wide or very narrow permission sets. This discussion is meant to give you a roadmap on the tools SharePoint offers to host information that is intended for a given breadth of consumption.
In our estimation, we believe that it is a best practice to host data once and link to it from multiple locations if those in other locations than the hosting location need quick access to the information. When multiple copies of a finished document are hosted in multiple locations, there is a very real opportunity to have different “versions of the truth,” which is deterrent to clear communication. Most individuals, teams, and organizations constantly struggle with message discipline, message clarity, and message consistency. We believe there is no sense in adding to that struggle by suggesting it is a best practice to host multiple copies of a finished document in multiple locations. Single instance hosting (SIH) should be the goal for your documents.
Permissions to the Document
Now, there is a line of thinking that says, “We can host a document in a team document library, give everyone read permissions, and place a link to the document in the portal so that we don’t move the document from its creation location to another location.” This is really a carry-over from the design of SharePoint Portal Server 2003. And with proper planning, this is a viable design. But if you intend to do this, then you need to understand the security implications and risks.
First, those who secure the document library will need to understand how to configure the library’s permissions so that only those who edit documents in the library can see the minor versions of the document. This setting is configured in the document library settings and is referred to as Draft Item Security (see Figure 8-10).
Second, your overall document plans will need to specify a plethora of team site locations for hosting finished documents. As the number and type of documents are added to your overall DMS planning matrix, you’ll find the number of finished hosting locations will grow and will likely become more difficult to manage.
Third, as you randomize a team site’s breadth of consumption, you introduce a randomized consumption pattern that will likely irritate most users. Most users would rather find sets of finished documents in one place as opposed to clicking on multiple links that tunnel through to multiple document libraries in which smaller sets of finished documents are exposed. This scenario detracts from an optimal findability solution and should be used only in exceptional circumstances. Best practice is to minimize the number of locations that host finished documents.
Figure 8-10 Draft Item Security configuration options in the document library settings
Note For a full discussion of findability, please see Chapter 15, “Implementing an Optimal Search and Findability Topology.”
When finished documents are moved to the locations specified in the DDoc, those locations need to be correctly secured so that the intended audience is the only audience that can consume that document. You should use a third-party tool that can inform you about the permissions that are assigned to an individual document or library when the security of that document or library is a high priority to the mission or existence of the organization.
Using the Send To Feature in SharePoint
For each document in a document library, you can have three levels of Send To functions:
- Other Location Send To field that is filled in by the user for an individual document
- Document Library custom Send To field that is configured by the document library administrator and appears in all document drop-down lists in the document library
- Official Records Repository that is configured by the farm administrator in Central Administration and appears in all document drop-down lists farm-wide
The Other Location method is illustrated in Figure 8-11. This method allows a user to set up a connection between the document in the local library and a copy of the document in the remote location. Then, when there is an update in the local copy, a prompt can be configured to remind the document’s author to send an updated copy to the remote location. This remote location can be any location within the SharePoint farm. The Other Location method is most often used to send a document from one team site to another or from a child site to a parent site within a site collection. It can be used to keep spreadsheets, reports, and other often-updated information current in two different locations. While this method can be used to publish an individual document from a team site to a portal, it cannot be used to publish a document from a team site to a Records Repository. This feature is implemented on a per-document basis and is best used when there is a one-off need to publish a document from its creation site to its consumption site.
Figure 8-11 Send To Location feature in a document’s shortcut menu
Note Note that the connection in the Other Location method can be set up only in a 1:1 relationship. You can’t set up a connection in a Many:1 relationship, which would be helpful if you’re editing individual documents in the source location that are then aggregated into a single document at the destination, or what is commonly called a compound document. Some will know this as a thicket.
The Document Library custom Send To field is illustrated in Figure 8-12. This method allows a single remote location to be made available for copying all the documents hosted in the document library to the remote location. This method is best utilized when a document library becomes the source location for an entire set of documents that are related to one another in some manner and the entire set needs to be copied to a remote location for wider consumption. Because most documents in a set will not be created and finished at the same time, this method allows each individual document to be sent to the same remote location without having to send the entire set at one time or to repeatedly send the entire set to the remote location when there are changes in only one or two of the documents. This method also allows the same Send To field to appear for different authors in the same library, so each author is not forced to set up the same connection individually for each document.
Figure 8-12 Custom Send To method in the properties of a document library
Note Because each document is sent to the remote location individually, a connection will be created between the source and destination document as if the Send To location has been entered manually for each document.
The Official Records Repository method is illustrated in Figure 8-13. This method allows any finished document that is intended to be a matter of record to be sent to the Official Records Repository hosted at the farm level. Presumably, the repository will have a matching content type and document library into which the incoming document will be placed as an official record. In the absence of this matching content type, the document will be placed in the Unclassified records library, as illustrated in Figure 8-14. Note that, unlike the Send To method, in the Official Records Repository method, no link is maintained between the source document and the target document in the Records Center. Note also that the name of the document in the Records Center is appended with a unique string of characters and this cannot be modified.
You can create more than one Official Records Repository in your SharePoint Server 2007 implementation. (But only one—and the same one—can be configured at the farm level and appear in the document’s shortcut list. Refer to Figure 8-13.) If you do this, use the Document Library custom Send To method to create a Send To connection between documents in a designated source library and the Records Repository so that there is a clear understanding that the source document library is used to create and develop what will be official, compliant, truthful communication and that the same content type is used in both the source document library and the destination document library in the repository site.
When developing an Official Records Repository, it is a best practice to clearly define the source document libraries and locations from which the official records will arrive. The content types and their names must match in the Record Repository’s routing table, and documents must be sent to the repository after all copy edits, technical edits, and approval workflows have been completed. If there is an update to the source document after it is sent to the repository, another record is created when it is resent to the repository. The first record is neither updated nor overwritten. This is because each record is a permanent record with a unique six-alphanumeric character string attached to it. Any updates to the record constitute a new record, not an update to the current record.
Figure 8-13 Send To menu shortcut for the Official Records Repository
Figure 8-14 Unclassified documents with their names appended with a random string of characters
Retention is the act of retaining a document in a specified location. The decision to retain a document and the length of time it is retained should be made by those who have developed the DDoc well before the document is created. The retention time is set using the expiration policy on the content type or at the site collection policy level.
When the expiration time is set at the site collection policy level, it is just an available setting that needs to be applied to the content type. Interestingly enough, when it is applied through the document library interface, it applies only to the instances of that content type in that library. If the setting is applied at the content type level, then the setting applies to all instances of the content type across the entire site collection. In the absence of site collection policy settings, when you use either the document library interface or the content type interface to set the expiration policy, you’re applying that policy at the content type level (there is no document library-level expiration policy) and it will apply to all instances of the content type within the site collection.
If you need to use the same content type across multiple document libraries and have different retention policies within those libraries, then assign the expiration policy to the content type at the document library level using the site collection expiration policy. However, if you need to assign the same retention policy to all instances of that content type within the site collection, then use the information policy setting on the content type.
Note that part of your retention policy is to decide what to do with the document after it has served its useful life. In some instances, you’ll decide to delete the document. In other instances, you’ll route it through a workflow that may require other individuals to decide what to do with the document—continue to retain it, delete it, or send it to long-term storage (a process commonly referred to as archiving). The customized workflow can be created in Visual Studio.NET and then applied to the content type. Whatever decision is made, the disposition process needs to be clearly outlined in the DDoc. It is to the topic of archiving documents that we now turn our attention.
Archiving a document or record means to place that item in a long-term storage solution so that, if needed, it can be retrieved at some point in the future. Usually, records in an archive are useful only for reference and never for ongoing collaboration.
The main enemy of archiving is technological change. As our word processors, security methods, and storage technologies evolve, older information can become inaccessible, defeating the original purpose of hosting the record in an archive. For example, if a document is secured through an Information Rights Management (IRM) program, the certificate and its chain of authority certificates may become outdated or obsolete, rendering the document both inaccessible and unusable. Moreover, a document secured by a password may not be accessible if the correct password cannot be found to open the document. Another example would be if a user tries to open a very old document, such as a WordStar 2000 document or an Ashton-Tate DB III database. While current technologies might still be able to open very old documents like this, the day may come when such documents will no longer be accessible in their original form.
Another enemy of archiving in SharePoint is the enumeration of large lists. SQL is certainly able to handle the storage of millions of records in a single database, but the enumeration of that database can be very difficult once the number of items in the list grows very large, such as a list of over 10,000 records. In many instances, a list this large is simply not renderable in the browser. Over time, your organization can have hundreds of thousands or millions of records in a long-term archive. Enumerating large lists will quickly become a factor in your archive architecture decisions.
Several best practices present themselves and should be seriously followed if you’re going to use SharePoint as a long-term archiving solution. First, plan where records will land in the long-term storage and the technologies that will be used to display them to the user.
One of the drawbacks to SharePoint is its inability to enumerate large lists in the browser. Recommendations for best practices vary based on who you ask, but we suggest that your lists in a single view not exceed 1,000 items. When a list is viewed via the browser, the entire list is loaded into memory on Web front-end, then presented in chunks to the end-user via the browser. For smaller lists, this is acceptable, but for larger lists (usually over 2,000 items), the list enumeration may time out because the server can’t load the entire list fast enough or lacks sufficient memory to perform the function.
If you need to exceed 1,000 items in a list, then a fall-back best practice is to use the Data Form Web Part (DFWP) to view the list rather than the browser or a filtered view in the browser. The reason for this best practice recommendation is two-fold: First, the DFWP can be configured to have sorting, grouping, and filtering available for end-users to employ when they initially view the list. This allows for a faster, better findability solution across a large set of records. Second, the DFWP does not retrieve the entire list in one call. Instead, it retrieves only the number of records it is configured to retrieve for any given display. For example, the DFWP can be used to view a list of 10,000 records but be configured to present only 100 records at a time. Unlike the browser that will first attempt to retrieve all 10,000 records, when the DFWP displays the first 100 records, it retrieves only the first 100 records. It will not retrieve any more records until asked to do so, and then it will retrieve only the next 100 records. This results in a faster, more performant viewing of the records in a large list.
A second best practice that relates to long-term archiving of records is to implement a process that updates the base records to new technologies on a regular basis. For example, if the original document was created in Microsoft Office Word 97, when the latest edition of Office Word is released, make it a point to upgrade the document to the latest version so that the content stays current with technology changes. We realize this practice is costly and somewhat cumbersome, but we assume that if your organization values the information enough to retain it in long-term storage, then the cost of upgrading the format of the data elements will be approved.
Note In some instances, the organization will want to retain the information but will not want the burden and cost of upgrading the base formatting of the information every seven to ten years. If this is the case, the records can be printed, scanned into .tiff files, and then placed back into SharePoint where the Optical Character Recognition (OCR) technologies can be used to index the content for fast retrieval. The need to upgrade .tiff files and OCR technologies will not be nearly as important as upgrading other platforms, such as Word, Excel, PowerPoint, and other word processing and spreadsheet programs.
A third best practice concerns records that are secured using certificates. If the records in the archive are secured through a certificate-based technology, then you’ll need to ensure that those certificates are updated on a timely basis or the records will become unusable because they will become inaccessible. If there are passwords on the record, then the passwords will need to be recorded and securely stored. If your organization does not want to bear the costs associated with retaining the security on the records in the archive, then the security can be removed and the record retained in the archive using SharePoint security. Alternately, the record could be printed out to hard copy and retained in a secure location.
© Microsoft. All Rights Reserved.