On email archiving
Keeping control of your messaging data can often be a frustrating experience. Administrators can often spend a lot of their time trying to balance the competing requirements of the user community, IT management and industry. Users need bigger mailboxes and better performance; IT management need published recovery SLA’s to be met and the use of expensive disk space to be minimised; and industry requires administrators to meet strict data retention policies. Archiving email provides a number of options which can allow administrators to better manage these issues.
What are the choices?
Email can be archived in a number of ways. These range from a simple Exchange online backup, to a solution where every message sent to or from your Exchange Organisation is ‘intercepted’ out on the internet and archived in transit. Alternatively there is the Exchange Server “message journaling” feature, where a copy of every e-mail that is sent from, or received by, a specific Exchange store is copied to a chosen mailbox. There is also the idea of moving email out of the Exchange Information Store and into an archive; a database stored beyond the Exchange Organisation but within your Exchange Administrators direct control. This blog will focus on this last example.
There are a number of well established archive solutions for Exchange out in the market; some of which are described here: http://www.microsoft.com/exchange/partners/default.mspx. Email archived in this way is generally archived automatically based on policies set by the administrator (and\or via a client action), and will generally be archived according to size and\or age. The email may be archived in its entirety, to be retrieved by an administrator or user from a separate console; or a ‘stub’ will be left in the information store which can be viewed through the Outlook client. This ‘stub’ is a pointer to the location of the archived message and may include the subject and even the first few lines of the email. By opening this message from within Outlook the message is retrieved from the archive into the Information Store to be presented to the client. The archive is generally an Oracle or SQL database; some solutions offering whichever database best suits your requirements.
Why implement an archive solution?
The first question that administrators need to ask is why an archive solution is being considered. In my experience the answer to this question is often misunderstood and in many cases Exchange administrators are not given enough of a voice in deciding whether the business requirements can be met within the current infrastructure, or by the implementation of an archive solution.
Generally it is an industry’s retention policy which is the deciding factor in project approval for an archive solution. For regulated financial institutions, for example, the UK Financial Services Commission mandates that members must retain all pertinent client records (paper and electronic) for a period of 10 years. If the email data held by your company is governed by such a retention policy it is important to understand in what form the data needs to be and how quickly it needs to be recovered. This will dictate whether these demands can best be met by an archive solution.
Despite the retention policy often being the deciding factor, in my experience the biggest reason for the implementation of an archive solution is the need to meet the demands of the user community who need larger mailboxes and don’t have the time to manage their mailboxes. On the face of it the introduction of an archive solution allows administrators to prove that they can meet retention policies, it allows users to be able to work within their mailbox limits, and it drastically reduces the work that administrators face every day in managing mailbox limits and restoring data. However it is often the case that introducing an archiving solution for this reason may just shift the problem elsewhere and introduce new issues for administrators to face at the same time.
What are some of the issues to be aware of?
There are a couple of great blogs around on the subject of mailbox size limits, message size limits and Outlook performance. Nicole Allen wrote a great blog about item counts and their impact on performance and is recommended reading for all Exchange Administrators. (Recommended Mailbox Size Limits http://msexchangeteam.com/archive/2005/03/14/395229.aspx) There is also ‘Thinking about Mailbox and Message Size Limits’ by Ross Smith. (http://msexchangeteam.com/archive/2006/07/06/428213.aspx) The issues described in these blogs are of particular significance when an archive solution is implemented because although mailbox sizes may be reduced, in most cases mailbox item counts will continue to rise because although a large message is archived, a message ‘stub’ is left in the mailbox, incrementing the folder item count. The performance of Outlook clients for certain operations will suffer as a result; for example accessing a shared calendar with a high item count. It becomes a mistake, therefore, to implement an archive solution to replace the need for users to manage their mailboxes. It is always recommended that you have no more than about 2500 - 5000 messages in any of the Outlook critical path folders; Calendar, Contacts, Inbox, and Sent Items folder. A user education program may be required therefore to put across Outlook best practises. Administrators may spend as much time here, and placating users over perceived poor Outlook performance, as they did managing mailbox limits before the introduction of the archive solution.
There are other potential issues to. IS Maintenance can be interrupted by archive operations. If the online defragmentation process is not allowed to complete at least twice a month on all exchange databases ESE will not be operating efficiently as it should be against these stores. Also archive operations can take time to complete and may not fit with your service requirements. If you operate a 24 hour a day email service there may be periods where performance suffers as the archive software operates against the mailbox stores. Another issue that the administrator may face is the supportability of a new database technology in the environment. Who manages the new Oracle database that has been introduced and are their sufficient skills in house to manage this new database effectively in the long term?
What should administrators consider before implementing an archive solution?
The first thing that administrators need to do is to build a framework around their current infrastructure so to gain as much control as they can over their environment. With this in place it is much easier to understand whether an archive solution will provide the benefits that are required. Some suggested steps are as follows:
- Define, document and publish SLA’s to govern server and database recovery times. This will dictate maximum database sizes and therefore enable you to better predict disk usage.
Gain support from IT management and the business for your SLA’s.
- Set suitable mailbox limits across the company and whenever possible do not allow exceptions. (Set email size limits globally also to introduce additional control over data in the Organisation.)
- Document and publish a user education program, defining email retention policies, mailbox limits, the dangers\benefits of Outlook personal folders etc, and incorporate this into the induction program for any new joiner to the company.
- Gather performance baselines for all Exchange Servers and gather performance data on a routine basis from then on.
If an administrator is given the time to follow the above steps to gain more control over their environment then often the reasons for archive solutions are reduced. For example administrators will now have a better understanding of disk usage and will better be able to predict how this may change over time. This may eliminate the need to move data out of the Exchange databases onto less expensive storage. Once administrators have the control they need over Exchange data then it is important that they understand what the requirements of any retention policy is and whether these requirements can be met within the current infrastructure. If they can’t then the administrator is in a much better position to implement the chosen archive solution with a better understanding of exactly what the implications will be. (A test environment will be a huge advantage in making decisions and assessing the impact of the archive solution prior to implementing in production.)
How will Exchange 2007 improve this situation?
Exchange 2007 introduces a number of features designed to help companies meet their regulatory responsibilities. ‘Message Journaling’ for example gives you the same option as Exchange 2000\2003 but introduces ‘Premium Journaling’ whereby the administrator has much more control over which messages are ‘archived’. An administrator is now given the ability to journal at the transport layer with the option to create journal rules for single mailbox recipients or for entire groups within the organization. It is possible therefore to journal messages destined for particular external addresses. Message Classification is also introduced as a feature in Exchange 2007 allowing a message to be classified and then handled in a particular way using a transport rule. As an example, if all attorneys in your organization are grouped into an organizational unit that is called "Legal", you can configure a transport rule that returns messages that are classified as ‘A/C Privileged’ to the sender if the sender or at least one recipient on the To or Cc line is not in the Legal group. ‘Messaging Records Management’ and ‘Managed Folders’ are also introduced in Exchange 2007. This allows an administrator to define Outlook folders which are created with chosen mailboxes. These ‘managed folders’ may not be deleted by the user and retention policies can be set to apply to email in these folders.
In addition to the new features described above Exchange 2007 introduces significant performance improvements over previous versions of the product and should allow administrators to introduce much larger mailboxes, whilst providing the same levels of performance for their clients. In my opinion a combination of these performance improvements, cheaper disks and the improved retention features described above mean that for many companies they can achieve what they want in terms of regulatory compliance and service to their user community without the need for an archive solution. However, regardless of whether your chosen solution is one based around the features within Exchange 2007, or using 3rd party archive software, or a combination of both; it is vital that it is introduced in collaboration with a solid framework for managing data across the entire messaging organisation.
- Doug Gowans