Exchange Server 2010: Exchange Server High Availability
E-mail is an essential business communication tool, so anything you can do to ensure high availability with Microsoft Exchange Server is a good thing.
Excerpted from “Exchange 2010 - A Practical Approach,” published by Red Gate Books (2009).
There are several layers in Microsoft Exchange Server 2010 you can configure as a high-availability (HA) solution. The Database Availability Group offers HA on the Mailbox Server role. If you want a full HA solution, you can configure the Client Access Server and Hub Transport Server as an HA solution as well.
It’s worth bearing in mind that Exchange Server 2010 Standard Edition also supports replication technologies, just like the Exchange Server 2010 Enterprise Edition. The only difference is that the Standard Edition only supports up to five databases per server, while the Enterprise Edition supports up to 100 databases per server. This is a perfect development for organizations that don’t have thousands of mailboxes, and don’t need to create a lot of Mailbox Databases.
Exchange Server Databases
Configuring HA for other server roles hasn’t changed significantly since Exchange 2007. Exchange Server 2010 uses a database to store the primary data—the messages you send and receive. This database technology is a transactional system, which is pretty common, but Exchange Server uses its own technology built on the Extensible Storage Engine (ESE), sometimes referred to as a JET database.
When installing an Exchange Server 2010 Mailbox Server, the initial mailbox database is, by default, stored on the local C:\ drive; more specifically on C:\Program Files\Microsoft\Exchange Server\V14\Mailbox\Mailbox Database <<random number>>\. This random number is generated by Exchange Server during the initial configuration because the database names on Exchange 2010 and higher servers must be unique within the Exchange organization.
A number of files make up the Exchange 2007 database environment, all of which play a crucial role in the correct functioning of Exchange server:
- “mailbox database 0242942819.edb”
- E00000003a.log, E000000003b.log, E00000003c.log and so on
- E00res00001.log and E00res00002.log
To understand Exchange database technology is to understand the flow of data between the Exchange Server and the database itself. Data is processed in 32KB blocks, also called “pages.” When Exchange is finished processing such a page, it immediately writes it to a log file if it was updated. The page is still kept in memory until Exchange needs this memory again, but when the page isn’t used for some time, or when Exchange needs to force an update during a checkpoint, the page is written to the database file. So, the data in the log files is always in advance of the data in the database. This is an important step to remember when troubleshooting database issues.
As data is written to the database, a pointer called the checkpoint is updated to reflect the new or updated page that was written to the database. The checkpoint is stored in a special file called the checkpoint file. Exchange Server uses this to make sure it knows what data has been written to the database and what data is in the log files and not yet written to the database. So, in short:
- Mail data is initially processed in memory and separated into pages.
- Updated pages are written to the log file.
- If pages are no longer needed by Exchange these pages are written to the database.
- The checkpoint file is updated to reflect the new location of the checkpoint.
Extensible Storage Engine
The database engine Exchange Server uses is built on the ESE. The ESE exists in several flavors:
- ESE97 for Exchange Server 5.5
- ESE98 for Exchange Server 2000/2003
- ESENT for Active Directory
- ESE for Exchange Server 2007 and Exchange Server 2010
The ESE is a low-level database engine. This means it knows all about “base types,” such as short, string, long, longlong, systime and so on. However, it has no knowledge of any structure or schema. The schema is defined by the Information Store in the application. This is in contrast to a relational database like Microsoft SQL Server, where all the database structures are just meta-data, or part of the database itself.
The ESE is optimized for handling large amounts of semi-structured data, as it’s impossible for an Exchange Server to predict what kind of data will be received, how large the data will be or what attachments messages will have.
When Exchange Server is working with a page, and that page’s status changes from dirty to clean, the page is written to the log file almost immediately. Data held in memory is fast to access, but volatile. All it takes is a minor hiccup in the server, and data in memory is lost. When it’s saved in the log file, the whole server could burn down, and as long as you keep the disk, you also keep the data.
Thankfully, saving to the log file is normally a matter of milliseconds. The log files are numbered internally, and this number (referred to as the lGeneration number) is used for identifying the log files, and for storing them on the disk when they’re completely filled with data.
The current log file, or the “log file in use,” is E00.log; while Exchange is filling this log file with data, a temporary E00tmp.log file is already created (or is in the process of being created) in the background. When the E00.log is eventually filled with data, it’s saved under another name. The name is derived from the log file’s prefix (E00, E01, E02 and so on) and the lGeneration number, which is a sequential hexadecimal notation.
For example, when the lGeneration number is 1, the E00.log is saved as E0000000001.log. Alternatively, the last time this process happened, the lGeneration number was 3E, so the log file was saved as E000000003E.log. Because the lGeneration number is a sequential number, we know that the next lGeneration number of the E00.log must be 3F, and the next time this log file rollover process takes place, the log file will be saved as E000000003F.log.
Although it’s not directly visible, the lGeneration number is stored inside the log file, and can be checked by dumping the header information of the log file with the ESEUTIL utility. The first few lines of the log file’s header should read something like:.
Base name: E00 Log file: E00.log lGeneration: 63 (0x3F) Checkpoint: (0x3F,8,16)
The lGeneration number is listed on the third line, both in decimal and hexadecimal notation. Unfortunately, this is very confusing, and there will be a day that an Exchange administrator mixes up these notations and starts working with the wrong log file.
After the pages are written to the log file, they’re kept in memory, thereby saving an expensive read from disk action when Exchange Server needs the page again. When the Mailbox Server needs that memory for other pages, or when the page stays in memory for a long time, it is written to the database file. This is also known as the “lazy writer mechanism.”
A common misconception is that data is read from the log files and written to the database file, but this is not the case. It’s written directly from memory to the database, and log files are only read in recovery scenarios, such as after an improper shutdown of the server. Under normal circumstances, the log files are 100 percent write, whereas the database is a random mix between read and write actions.
The relationship between writing data in the log files and writing data into the database itself is managed by the checkpoint file, E00.chk. The checkpoint file points to the page in the database that was last written, and is advanced as soon as Exchange writes another page from memory to the database.
The difference between the data in the database and the data in the log files is referred to as checkpoint depth. This checkpoint depth can be several log files; in fact, the default checkpoint depth is 20 log files. By using the checkpoint, Exchange waits before writing to the database, and tries to combine several write actions so that the database write operations can be performed more efficiently.
Checkpoint depth is also a per-database setting. So when a database’s checkpoint depth is 20 log files, a minimum of 20MB of data is kept in memory for that specific database. When using 30 databases in Exchange Server 2010, each at its maximum checkpoint depth, approximately 600MB of Exchange data is kept in memory.
The Mailbox Database
The “mailbox database 0242942819.edb” file is the primary repository of the Exchange Server 2010 Mailbox Server role. In Exchange Server 2007 this file was called “mailbox database.edb,” whereas in Exchange 2003 and Exchange 2000 the database was comprised of two files: priv1.edb and priv1.stm. In Exchange Server 2010, a Mailbox Server can now hold up to 100 databases.
The maximum size of an ESE database can be huge. The upper limit of a file on NTFS is 64EB, and this is generally considered sufficient to host large Mailbox Database files. The Microsoft-recommended maximum file size of the Mailbox Database on Exchange Server 2010 is 2TB. Compared to the 200GB file-size limit in Exchange 2007 (using Continuous Cluster Replication) this is a tremendous increase. Bear in mind that you will have to configure multiple database copies to achieve an HA solution if you’re using this sizing.
Jaap Wesselius is the founder of DM Consultants, a company with a strong focus on messaging and collaboration solutions. After working at Microsoft for eight years, Wesselius decided to commit more of his time to the Exchange community in the Netherlands, resulting in an Exchange Server MVP award in 2007. He is also a regular contributor at the Dutch Unified Communications User Group and a regular author for Simple-Talk.
Learn more about “Exchange 2010 - A Practical Approach” at red-gate.com/our-company/about/book-store.