I decided to check out the size of Eric's DIT ...
... take some time, measuring the exact dimensions of Eric's DIT
... and I must say, I've seen a fair amount of DITs in my time, and I can say with a
fair amount of certainty, Eric's DIT is the biggest I've ever seen! I can't believe what a massively huge a DIT Eric has.
Note: while this is about an Active Directory database, Exchange is
based on the same database technology, so it would (and does) have similar space hierarchy.
Table of ESE Space usage:
The black in this table is just the output from 3 of the columns of esentutl /ms adamntds.dit (original report), the blue are columns/rows I've added to break out the space usage in a clearer way:
|<calc: DB real>||268179344||6088||0.000||0.046|
|<calc: datatable real>||268178751||5689||2046.041||0.043|
|<calc: Row Data>||186707905||2260||1424.468||0.017|
|<sum: Idx Totals>||81470804||3411||621.573||0.026|
|PDNT_index||Int: PDNT + Name||11482791||7||87.607||0.000|
|nc_guid_Index||Int: NC + objGuid||10870892||5||82.938||0.000|
|DRA_USN_index||Int: Repl USN||7083583||251||54.043||0.002|
|DRA_USN_CREATED_index||Int: Repl Created USN||4479144||34||34.173||0.000|
... deleted about a dozen small indices ...
I'll discuss the permutations I performed on the esentutl /ms output, in the hopes it will be clear ...
First I sum up the owned space for all indices in the datatable, this comes out to 81470804.
Note the #'s above may not add up exactly because I deleted a dozen or
so super small indices. I summed up all the indices because it
makes the next calculation easier, and also so we can get the "% of
Total Idxs" column as well.
So first understand that ESE's "owned" space is hierarchical, so the "datatable" owns all the space owned by each of
the indices and the LV B-Tree in the datatable. But the primary B-Tree for the datatable
also contains (and thus owns) the normal row data. So the real data that is in the
regular row data for the datatable is 268178751 (datatable) - 42 (datatable's LV B-Tree owned) - 81470804 (owned by sum of all datatable indices) = 186707905 (i.e. the "<calc: Row Data>" line).
I then created a couple columns to turn this page counts into a usable unit (GBs), i.e. <# of pages> * 8 / 1024 / 1024.
Finally I added a friendly name column, so you'd know roughly what the index was indexing.
From the above table we can easily see the row data 1,424 GBs and all
the indices combined is 621 GBs. This breaks out like so:
Based on the table above this is showing us a full 30% of this
database is indices!!! That's a huge amount. This isn't a
common space breakdown for most AD objects, as the objects making up
Eric's DIT are very very small / light weight. He was just
creating containers w/ minimal attributes (see Eric's initial post),
and so just the base set of indices on a basic object lead up
to a significant portion of the objects overall "footprint" in the DB.
As for the breakup of the individual index usages, it looks something like this:
Of the secondary indices on the datatable,
10 are always updated! And another 2 (the very slender ones) are
only updated on delete. Since there are over 2 billion objects in
this database, that means we inserted about 22 billion B-Tree entries,
kind of neat.
One last, somewhat technical thing that I think a few of you might find interesting, is that even the
largest 1,424 GB primary B-Tree is only 5 levels deep. This means that to
locate a specific row (by DNT) will only take 5 disk seeks in the worst
case (cold cache). B-Trees have this very nice high fan out, that keeps disk seeks minimal.
Interestingly, I dumped the root page, and it only has 3 nodes (TAG 0
doesn't count), what this means is that we could add about 100x more
data to this b-tree and there would be no increase in the # of disk seeks to fetch a row
from this table.
Anyway that seems like enough for now ...