Back of the envelope

Everyone who writes software, and most who use it, should be familiar with "back of the envelope" calculations. I originally read about this style of thinking about problems in Jon Bentleys' Programming Pearls, Chapter 7

Part of the work we are doing in core is around rendering and performance. We are working to update our rendering abstraction, how we view rendering in general; as well as how we think about and measure performance.

As part of this we have started doing some calculations related to bus bandwidth and how much of the bus each subsystem can and should use. This is really our first attempt at these “back of the envelope” calculations and we should have done them long ago.

Anyway, enough with the recriminations and on to the data. First, here are the bus bandwidth rates we are using:

Bus Expected Perf Rate (GB/s) Expected Rate ( MB/Frame at 60 FPS ) Date introduced

PCIe 16x 2.0 4 68 2007

PCIe 16x 1.0 2 34 2004

PCIe 8x 1.0 1 17 2004

AGP 8x 1.0 1 17 2002

Ok, so let’s think about this and just the Autogen tree subsystem. In FSX, most auto generated trees have 12 vertices that contain 32 bytes of data[1] thus giving the model a size of (12 * 32 bytes) = 384 bytes per instance.

When using batching, assuming 20% of the bus bandwidth is available for auto generated trees, it is possible to transfer (0.20 * 34 MB / 384 bytes per instance) = 18568 batched trees per frame at 60 Hz.

Now, a typical scene has 50 1km x 1km cells in the scene. Autogen trees are pegged at 4500 max per cell (when the slider is all the way to the right). You can set the max up to 6000, so let’s call it 5000 for easier math. 5000*50=250,000 trees.

Holy Autogen Batman! Yes, this is why autogen brings the system to its knees. Crysis doesn’t try to render 250k trees. If you are having a major perf problem with Autogen, try using the “max in cell” tweak to reduce the max to 500. Then your max is 500*50=25,000.

It should be obvious the same holds true for Autogen buildings.

Part of our performance work is to turn the engine from a batching engine to an instancing engine to help bridge this gap. When using instancing and assuming 10% of the bus bandwidth is available for non-animated instance data (0.10 * 34MB / 48 bytes per instance) = 74, 274 instances per frame at 60 Hz can be sent across a PCIe 16x bus. If we give Autogen Trees 20%, then we get 150,000 trees. 3000 trees per cell is certainly within those limits. So that change alone gets us close to "within bounds".

Given there are sliders and config entries, users can adjust their settings to local conditions of the hardware and the style of flying you do. That is why this gap isn’t “tragic”; but it still is a rude surprise to most people that FSX tries to do so much and that is a good part of the root of the problem and why the FSX engine is so different from other engines.

And there are some other things we are doing, so the engine isn’t so overcommitted. But that is another post. J.

PS.

http://support.microsoft.com/kb/555739 lists the Autogen max per cell tweak.

PPS

The slider stops correspond to the following percentages:

0, 10, 20, 45, 70, 100


[1] 3 floating point numbers for position, 3 for normal, and 2 for texture coordinates: ((3 + 3 + 2) * 4 bytes) = 32 bytes.