Revisiting 64-bit-ness in Visual Studio and elsewhere

[Due to popular interest I also wrote a piece that is "pro" 64 bits here]

The topic of 64-bit Visual Studio came up again in a tweet and, as usual, I held my ground on why it is the way it is.  Pretty predictable.  But it’s not really possible to answer questions about your position in a tweet, hence this posting.

I’m going to make some generalizations to make a point and you should really not use those generalizations to make specific conclusions about specific situations.  This is as usual in the spirit of giving approximately correct advice rather than writing a novel.

Let’s say I convert some program to a 64-bit instruction set from a 32-bit instruction set.  Even without knowing anything about the program I can say with pretty good confidence that the most probable thing that will happen that it will get bigger and slower.

“But Rico! More RAM better!  More bits better!”

In the immortal words of Sherman T. Potter: “Horse hucky!”

I’ve said this many times before, for the most part there is no space/speed trade-off.  Smaller IS Faster.  In fact, in a very real sense Space is King.  Or if you like Bigger is Slower.  Part of the reason we study space/speed tradeoffs is because they are exotic beasts and it’s important to understand how it is that using more memory, inherently more expensive, can strangely give you a speedup, and under what conditions that speedup actually persists.

Let’s break it down to two cases:

1. Your code and data already fits into a 32-bit address space

Your pointers will get bigger; your alignment boundaries get bigger; your data is less dense; equivalent code is bigger.  You will fit less useful information into one cache line, code and data, and you will therefore take more cache misses.  Everything, but everything, will suffer.  Your processor's cache did not get bigger.  Even other programs on your system that have nothing to do with the code you’re running will suffer.  And you didn’t need the extra memory anyway.  So you got nothing.  Yay for speed-brakes.

2. Your code and data don’t fit into a 32-bit address space

So, you’re now out of address space.  There are two ways you could try to address this.

a) Think carefully about your data representation and encode it in a more compact fashion

b) Allow the program to just use more memory

I’m the performance guy so of course I’m going to recommend that first option. 

Why would I do this?

Because virtually invariably the reason that programs are running out of memory is that they have chosen a strategy that requires huge amounts of data to be resident in order for them to work properly.  Most of the time this is a fundamentally poor choice in the first place.  Remember good locality gives you speed and big data structures are slow.  They were slow even when they fit in memory, because less of them fits in cache.  They aren’t getting any faster by getting bigger, they’re getting slower.  Good data design includes affordances for the kinds of searches/updates that have to be done and makes it so that in general only a tiny fraction of the data actually needs to be resident to perform those operations.  This happens all the time in basically every scalable system you ever encounter.   Naturally I would want people to do this.

Note: This does NOT mean “store it in a file and read it all from there.”  It means “store *most* of it in a file and make it so that you don’t read the out-of-memory parts at all!”

This approach is better for customers; they can do more with less.  And it’s better for the overall scalability of whatever application is in question.  In 1989 the source browser database for Excel was about 24M.  The in-memory store for it was 12k.  The most I could justify on a 640k PC.  It was blazing fast because it had a great seek, read and cache story.

The big trouble with (b) is that Wirth’s Law that “software manages to outgrow hardware in size and sluggishness” applies basically universally and if you don’t push hard nothing ever gets better.  Even data that has no business being as big as it is will not be economized.  Remember, making it so that less data needs to be accessed to get the job done helps everyone in all the cases, not just the big ones.

So what does this have to do with say Visual Studio? *

I wrote about converting VS to 64-bit in 2009 and I expect the reasons for not doing it then mostly still apply now.

Most of Visual Studio does not need and would not benefit from more than 4G of memory.  Any packages that really need that much memory could be built in their own 64-bit process and seamlessly integrated into VS without putting a tax on the rest.   This was possible in VS 2008, maybe sooner.  Dragging all of VS kicking and screaming into the 64-bit world just doesn’t make a lot of sense. **

Now if you have a package that needs >4G of data *and* you also have a data access model that requires a super chatty interface to that data going on at all times, such that say SendMessage for instance isn’t going to do the job for you, then I think maybe rethinking your storage model could provide huge benefits.

In the VS space there are huge offenders.  My favorite to complain about are the language services, which notoriously load huge amounts of data about my whole solution so as to provide Intellisense about a tiny fraction of it.   That doesn’t seem to have changed since 2010.   I used to admonish people in the VS org to think about solutions with say 10k projects (which exist) or 50k files (which exist) and consider how the system was supposed to work in the face of that.  Loading it all into RAM seems not very appropriate to me.  But if you really, no kidding around, have storage that can’t be economized and must be resident then put it in a 64-bit package that’s out of process. 

That’s your best bet anyway.  But really, the likelihood that anyone will have enough RAM for those huge solutions even on a huge system is pretty low.  The all-RAM plan doesn’t scale well…  And you can forget about cache locality.

There’s other problems with going 64 bit.  The law of unintended consequences.  There’s no upper limit on the amount of memory you can leak.  Any badly behaved extension can use crazy amounts of memory, to the point where your whole system is unusable. ***

But, in general, using less memory is always better advice than using more.  Creating data structures with great density and locality is always better than “my representation is a n-way tree with pointers to everything everywhere”

My admonition for many years has been this:  Think about how you would store your data if it were in a relational database.  Then do slices of that in RAM.   Chances are you’ll end up in a much better place than the forest of pointers you would have used had you gone with the usual practice.  Less pointers, more values.

This isn’t about not wanting a great experience for customers, nothing could be further from the truth.  It’s about advocating excellence in engineering rather than just rubberstamping growth.  This is basically my “brand.”

* I don't work on Visual Studio anymore, don't read this as any indication of future plans or lack of plans because I literally have no idea

** there are significant security benefits going to 64-bit due to address randomization and you do get some code savings because you don’t need the WOW subsystem, but VS is so big compared to those libraries that doesn’t really help much, it was a big factor for MS Edge though

*** Also happens in MS Edge