Architecture by public community discussions?

Here at Twitter HQ, we're not blind to the flurry of discussion over the past weeks about our architecture. For many of our technically-minded users, Twitter downtime is an opportunity to muse about what the source of our problems might be, and to propose creative solutions. I sympathize, as I clearly find our problems interesting enough to work on them every day.
Part of the impetus for this public discussion extends from the sense that Twitter isn't addressing our architectural flaws. When users see downtime, slowness, and instability of the sort that we've exhibited this week, they assume that our engineering progress must be stagnant. With the Twitter team working on these issues on and off for over a year, surely downtime should be a thing of the past by now, right? Shouldn't we be able to just "throw more machines at it"?
To both rhetorical questions, the answer is "not quite yet". We've made progress, and we're more scalable than we were a year ago, but we're not yet reliably horizontally scalable. Why? Because there are significant portions of our system that need to be rewritten to meet that goal.
Twitter is, fundamentally, a messaging system. Twitter was not architected as a messaging system, however. For expediency's sake, Twitter was built with technologies and practices that are more appropriate to a content management system. Over the last year and a half we've tried to make our system behave like a messaging system as much as possible, but that's introduced a great deal of complexity and unpredictability. When we're in crisis mode, adding more instrumentation to help us navigate the web of interdependencies in our current architecture is often our primary recourse. This is, clearly, not optimal.
Our direction going forward is to replace our existing system, component-by-component, with parts that are designed from the ground up to meet the requirements that have emerged as Twitter has grown. First and foremost amongst those requirements is stability. We're planning for a gradual transition; our existing system will be maintained while new parts are built, and old parts swapped out for new as they're completed. The alternative - scrapping everything for "the big rewrite" - is untenable, particularly given our small (but growing!) engineering and operations team.
We keep an eye on the public discussions about what our architecture should be. Our favorite post from the community is by someone who's actually tried to build a service similar to Twitter. Many of the best practices in scalability are inapplicable to the peculiar problem space of social messaging. Many off-the-shelf technologies that seem like intuitive fits do not, on closer inspection, meet our needs. We appreciate the creativity that the technical community has offered up in thinking about our issues, but our issues won't be resolved in an afternoon's blogging.
We'd like people to know that we're motivated by the community discussion around our architecture. We're immersed in ideas about improving our system, and we have a clear direction forward that takes into account many of the bright suggestions that have emerged from the community.
To those taking the time to blog about our architecture, I encourage you to check out our jobs page. If you want to make Twitter better, there's no more direct way than getting involved in our engineering efforts. We love kicking around ideas, but code speaks louder than words.

Tags van Technorati: EA