…So Isn’t the Root Database a Bottleneck for Federations in SQL Azure?
Dan posted a great comment to my previous blog post and I think the question he is raising is worth a blog post! Thanks for the question Dan!
Here is Dan’s question:
Dan - Tue, Dec 13 2011 11:57 AM
Cihan, thanks for keeping us all posted all the way until the D-day the 12th.
Having followed your posts and some Azure documentation (so far limited), there is 1 simple but key question that keeps bugging me, my team and quite likely many other developers - so far I have not found any reasonable explanation and I fail to understand on HOW and WHY having ALL(!) federation member requests go ALWAYS(!) through ONE(!) Federation Root DB would eliminate the SQL performance bottleneck problem that we are trying to get away with???
This is a common confusion I think we create with the root database but root, besides holding onto some metadata, really does not do the processing of transactions when it comes to federations. In fact, unless you have specific processing requests that happen in the root, root stays pretty idle. This is the magic of the USE FEDERATION statement.
Before I go into the detail of USE FEDERATION, let me dive into some architectural details of SQL Azure. SQL Azure physically has a separate layer in front of the database nodes called the services layer (a.k.a the gateway). Services layer owns managing connectivity, translation of logical to physical server names, some TSQL processing and a few other services. Services layer does some complicated TSQL processing; In fact processing of USE FEDERATION statement and all other CREATE/ALTER/DROP FEDERATION statements are done by the services layer not the platform layer (or the database nodes) in SQL Azure. Services layer is a scaled out tier and SQL Azure configuration controls the number of gateway nodes per cluster in SQL Azure. You can see the detailed picture of the layers below. You can find more details on SQL Azure architecture here.
Connections from your applications are distributed over to the services layer nodes through a load balancer. When a connection from a client application comes in, one of the services layer nodes handle the connection, translate which physical server now contains the logical server and database name in the connection string and establishes the connection to the database node in the platform layer.
Ok, so lets go back to federations and how federation do their processing: With federations, applications use the name of the root database in their connection string. At this point, a connection from the app to the services layer node is established (white arrows) and a connection from the services layer node to the root database node is established or if a pooled connection already exists to the root database it is reused (blue arrow).
In federation model, before you do transactional processing we require that you issue a routing statement called USE FEDERATION to work with your scaled-out schema. USE FEDERATION takes a federation name, a federation key value and some attributes. When USE FEDERATION is received by the services layer, the connection from the app to the services layer stay intact but the connection from the services layer to the database node is switched over to the federation member that contain the federation key value provided in the USE FEDERATION statement. So if you execute;
“USE FEDERATION orders_federation(tenant_id=155) …”
Connection from the application to Node#1 in services layer stays intact but the the services layer switches the connection from the root database to the federation member that contain tenant_id value 155 (white arrow from gateway node#1 to orders_federation member database node)
After this point all transaction processing, aside from a few TSQL statement gateway nodes process, happen directly on the federation member node. All your stored procedure executions to queries directly work with the node that contain the federation member which contains tenant_id 155. Each federation member and root database is spread over to various nodes in the platform layer so as more processing hit other federation members that contain tenant_id 655 or 1055, other nodes in the platform layer is engaged in processing those transactions.
At this point you may say, fine the transactional processing is decentralized but how about processing of USE FEDERATION itself? The information in the root database is the only way to figure out which federation member has tenant_id=155 or 655 or 1055? SQL Azure services layer is smart about that as well: For processing this connection routing, gateway nodes do caching of federation metadata and pooling of connections to process the USE FEDERATION command itself. That means while the cache and pool is getting built the root database receive queries asking for the federation map but after things get warmed up, services layer nodes do the heavy lifting.
So with the magic of USE FEDERATION statement both the processing of connection routing and the transaction processing in federation is decentralized! that gives you the ability to do processing much beyond capacity limits of a single node!
Hope this helps explain your question Dan!