Improving Performance by Batching Azure Table Storage Inserts

June 7, 2012 update: The Microsoft Windows Azure team has released a new Windows Azure SDK for PHP. This release is part of an effort to keep PHP client libraries up to date with new Windows Azure features and to make PHP a first-class citizen in Windows Azure. The latest client libraries are on GitHub: https://github.com/WindowsAzure/azure-sdk-for-php. While the SDK hosted on CodePlex will continue to work for the foreseeable future, it is strongly recommended that new PHP/Windows Azure application use the SDK hosted on GitHub.

The work done by Maarten Balliauw and other contributors in building the SDK hosted on CodePlex was critical in unifying the PHP developer experience for Windows Azure. The Windows Azure team is grateful to these contributors for their pioneering work and looks forward to their continued support (and yours!) in adding to the new SDK on GitHub.

Thanks,

      
The Windows Azure Team


This is a short post to share the results of a little investigation I did that was inspired by comments on a post I wrote about using SQL Azure for handling session data. The comment was by someone reporting that SQL Azure seemed to be faster than Azure Table Storage for handling session data. My experiments show that SQL Azure and Table Storage have very similar performance when doing single writes (YMMV), so I can’t verify or refute the claim. However, I got to wondering which is faster for inserting and retrieving many “rows” of data. I know that Table Storage is supposed to be faster, but I wondered how much faster. So I wrote a two-part PHP script that does the following:

  1. Connects to SQL Azure.
  2. Inserts 100 rows to an existing database.
  3. Retrieves the 100 rows.

Here’s the code:

 $conn = sqlsrv_connect(SQLAZURE_SERVER_ID.".database.windows.net,1433", array("UID"=>SQLAZURE_USER."@".SQLAZURE_SERVER_ID
                                                                             , "PWD"=>SQLAZURE_PASSWORD
                                                                             , "Database"=>SQLAZURE_DB
                                                                             , "ReturnDatesAsStrings"=>true));    
     
 for($i = 0; $i < 100; $i++)
 {    
     $id = $i;
     $data = "GolferMessage".$i;
     
     $params = array($id, $data);
     $stmt1 = sqlsrv_query($conn, "INSERT INTO Table_1 (id, data) VALUES (?,?)", $params);
     if($stmt1 === false)
         die(print_r(sqlsrv_errors()));
  
 }
  
 $stmt2 = sqlsrv_query($conn, "SELECT id, data, timestamp FROM Table_1");
 while($row = sqlsrv_fetch_array($stmt2, SQLSRV_FETCH_ASSOC))
 {
  
 }

Note: The code above uses the SQL Server Driver for PHP to connect to SQL Azure.

The second part of the script does the equivalent for Table Storage:

  1. Connects to Azure Storage.
  2. Inserts 100 entities to an existing table.
  3. Retrieves the 100 entities.

Here’s the code:

  
 $batch = $tableStorageClient->startBatch();
 for($i = 0; $i < 100; $i++)
 {
     $name = $i;
     $message = "GolferMessage".$i;
     
     $mbEntry = new MessageBoardEntry();
     $mbEntry->golferName = $name;
     $mbEntry->golferMessage = $message;
     $tableStorageClient->insertEntity('MessageBoardEntry', $mbEntry);
 }
 $batch->commit();
  
 $messages = $tableStorageClient->retrieveEntities("MessageBoardEntry", null, "MessageBoardEntry");
  
 foreach($messages as $message)
 {
     
 }

Note: The code above uses the Windows Azure SDK for PHP to connect to Azure Storage.

The result of the test was that Table Storage was consistently 4 to 5 times faster than SQL Azure (again, YMMV). The key, however, was to use the $tableStorageClient->startBatch() and $batch->commit() methods with Table Storage. Without using batches, Table Storage opens and closes a new HTTP connection for each write, which results in slower performance than SQL Azure (which keeps a connection open for writes). When using batches with Table Storage, the connection is kept open for all writes.

Note: Many thanks to Maarten Balliauw who, when I was perplexed about the results of my tests without batching (I expected Table Storage to be faster, but because I didn’t know about batches for Table Storage, I was not getting the results I expected), suggested I try batching.

The complete script (with set up/tear down of database and Table) is attached in case you want to try for yourself.

Thanks.

-Brian

Share this on Twitter