An Auto-Scaling Module for PHP Applications in Windows Azure

June 7, 2012 update: The Microsoft Windows Azure team has released a new Windows Azure SDK for PHP. This release is part of an effort to keep PHP client libraries up to date with new Windows Azure features and to make PHP a first-class citizen in Windows Azure. The latest client libraries are on GitHub: https://github.com/WindowsAzure/azure-sdk-for-php. While the SDK hosted on CodePlex will continue to work for the foreseeable future, it is strongly recommended that new PHP/Windows Azure application use the SDK hosted on GitHub.

The work done by Maarten Balliauw and other contributors in building the SDK hosted on CodePlex was critical in unifying the PHP developer experience for Windows Azure. The Windows Azure team is grateful to these contributors for their pioneering work and looks forward to their continued support (and yours!) in adding to the new SDK on GitHub.

Thanks,

      
The Windows Azure Team


One of the core value propositions of Windows Azure is the ability to have automatic, elastic scalability for your applications (i.e. automatically increase or decrease the number of instances on which your application is running based on some criteria). In this post, I’ll introduce you to a customizable PHP module that automatically scales an application based on performance counters. The source code for the module is available on GitHub here: https://github.com/brian-swan/PHP-Auto-Scaler.

This module is still very much a proof-of-concept (but I’ve tested it and it works!). You can deploy this module in a Worker role and watch your application scale based on its performance. Of course, you can customize the logic for exactly when or how it scales. I can think of several improvements that can (and should) be made in order for this to be a production-ready tool. I hope that people who are interested will fork the code and customize it as they see fit. When it makes sense, I hope people will submit their code (or accept code) so that this module becomes widely usable.

Note: Much of code in this module is based on a 4-part series on scaling PHP applications that starts here: Scaling PHP Applications on Windows Azure, Part 1. I recommend reading through that series if you want to fill in some of the details about scaling that I’m omitting here.

Overview

This auto-scaling module is designed to work with any application running on any number of Windows Azure Web role instances. You simply need to configure it and deploy it in a Worker role as part of your application. Here’s how it works:

  1. You deploy your application to Windows Azure with Windows Azure Diagnostics turned on and the auto-scaling module running in a Worker role. (For more about diagnostics, see How to Get Diagnostics Info for Azure/PHP Applications–Part 1 and How to Get Diagnostics Info for Azure/PHP Applications–Part 2.)
  2. The Windows Azure Diagnostics Monitor writes diagnostics data (such as CPU usage, available memory, etc.) at regular intervals (configured by you) to your storage account (Table storage, to be specific).
  3. The worker role periodically reads data from your storage account and crunches the numbers. The module includes some default “crunching” logic (details below), but you will undoubtedly want to customize this logic. (For more about worker roles, see Support for Worker Roles in the Windows Azure SDK for PHP.)
  4. Based on the number crunching, the Worker role uses the Windows Azure Management API to increase or decrease (or leave the same) the number of instances on which your application is running.
  5. Steps 2-4 are automatically repeated.

image

Step-by-Step

Here, I’ll walk you through the steps for using the auto-scaling module while also explaining a bit about how it works.

1. Prerequisites

In order to use the scaling module, you’ll need to take care of a few things first:

  1. Create a Windows Azure subscription. Make note of your subscription ID.
  2. Create a storage account. Make note of your storage account name and private key.
  3. Create a hosted service. Make note of the URL prefix for your service.
  4. Create your PHP application.

2. Configure the scaling module

In the storageConfig.php file of the scaling module, you will need to define several constants:

 define ('SUBSCRIPTION_ID', 'your subscription id');
 define("STORAGE_ACCOUNT_NAME", "your_storage_account_name");
 define("STORAGE_ACCOUNT_KEY", "your_storage_account_key");
 define('DNS_PREFIX', 'the dns prefix for your hosted service');
 define("PROD_SITE", true);
 define("EXCEPTION_TABLE", "ExceptionEntry");
 define("STATUS_TABLE", "StatusTable");
 define('ROLE_NAME', 'your web role name');
 define('DEPLOYMENT_SLOT', 'production or staging');
 define('MIN_INSTANCES', 2); 
 define('MAX_INSTANCES', 20);
 define ('AVERAGE_INTERVAL', "-15 minutes");
 define('COLLECTION_FREQUENCY', 60); // in seconds
 $certificate = 'your_cert_name.pem';
  • SUBSCRIPTION_ID, STORAGE_ACCOUNT_NAME, STORAGE_ACCOUNT_KEY, and DNS_PREFIX should all come from your notes in step 1.
  • PROD_SITE should be set to true if your application will be run in the cloud. If you are testing the application locally, it should be set to false.
  • EXCEPTION_TABLE and STATUS_TABLE are the names of tables in your storage account that data will be written to. Do not change these.
  • ROLE_NAME is the name of your Web role. (More on this later.)
  • DEPLOYMENT_SLOT is production or staging depending on whether you are deploying to the staging or production slot of your hosted service.
  • MIN_INSTANCES and MAX_INSTANCES are the minimum and maximum number of instances on which you want your application running.
  • AVERAGE_INTERVAL is the period over which you want performance counters averaged. The default value is 15 minutes, so when the module reads data from your storage account, it will look at and average data going back in time 15 minutes.
  • COLLECTION_FREQUENCY determines how often the module collects performance data from your storage account. The default is 60 seconds.
  • $certificate is the name of your .pem certificate, which is necessary when using the Windows Azure Management API. Your certificate needs to be included in the root directory of your Worker role. You can learn more about what you need to do in the Creating and uploading a service management API certificate section of this tutorial: Overview of Command Line Deployment and Management with the Windows Azure SDK for PHP.

3. Customize the scaling logic

Two functions, get_metrics and scale_check, in the scaling_functions.php file do the work of getting performance data from your storage account and determining what action should be taken based on the data. You will probably want to customize the logic in these functions.

The get_metrics function queries your storage account for entries going back in time 15 minutes (by default), and averages each of the performance counters you are watching (configured in the diagnostics.wadcfg file of your Web role):

 function get_metrics($deployment_id, $ago = "-15 minutes") {    
     $table = new Microsoft_WindowsAzure_Storage_Table('table.core.windows.net', STORAGE_ACCOUNT_NAME, STORAGE_ACCOUNT_KEY);     
     
     // get DateTime.Ticks in past 
     $ago = str_to_ticks($ago); 
     
     // build query 
     $filter = "PartitionKey gt '0$ago' and DeploymentId eq '$deployment_id'"; 
  
     // run query 
     $metrics = $table->retrieveEntities('WADPerformanceCountersTable', $filter);     
     
     $arr = array();    
     foreach ($metrics AS $m) {
         // Global totals 
         $arr['totals'][$m->countername]['count'] = (!isset($arr['totals'][$m->countername]['count'])) ? 1 : $arr['totals'][$m->countername]['count'] + 1;        
         $arr['totals'][$m->countername]['total'] = (!isset($arr['totals'][$m->countername]['total'])) ? $m->countervalue : $arr['totals'][$m->countername]['total'] + $m->countervalue;        
         $arr['totals'][$m->countername]['average'] = (!isset($arr['totals'][$m->countername]['average'])) ? $m->countervalue : $arr['totals'][$m->countername]['total'] / $arr['totals'][$m->countername]['count'];         
         
         // Totals by instance 
         $arr[$m->roleinstance][$m->countername]['count'] = (!isset($arr[$m->roleinstance][$m->countername]['count'])) ? 1 : $arr[$m->roleinstance][$m->countername]['count'] + 1;        
         $arr[$m->roleinstance][$m->countername]['total'] = (!isset($arr[$m->roleinstance][$m->countername]['total'])) ? $m->countervalue : $arr[$m->roleinstance][$m->countername]['total'] + $m->countervalue;        
         $arr[$m->roleinstance][$m->countername]['average'] = (!isset($arr[$m->roleinstance][$m->countername]['average'])) ? $m->countervalue : ($arr[$m->roleinstance][$m->countername]['total'] / $arr[$m->roleinstance][$m->countername]['count']);    
     }    
     return $arr;
 }

If you want to collect and average other metrics, you may want to change the logic of this function.

The scale_check function essentially takes the output of the get_metrics function and returns 1, 0, or –1 depending on whether an instance needs to be added, the instance count needs no adjustment, or an instance needs to be subtracted. The logic used to determine this is simplistic (as you can see in the function). You will probably want to adjust the logic to suit your application. (Note that by default scaling logic is based on 3 performance counters: percent CPU usage, available memory, and the number of TCPv4 connections.)

 function scale_check($metrics) {
     $percent_proc_usage = (isset($metrics['totals']['\Processor(_Total)\% Processor Time']['average'])) ? $metrics['totals']['\Processor(_Total)\% Processor Time']['average'] : null;
     $available_MB_memory = (isset($metrics['totals']['\Memory\Available Mbytes']['average'])) ? $metrics['totals']['\Memory\Available Mbytes']['average'] : null;
     $number_TCPv4_connections = (isset($metrics['totals']['\TCPv4\Connections Established']['average'])) ? $metrics['totals']['\TCPv4\Connections Established']['average'] : null;
     
     if(!is_null($percent_proc_usage)) {
         if( $percent_proc_usage > 75 )
             return 1;
         else if( $percent_proc_usage < 25)
             return -1;
     }
  
     if(!is_null($available_MB_memory)) {
         if( $available_MB_memory < 25 )
             return 1;
         else if( $available_MB_memory > 1000)
             return -1;
     }
     
     if(!is_null($number_TCPv4_connections)) {
         if( $number_TCPv4_connections > 120 )
             return 1;
         else if( $number_TCPv4_connections < 20)
             return -1;
     }
     
     return 0;
 }

4. Package and deploy your application

Now you are ready to package and deploy your application. Instructions for doing so are in this blog post: Support for Worker Roles in the Windows Azure SDK for PHP. However, there are a couple of things you’ll need to know that aren’t included in that post:

  • When you run the default scaffoler and specify a name for your Web role, that name will be the value of the ROLE_NAME constant in step 2 above. i.e. When you run this command…
 scaffolder run -out="c:\path\to\output\directory" -web="WebRoleName" -workers="WorkerRoleName"

…WebRoleName will be the value of ROLE_NAME.

  • After you have run the default scaffolder and you have added your application source code to your Web role directory, you need to configure Windows Azure Diagnostics. Instructions for doing this are here: How to Get Diagnostics Info for Azure/PHP Applications – Part 1. As is shown in that post, the diagnostic information that the auto-scaling module relies on (by default) are three performance counters: CPU usage, available memory, and TCPv4 connections. You can collect different diagnostic information, but to leverage it for scaling decisions you will have to adjust the scaling logic in the scaling module (see step 3 above).
  • In the Worker role directory, open the run.bat file and change it contents to php scale_module.php. This tells the worker role to run scale_module.php on start up.
  • You will need to include the Windows Azure SDK for PHP library with your Worker role.

The Main Loop

Boiled down, the main loop (in the scale_module.php file) looks like this

 while(1) {
     
     //Calls to management API here
     //based on metrics.
         
     sleep(COLLECTION_FREQUENCY); 
 }

It essentially repeatedly calls get_metrics and scale_check and makes calls to the management API (based on the return value of scale_check) to increase or decrease the number of instances. To avoid doing this too often, it pauses for COLLECTION_FREQUENCY seconds in each loop.

After the number of instances have been adjusted based on metrics, it takes some time for the new instance to be up and running. Another loop checks to make sure all instances are in the ready state before checking performance metrics again:

 while($ready_count != $instance_count) {
     $ready_count = 0;
     foreach($deployment->roleinstancelist as $instance) {
         if ($instance['rolename'] == ROLE_NAME && $instance['instancestatus'] == 'Ready')
             $ready_count++;
     }
     sleep(10); // Avoid being too chatty.
     $scale_status = new ScaleStatus();
     $scale_status ->statusmessage = "Checking instances. Ready count = " . $ready_count . ". Instance count = " . $instance_count;
     $scale_status ->send();
 }

I won’t post the rest of the main loop here (you can see it on GitHub here: https://github.com/brian-swan/PHP-Auto-Scaler/blob/master/scale_module.php), but I will point out two things about the code as it is now:

  1. You’ll see several calls that write to a table called ScaleStatus (as in the snippet above). These are there so I could see (in a table) what my scaling module was doing. I think these calls can be removed at some point in the future.
  2. You’ll see try…catch blocks around some of my code. In some cases I found that an exception was thrown when a timeout occurred. This information is currently written to table storage. Clearly, this needs to be improved.

So, there it is…an auto-scaling module in PHP. As I said in the introduction, this is very much proof-of-concept code, but I’ve tested it and it works as I’d expect it to. Instances are added when I hammer my application with traffic and instances are subtracted when the traffic backs off. However, as I also pointed out earlier, I know there is lots of room to improve on this…I’m looking forward to input from other.

Thanks.

-Brian

Share this on Twitter