Azure Real World: Migrating a Drupal Site from LAMP to Windows Azure
Last month, the Interoperability team at Microsoft highlighted work done to move the Screen Actors Guild Awards Drupal website from a Linux-Apache-MySQL-PHP (LAMP) environment to the Windows Azure platform: SAG Awards Drupal Website Moves to Windows Azure. The move was the result of collaboration between SAG Awards engineers and engineers from Microsoft’s Interoperability Team and Customer Advisory Team (CAT). The move allowed the SAG Awards website to handle a sustained traffic spike during the SAG Awards show in January. Since then, I’ve had the opportunity to talk with some of the engineers who helped with the move. In this post I’ll describe the challenges and steps taken in moving the SAG Awards website from a LAMP environment to the Windows Azure platform.
Note: Rama Ramani, Senior Program Manager on the Customer Advisory Team at Microsoft, was co-author of this post.
The Screen Actors Guild (SAG) is the United States’ largest union representing working actors. In January of every year since 1995, SAG has hosted the Screen Actors Guild Awards (SAG Awards) to honor performers in motion pictures and TV series. In 2011, the SAG Awards Drupal website, deployed on a LAMP stack, was impacted by site outages and slow performance during peak-usage days, with SAG having to consistently upgrade their hardware to meet demand for those days. That upgraded hardware was then not optimally used during the rest of the year. In late 2011, SAG Awards engineers began working with Microsoft engineers to migrate its website to Windows Azure in anticipation of its 2012 show. In January of 2012, the SAG Website had over 350K unique visitors and 1.1M page views, with traffic spiking to over 160K visitors during the show.
Overview and Challenges
In many ways, the SAG Awards website was a perfect candidate for Windows Azure. The website has moderate traffic throughout most of the year, but has a sustained traffic spike shortly before, during, and after the awards show in January. The elastic scalability and fast storage services offered by the Azure platform were designed to handle this type of usage.
The main challenge that SAG Awards and Microsoft engineers faced in moving the SAG Awards website to Windows Azure was in architecting for a very high, sustained traffic spike while accommodating the need of SAG Awards administrators to frequently update media files during the awards show. Both intelligent use of Windows Azure Blob Storage and a custom module for invalidating cached pages when content was updated were key to delivering a positive user experience.
Note: In this post I will focus on how the Drupal website was moved to Windows Azure, as well as how content and data were moved to Windows Azure Blob Storage and SQL Azure. I won’t cover the details of the caching strategy.
The process for moving the SAG-Awards website from a LAMP environment to the Windows Azure platform can be broken down into five high-level steps:
- Export data. A custom Drush command (portabledb-export) was used to create a database dump of MySQL data. A .zip archive of media files was created for later use.
- Install Drupal on Windows. The Drupal files that comprised the installation in the LAMP environment were copied to Windows Server/IIS as an initial step in discovering compatibility issues.
- Import data to SQL Azure. A custom Drush command (portabledb-import) was used together with the database dump created in step 1 to import data to SQL Azure.
- Copy media files to Azure Blob Storage. After unpacking the .zip archive in step 1, CloudXplorer was used to copy these files to Windows Azure Blob Storage.
- Package and deploy Drupal. The Azure packaging tool cspack was used to package Drupal for deployment. Deployment was done through the Windows Azure Portal.
Note: The portabledb commands mentioned above are authored and maintained by Damien Tournoud.
Details for each of these high-level steps are in the sections below.
Microsoft and SAG engineers began investigating the best way to export MySQL data by looking at Damien Tournoud’s portabledb Drush commands. They found that this tool worked perfectly when moving Drupal to Windows and SQL Server, but they needed to make some modifications to the tool for exporting data to SQL Azure. (These modifications have since been incorporated into the portabledb commands, which are now available as part of the Windows Azure Integration Module.)
The names of media files stored in the file_managed table were of the form public://field/image/file_name.avi. In order for these files to be streamed from Windows Azure Blob Storage (as they would be by the Windows Azure Integration module when deployed in Azure), the file names needed to be modified to this form: azurepublic://field/image/file_name.avi. This was an easy change to make.
Because the SAG Awards website would be retrieving all data from the cloud, Windows Azure Storage connection information needed to be stored in the database. The portabledb tool was modified to create a new table, azure_storage, for containing this information.
Finally, to allow all media files to be retrieved from Blob Storage, the file_default_scheme table needed to be updated with the stream wrapper name: azurepublic.
Using the modified portabledb tool, the following command produced the database dump:
drush portabledb-export --use-windows-azure-storage=true --windows-azure-stream-wrapper-name=azurepublic --windows-azure-storage-account-name=azure_storage_account_name --windows-azure-storage-account-key=azure_storage_account_key --windows-azure-blob-container-name=azure_blob_container_name --windows-azure-module-path=sites/all/modules --ctools-module-path=sites/all/modules > drupal.dump
Note that the portabledb-export command does not copy media files themselves. Instead, the local media files were compressed in a .zip archive for use in a later step.
Install Drupal on Windows
In order to use the portabledb-import command (the counter part to the portabledb-export command above), a Drupal installation needed to be set up on Windows (with Drush for Windows installed). This was necessary, in part, because connectivity to SQL Azure was to be managed by the Commerce Guys’ SQL Server/SQL Azure module for Drupal, which relies on the SQL Server Drivers for PHP, a Windows-only PHP extension. Having a Windows installation of Drupal would also make it possible to package the application for deployment to Windows Azure. For this reason, Microsoft and SAG Awards engineers copied the Drupal files from the LAMP environment to a Windows Server machine. The team incrementally moved the rest of the application to an IIS/SQL Server Express stack before moving the backend to SQL Azure.
Note: The Windows Server machine was actually a virtual machine running in a Windows Azure Web Role in the same data center as SQL Azure. The Web Role was configured to allow RDP connections, which the team used to install and configure the SAG website installation. This was done to avoid timeouts that occurred when attempting to upload data from an on-premises machine to SQL Azure.
There were, however, some customizations made to the Drupal installation before running the portabledb-import command. Specifically,
- The SQL Server/SQL Azure module for Drupal was installed and enabled.
- The memcache module for Drupal was installed and enabled.
- The Windows Azure Integration module for Drupal was installed and enabled. Note that this module has a dependency on the CTools module. It also requires Damien Tournoud’s branch of the Windows Azure SDK for PHP, which must be unpacked and put into a folder called phpazure in the module’s main directory. (As of this writing, Damien Tournoud’s changes have not been merged with the Windows Azure SDK for PHP. However, they may be merged in the future.)
- Database connection information in the settings.php file was modified to connect to SQL Azure.
- A custom caching module was installed and enabled.
Some customizations to PHP were also necessary since this PHP installation would be packaged with the application itself:
- The php_pdo_sqlsrv.dll extension was installed and enabled. This extension provided connectivity to SQL Azure.
- The php_memcache.dll extension was installed an enabled. This would be used for caching purposes.
- The php_azure.dll extension was installed and enabled. This extension allowed configuration information to be retrieved from the Windows Azure service configuration file after the application was deployed. This allowed changes to be made without having to re-package and re-deploy the entire application. For example, database connection information could be retrieved in the settings.php file like this:
$databases['default']['default']['driver'] = 'sqlsrv'; $databases['default']['default']['username'] = azure_getconfig('sql_azure_username'); $databases['default']['default']['password'] = azure_getconfig('sql_azure_password'); $databases['default']['default']['host'] = azure_getconfig('sql_azure_host'); $databases['default']['default']['database'] = azure_getconfig('sql_azure_database');
With Drupal running on Windows, and with the customizations to Drupal and PHP outlined above, the importing of data could begin.
Import Data to SQL Azure
There were two phases to importing the SAG Awards website data: importing database data to SQL Azure and copying media files to Windows Azure Blob Storage. As alluded to above, importing data to SQL Azure was done with the portabledb-import Drush command. With SQL Azure connection information specified in Drupal’s settings.php file, the following command copied data from the drupal.dump file (which was copied to Drupal’s root directory on the Windows installation) to SQL Azure:
drush portabledb-import --delete-local-files=false --copy-files-blob-storage=false --use-production-storage=true mysite.dump
Note: The copy-files-blob-storage flag was set to false in the command above. While the portabledb-import command can copy media files to Blob Storage, Microsoft and SAG engineers had some work to do in modifying media file names (discussed in the next section). For this reason, they chose not to use this tool for uploading files to Blob Storage.
The next step was to create stored procedures on SQL Azure that are designed to handle some SQL that is specific to MySQL. The SQL Server/SQL Azure module for Drupal normally creates these stored procedures when the module is enabled, but since Drupal would be deployed with the module already enabled, these stored procedures needed to be created manually. Engineers executed the stored procedure creation DDL that is defined in the module by accessing SQL Azure through the management portal.
After the import was complete, the Windows installation of the SAG Awards website was now retrieving all database data from SQL Azure. However, recall that the portabledb-export command modified the names of media files in the file_managed table so that the Drupal Azure module would retrieve media files from Blob Storage. The final phase in importing data was to copy media files to Blob Storage.
Note: After this phase was complete, engineers cleared the cache through the Drupal admin panel.
Copy Media Files to Blob Storage
The main challenge in copying media files to Windows Azure Blob Storage was in handling Linux file name conventions that are not supported on Windows. While Linux supports a colon (:) as part of a file name, Windows does not. Consequently, when the .zip archive of media files was unpacked on Windows, file names were automatically changed: all colons were converted to underscores (_). However, colons are supported in Blob Storage as part of blob names. This meant that files could be uploaded to Blob Storage from Windows with underscores in the blob names, but the blob names would have to be modified manually to match the names stored in SQL Azure.
Engineers used WinRAR to unpack the .zip archive of media files. WinRAR provided a record of all file names that were changed in the unpacking process. Engineers then used CloudXplorer to upload the media files to Blob Storage and to change the modified files names, replacing underscores with colons.
At this point in the migration process, the SAG Awards website was fully functional on Windows and was retrieving all data (database data and media files) from the cloud.
Package and Deploy Drupal
There were two main challenges in packaging the SAG Awards website for deployment to Drupal: packaging a custom installation of PHP and creating the necessary startup tasks.
Because customizations were made to the PHP installation powering Drupal, engineers needed to package their custom PHP installation for deployment to Windows Azure. The other option was to rely on Microsoft’s Web Platform Installer to install a “vanilla” installation of PHP and then write scripts to modify it on start up. Since it is relatively easy to package a custom PHP installation for deployment to Azure, engineers chose to go that route. (For more information, see Packaging a Custom PHP Installation for Windows Azure.)
The startup tasks that needed to be performed were the following:
- Configure IIS to use the custom PHP installation.
- Register the Service Runtime COM Wrapper (which played a role in the caching strategy).
- Put Drush (which was packaged with the deployment) in the PATH environment variable.
The final project structure, prior to packaging, was the following:
(Drupal files and folders)
Finally, the Windows Azure Memcached Plugin was added to the Windows Azure SDK prior to packaging so that memcache would run on startup and restart if killed.
The work in moving the SAG Awards Drupal website to the Windows Azure platform was an excellent example of Microsoft’s commitment to supporting popular OSS applications on the Windows Azure platform. The collaboration between engineers from SAG and from Microsoft’s Interoperability and Customer Advisory Teams resulted in a win for SAG (the SAG Awards website was able to handle sustained spikes in traffic that it could not handle previously) and in valuable lessons learned for the Windows Azure team about supporting migration and scalability of OSS applications on the Azure Platform.