Head in the Cloud, Feet on the Ground
Eugenio Pace and Gianpaolo Carraro
Summary: Building on the Software as a Service (Saas) momentum, distributed computing is evolving towards a model where cloud-based infrastructure becomes a key player, to the level of having a new breed of companies deploying the entirety of their assets to the cloud. Is this reasonable? Only time will tell. In this article, we will take an enthusiastic yet pragmatic look at the cloud opportunities. In particular, we will propose a model for better deciding what should be pushed to the cloud and what should be kept in-house, explore a few examples of cloud-based infrastructure, and discuss the architectural trade-offs resulting from that usage.
LOtSS in the Enterprise: Big Pharma
Applying LOtSS to LitwareHR
Cloud Services at the UI
Cloud Services at the Data Layer
Authentication and Authorization in the Cloud
From a pure economical perspective, driving a car makes very little sense. It is one of the most expensive ways of moving a unit of mass over a unit of distance. If this is so expensive, compared to virtually all other transportation means (public transport, trains, or even planes), why are so many people driving cars? The answer is quite simple: control. Setting aside the status-symbol element of a car, by choosing the car as a means of transportation, a person has full control on when to depart, which route to travel (scenic or highway), and maybe most appealing, the journey can start from the garage to the final destination without having to rely on external parties. Let’s contrast that with taking the train. Trains have strict schedules, finite routes, depart and arrive only at train stations, might have loud fellow passengers, are prone to go on strike. But, of course, the train is cheaper, and you don’t have to drive. So, which is one is better? It depends if you are optimizing for cost or for control.
Let’s continue this transportation analogy and look at freight trains. Under the right circumstances—typically, long distances and bulk freight—transport by rail is more economic and energy efficient than pretty much anything else. For example, on June 21, 2001, a train more than seven kilometers long, comprising 682 ore cars, made the Newman-Port Hedland trip in Western Australia. It is hard to beat such economy of scale. It is important to note, however, that this amazing feat was possible only at the expense of two particular elements: choice of destination and type of goods transported. Newman and Port Hedland were the only two cities that this train was capable of transporting ore to and from. Trying to transport ore to any other cities or transporting anything other than bulk would have required a different transportation method.
By restricting the cities that the train can serve and restricting the type of content it can transport, this train was able to carry more than 650 wagons. If the same train was asked to transport both bulk and passengers as well as being able to serve all the cities of the western coast of Australia, the wagon count would have likely be reduced by at least an order of magnitude.
The first key point we learn from the railroads is that high optimization can be achieved through specialization. Another way to think about it is that economy of scale is inversely proportional to the degree of freedom a system has. Restricting the degrees of freedom (specializing the system) achieves economy of scale (optimization).
- Lesson 1—The cloud can be seen as a specialized system with fewer degrees of freedom than the on-premise alternative, but can offer very high economy of scale.
Of course, moving goods from only two places is not very interesting, even if you can do it very efficiently. This is why less optimized but more agile means of transport such as trucks are often used to transport smaller quantities of goods to more destinations.
As demonstrated by post offices around the globe, their delivery network is a hybrid model including high economy of scale (point-to-point transportations—for example, train between two major cities); medium economy of scale (trucks dispatching mail from the train station to the multiple regional centers); and very low economy of scale, but super-high flexibility (delivery people using car, bike, or foot who are capable of reaching the most remote dwelling).
- Lesson 2—By adopting a hybrid strategy, it is possible to tap into economy of scale where possible while maintaining flexibility and agility where necessary.
The final point we can learn from the railroads is the notion of efficient transloading. Transloading happens when goods are transferred from one means of transport to another; for example, from train to a truck as in the postal service scenario. Since a lot of cost can be involved in transloading when not done efficiently, it is commonly done in transloading hubs, where specialized equipment can facilitate the process. Another important innovation that lowered the transloading costs was the standard shipping container. By packaging all goods in a uniform shipping container, the friction between the different modes of transport was virtually removed and finer grained hybrid transportation networks could be built.
- Lesson 3—Lowering the transloading costs through techniques such as transloading hubs and containerization allows a much finer-grained optimization of a global system.
In a world of low transloading costs, decisions no longer have to be based on the constraints of the global system. Instead, decisions can be optimized at the local subsystem without worrying about the impact on other subsystems.
We call the application of these three key lessons:
- Optimization through specialization
- Hybrid strategy maximizing economy of scale where possible while maintaining flexibility and agility, where necessary
- Lowering transloading cost in the context of software architecture: localized optimization through selective specialization (or LOtSS)
The rest of this article is about applying LOtSS in an enterprise scenario (Big Pharma) and an ISV scenario (LitwareHR)
LOtSS in the Enterprise: Big Pharma
How does LOtSS apply to the world of an enterprise? Let’s examine it through the example of Big Pharma, a fictitious pharmaceutical company.
Big Pharma believes it does two things better than its competition: clinical trials and molecular research. However, it found that 80 percent of its IT budget is allocated to less strategic assets such as e-mail, CRM, and ERP. Of course, these capabilities are required to run the business, but none of them really help differentiate Big Pharma from its competitors. As Dr. Thomas (Big Pharma’s fictitious CEO) often says, “We have never beaten the competition because we had a better ERP system… It is all about our molecules.”
Therefore, Dr. Thomas would be happy to get industry standard services for a better price. This is why Big Pharma is looking into leveraging “cloud offerings” to further optimize Big Pharma IT. The intent is to embrace a hybrid architecture, where economy of scale can be tapped for commodity services while retaining full control of the core capabilities. The challenge is to smoothly run and manage IT assets that span across the corporate boundaries. In LOtSS terms, Big Pharma is interested in optimizing its IT by selectively using the cloud where economy of scale is attainable. For this to work, Big Pharma needs to reduce to its minimum the “transloading” costs, which in this case means crossing the corporate firewall.
After analysis, Big Pharma is ready to push out of its data center two capabilities available at industry standard as a service by a vendor: CRM and e-mail. Notice that “industry average” doesn’t necessarily mean “low quality.” They are simply commoditized systems with features adequate for Big Pharma’s needs. Because these providers offer these capabilities to a large number of customers, Big Pharma can tap into the economies of scale of these service providers, lowering TCO for each capability.
Big Pharma’s ERP, on the other hand, was heavily customized, and the requirements are such that no SaaS ISV offering matches the need. However, Big Pharma chooses to optimize this capability at a different level, by hosting it outside its data center. It still owns the ERP software itself, but now operations, hardware, A/C, and power are all the responsibility of the hoster. (See Figure 1.)
Figure 1. Big Pharma on-premise vs. Big Pharma leveraging hosted “commodity” services for noncore business capabilities and leveraging cloud services for aspects of their critical software
- Lesson 4—Optimization can happen at different levels. Companies can selectively outsource entire capabilities to highly specialized domain-specific vendors (CRM and e-mail in this example) or just aspects of an application (for example, the operations and hardware as in the ERP).
The consequences of moving these systems outside of Big Pharma corporate boundaries cannot be ignored. Three common aspects of IT—security, management, and integration across these boundaries—potentially could introduce unacceptable transloading costs.
From an access control perspective, Big Pharma does not want to keep separate user name/passwords on each hosted service but wants to retain a centralized control of authorization rules for its employees by leveraging the already existing hierarchy of roles in its Active Directory. In addition, Big Pharma wants a single sign-on experience for all its employees, regardless of who provides the service or where it is hosted. (See Figure 2.)
Figure 2. Single sign-on through identity federation and a cloud STS
One proven solution for decreasing the cross-domain authentication and authorization of “transloading costs” is federated identity and claims-based authorization. There are well-known, standards-based technologies to implement this efficiently.
- Lesson 5—Security systems today are often a patchwork of multiple ad-hoc solutions. It is common for companies to have multiple identity management systems in place. Companies like Big Pharma will favor solutions that implement standards-based authentication and authorization systems because it decreases transloading costs.
From a systems management perspective, Big Pharma employs a dedicated IT staff to make sure everything is working on agreed SLAs. Also, employees experiencing issues with the CRM, ERP, or e-mail will still call Big Pharma’s helpdesk to get them resolved. It is therefore important that the IT staff can manage the IT assets regardless of whether they are internal or external. Therefore, each hosted system has to provide and expose a management API that Big Pharma IT can integrate into its existing management tools. For example, a ticket opened by an employee at Big Pharma for a problem with e-mail will automatically open another one with the e-mail service provider. When the issue is closed there, a notification is sent to Big Pharma that will in turn close the issue with the employee. (See Figure 3.)
Figure 3. Big Pharma monitoring and controlling cloud services through management APIs
- Caveat—Standards around systems management, across organization boundaries, are emerging (for example, WS-Management), but they are not fully mainstream yet. In other words, “containerization” has not happened completely, but it is happening.
Now let’s examine the systems that Big Pharma wants to invest in the most: molecular research and clinical trials. Molecular research represents the core of Big Pharma business: successful drugs. Typical requirements of molecular research are modeling and simulation. Modeling requires high-end workstations with highly specialized software. Big Pharma chooses to run this software on-premise; due to its very complex visualizations requirements and high interactivity with its scientists, it would be next to impossible to run that part of the software in the cloud. However, simulations of these abstract models are tasks that demand highly intensive computational resources. Not only are these demands high, but they are also very variable. Big Pharma’s scientists might come up with a model that requires the computational equivalent of two thousands machine/day and then nothing for a couple of weeks while the results are analyzed. Big Pharma could decide to invest at peak capacity but, because of the highly elastic demands, it would just acquire lots of CPUs that would, on average have a very low utilization rate. It could decide to invest at median capacity, which would not be very helpful for peak requests. Finally, Big Pharma chooses to subscribe to a service for raw computing resources. Allocation and configuration of these machines is done on-demand by Big Pharma’s modeling software, and the provider they use guarantees state of the art hardware and networking. Because each simulation generates an incredible amount of information, it also subscribes to a storage service that will handle this unpredictably large amount of data at a much better cost than an internal high-end storage system. Part of the raw data generated by the cluster is then uploaded as needed into Big Pharma’s on-premise systems for analysis and feedback into the modeling tools.
This hybrid approach offers the best of both worlds: The simulations can run using all the computational resources needed, while costs are kept under control as only what is used is billed. (See Figure 4.)
Figure 4. Big Pharma scientist modeling molecules on the on-premise software and submitting a simulation job to an on-demand cloud compute cluster
Here, again, for this to be really attractive, transloading costs (in this example, the ability to bridge the cloud-based simulation platform and the in-house modeling tool) must be managed; otherwise, all the benefits of the utility platform would be lost in the crossing.
The other critical system for Big Pharma is clinical trials. After modeling and simulation, test drugs are tried with actual patients for effectiveness and side effects. Patients who participate in these trials must communicate back to Big Pharma all kinds of health indicators as well as various other pieces of information: what they are doing, where they are, how they feel, and so on. Patients submit the collected information to both the simulation cluster and the on-premise, specialized software that Big Pharma uses to track clinical trials.
Similar to the molecule modeling software, Big Pharma develops this system in-house, because of its highly specialized needs. One could argue that Big Pharma applied LOtSS many times already in the past. It is likely that the clinical-trial software communication subsystem allowing patients to share their trial results evolved from an automated fax system several years ago to a self-service Web portal nowadays, allowing the patients to submit their data directly.
Participating patients use a wide variety of devices to interact with Big Pharma: their phones, the Web, their personal health monitoring systems. Therefore, Big Pharma decides to leverage a highly specialized cloud-based messaging system to offer reliable, secure communications with high SLAs. Building this on its own would be costly for the company. Big Pharma doesn’t have the expertise required to develop and operate it, and they would not benefit from the economies of scale that the cloud service benefits from.
The Internet Service Bus allows Big Pharma to interact with patients using very sophisticated patterns. For example, it can broadcast updates to the software running on its devices to selected groups of patients. If, for any reason, Big Pharma’s clinical-trial software becomes unavailable, the ISB will store pending messages and forward them as the system becomes available again. (See Figure 5.)
Figure 5. Big Pharma clinical-trial patients submitting data to clinical-trial system and simulation cluster
Patients, on the other hand, are not Big Pharma employees, and therefore, are not part of Big Pharma’s directory. Big Pharma uses federation to integrate patients’ identities with its security systems. Patients use one of its existing Web identities such as Microsoft Live ID to authenticate themselves; the cloud-identity service translates these authentication tokens into tokens that are understood by the services with which patients interact.
In summary, this simplified scenario illustrates optimizations that Big Pharma can achieve in its IT environment by selectively leveraging economy-of-scale–prone specialized cloud services. This selection happens at different levels of abstraction, from finished services (for example, e-mail, CRM, and ERP) to building block services that happen only in the context of another capability (for example, cloud compute, cloud-identity services, and cloud storage).
Applying LOtSS to LitwareHR
As described in the previous scenario, optimization can happen at many levels of abstraction. After all, it’s about extracting a component from a larger system and replacing it with a cheaper substitute, while making sure that the substitution does not introduce high transloading cost (defeating the benefit of the substitution).
The fact is that ISVs have been doing this for a long time. Very few ISVs develop their own relational database nowadays. This is because acquiring a commercial, packaged RDBMS is cheaper, and building the equivalent functionality brings marginal value and cannot be justified.
Adopting commercial RDBMS allowed ISVs to optimize their investments, so there is no need to develop and maintain the “plumbing” that is required to store, retrieve, and query data. ISVs could then focus those resources on higher-value components of their applications. But it does not come for free; there is legacy code, skills that must be acquired, new programming paradigms to be learned, and so on. These are all, of course, examples of “transloading” costs that result as a consequence of adopting a specialized component.
The emergence and adoption of standards such as ANSI SQL and common relational models are the equivalent of “containerization” and have contributed to a decrease in these costs.
The market of visual components (for example, ActiveX controls) is just another example of LOtSS: external suppliers creating specialized components that resulted in optimizations in the larger solution. “Transloading costs” were decreased by de-facto standards that everybody complied with tools. (See Figure 6.)
Figure 6. Window built upon three specialized visual parts, provided by three different vendors
Cloud services offer yet another opportunity for ISVs to optimize aspects of their applications.
Cloud Services at the UI
Figure 7. Web mashup using Microsoft Virtual Earth, ASP.NET, and GeoRSS feeds
Cloud Services at the Data Layer
Storage is a fundamental aspect of every nontrivial application. There are mainly two options today for storing information: a file system and a relational database. The former is appropriate for large datasets, unstructured information, documents, or proprietary file formats. The latter is normally used for managing structured information using well-defined data models. There are already hybrids even within these models. Many applications use XML as the format to encode data and then store these datasets in the file system. This approach provides database-like capabilities, such as retrieval and search without the cost of a fully fledged RDBMS that might require a separate server and extra maintenance.
The costs of running your own storage system include managing disk space and implementing resilience and fault tolerance.
Current cloud-storage systems come in various flavors, too. Generally speaking, there are two types: blob persistence services that are roughly equivalent to a file system and simple, semistructured entities persistence services that can be thought of as analogous to RDBMS. However, pushing these equivalences too much can be dangerous, because cloud-storage systems are essentially different than local ones. To begin with, cloud storage apparently violates one of the most common system-design principles: Place data close to compute. Because every time that you interact with the store you necessarily make a request that goes across the network, applications that are too “chatty” with its store can be severely affected by the unavoidable latency.
Fortunately, data comes in different types. Some pieces of information never or very seldom change (reference data, historical data, and so on) whereas some is very volatile (the current value of a stock, and so on). This differentiation creates an opportunity for optimization.
One such optimization opportunity for leveraging cloud storage is the archival scenario. In this scenario, the application stores locally data that is frequently accessed or too volatile. All information that is seldom used but cannot be deleted can be pushed to a cloud-storage service. (See Figure 8.)
Figure 8. Application storing reference and historical data on a cloud-storage service
With this optimization, the local store needs are reduced to the most active data, thus reducing disk capacity, server capacity, and maintenance.
There are costs with this approach though. Most programming models of cloud-storage systems are considerably different from traditional ones. This is one transloading cost that an application architect will need to weigh in the context of the solution. Most cloud-storage services expose some kind of SOAP or REST interface with very basic operations. There are no such things as complex operators or query languages as of yet.
Authentication and Authorization in the Cloud
Traditional applications typically either have a user repository that they own (often, stored as rows in database tables) or rely on a corporate user directory such as Microsoft Active Directory.
Either of these very common ways of authenticating and authorizing users to a system has significant limitations in a hybrid in-house cloud scenario. The most obvious one is the proliferation of identity and permissions databases as companies continue to apply LOtSS. The likelihood of crossing organization boundaries increases dramatically and it is impractical and unrealistic that all these databases will be synchronized with ad-hoc mechanisms. Additionally, identity, authorization rules, and life cycle (user provisioning and decommission) have to be integrated.
Companies today have afforded dealing with multiple identities within their organization, because they have pushed that cost to their employees. They are the ones having to remember 50 passwords for 50 different, not integrated systems.
- Consider this: When a company fires an employee, access to all the systems is traditionally restricted by simply preventing him/her to enter the building and therefore to the corporate network. In a company leveraging a cloud service, managing its own username/password as opposed to federated identity, the (potentially annoyed and resentful) fired employee is theoretically capable of opening a browser at home and logging on to the system. What could happen then is left as an exercise to the reader.
As mentioned in the Big Pharma scenario, identity federation and claims-based authorization are two technologies that are both standards-based (fostering wide adoption) and easier to implement today by platform-level frameworks built by major providers such as Microsoft. In addition to frameworks, ISVs can leverage the availability of cloud-identity services, which enables ISVs to optimize their infrastructure by pushing this entire infrastructure to a specialized provider. (See Figure 9.)
Figure 9. Big Pharma consuming two services. CRM leverages cloud identity for claim mapping, and ERP uses a host-based STS.
The concept introduced in this article is fairly straightforward and can be summarized in the following three steps: Decompose a system in smaller subsystems; identify optimization opportunities at the subsystem level; and recompose while minimizing the transloading cost introduced by the new optimized subsystem. Unfortunately, in typical IT systems, there are too many variables to allow a magic formula for discovering the candidates for optimization. Experience gained from previous optimizations, along with tinkering, will likely be the best weapons to approach LOtSS. That said, there are high-level heuristics that can be used. For example, not all data is equal. Read-only, public, benefitting from geo distribution data such as blog entries, product catalogs, and videos is likely to be very cloud friendly; on the other hand, volatile, transaction requiring, regulated, personally identifying data will be more on-premise–friendly. Similarly, not all computation is equal; constant predictable computation loads are less cloud attractive than variable, unpredictable loads, which are better suited for utility type underlying platform.
LOtSS uses the cloud as an optimization opportunity driving overall costs down. Although this “faster, cheaper, better” view of the cloud is very important and is likely to be a key driver of cloud utilization, it is important to realize that it is only a partial view of what the cloud can offer. The cloud also presents opportunities for “not previously possible” scenarios.
IT history has shown that over time, what was unique and strategic becomes commodity. The only thing that seems to stay constant is the need to identify new optimization opportunities continuously by selectively replacing subsystems with more cost-effective ones, while keeping the cost introduced by the substitute low. In other words, it seems that one of the few sustainable competitive advantages in IT is in fact the ability to master LOtSS.
“Data on the Outside vs. Data on the Inside,” by Pat Helland (http://msdn.microsoft.com/en-us/library/ms954587.aspx)
Identity Management in the Cloud (http://msdn.microsoft.com/en-us/arcjournal/cc836390.aspx)
Microsoft Identity Framework, code-named “Zermatt” (https://connect.microsoft.com/site/sitehome.aspx?SiteID=642)
About the authors
In his current role, Gianpaolo Carraro (Senior Director, Architecture Strategy) leads a team of architects driving thought leadership and architectural best practices in the area of Software + Services, SaaS, and cloud computing. Prior to Microsoft, Gianpaolo helped inflate and then burst the .com bubble as cofounder and chief architect of a SaaS startup. Gianpaolo started his career in research as a member of the technical staff at Bell Laboratories. You can learn more about him through his blog at http://blogs.msdn.com/gianpaolo/.
Eugenio Pace (Senior Architect, Architecture Strategy) is responsible for developing architecture guidance in the area of Software + Services, SaaS, and cloud computing. Before joining the Architecture Strategy group, he worked in the patterns & practices team at Microsoft, where he was responsible for delivering client-side architecture guidance, including Web clients, smart clients, and mobile clients. During that time, his team shipped the Composite UI Application Block, and three software factories for mobile and desktop smart clients and for Web development. Before joining Patterns and Practices, he was an architect at Microsoft Consulting Services. You can find his blog at http://blogs.msdn.com/eugeniop.
This article was published in the Architecture Journal, a print and online publication produced by Microsoft. For more articles from this publication, please visit the Architecture Journal Web site.