Simply SOAP

 

Roger Wolter
Microsoft Corporation

October 15, 2001

Greetings from Redmond, I hope you are enjoying the fall season. A lot has been written about SOAP in the last year or so, but most articles spend a lot more time talking about what a SOAP message looks like rather than why you would want to use SOAP in the first place. This article by Roger discusses SOAP at a high level, in terms of both what it is and how you can use it. – Eric Schmidt

What Is SOAP?

When SOAP is described as a communications protocol, most people think of Component Object Model (COM) or Common Object Request Broker Architecture (CORBA) and start asking questions like, "How does SOAP do object activation?" or "What naming service does SOAP use?". While a SOAP implementation will probably include these things, the SOAP standard doesn't specify them. SOAP is a specification that defines the XML format for messages, and that's about it for the required parts of the spec. If you have a well-formed XML fragment enclosed in a couple of SOAP elements, you have a SOAP message.

While this is all that is required for SOAP, it may not be enough for applications to talk to each other. If your application doesn't understand XML, you will need to represent your program's datatypes—integers, floats, arrays, structs, and so on—as XML data in the SOAP message. Section 5 of the SOAP standard specifies an XML notation for representing programming language types. Because it is defined in section 5 of the standard, it is often called Section 5 encoding. The other alternative for defining the data types of the elements in a SOAP message is to use a schema as defined in the XML Schema (XSD) standard. Section 5 makes sense if you are using SOAP to support Remote Procedure Calls from a local program to a remote program because the mapping between XML and your programming language is simpler with Section 5 encoding. In XML message-based applications with complex data, it may make more sense to provide an XML schema for the SOAP message, rather than using SOAP encoding. The Section 5 encoded SOAP message is called an encoded message, and a message that contains an XML document is called a literal message. While all major SOAP implementations support section 5 encoding, it is an optional part of the spec and a conformant SOAP implementation wouldn't have to support it. The .NET XML Web services defaults to literal messages, but it does support encoded messages for interoperability.

Another optional part of the SOAP specification is section 7, which is titled Using SOAP with RPC. Most programmers are comfortable with Remote Procedure Call (RPC) programming, so it's natural for SOAP to include an RPC specification. Section 7 defines the XML element that represents a function call and the element that contains the return code in the response message. Again, section 7 is optional, so while many SOAP applications use this request-response style of interaction, one-way and asynchronous messaging is allowed in SOAP.

The last major part of the SOAP specification defines what an HTTP message containing a SOAP message must look like. This HTTP binding is important because HTTP is supported by almost all modern operating systems (and several not so modern operating systems). The HTTP binding is optional, but almost all SOAP implementations support it because it's the only standardized protocol. For this reason, there's a common misconception that SOAP required HTTP. Some implementations support MSMQ, MQ Series, SMTP, or TCP/IP transports. Interoperating with another SOAP implementation through one of these other transports can be a challenge because there's no standard for how the SOAP message is carried, but SOAP doesn't care what the transport is. You could print out a SOAP message, fax it to a remote location, and type it into the remote machine without violating the SOAP standard.

A major source of confusion when getting started with SOAP is the difference between the SOAP specification and the many implementations of the SOAP specification. Most people who use SOAP don't write SOAP messages directly, but instead use a SOAP toolkit to create and parse the SOAP messages. These toolkits generally translate function calls from some kind of language to a SOAP message. For example, the Microsoft SOAP Toolkit 2.0 translates COM function calls to SOAP, and the Apache toolkit translates Java function calls to SOAP. The types of function calls, and the datatypes of the parameters supported, vary with each SOAP implementation, so a function that works with one toolkit may not work with another. This isn't a limitation of SOAP, but of the particular implementation in use.

To summarize, the required part of the SOAP specification defines the structure of an XML document that can be used to exchange data between two applications. It defines a way to represent programming language specific datatypes in XML, although it's not required. The SOAP spec defines a way to use SOAP to request-response RPC style messaging, but does prevent you from using other styles of messaging. It also defines a way to exchange SOAP messages over an HTTP transport, but doesn't limit you to using that transport.

This flexibility means that SOAP is widely applicable to a large number of communications requirements, but it also means that users have to choose carefully when selecting SOAP implementations. For example, if one implementation only offers HTTP as a transport and another only offers MSMQ as a transport, they both conform to the SOAP standard, but they won't be able to communicate with each other. In other words, SOAP promotes but doesn't ensure interoperability. In practice, this isn't a significant issue because most SOAP implementations support the same optional features and extensively test interoperability with other implementations.

Why Should I Use SOAP?

The most compelling feature of SOAP is that it has been implemented on many different hardware and software platforms. This means that SOAP can be used to link disparate systems within and external to your organization. Many attempts have been made in the past to come up with a common communications protocol that could be used for systems integration, but none of them has had a widespread adoption like SOAP. Why is this? Because SOAP is smaller and easier to implement than most of the previous protocols. For example, distributed computing environment (DCE) and CORBA took years to implement, so only a few implementations were ever released. SOAP, however, can use existing XML Parsers and HTTP libraries to do most of the hard work, so a SOAP implementation can be completed in a matter of months, which is why there are more than 30 SOAP implementations available. SOAP doesn't have all the features found in DCE or CORBA do, but this streamlined approach is what makes SOAP so readily available.

The primary use of SOAP is for different programs, possibly written in different languages and running on different platforms, to communicate with each other. For example, an Order Entry system running on Microsoft Windows® can update an accounts receivable system running on a mainframe and an inventory system running on UNIX by using SOAP RPC calls, or a book retailer might send orders to its outsourced shipping department as XML documents over SOAP.

The HTTP transport binding for SOAP makes it attractive for some uses. Since most organizations are familiar with HTTP and already have it incorporated into their network infrastructure, SOAP fits right in without the complex changes to the network or firewalls that many other protocols require. Anyone who has tried to make DCOM run through a firewall will see the value of SOAP running over the same ports as other Web applications.

SOAP over HTTP can be managed with the same tools that manage other Web applications. For example, Application Center will replicate and load balance SOAP applications that use ISAPI or ASP and COM+ because SOAP applications look like normal Web applications.

Most people use SOAP because it supports interoperability among many different environments and it supports HTTP, which has led to SOAP becoming an industry standard. The biggest advantage of SOAP can also be a disadvantage. SOAP data is sent as XML text to enable standard message formats, standard data representation, and manipulation with standard XML tools—all good things. However, converting all you data into text and parsing it back into data structures at the other end can use up quite a bit of processing power. The tags that make SOAP self-describing make SOAP messages bigger than the equivalent message without the tags. While these things add up to minor performance penalty for using SOAP, many other protocols perform better than SOAP. If you need SOAP for one of the reasons discussed earlier, then this minor performance penalty is a small price to pay, but if these things don't apply to your situation, using SOAP may not be the best choice. In other words, if it comes down to a choice between DCOM and SOAP, and DCOM will do everything you need it to do, then you should probably choose DCOM. As SOAP implementations mature, the performance gap between SOAP and other protocols will narrow, so if SOAP makes sense in your long-term strategy or if you want to standardize on a single protocol, going with SOAP is a safe move.

SOAP and Security

One of the first questions that newcomers to SOAP have usually revolves around how SOAP deals with security. Early on in the development of SOAP, SOAP was seen as an HTTP based protocol, so the assumption was made that HTTP security would be adequate for SOAP. After all, there were thousands of Web applications using HTTP security, so surely it would be adequate for SOAP. For this reason, the current SOAP standard assumes security is a transport issue and doesn't address security.

However, when SOAP expanded to become a more general-purpose protocol that runs on top of a number of transports, security became a bigger issue. For example, HTTP provides several ways to authenticate which user is making a SOAP call, but it doesn't provide a way for that identity to be propagated when the message is routed from HTTP to an SMTP transport. Fortunately, the W3C is already working on security for XML documents, so it's probably safe to assume that at some point in the near future, the security issues addressed by the W3C will be used to define a security implementation for SOAP. In the meantime, SOAP of HTTP can take advantage of the full range of security options available for Web applications.

What Is WSDL and Do I Need It?

WSDL (often pronounced whiz-dull) stands for Web Services Description Language. For our purposes, we can say that a WSDL file is an XML document that describes a set of SOAP messages, and how the messages are exchanged. In other words, WSDL is to SOAP as the interface definition language (IDL) is to CORBA or COM. Since WSDL is XML, it is readable and editable, but in most cases, it is generated and consumed by software.

To see the value of WSDL, imagine you want to start calling a SOAP method provided by one of your business partners. You could ask him for some sample SOAP messages and write your application to produce and consume messages that look like the samples, but this can be error-prone. For example, you might see a customer ID of 2837 and assume it's an integer when in fact it's a string. WSDL specifies what a request message must contain and what the response message will look like in unambiguous notation.

Tool support for WSDL isn't as complete as it should be, but it shouldn't be long before tools to author WSDL files and then use them to generate proxies are part of most SOAP implementations. At that point, WSDL will become the preferred way to author SOAP interfaces.

So, if you are writing both the client and the server for a SOAP application, you can probably get along without WSDL because you define the SOAP messages at both ends, but even in this case, WSDL will make the interface easier to maintain. If you are exposing SOAP services for others to call or if you are consuming SOAP services that someone else has written, using WSDL can make your life much easier.

What Is an XML Web Service?

One of the most compelling uses of SOAP is to enable XML Web services. An XML Web service is a function that is exposed through a SOAP interface so that other SOAP-based application on the Web can call it to take advantage of the service. For example, a credit authorization service might expose a SOAP interface so that applications that need to check a customer's credit can make a SOAP call to obtain the information.

There are two great advantages to using XML Web services:

  • An XML Web service is a standard way to expose services your company provides to a large number of other users who need that services.
  • XML Web services provide a way to combine services that other people provide and use them to build your own unique application.

By using XML Web services for common services such as credit verification, billing, and shipping, you application developers can concentrate on the unique value-added functions that your company provides.

There are significant issues involved in providing XML Web services, such as security, authentication, scalability, billing, and so on, but most of these issues are being resolved and XML Web services appear to be a part of almost every developer's future. For more information on XML Web services, please see https://msdn.microsoft.com/webservices. For additional information on the topics covered in this article, please visit the links below:

 

Roger Wolter, a Program Manager for the SOAP Toolkit.