\\"Deep Packet Inspection\\"; What Does it Mean, Really?
Published: July 2008
Jim Harrison - Program Manager, ISA SE
Contributors and Tech Reviewers
Yuri Diogenes - Security Support Engineer
Mohit Saxena - Security Tech Lead (ISA & IAG)
A Bit of History
In December 2000, application-layer awareness was brought forth when Microsoft Internet Security and Acceleration (ISA) Server 2000 flew boldly onto the scene. This functionality was relatively new to the firewall landscape and difficult to express accurately (much less succinctly) to prospective customers. The ISA Server marketing team coined the phrase “deep packet inspection” to summarize this functionality, and it seems that every other firewall marketing team on the planet (we haven’t finished surveying the galaxy yet) ran full out with this phrase tucked away in their pockets. ..I hope they only gave props to the ISA Server marketing team. ..not taking that bet? I don’t blame you. Still, while it served to summarize this new feature set reasonably well, the unfortunate fallout from the virus-like abuse of the phrase “deep packet inspection” by some well-meaning (and some not-so-well-meaning) folks was that it eventually served to confuse and confound ISA Server customers and in many cases, those of our competitors (ok; so it’s not a totally bad thing, but I digress).
Is "Deep Packet Inspection" Wrong?
I think we can all agree that the point is not whether ISA Server only inspects deep packets and leaves the shallow ones alone, or what sort of inspection ISA Server may apply to those deep packets, or for that matter, what exactly defines deep vs. shallow packets. Most folks would agree that this term summarizes the evaluation ISA Server performs that goes beyond “simple” IP, TCP, or UDP evaluation.
You may well respond with, “but ISA Server does do ‘deep packet inspection’; that’s what all those filters are doing, isn’t it?”, and you’d be right, at least up to a point. It’s exactly this point that I want to examine today.
Many customers with whom I’ve discussed ISA Server functionality have understandably mistaken the statement “deep packet inspection” to mean ISA Server evaluates each packet payload individually as it crosses ISA Server boundaries. The conversation invariably gets interesting when I have to explain that once the traffic is being handled by one of the filters, this is almost always untrue. Fear not; our marketing team didn’t lie to you. It’s just that a single phrase intended to summarize a whole feature set can’t accurately express the fine details of the underlying process.
As previously noted, ISA Server employs various application and Web filters to evaluate HTTP, SMTP, PPTP, RTSP, etc. protocols in order to enable their safe passage as well as prevent their use or abuse across your network edge. In fact, a reasonably-sized sub-market for 3rd-party filters has grown up around ISA Server extensibility in both Web filters and application filters. What we’ll be examining today is the distinction between “packets” and “data”. In order to do that, we need to examine some traffic as it’s processed by ISA Server (or any other application-layer firewall, for that matter).
A look at HTTP
Aside from SMTP as used for SPAM and malware distribution (and a fine vehicle it is), HTTP is easily the most commonly used application protocol on the Internet. It allows the kids to play their Flash games at Nickelodeon.com, and Mom & Dad get to watch YouTube, shop for fishing tackle, or perform research for that upcoming Master’s thesis. Whatever the use, HTTP is the undisputed “heavy lifter” of the Internet for the present and foreseeable future. If you’re wondering what this has to do with deep packets or how they’re inspected, please bear with me; it’ll all come together soon.
HTTP, like many protocols, operates by using a request / response process in which the client makes requests of the Web server and the Web server responds accordingly. That’s basically it. A proxy server changes the protocol only slightly in order to indicate the presence of an intermediate entity. We can summarize the process through an ISA Server (or any Web proxy) as:
The client asks ISA Server to retrieve content from a Web server
ISA Server examines the request and, unless it violates HTTP rules or the policies defined by the admin, forwards it to the server
The server also examines the request and responds according to the Web site configuration
ISA Server examines the response and, unless it violates HTTP rules or the policies defined by the admin, forwards it to the client
The client receives the response and processes it
This cycle will continue until everything the client wants (this time) is delivered. For each Web site, there may be tens to hundreds of client request / response cycles before all of the content is acquired and rendered. The more “active” the site, the more content is being requested.
The really interesting part of HTTP is not just the data that the server sends back to the client, but the additional data that accompanies the request and response as well as the subtle changes that occur in the protocol when the client behaves as a CERN (Web) proxy client.
When you type a URL such as http://www.contoso.com/default.asp into your browser address bar and click GO or hit ENTER, the browser breaks this information down into multiple parts before sending it to the server or proxy. A “simple URL” actually breaks down into multiple components:
In the case of a direct connection (or transparent proxy), it will send its request similarly to the following:
GET /default.asp?gimmestuff HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/x-shockwave-flash, */*
Accept-Encoding: gzip, deflate
UserAgent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322)
The first part of the request contains the method (GET), the resource absolute path (/default.asp), and the protocol version desired by the client (HTTP/1.1). This collection of information tells the Web server what task the Web server should perform, the resource on which it should perform that action, and the protocol version it should use in the response. In this case, the client instructed the Web server to “get me the resource named “default.asp” from the root folder “/” of the Web site, pass the data (gimmestuff) to that resource for processing, and respond to me using HTTP/1.1 if you can”.
You may have noticed that there is an important part of the original URL missing from this instruction; the “host”. This information is placed into a data field known as a “header”; in this case, the “host” header (Host: www.contoso.com). Since many Web servers can host a number of Web sites, the client also indicates exactly which Web site the Web server should use (www.contoso.com) to satisfy this request by populating the host header with the fully-qualified name of the Web site where the client believes the resource to be held.
When the Web server responds to the client request, it will also populate some headers, which will instruct the client as to the actual disposition, content, size, and type of content the server is providing. In the case of the previous request, the Web server responded with the following:
HTTP/1.1 200 Ok
Last-Modified: Sat, 22 Feb 2003 01:48:30 GMT
Date: Thu, 07 Jul 2005 20:50:49 GMT
<Message Body – removed for clarity>
The first part of the response message is the server’s response code and protocol version. HTTP specifications (also known as RFC) define the response code “200” as “request completed successfully .“ The server is also telling the client that it will use HTTP protocol version 1.1 in this response and further communication with the client. The other interesting part of this message in the context of this discussion is the content-length, which tells the client how many bytes the client should expect to receive from the server in the message body. We’ll come back to this later, but first we need to establish some additional protocol baseline.
Without going into the deep, dark depths of the OSI networking model, we can accurately state that in order for the previous conversation to occur, HTTP must find its way between the client and the server. In order to satisfy this requirement, HTTP must depend on another protocol, which in turn depends on another protocol further down the stack and so on until we are looking at electrical signals flowing through a wire or perhaps even light waves zipping down a hair-thin reflective tube (we’ll save RFC 1149 and RFC 2549 for another discussion). Each one of these dependent protocols is actually the payload within its antecedent protocol (lower in the stack).
The primary point for purposes of this discussion is that HTTP uses the TCP protocol; TCP uses IP, which uses Ethernet. Since Ethernet is the “lowest protocol on the pile”, it has the greatest influence on how the traffic flows between endpoints. In modern networks, Ethernet frames are 1522 bytes long and allow a maximum payload size of 1500 bytes. There are such things as “Jumbo Frames”, which allow much larger frame sizes, but these are not in common use within most networks.
Ethernet frame summary
Header (18 bytes)
IP frame (1500 bytes)
Checksum (4 bytes)
The protocol carried within the Ethernet frame must place identifying and control information in a header of its own so that as the data is passed up the stack, that protocol has the information it needs to manage its part of the job. The most common protocol found in Ethernet frames today is IPv4; the “IP” part of TCP/IP.
The IP protocol creates a header of between 12-50 (typically 20) bytes, leaving only between 1450 and 1484 bytes available for the next protocol, in this case, TCP.
IP frame summary
Header (12-50 bytes)
TCP Frame (1450-1484 bytes)
TCP uses a variable header size, but no less than 12 (typically 20) bytes long, leaving at most 1484 bytes available for HTTP. If your network or the Web site imposes encryption or additional protocols (such as PPTP) at any point between HTTP and Ethernet, this will add more overhead and further reduce the per-packet space available for HTTP. The amount of packet overhead imposed depends on the encryption protocol.
TCP frame summary
Header (12-[variable] bytes)
HTTP (max 1464 bytes)
Count the Beans
When HTTP 1.0 and HTTP 1.1 applications build their requests and responses, they must adhere to a specific format. Each line must be terminated by a single carriage-return-line-feed (CR/LF) pair, and the header section must be separated from the message body by another CR/LF pair. This means that each HTTP message contains (lines*2) + 2 + character_count bytes in the header section. If the HTTP message contains a body, then we have to add the total size of this to the combined total.
In our example above, the client request included 406 printable characters in a header section composed of 8 lines (the “Accept” and “User-Agent” lines wrap in the example), giving us a total byte count of (406 + 16 + 2) for a whopping 424 characters, not even half of a TCP payload. This request was easily carried in a single packet, and you could accurately state that ISA Server performed “deep packet inspection” in this case.
On the other hand, the response totaled 241 printable characters in 10 lines and was followed by a message body which was specified in the content-length header as being 2806 bytes long. All together, this indicates a total response byte-count of (241 + 20 + 2 + 3,247), or a total of 3,510 bytes, more than two Ethernet frames. If we assume a TCP payload size of 1460 bytes (most common), we can see that sending this response will require two full TCP packets and 590 bytes of a third. This is where the concept of “deep packet inspection” starts to lose its grip on reality.
In order for ISA Server to evaluate this response as a complete entity, it must have received all three packets and accumulated them into a single, coherent data stream. Once it does this, the data being evaluated ceases to be packets at all; in fact, there is no association whatsoever with any of the three packets that originally delivered the response; only a single data stream in “the mind of ISA Server” (memory) which represents the total payload portion of each of the TCP packets. Now that ISA Server has the complete data, it can apply its application-layer intelligence to the task of evaluating this response.
When this data needs to be sent to the intended recipient, ISA Server will provide a pointer to this data buffer when it calls on Windows to send the data back down the network stack, through TCP, IP, and Ethernet, back to the wire between ISA Server and the client. It is the network components that will break this data stream into properly-sized packets and manage their flow through the stack and between the ISA Server and the client.
Although HTTP is considered a “stateless” protocol, meaning the protocol itself doesn’t maintain the current “state” of the communications from one point in time to another, the client, proxy and Web server must maintain state if the conversation is to be meaningful. This is especially true for a proxy, which must understand whether to expect a request or response message (and from whom) and what is considered an appropriate type of message in that case. To accomplish this task, it’s necessary for ISA Server to “cherry-pick” certain data fields from a message header and stash them away so that the following messages received as part of the conversation can be evaluated properly. This is another place where the “packet inspection” concept loses contextual integrity.
Bring It On Home
As you can see, the actions that ISA Server performs on and as a result of many common protocols are frequently not based on “deep packet inspection”, but “data stream” and “protocol behavior” evaluation. This is the same whether the data to be evaluated is seen in an HTTP or SMTP conversation, RTSP stream, or an FTP directory listing. Since ISA Server spends much of its time evaluating data larger than a single packet and will actually retain portions of packet data between messages, the concept of “packet-level” functionality is rarely accurate. The more accurate statement to describe ISA Server behavior is “application-layer inspection”, which encompasses the concepts of data stream and protocol behavior analysis.