HTTP connection and session behavior

| Comments (1) |
One of the less good features of old-style HTTP is the way it handled TCP connections. In original HTTP 1.0, the way that things worked was that you'd open up a TCP connection, send a single request, get a response, and close the connection. As long as you don't know anything about TCP, This sounds simple and elegant, but it's actually a terrible idea.

The problem is that setting up a TCP connection takes time. First, there's the "3-way handshake" to set it up. So, you've consumed a round trip before you even get to send your request. Then to make matters worse, TCP congestion control uses an algorithm called slow start. The idea with slow start is that you don't start sending data at full rate right away. Instead, you start out with a slow sending rate and only increase it as you realize that data is being received at the current rate.1

Again, this doesn't sound so bad until you realize that your average Web page isn't the result of a single HTTP fetch but rather an HTML page with a bunch of inline images. Each of these images requires its own fetch. So, in the worst case scenario, you fetched each image in sequence, which provides suboptimal performance and looks lousy in the UI. The problem is even worse with SSL/TLS because the connection setup is rather more expensive. The initial TLS connection setup requires two round trips and costs the server an expensive private key operation, typically RSA (your average server can do a few hundred RSA operations per second).

Modern Web implementations have several features designed to alleviate these problems. The first is persistent connections. Instead of setting up a new connection for each fetch, you leave the connection up and then issue multiple fetches. This avoids the 3-way handshake and slow start, so you can get the full bandwidth of the link.2

The second feature is parallel connections. Instead of just opening one connection to the server, the client opens several. It can then fetch multiple images (or anything else in parallel). This has a number of advantages. The first is that it looks snappier since you can load more than one image on the page at once, so people don't feel like they're waiting as long for something to happen. The second advantage is that it works even if the server doesn't support persistent connections. Finally, if you're sharing the network connection with others, you get a bigger share of the bandwidth. (This isn't fair, but there you have it.) This feature was originally introduced by Netscape but everyone does it now.

SSL implementations also include a feature called "session resumption". Instead of doing a new complete handshake with every TCP connection you initiate, the client and server can reuse the same keying material initiated in connection N with connection N+1. This lets you avoid the RSA computation on the server and save a round trip in the handshake.

So, any given HTTPS session tends to involve some combination of both. By way of illustration, here's the sequence of events from my local client talking to my local server with Firefox, Apache, and a hacked up version of the Apache default page with some extra images. The way to read this is that X.Y is "Connection X, Request Y".

 
Connection 1: New handshake 
1.1  GET / HTTP/1.1 
1.2  GET /manual/images/feather.jpg HTTP/1.1 
1.3  GET /manual/images/apache_pb.gif HTTP/1.1 
1.5  GET /manual/images/openssl_ics.gif HTTP/1.1 
1.6  GET /manual/images/apache_header.gif HTTP/1.1 
Connection 2: Resumed handshake 
2.1  GET /manual/images/mod_ssl_sb.gif HTTP/1.1 
1.7  GET /manual/images/index.gif HTTP/1.1 
2.2  GET /manual/images/home.gif HTTP/1.1 
1.8  GET /favicon.ico HTTP/1.1 

So, as you can see, there's 10 requests, two connections, and one RSA handshake.

Typically, the lifetime of HTTP persistent connections is very short. My Firefox offers 300 seconds, but it looks like Apache gets bored after 15 seconds or so. Session caches are a lot longer. It's generally quite long with clients but servers tend to keep it fairly short. 5 minutes is the default with mod_SSL, but you can dial it up arbitrarily high since it's a classic memory/CPU tradeoff. In general, all the connections associated with a given page will be in the same session at least.

1. The standard treatment of this problem is The Case for persistent-connection HTTP.
2. There's also pipelining, which I won't talk about here.

1 Comments

So when the server wants to rehandshake, does this happen on both TCP connections simultaneously? something sounds wrong here.

Leave a comment