Secure remote access to web servers with SSH

| Comments (5) |
Telecommuting is super-convenient, but has the obvious problem, especially for developers, that you want to be able to access resources on the corporate network. The classic soution here is a VPN, but there are lots of situations where the overhead of setting them up is excessive, especially since a lot of services (e-mail, remote login, version control, etc.) can be remoted without one.

One hack that people often use for applications that aren't so easily remoted is SSH port forwarding. For concreteness, say that you're working on your laptop in your hotel room and you want to use a service on your corporate network. You can use SSH to set things up so that if you connect to port L on your laptop (operated by your SSH client) then the SSH server connects to port R on some specific machine M on your corporate network. (This gets a little confusing because the technical term here is that port L is called the "local" port and M:R is called the "remote" host and port, even though you're the one working remotely.). Any data you write to the client (on L gets forwarded to the M:R.)

Once you have port forwarding set up, you reconfigure your client to think that localhost:L is the server and your client just works--at least it does for many protocols, such as POP and IMAP. Unfortunately, it interacts badly with two features of the Web: absolute URLs and virtual hosts

Absolute URLs
When you make an IMAP or POP connection, you configure the address of the server once and then your client uses it from then on (until you change it). But that's not how the Web works. Even if you type in the URL (address) of the home page, from then on you navigate by clicking on links. Each link, of course, is a pointer to some other page.

Now, links come in two basic flavors: relative and absolute. Absolute URLs specify both the server and the page on the server, like http://www.educatedguesswork.org/movabletype/archives/2006/04/2_billion_dolla.html. That means they can point anywhere. Relative URLs specify only the path, either the entire path, like /movabletype/archives/2006/04/2_billion_dolla.htm or just a partial path, like 2_billion_dolla.html. All those links go to the same place, but because the second two are relative, they only go to the right place if you have the right context, which is to say if you're reading the right page to start with.

The great thing about relative URLs is that they allow portability. Say I decide I want to change the name of my web site to www.eg.org. If I'm using relative URLs then I can just make a copy of the site and everything works. If I'm using the second kind of relative URL, I can even change the directory my pages live in. The bad thing about relative URLs is that they're context dependent. One way in which this is bad is that if you're reading this page in a Web-based RSS reader like bloglines then the URLs end up pointing to Bloglines site not mine and so may not work properly. And, of course, you can't use relative URLs to point to other sites. In any case, people often use absolute URLs even for links that point inside their own site.

Absolute URLs cause big problems if you want to tunnel access to a Web site over SSH. To see why, think about how the tunnel works. Say I want to tunnel to www.example.com but it's behind my firewally. I set up a port forwarding association between port 8080 on my local machine and www.example.com:80. Then I type http://localhost:8080 into my browser and things just work--at least they do as long as all the URLs are relative. Relative URLs like /page.html get turned by my browser into absolute URLs like http://localhost:8080/page.html and everything works fine because I go back to my local server, which is just a tunnel to the remote server.

But consider what happens if there's an absolute URL on the page, like http://www.example.com/page.html. Your browser has no way of knowing that that's actually the same site--it thinks you're on http://localhost:8080, so it tries to connect to www.example.com, which is inaccessible (hence the need for the tunnel) and you're hosed.

Virtual Hosts
The second problem is name-based virtual hosts. Remember that to do NBVH the client sends the Host: header which contains the domain name of the server it thinks it's connecting to. If the server is using NBVH and it gets localhost:8080 instead of www.example.com:80 you probably aren't going to get the result you expect.

Port Forwarding With HTTP
The fix here is to tell the client that your SSH port tunnel is actually your proxy. This is one of those things that seems totally natural if you know nothing about HTTP, is surprising if you know a bit about HTTP (especially, if like me, your opinions about HTTP were largely formed in the early 90s), but actually works. The reason why you'd think it might not work is this: Many HTTP proxies modify the content. When you talk to an HTTP server, the URL you give it to request a page is just the path part of the URL, like so:

GET /index.html HTTP/1.1
...

But in HTTP/1.0 the Host: header hadn't been invented and so you would sent the absolute URL to the proxy, like so:

GET http://www.educatedguesswork.org/index.html HTTP/1.1
...

The proxy would then strip off the host part, connect to the right host, and send the request with only the path part of the URL. So, the server would only see the path part of the URL. So, because the SSH port forwarding doesn't change the request, you would expect the server to choke.

In HTTP/1.1, however, servers are required to accept full URLs in the request line--presumably on the theory that in some hypothetical future version of HTTP we will send them all the time. So, it turns out that the server accepts this request just fine. Basically, the SSH port forwarder is acting as what HTTP calls a transparent proxy.

Port Forwarding With HTTPS
Unfortunately, this doesn't work properly for HTTP over SSL/TLS. The problem is that HTTPS's interaction with proxies is totally different. When you do HTTPS, you do the SSL/TLS handshake first and that doesn't contain any information that the proxy can use to know which server to connect to.

To get around this problem, SSL uses a special proxy method called CONNECT which basically means "make a direct connection and then get out of my way". So, the client sends:

CONNECT www.educatedguesswork.org:443 HTTP/1.1
...

And the proxy makes a connection to www.educatedguesswork.org:443 and then transparently forwards any data the client or server sends without inspecting or modifying it. This gives IT managers fits but it's not like you can tell your employees not to buy stuff at Amazon.com.

The problem here is that your average SSL server is not prepared to accept a CONNECT request. So, if you set your SSH port forward to be your SSL proxy, the server sees the CONNECT and throws an error, hangs up on you, or both. So, you need to somehow consume that CONNECT. The fix is to run a Web proxy on the remote machine (the one you're SSH-ing into). You then point your SSH tunnel at the Web proxy. When your client connects to the tunnel (and thus transitively to the proxy) and offers CONNECT the proxy says OK and makes the connection for you. Mission accomplished. A lot of work to get to your corporate Web servers, though.

5 Comments

...or use the -D flag on SSH and tell your browser to use localhost:8080 as a SOCKS proxy

Yep, the SOCKS approach is certainly easier; OpenSSH acts as a pretty good SOCKS4 proxy.

A recent interview with an OpenSSH developer indicated that they see the VPN-like features as a key featureset in the future, which is good news.

We can already see some good new features like the -M and -f switches, which *almost* remove the need for "babysitting" scripts wrapped around the ssh invocations. It's certainly a lot easier than alternatives like IPSec VPNs.

SOCKS.... Isn't that cheating? :)

You can automate the proxy configuration on the browser side using a proxy configuration file (proxy.pac). A good description is here.

'SOCKS.... Isn't that cheating? :)'

it sometimes feels like it. The damn OpenSSH dev team keep obsoleting large chunks of my daily script collection ;)

Leave a comment