« TLS 1.1 and DTLS finally published | Main | If I give you $100 can I drill in ANWR? »
April 26, 2006
Secure remote access to web servers with SSH
Telecommuting is super-convenient, but has the obvious problem, especially for developers, that you want to be able to access resources on the corporate network. The classic soution here is a VPN, but there are lots of situations where the overhead of setting them up is excessive, especially since a lot of services (e-mail, remote login, version control, etc.) can be remoted without one.One hack that people often use for applications that aren't so easily remoted is SSH port forwarding. For concreteness, say that you're working on your laptop in your hotel room and you want to use a service on your corporate network. You can use SSH to set things up so that if you connect to port L on your laptop (operated by your SSH client) then the SSH server connects to port R on some specific machine M on your corporate network. (This gets a little confusing because the technical term here is that port L is called the "local" port and M:R is called the "remote" host and port, even though you're the one working remotely.). Any data you write to the client (on L gets forwarded to the M:R.)
Once you have port forwarding set up, you reconfigure your client to think that localhost:L is the server and your client just works--at least it does for many protocols, such as POP and IMAP. Unfortunately, it interacts badly with two features of the Web: absolute URLs and virtual hosts
Absolute URLs
When you make an IMAP or POP connection, you configure the
address of the server once and then your client uses it
from then on (until you change it). But that's not how
the Web works. Even if you type in the URL (address)
of the home page, from then on you navigate by clicking
on links. Each link, of course, is a pointer to some
other page.
Now, links come in two basic flavors: relative and
absolute. Absolute URLs specify both the
server and the page on the server, like
http://www.educatedguesswork.org/movabletype/archives/2006/04/2_billion_dolla.html. That means they can point anywhere.
Relative URLs specify only the path, either the entire
path, like /movabletype/archives/2006/04/2_billion_dolla.htm
or just a partial path, like 2_billion_dolla.html.
All those links go to the same place, but because the second two
are relative, they only go to the right place if you have
the right context, which is to say if you're reading the
right page to start with.
The great thing about relative URLs is that they allow portability.
Say I decide I want to change the name of my web site to www.eg.org. If I'm using relative URLs then I can just make a copy of the site
and everything works. If I'm using the second kind of relative
URL, I can even change the directory my pages live in. The
bad thing about relative URLs is that they're context dependent.
One way in which this is bad is that if you're reading this
page in a Web-based RSS reader like bloglines then the URLs end up pointing to Bloglines site not mine
and so may not work properly. And, of course, you can't use
relative URLs to point to other sites. In any case, people often
use absolute URLs even for links that point inside their own site.
Absolute URLs cause big problems if you want to tunnel access to
a Web site over SSH. To see why, think about how the tunnel
works. Say I want to tunnel to www.example.com
but it's behind my firewally. I set up a port forwarding association
between port 8080 on my local machine
and www.example.com:80.
Then I type http://localhost:8080 into my
browser and things just work--at least they do as long
as all the URLs are relative. Relative URLs like /page.html get turned by my browser into
absolute URLs like http://localhost:8080/page.html and everything works fine because I go
back to my local server, which is just a tunnel to the remote
server.
But consider what happens if there's an absolute URL on the page,
like http://www.example.com/page.html. Your browser
has no way of knowing that that's actually the same site--it thinks
you're on http://localhost:8080, so it tries to
connect to www.example.com, which is inaccessible
(hence the need for the tunnel) and you're hosed.
Virtual Hosts
The second problem is name-based virtual hosts.
Remember that to do NBVH the client sends the Host:
header which contains the domain name of the server it thinks
it's connecting to. If the server is using NBVH and it gets
localhost:8080 instead of www.example.com:80
you probably aren't going to get the result you expect.
Port Forwarding With HTTP
The fix here is to tell the client
that your SSH port tunnel is actually your proxy. This is one
of those things that seems totally natural if you know nothing
about HTTP, is surprising if you know a bit
about HTTP (especially, if like me, your opinions about HTTP were
largely formed in the early 90s), but actually works. The reason why you'd think
it might not work is this: Many HTTP proxies modify the content.
When you talk to an HTTP server, the URL you give it to request
a page is just the path part of the URL, like so:
GET /index.html HTTP/1.1
...
But in HTTP/1.0 the Host: header hadn't been invented
and so you would sent the absolute URL to the proxy, like so:
GET http://www.educatedguesswork.org/index.html HTTP/1.1
...
The proxy would then strip off the host part, connect to the right host, and send the request with only the path part of the URL. So, the server would only see the path part of the URL. So, because the SSH port forwarding doesn't change the request, you would expect the server to choke.
In HTTP/1.1, however, servers are required to accept full URLs in the request line--presumably on the theory that in some hypothetical future version of HTTP we will send them all the time. So, it turns out that the server accepts this request just fine. Basically, the SSH port forwarder is acting as what HTTP calls a transparent proxy.
Port Forwarding With HTTPS
Unfortunately, this doesn't work properly for HTTP over SSL/TLS.
The problem is that HTTPS's interaction with proxies is totally
different. When you do HTTPS, you do the SSL/TLS handshake first
and that doesn't contain any information that the proxy can use
to know which server to connect to.
To get around this problem, SSL uses a special proxy method
called CONNECT which basically means "make a
direct connection and then get out of my way". So, the client
sends:
CONNECT www.educatedguesswork.org:443 HTTP/1.1
...
And the proxy makes a connection to www.educatedguesswork.org:443
and then transparently forwards any data the client or server sends without
inspecting or modifying it. This gives IT managers fits but it's not
like you can tell your employees not to buy stuff at Amazon.com.
The problem here is that your average SSL server is not prepared
to accept a CONNECT request. So, if you set your SSH port forward
to be your SSL proxy, the server sees the CONNECT and
throws an error, hangs up on you, or both. So, you need to somehow
consume that CONNECT. The fix is to run a Web proxy
on the remote machine (the one you're SSH-ing into). You then
point your SSH tunnel at the Web proxy. When your client
connects to the tunnel (and thus transitively to the proxy) and
offers CONNECT the proxy says OK and makes the connection
for you. Mission accomplished. A lot of work to get to your corporate
Web servers, though.
Posted by ekr at April 26, 2006 8:35 PM | Filed under:
Comments
...or use the -D flag on SSH and tell your browser to use localhost:8080 as a SOCKS proxy
Posted by: Craig Hughes at April 27, 2006 12:52 AM
Yep, the SOCKS approach is certainly easier; OpenSSH acts as a pretty good SOCKS4 proxy.
A recent interview with an OpenSSH developer indicated that they see the VPN-like features as a key featureset in the future, which is good news.
We can already see some good new features like the -M and -f switches, which *almost* remove the need for "babysitting" scripts wrapped around the ssh invocations. It's certainly a lot easier than alternatives like IPSec VPNs.
Posted by: Justin Mason at April 27, 2006 2:32 AM
SOCKS.... Isn't that cheating? :)
Posted by: EKR at April 27, 2006 6:43 AM
You can automate the proxy configuration on the browser side using a proxy configuration file (proxy.pac). A good description is here.
Posted by: Dan Wing at April 27, 2006 8:09 PM
'SOCKS.... Isn't that cheating? :)'
it sometimes feels like it. The damn OpenSSH dev team keep obsoleting large chunks of my daily script collection ;)
Posted by: Justin Mason at April 28, 2006 5:19 AM