Software: February 2009 Archives

 

February 25, 2009

I've got some code that needs to convert an IP address into a string. This is one of those cases where there's a twisty maze of APIs, all slightly different. The traditional API here is:

    char *
    inet_ntoa(struct in_addr in);

inet_ntoa() has two deficiencies, one important and one trivial: it doesn't support IPv6 and it returns a pointer to a statically allocated buffer, so it's not thread safe (I'll let you figure out which is which). Luckily, there's another API: addr2ascii():

    char *
    addr2ascii(int af, const void *addrp, int len, char *buf);

If you pass buf=0, addr2ascii() will return a pointer to a static buffer like inet_ntoa(). However, if you pass it an allocated buffer it will return the result in buf. Unfortunately, if you actually try to use addr2ascii() in threaded code you will quickly discover something unpleasant, at least on FreeBSD: you occasionally get the result "[inet_ntoa error]" or some fraction thereof. The answer is hidden in the EXAMPLES section of the man page:

In actuality, this cannot be done because addr2ascii() and ascii2addr() are implemented in terms of the inet(3) functions, rather than the other way around.

More specifically, on FreeBSD, it looks like this:

    case AF_INET:
        if (len != sizeof(struct in_addr)) {
	    errno = ENAMETOOLONG;
            return 0;
        }
        strcpy(buf, inet_ntoa(*(const struct in_addr *)addrp));
        break;

In other words, even though addr2ascii() doesn't explicitly use a static buffer, since it depends on inet_ntoa() it's still not thread safe. In order to get thread safety, you need to use yet another API:

    const char *
    inet_ntop(int af, const void *restrict src, char *restrict dst,
        socklen_t size);

Outstanding!

UPDATE: Clarified that this is a problem on FreeBSD. I don't know if it's an issue on all other platforms. Linux, for instance, doesn't have addr2ascii()
UPDATE2: Trivial vs. important.

 

February 16, 2009

I've got a reasonably large computation job—bigger than I can conveniently do on my own hardware—I need to do, and so naturally I thought EC2. For those of you who don't know, the basic idea behind EC2 is that you have Amazon Machine Images (AMIs), which represent the state of a machine which is off (e.g., the disk drive state). You can activate as many instances as you want, booting off the same AMI, which gives you a bunch of nearly identical machines (except for the IP address, etc.) which you can then log into and use for whatever you want. All the management is via this Web services interface which you drive with client-side Java apps. So, for instance ec2-run-instances XXX brings up a single instance of image XXX.

After about 5 hours screwing around with it, I've figured out how to do what I want, but I have to say, they don't make it super-convenient.

  • Nothing has a mnemonic name. So, for instance, all the images are names ami-XXXXXXXX where the Xs are hex digits. Running instances are similar. Now, I can totally understand why it's convenient to use numeric identifiers, but since they make you download their toolchain, you'd think they could at least let you assign symbolic names of your choosing to the objects.
  • The tools are orthogonal but uh, fine grained. So, to bring up a new instance and log into it, you do (1) start the instance with ec2-run-instances (2) run ec2-get-console-output to see if it's booted and to get the SSH public key [repeat as necessary] (3) run ec2-describe-instances to get the domain name for the machine so you can log in (4) ssh in.
  • The default images are fairly minimal: no Emacs, no compiler, no debugger, etc. Now, they have yum, so you can install this stuff easily, but this brings us to...
  • The images don't have any persistent state. So, if you install Emacs, and shutdown the instance, it's back to the initial state when you start it again. And since you pay by the operating hour even if the machine is idle, you don't want to leave the machine running all the time. Amazon does provide a storage service (actually, two, S3 and EBS), but you still need to do some work on a machine-by-machine basis to make it connect automatically.
  • Amazon does let you take a running machine and make a new image out of it, but the process is pretty slow, so what ends up happening is you get the machine in the state you think you want it, pickle the image, and then next time you boot it you realize you forgot something. I repeated this a few times before I got an image I liked.

This probably all works OK as a replacement for your own data center where you would need to absorb all the installation cost anyway, but if what you need is a temporary pile of computrons for a single compute job, EC2 isn't that great a match. It'll get the job done but the overhead is awful high.