November 2007 Archives

 

November 29, 2007

I've written before about Asimov's blind spot about hardware versus software. A related issue in more recent science fiction is the treatment of various kinds of input overload. The two cases that come most naturally to mind are:
  • In Haldeman's The Forever War Mandela gets badly burned when he looks at a laser with his image intensifiers on and resulting in massive amplification:
    When the laser hit my image converter there was a red glare so intense it seemed to go right through my eyes and bounce off the back of my skull. It must have been only a few milliseconds before the converter overloaded and went blind, but the bright green afterimage hurt my eyes for several minutes.

    ...

    We knew enough not to groan or anything, but there were some pretty disgusted looks, epsecially on the faces that had singed eyebrows and a pink rectangle of sunburn framing their eyes.

    (This is actually from the original short story Hero because I can't find my copy of TFW, but the plot point is in TFW, as I recall).

  • In Gibson's Neuromancer, if you run afoul of sufficiently bad ICE you can actually get electrocuted:
    Sure. I was crazy. Figured I'd try to cut it. Hit the first strata and that's all she wrote. My joeboy smelled the skin frying and pulled the trodes off me. Mean shit, that ice."
    "And your EEG was flat."
    "Well, that's the stuff of legend, ain't it?"

    (Transcription here).

Now, whole volumes could be written about Gibson's ignorance of how computers work, but Haldeman was scientifically trained so it's a little more surprising coming from him, but both of these examples are basically nuts. You'd have to be nuts to build a system that could potentially pump enough energy into the human body to actually burn your skin.

The Neuromancer case is particularly egregious because you've presumably got some digital system plugged into your brain-computer interface, so it's just a simple matter of never giving the BCI enough voltage that it could potentially damage you. Even if you can't do that for some reason, it's easy to add physical voltage (e.g., zeners) or current limiting devices (e.g., fuses) to the leads to make electrocute you. This is pretty basic electronics and not really subject to circumvention no matter how malware infested your computer gets.

The Forever War passage is more interesting because at least in the past image intensifiers were quasi-analog devices. However, it's pretty hard to believe that one would make an amplification stage that could actually emit enough power to burn your skin in milliseconds, especially since the amount of energy emitted by the displays in analog image intensifier systems is partially gated on the relaxation time of the phosphors— no matter how many electrons you pump into the phosphor, it only phosphoresces so fast, so once all the molecules are in the excited quantum state, the electrons simply aren't absorbed. As I understand it, when standard night vision systems are overloaded (e.g., someone shines a light on them) they just stop working, not burn your face off. And of course any system in which the amplifier stage is digitally read and then displayed on a screen can be easily set with a maximum emission power. So, ultimately, I don't think getting burned by your image intensifier is a plausible story either, but I guess in both cases having your face catch on fire is more exciting than saying "my computer crashed".

 

November 28, 2007

If you're an American with an iPhone or are just crazy enough to use AT&T as your wireless carrier, you may be wondering how much you're going to get gouged to use your phone in Canada. The answer is: a lot.

  • Basic per-minute rate is $.79.
  • For $3.99/month you can bring this down to $.59
  • There are two data plans available: 20MB for $29.99/month and 50MB for $59.99/month. I wasn't brave enough to ask what the roaming data rate was if you didn't do this, but I imagine it's insane.
  • You can turn off Edge data when roaming in Settings|General|Network. Actually, this seems to be the default.

Welcome to Canada!

 

November 27, 2007

I started running back in the last 1970s, so I well remember the days when just having a digital watch seemed pretty cool and actually having a stopwatch (or as it was labelled, a chronograph!) with a lap timer was ridiculously high tech. So, it was with some trepidation that I forked over for the Garmin Forerunner 305 a combined Watch/HRM/GPS ($179 at Costco, pricematched via Zappos, with a $50 rebate). This is a pretty impressive technical achievement: big for a watch, but pretty compact for a GPS, especially considering that this is a technology that only went operational 15 years ago.

OK, so once you get over the sheer technical whizzbangery, how well does it work? Pretty well, actually. It's pretty much Garmin's standard GPS system crammed into a watch form factor along with a heart rate monitor receiver, so you can do all the standard GPS stuff: time, time in motion, lap time, speed, distance, altitude, heart rate, etc. As usual, Garmin's UI is a little cryptic, but it's extremely configurable—you can set each screen up with a variety of fields and configure what's displayed in each field. It's really surprisingly nice to have distance and instantaneous speed, and it's a lot less annoying than the foot pod pedometers that non-GPS systems use.

There's also a bunch of mapping type features. You can retrace your steps (useful for running in unfamiliar environments) as well as save workouts and then race your previous performance. I haven't tried that feature but I'm looking forward to in the future, since it seems a lot easier than memorizing the time at various waypoints.

As far as comfort, the watch is a bit of a brick, but it's actually surprisingly comfortable once you get it on. I actually noticed the HRM chest strap more—it's about average comfortable, which is to say not great but not terrible. It's probably the first sports watch that I've ever had that I wouldn't want to wear on a daily (non working out) basis. It's not really practical for that anyway since it runs on rechargeable batteries and only lasts about 10 hours on a charge. On the other hand, it seems fine for running and ought to easily last your entire run unless you're doing some kind of ultra.

 

November 26, 2007

California law require vehicles to be smog tested in order to be registered in the state of California. The smog testing is done by independent (but licensed) operators. This creates two major principal/agent problems:
  • The tester can falsely report a passing grade in return for a bribe—or just cause you're a good customer.
  • The tester can falsely return a failing grade in order to get you to spend money on "fixing" whatever they say is wrong with your car.

There are a number of countermeasures used against these problems. The first is to have "check only" stations, which will test your car but not fix it. The state requires some vehicles to be tested at check only stations, both by random assignment and by preferentially selecting vehicles they expect to be high emitters. The idea here seems to be that check only stations have no incentive to give a false failing grade because they can't profit from it. Similarly, because they don't have a relationship with you—unlike your regular mechanic—it's harder for you to bribe them since they're less likely to trust you're not entrapping them (see below).

This program seems like it's likely to be of limited effectiveness. First, only a small fraction of vehicles are assigned to check only stations and so you're only decreasing the amount of fraud by that fraction. It doesn't serve as a deterrent to fraud in itself. Second, at least for the second kind of fraud, the check only station could at least potentially get a kickback from your mechanic, though it might be hard for them to get together.

Another countermeasure is that the smog testing machines transmit their results to the state directly before (at least according to my mechanic) they're displayed to the mechanic. This is actually pretty clever, since it reduces the opportunities for a fraudulent mechanic to see the result and offer to fix the results for you. Obviously, you can still bribe the mechanic in advance, but that requires you to know in advance what's wrong. I tend to think it's less useful for the second kind of fraud, since it's surely pretty easy to gimmick the sensors to produce a positive reading. I should also mention that the connection between the machine and the state appears to be via modem, and, unless cryptography is being used, is probably pretty easy to spoof, which would let you bypass the initial negative test.

Another countermeasure against the first kind of fraud is that only test only and "gold shield" test stations can certify a vehicle once it has failed its smog check. It's obvious why this makes sense for test only and I assume that gold shield are subject to extra scrutiny by the state.

Finally, I assume that the State periodically sends people out with pre-calibrated vehicles to see if the testing stations are producing accurate results, if they offer to falsify the results for you, etc.

By the way, my car passed, though I needed a new gas cap.

 

November 25, 2007

OK, I totally understand why there is a race to the front for state primaries. The earlier primaries have a disproportionate effect on who gets selected as the final presidential nominee. Obviously, if you want to get pandered to the way Iowa and New Hampshire do, it would be to your advantage to move the primary up.

So, what I don't get is why the Democrats are pushing back on the states (e.g., refusing to seat Florida's delegates), since it's not clear that it's in their interest to have the current early primary states be so influential. In this case, since the party's interest is to have the most credible candidate for the general election, and Florida is often a determining state, it would be to their advantage to encourage Florida go first. In any case, it's not clear to me why they are actively discouraging it.

 

November 24, 2007

For some reason, Peter Cox's SIPtap program is getting press. First, it's immediately obvious to anyone with even minimal knowledge of networking that if you have access to the packets of a VoIP flow (or for that matter any other unencrypted network flow), you can reconstruct the data. That's why people use encryption. So, this is hardly news. That's why the IETF and others have spent a lot of time building security protocols for VoIP. Many current VoIP phones come with some encryption now and the newer stuff will be more secure and easier to deploy.

OK, so it's common knowledge. On the other hand, Cox doesn't say he discovered it, just that this is a "proof of concept". Given that it's droolingly easy to write an RTP decoder and that VoIPong and Vomit and Wireshark already existed, it's hard to see exactly what concept is being proved, other than that with enough hype you can get your name in the paper.

UPDATE: Fixed typos

 

November 23, 2007

For obvious reasons, law enforcement and investigative agencies aren't incredibly fond of encrypted communications. The most popular responses to this difficulty have generally been one or more of:

  • Forbid strong crypto entirely.
  • Require "key escrow" where a copy of the keying material somehow goes to the LEA.
  • Get a copy of the keying material after the fact.
  • Use keyloggers or other invasive measures.

None of these have been particularly successful: the strong crypto cat is out of the bag, users have overwhelming rejected key escrow, and although people do sometimes have their keys subpenad (the UK has a law requiring complaince), there are standard cryptographic techniques that provide "perfect forward secrecy" so that even if your keys are disclosed after the fact your communications aren't readable. The government in the US has had some success with keyloggers, spyware, etc., but they either require physical access or compromise of the system in question.

The popularity of combined software/service operations like Hushmail and Skype opens up a new avenue, however. It's recently come out that Hushmail has in the past handed over keys to the government for users who used their online encryption system. This was made easier by Hushmails "software as a service" type architecture, where they do the encryption and decryption on their site. Hushmail also provides an option where you can download a Java applet, but it should be clear that under the right legal constraints, they could theoretically put a backdoor in the applet you downloaded, too.

Similarly, the German police have recently complained that they can't monitor Skype calls. They say they're not asking for the encryption keys, but because of Skype's architecture and the fact that Skype is involved in authenticating each call, it should be clear that Skype could mount a man-in-the-middle attack on your phone call and hand over the keys. They could also just give you an "upgraded" software version with a back door.

Combined software/service systems like Skype and Hushmail are uniquely susceptible to this kind of lawful intercept attack (or for that matter to cheating by the vendor of any kind.) If you use third party software than you don't have to worry about your ISP cheating you because they can't—they don't have the keys. And while your software vendor could potentially cheat you, they don't have the kind of constant contact with you that Skype or Hushmail does, so they would generally need to put a back door in every copy of the software, which carries a much higher risk of discovery and of users switching software. Who wants to run software with a deliberate back door?

 

November 21, 2007

When we were writing the most recent version of RELOAD, I found myself doing a lot of bit diagrams, which I hate doing: it takes forever and whenever you change anything (like add one field) you have to totally redo the diagram to get the alignment right. I sometimes wonder whether protocol designer's penchant for 32-bit alignment and padding doesn't have more to do with making the diagrams easier to modify than it does with processor architecture.

Anyway, rather than do a lot of hand drawing, I decided to do a tool, which I call s2b (this, by the way, is a classic demonstrations of programmer thinking, since it would have taken me about 5 hrs to do the diagrams and it took me about 12 hrs to do the tool, mostly tuning the output format). The idea here is that you specify your diagrams in a language, which looks a lot like C structures (I actually ripped it off from the specification language, used in SSL/TLS, which was ripped off from C and XDR (which itself was ripped off from C)). You then run your language through s2b and it emits bit diagrams. Here's an example:

    struct {
      peer_id id; 
      uint32 uptime;
      opaque certificate<65000>;
      ip_address_and_port address;
    } peer_info_data;

    public struct {
      string version;
      uint8 transport;
      peer_info_data peer_info;
   } route_log_entry;

And here's the diagram it produces:

  STRUCTURE: route_log_entry
      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |          Version Len          |                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
      |                            Version                            |
      /                                                               /
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |   Transport   |                                               |
      +-+-+-+-+-+-+-+-+                                               +
      |                               Id                              |
      +                                                               +
      |                                                               |
      +                                                               +
      |                                                               |
      +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |               |                     Uptime                    |
      +-+-+-+-+-+-+-+-++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |               |        Certificate Len        |               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               +
      |                          Certificate                          |
      /                                                               /
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      /                                                               /
      |                            Address                            |
      /                                                               /
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Of course, what you really want is to simply embed the language syntax into your document and then have the diagrams automatically embedded into the document at the declaration site.1 s2b also includes a Perl script called s2x.pl that will do this for xml2rfc documents. You just embed your PDU definition in an artwork declaration, like so:

    <figure>
      <!-- begin-pdu-->
    
      <artwork><![CDATA[
      public struct {
        route_log_entry entries<65000>;
      } route_log;	
        ]]></artwork>
    </figure>

One more point deserves mentioning. Because I was just generating diagrams and not bothering to automatically generate encoders or decoders, there is only one built-in primitive type: primitive, which is just an n-bit long object. You have to define all other types. Here's the prologue I use:

primitive uint8 8;
primitive uint16 16;
primitive uint24 24;
primitive uint32 32;
primitive int32 32;
primitive uint64 64;
primitive uint128 128;
primitive char 8;
primitive opaque 8;
primitive blob 0;
typedef char string<65000>;
You tell s2x.pl about this with a begin-prologue comment. You can define types of any size you want. This means that there's nothing special about 8 bits, so you can do flags or whatever.

Not the greatest tool ever, but a lot more fun than drawing ASCII art.

1. You might ask why I didn't simply work directly in a language and eschew the bit diagrams altogether. This is what, for instance, SSL/TLS does. Answer: a lot of people (me not one of them) prefer bit diagrams.

 

November 19, 2007

This page, representing information conveyed in rap songs in visual form (e.g., the relationship between money and problems) has been making the rounds. I didn't see a way to submit your own entries, but here's my attempt:

 

November 18, 2007

 
As draft season is once again upon us, I am once again spending a lot of time with xml2rfc the unofficial official draft production tool of the IETF. Now, the party line at IETF is that we use ASCII and you can prepare documents in any tool you like, but here on Planet Earth, the combination of nroff bit rot (or at least mind rot) and increasingly stringent formatting requirements has made it a real PITA to do documents in any tool other than xml2rfc. This does not mean that xml2rfc is a joy to use.

Before I go on with my litany of complaint, I want to head off at the pass the usual response one hears a this point. Two responses, actually: (1) nobody is making you use it and (2) it's open source software, if you don't like how it behaves, then you can fix it. The first objection is literally true but as a practical matter false. First, everyone else uses it so if you want to collaborate you pretty much have to. Second, as I said earlier, the fact that everyone else uses it means that the IETF has felt free to impose increasingly stringent tests on submitted documents to the point where if you use any other document production system, each time you want to submit a new document you end up spending a lot of time figuring out how to get it through whatever submission filters have been imposed this week. Finally, and most importantly, if you submit your draft to the RFC Editor in XML (you do want your document published as an RFC, right?) they will edit it in XML and so when you want to do a bis version, you have all their copy edits incorporated. On the other hand, if you give them plaintext, then you end up either having to edit their incredibly crufty nroff source or backport all their copy edit changes into your original source format, whatever that was.

The second response, of course, is insane. I just want to write documents and shouldn't have to be an XML hacker, let alone a tcl hacker (I did mention that xml2rfc is written in tcl, right?) to get that task done. "Go fix it yourself" is a fine mantra for tools that are truly optional, but not for those which are increasingly becoming the de facto standard.

OK, back to my theme. As the name suggests, to write something in xml2rfc you start with an XML document in a particular format and the run it through xml2rfc to produce ASCII or HTML or whatever (though ASCII is the normative format). The document looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<!-- edited with XMLSPY v5 rel. 3 U (http://www.xmlspy.com)
     by Daniel M Kohn (private) -->

<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
    <!ENTITY rfc2119 PUBLIC '' 
      'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml'>
]>

<rfc category="std" ipr="full3978" docName="sample.txt">

<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>

<?rfc toc="yes" ?>
<?rfc symrefs="yes" ?>
<?rfc sortrefs="yes"?>
<?rfc iprnotified="no" ?>
<?rfc strict="yes" ?>

    <front>
        <title>An Example</title>
        <author initials='A.Y' surname="Mous" fullname='Anon Y. Mous'>
            <organization/>
        </author>
        <date/>
        <abstract><t>An example.</t></abstract>
    </front>

    <middle>
        <section title="Requirements notation">
            <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
            "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
            and "OPTIONAL" in this document are to be interpreted as
            described in <xref target="RFC2119"/>.</t>
        </section>

        <section title="Security Considerations">
        <t>None.</t>
        </section>
    </middle>

    <back>
        <references title='Normative References'>&rfc2119;</references>
    </back>

</rfc>

Now, there's plenty of stuff to object to here, starting with the (false) notion that I want to be writing my document in XML in the first place. But what I want to talk about right now is how references/bibliographies are done.

Bibliography Locations
xml2rfc has three major reference handling modes:

  • Directly inserting the bibliographic information into the file.
  • Reading the bibliographic information off files on the disk.
  • Reading the bibliographic information off a site on the Internet (the example above).

You can mix and match these with some of the references being in each location.

Now, with RFCs and Internet-Drafts, as opposed to, say, scientific papers, Internet based references are unusually attractive.

  • There's an extremely small set of about 10,000 documents that most of your citations come from.
  • Those documents have unambiguous naming scheme that everyone agrees on (RFC-XXXX, draft-yyy). This sounds trivial, but it's actually a significant obstacle to reference sharing between collaborators in formats like LaTeX where you need to unambiguously specify the reference key—even in the face of tools like RefTeX to let you search.
  • The common documents have a lot of reference volatility drafts get updated regularly and you can feed xml2rfc the draft name without the version number and it will automatically pick up the latest version. This prevents bit-rot.

For all these reasons, you'd think any sane person would use Internet-based references all the time and just use file-based and/or included references (which, btw, are hideous) when they had to reference something that wasn't online. Unfortunately, if you are that sane person, you're about to get screwed: as soon as you go offline (like you want to work on your document on a plane) things go pear-shaped in a really serious kind of way and you get an error that looks like this:

xml2rfc: error: http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml: http package failed

Now, problem one (and a theme we'll come back to in a minute) is that you pretty much have to be a computer scientist to figure what this means. HTTP package failed? Maybe I need a new HTTP package? No, you're not on the Internet. But that's sort of forgivable, because only a computer scientist would be able to tolerate writing a document of any length in XML in the first place. And if you think about it for a minute, you can probably figure out what this means—though it's worth noting that the web page where I got this example from is none too clear on the fact that you're actually getting this reference from the Internet, and though you'd think the http:// would be a bit of a giveaway, it turns out that XML people routinely use un-dereferenceable URLs to identify resources, so there's no guarantee that just because something starts with http://, you actually can retrieve it.

Problem two is that in most cases you've built this document before and have just made some trivial change and want to rebuild. Most of the references were present when you rebuilt the document two hours ago before you got on the plane. xml2rfc could have simply cached them at that time and use the local cached copy when disconnected until it has time to check cache validity. Unfortunately, it doesn't, so all your references break as soon as you go offline.

Now, this would all be just annoying except for the fact that that error I showed you above is all xml2rfc gives you when you try to build a document with unresolvable references. Even one unresolvable reference means that it won't process your document at all, so if you change one paragraph, leave the references alone, and want to see what it looks like, too bad! You're SOL! At this point your only choice is to go through and stub out all the unresolvable references so that xml2rfc doesn't freak out, and since they appear all over the document this is a lot of work, and even more work when you have to unstub them when you actually want to build the document. By contrast, in a system like LaTeX/bibtex, you just end up with

[?]
at the reference site in the text and empty biblio entries at the end.

The consequence of all this stuff is that people who want to work offline end up using one of the other two reference styles, where there's a local copy. And if you want to collaborate with anyone else, you all either have to have a copy of the entire bibliography strategy gets pretty tedious (did I mention it's scattered across one file for each reference, though there may be some poorly documented or undocumented way to fix that) so you end up just cutting and pasting the bibliography information into the main working file, which, did I mention, is hideous? In the document I'm working on now, over 20% of the lines in the file are devoted to bibliography. But at least it's self-contained.

I can't help myself: here's a typical bibliography entry, cut right out of my document:

      <reference anchor="I-D.garcia-p2psip-dns-sd-bootstrapping">
        <front>
          <title>P2PSIP bootstrapping using DNS-SD</title>

          <author fullname="Gustavo Garcia" initials="G" surname="Garcia">
            <organization></organization>
          </author>

          <date day="25" month="October" year="2007" />

          <abstract>
            <t>This document describes a DNS-based bootstrap mechanism to
            discover the initial peer or peers needed to join a P2PSIP
            Overlay. The document specifies the use of DNS Service Discovery
            (DNS-SD) and the format of the required resource records to
            support the discovery of P2PSIP peers. This mechanism can be
            applied in scenarios with DNS servers or combined with multicast
            DNS to fulfill different proposed P2PSIP use cases.</t>
          </abstract>
        </front>

        <seriesInfo name="Internet-Draft"
                    value="draft-garcia-p2psip-dns-sd-bootstrapping-00" />

        <format target="http://www.ietf.org/internet-drafts/draft-garcia-p2psip-dns-sd-bootstrapping-00.txt"
                type="TXT" />
      </reference>

And here's the reference entry it actually produces:

   [I-D.garcia-p2psip-dns-sd-bootstrapping]
              Garcia, G., "P2PSIP bootstrapping using DNS-SD",
              draft-garcia-p2psip-dns-sd-bootstrapping-00 (work in
              progress), October 2007.

Now, ask yourself the following question: why, exactly, does this biblio entry need to contain the abstract?!?! The URL is also included, though not used here, but that's so xml2rfc can make clickable links in an HTML version. I guess putting the abstract in the reference would let some future JavaScript weenie pop up the abstract if you hover over the reference. That would sure be useful! The real answer, of course, is that that was what was in the file we sucked down from the Internet and we're sure as heck not going to edit it, lest we break the XML.

Bibliography Errors
So, what happens if you screw up stuffing in some reference, which, since there are three places to do this, happens depressingly often. Let's see what happens if we screw up one of these.

First, let's delete the reference from the body of the document. This produces the following result:

xml2rfc: warning: no <xref> in <rfc> targets
<reference anchor='RFC2119'> around input line 10

Now, this isn't so bad. Once you translate the xmlese, it says that there's a reference anchor (i.e., something you can reference) for RFC 2119 that isn't targeted by an xref (i.e., a reference in the text.) So, this is a superfluous bibliography entry. Also, the good news is that in this case it will still make the document.

Now, let's put that back and try removing the

&rfc2119;
marker at the end. That produces this error:
xml2rfc: error: can't read "xref(RFC2119)": no such element in array around input line 35

Uh... yeah.

So, what this means, literally, is that there's some array (xref?) doesn't contain the element "RFC2119". If you think like a computer programmer as opposed to someone who just wants to produce documents, you might guess that you're reference to RFC2119 doesn't point anywhere. But how do I populate that array. Well, if you go back to the example, you can probably figure out how to fix this, which is good, because you have to fix it if you want the document to build past the point of the first undefined reference!

Finally we come to the piece de resistance: what happens if you don't put in the entity declaration at the top? You get this:

xml2rfc: error: not expecting pcdata in <references> element around input line 4
1 in "internally-preprocessed XML"

Syntax:
    41:<references title="Normative References">
    40:<back>
    8:<rfc category="std" ipr="full3978" docName="sample.txt">

"Not expecting pcdata"? What the fuck does that mean?

Luckily, you have me to translate for you. What this means is that the string &rfc2119; in the references section is an entity reference, but because you haven't defined the entity, the parser treats it as character data (PCDATA), which isn't permitted at this location in the XML document by the DTD. Hence, "not expecting pcdata". Useful, right?

As if that weren't bad enough, even once you've decoded this error message it doesn't tell you which entity you've forgotten to define. Sure, there's a line number, 41, and here's line 41:

<references title='Normative References'>&rfc2119;</references>
So far so good, but unfortunately the line number here is that of the <references> element, not of the offending missing entity. Put as many valid references in there as you want and you still get the same line number. In order to figure out the offending entity, you either need to match up the front and the back of the documents or progressively cut references out of the back till the error goes away.1

The basic reason you're getting this error instead of something useful like "Go include a <!ENTITY rfc2119 ... production, at the top of the file, you dummy" is that this part of the references system is done purely using XML mechanisms, so you get an XML failure before some better error handling mechanism comes into play. This isn't the only time xml2rfc does this to you either, it's just the most offensive.

And that, children, is how the Internet standards sausage gets made. Outstanding!

1. Apparently you can use other tools to diagnose this too, but xml2rfc won't help you out.

 

November 12, 2007

Steve Burnett is giving an intro to crypto talk in which he explains that "cryptography is about turning sensitive data into gibberish in such a way that you can get the sensitive data back from the gibberish".

My observation: "This differs from standardization, where you can't get the sensitive data back from the gibberish."

 

November 5, 2007

I'll be speaking tomorrow at the Stanford Security Seminar:
Some Results From the California Top To Bottom Review

Eric Rescorla

In Spring of 2007, the California Secretary of State convened a team of security researchers to review the electronic voting systems certified for use in California. We were provided with the source code for the systems as well as with access to the hardware. Serious and exploitable vulnerabilities were found in all the systems analyzed: Diebold, Hart, and Sequoia. We'll be discussing the effort as a whole, providing an overview of the issues that all the teams found, and then discussing in detail the system we studied, Hart InterCivic.

Joint work with Srinivas Inguva, Hovav Shacham, and Dan Wallach

If you want to listen, heckle, whatever, it's at 4:30

 

November 3, 2007

Ezra Klein points out that while the US death rate from prostate cancer is more or less the same as in other developed nations, the survival rate is a lot higher because the US screening program is so effective.

Figure from: Cancer Research UK

A natural question to ask at this point is: what's the point of a massive screening program if it doesn't improve the death rate? There's more to the issue here than the cost of the screening, since you need to followup with other tests, eventually culminating in a biopsy, and then treatment isn't fun. And of course it's probably kind of stressful to find out you have prostate cancer, even if it's not eventually going to kill you.

 

November 2, 2007

Truepath objects to complaints about the way that women's bodies are portrayed in media:
So I've long been disgusted by the socially approval of complaints about models being too skinny and demands that 'real' women, i.e., less skinny women, be depicted in the media. I've already skewered most of the arguments elsewhere but the long and short of it is that the people who complain about skinny models aren't demanding we show more ugly people on TV. Rather they are just complaining about which features are considered beautiful.

Sure, often these views are voiced as mostly meaningless gripes the same way men might gripe that it should be illegal for women to prefer the guy with the fancier car, full head of hair etc. So long as these complaints are taken no more seriously than this they are a harmless way to express frustration and worry about one's sexual desirability. However, some speakers take these complaints quite seriously and that amounts to an (unconsciously) selfish ploy to get ahead by denigrating the competition. After all some people will always be more beautiful than others so at best they are demanding we change the standards to put themselves closer to the top. In men we recognize the analogous unpleasant behavior (dismissing every guy who is popular with the ladies as an asshole or sissy) isn't praiseworthy and we should do the same in women.

This is simple human psychology. We all (men and women) resent those we fear are more attractive/more successful than us and we look for ways to bring them down so we don't feel so bad about ourselves. It would thus be unfair to assign more than a little blame to the men and women who look for excuses to dismiss their potential competitors. They are just groping for ways to feel better about themselves. The true culprit here is society which doesn't call out this behavior for what it is.

I agree with some of the later argument about anorexia versus obesity, but I don't find this argument very convincing. It's certainly likely true that many who complain about extremely thin body images would like society to accept a body image that makes them feel better about themselves, but that doesn't mean that that wouldn't be a net win as well, even if we're just looking at self-image.

Let's start with an unreasonable model and assume that we can characterize body size by a single metric M corresponding to body mass index. To simplify things, we'll say that the smallest possible M value is 0 and the largest is 1. Further, we'll assume there's some ideal M value MI. If we assume that individual A has a size MA, then we can write their happiness as F(MA, MI). We could posit a large number of different forms for F, but in the spirit of oversimplification, let's say that unhappiness is always positive and is linear in the distance between your size and the ideal size:
Unhappiness = | MA - MI |
So, is there an optimal value of MI, i.e., one that maximizes average happiness? It turns out that the answer is probably yes. Let's assume (again unrealistically) that people's sizes are uniformly distributed between 0 and 1. As an example, let's assume that MI is 0. Remember that the average size is .5, so the average happiness is also .5. Because the distribution is uniform, we can extend this for any value of MI. We simply partition the space around MI and then note that the average size (and hence happiness) on either side of the partition is equal to halfway between the partition and the end. And since the fraction of people on either side is also proportional to the point of partition...
Avg-Unhappiness = 1/2((MI)2 + (1-MI)2)
or Avg-Unhappiness = 1/2(1 - 2MI + 2MI2)

It's not hard to convince yourself that the minimum of this function is at MI = .5. Moreover, unhappiness gets worse the further away MI gets from 0.5, reaching a maximum at 0 and 1.

Now, obviously, this model is unrealistic in a number of ways. For instance:

  • There's no single metric of body size that's useful.
  • Body sizes aren't anything like uniformly distributed (it's more like a bell curve, which actually would make the pull towards the center more powerful).
  • Body image happiness isn't a linear function of distance from some ideal body image.

The last objection is probably the most serious. In fact, it's not clear it's any kind of distance function. You could imagine instead that it's a function of how many people are between you and the top. However, I doubt that's completely true. For most practical purposes, being merely more attractive than almost everyone you meet is almost as good as being the most attractive person in the world. Most women are never going to meet Brad Pitt, so if I just look more attractive than nearly every man they ever meet (which is regrettably not true), that's nearly as good as being as attractive as Brad as far as getting dates is concerned. And yet, very attractive people—even those more attractive than almost everyone they know—seem to be reasonably likely to be unhappy with their bodies as well, at least in part because they're judging themselves against people with whom they're not really competing directly.

This brings me to what I think is the more important point: it's not clear that the current media-portrayed ideal body image is within the possible range at all. I've never seen any supermodel in person, so I'm operating pretty much on the basis of photography and video, which (1) are very heavily made up (2) are posed and cherrypicked to be those people at their most attractive and (3) at least in the case of still photography, are extremely heavily photoshopped. In other words, it's quite possible that they appear to be at (say) -.2, which isn't actually achievable. And since the further away from the center of gravity the ideal image gets the more unhappy people get, having the ideal image outside the possible range creates unnecessary unhappiness. Even if we moved it to just the outer limit of the possible range, i.e., MI=0 that wouldn't upset the orderings but would make people happier because they would feel closer to the perceived ideal (this is actually Pareto dominant).

As I said at the beginning, this is a ridiculously oversimplified model. I don't know if any of these properties would hold up in a more realistic model, but it certainly seems possible they would, and if so, then it's possible we would in fact benefit—at least in terms of happiness with one's appearance—from an ideal body image more closer to the population norm, in which case wanting to change the ideal body image is potentially more than just a matter of rearranging the pecking order.