EKR: February 2009 Archives


February 28, 2009

This East Bay Express article claims that Yelp manipulates rankings in order to extract advertising revenue. For instance:
A San Francisco wedding photographer relayed a similar story. About two years ago, a Yelp sales rep contacted her to advertise. The photographer -- we'll call her "Mary" -- declined the offer. But the sales rep was pushy; Mary said she received about three phone calls and as many as ten e-mails per week asking her to advertise. Still, she declined. "All of a sudden my reviews started disappearing," she said. "I called them up and said, 'I'm a little curious why my reviews are disappearing.' They said, 'Well, people stop reviewing, we take them down.' ... I talked to the clients -- they're still actively reviewing."

"Ellen," who only agreed to be interviewed if not identified by name, owns an Oakland business with more than twenty Yelp reviews, and averages a 4.5-star rating. She says she began to receive solicitations to advertise soon after her business began receiving positive customer reviews. But she declined. "The prices were cost-prohibitive," she recalled telling the sales rep. "I can't pay $300 a month when I pay $90 for Google AdWords. After that, reviews started to disappear."

When Ellen questioned her sales rep as to why some reviews had disappeared, the rep told her reviews can be taken down based on the company's algorithm. Reviewers must follow certain guidelines to post a legitimate review, the rep replied. "They had to have pictures, friends, be part of the community," Ellen recalled the rep telling her. But Ellen says the reviews that were removed fit the profile of acceptable reviews. Ellen turned down the offer again, and more reviews disappeared. She says she's now down to 50 percent of her original reviews. "Just today I got three more e-mails from Yelp. They're aggressive. ... But it's blackmail."

Yelp denies this:

"There is irrefutable evidence that we do not do that," Stoppelman told CNET News on Thursday when asked whether the placement of some reviews is determined by advertising deals. "It's absolutely ridiculous that somebody would say we are going to write a review and call a business (to sell advertising). That's not how you build a sustainable business...Trust and integrity are key to staying in business."

The problem, according to Stoppelman, lies in the company's secret sauce for filtering out reviews.

Basically, merchants are at the mercy of a computer algorithm just like Web sites are at the mercy of what is known as the "Google Dance"--the monthly update of the Google search engine's index. One tweak of the Google index can potentially make or break a business.


Asked to explain why Estelle's negative reviews of the moving company were repeatedly removed, Ichinose said she could not go into specifics or risk revealing information that people could use to game the system.

Really, there are two issues here that you need to think of separately:

  • Creation and removal of reviews.
  • Ordering of reviews on the site.

Creation and removal of reviews are a real problem for any reputation system like Yelp. The basic problem is that it's trivial for people to generate fake reviews to benefit or damage a given merchant. The problem then becomes how to exclude (or at least downrate) such fake reviews. There are a bunch of possible techniques here, for instance: weight by how many reviews the reviewer has done/how long they've been on the system; meta-reviews where you ask people to rate the usefulness quality of other reviews; forensics to try to detect reviews which look like they've been all been generated by the same party; requiring real user authentication, etc.) but ultimately, if you're going to allow quasi-anonymous reviews, as Yelp does, then there's only a limited amount you can do, and it's likely to involve some heuristics and human judgement. In that case, it's not completely nuts for Yelp to want to keep their procedures secret to make gaming the system more difficult.

Of course, the flip side of such mechanisms is that they leave the system operator very open to charges of gaming the system by removing positive reviews as a form of extortion; reviews just disappear and you end up saying "we think this is fake", but you generally can't prove it. Of course, you can mount the opposite type of extortion ("pay up or I'll publish a bad review") without collusion from the site operator, but it helps if the site operator colludes since that makes it harder for the victim to get your negative reviews removed.

This brings us to the topic of reordering. One way to deal with concerns about unfair removal is never to remove posts but simply to attempt to prioritize them by whatever estimates of veracity you're using. This avoids making sharp distinctions between real and (suspected) fake posts, you just push the (suspected) fake posts down towards the end of the reviews for a given merchant, but now justifying this exact order gets even harder. One natural choice is to use deterministic orderings (most recent, highest first, etc.). That's pretty clearly undesirable for a system like Google where you have to sort an enormous number of candidate choices, but actually it seems pretty suitable for a review site like Yelp where I doubt their ordering algorithms actually add that much value over simple orderings like these. Obviously, this isn't a perfect answer, since it doesn't really let you discriminate against fake reviews (though you can make posting fake reviews harder by, for instance, prioritizing frequent reviewers), but on the other hand if people don't trust your site's objectivity that diminishes the value of the site as well.


February 27, 2009

Read this article:
Eight different blood markers, including hemoglobin, are examined, said Robin Parisotto, a researcher from Australia. He is one of the nine scientists on an independent panel that reviews the abnormal blood profiles for the International Cycling Union, which is known as the U.C.I.

The markers are put into formulas and models that determine the statistical probabilities that an athlete is doping. Mr. Parisotto said the goal was to reach a 99.9 percent probability.

"The beauty with the blood passport is that you don't need to know each and every drug that is out there because you see the indication that something is being used," said Mr. Parisotto, who was the principal researcher in the creation of the first test for EPO used at the Olympic Games.

Now read this and ask yourself how well these tests were validated. The WADA writeup isn't very informative. Here's an overview of the research and here's a poster.

In this work, we estimated and integrated into a Bayesian network different components of variance of blood doping markers (hemoglobin, OFF-score, ABPS, tHb-mass) and steroid doping makers (T/E). The created network also included models of heterogeneous factors such as the influence of altitude on blood markers on the basis of a model proposed by the WHO. The Bayesian network has been validated and applied to more than 20,000 blood or steroid profiles. A software application, available upon demand, is capable of storing and interpreting an Athlete's Biological Passport.

These documents are pretty incomplete and it's a bit hard to figure out exactly how thorough the testing is. Intuitively, it seems like you'd need a pretty large baseline of samples to get a sufficiently high level of confidence. I wonder if 20,000 is the number of samples, athletes, or what? If it's 50 samples from 400 athletes, that's pretty different from 50 samples from 20,000 athletes.


February 26, 2009

OK, so everyone knows about Lolcats, but have you seen Rolcats:

The caption on this is "Cease your protests, the deal is done! You are to make a fine wife for uncouth American businessman!"

I don't speak Russian so I don't know for sure what the story is. My sources tell me that these are genuine Russian lolcats, but that the translations are totally bogus and that the sense of humor is generally different.


February 25, 2009

I've got some code that needs to convert an IP address into a string. This is one of those cases where there's a twisty maze of APIs, all slightly different. The traditional API here is:

    char *
    inet_ntoa(struct in_addr in);

inet_ntoa() has two deficiencies, one important and one trivial: it doesn't support IPv6 and it returns a pointer to a statically allocated buffer, so it's not thread safe (I'll let you figure out which is which). Luckily, there's another API: addr2ascii():

    char *
    addr2ascii(int af, const void *addrp, int len, char *buf);

If you pass buf=0, addr2ascii() will return a pointer to a static buffer like inet_ntoa(). However, if you pass it an allocated buffer it will return the result in buf. Unfortunately, if you actually try to use addr2ascii() in threaded code you will quickly discover something unpleasant, at least on FreeBSD: you occasionally get the result "[inet_ntoa error]" or some fraction thereof. The answer is hidden in the EXAMPLES section of the man page:

In actuality, this cannot be done because addr2ascii() and ascii2addr() are implemented in terms of the inet(3) functions, rather than the other way around.

More specifically, on FreeBSD, it looks like this:

    case AF_INET:
        if (len != sizeof(struct in_addr)) {
	    errno = ENAMETOOLONG;
            return 0;
        strcpy(buf, inet_ntoa(*(const struct in_addr *)addrp));

In other words, even though addr2ascii() doesn't explicitly use a static buffer, since it depends on inet_ntoa() it's still not thread safe. In order to get thread safety, you need to use yet another API:

    const char *
    inet_ntop(int af, const void *restrict src, char *restrict dst,
        socklen_t size);


UPDATE: Clarified that this is a problem on FreeBSD. I don't know if it's an issue on all other platforms. Linux, for instance, doesn't have addr2ascii()
UPDATE2: Trivial vs. important.


February 24, 2009

I've been thinking lately about the security impact of using AMRD-style tables for audit unit selection. The concern here is that an attacker might be able to pre-analyze the table and gain an advantage. After some thought and burning a bunch of computrons, I think the answer is provisionally "yes":

On the Security of Election Audits with Low Entropy Randomness
Eric Rescorla

Secure election audits require some method of randomly selecting the units to be audited. Because physical methods such as dice rolling or lottery-style ping pong ball selection are inefficient when a large number of audit units must be selected, some authors have proposed the use of randomness tables or random number generators which can be seeded by a small amount of randomness generated by physical methods. We analyze the security of these methods when the amount of input entropy is low and the attacker can choose the audit units to attack and find that audits do not necessarily provide the level of confidence implied by the standard statistics. This effect is most pronounced for randomness tables, where significantly more units must be audited in order to achieve the same level of security that would be expected if the audit units were selected by a truly random process.

PDF version here.

Of course, it's still possible I'm totally wrong about all this. Alternately, it's possible everyone but me already knows. Hopefully not, though.


February 23, 2009

Lately I've had several people contact me to complain about bogus certificates with their email servers. Why are they contacting me? Well, the certificates are labelled RTFM, Inc.:
Version: 3 (0x2) 
Serial Number: 0 (0x0) 
Signature Algorithm: md5WithRSAEncryption 
Issuer: C=US, O=RTFM, Inc., OU=Widgets Division, CN=Test CA20010517 
    Not Before: May 17 16:01:14 2001 GMT 
    Not After : Dec 25 16:01:14 2006 GMT 
Subject: C=US, O=RTFM, Inc., OU=Widgets Division, CN=Test

This is happening for customers of Comcast, Charter, and Cox (at least).

So, these actually are my certificates, distributed with an article I wrote a few years back about how to program to OpenSSL, but I'm certainly not intercepting people's email. Obviously, this could be an attack, but you'd think an attacker competent enough to capture connections to an ISP's mail servers would manage to get a certificate that (1) isn't expired and (2) doesn't have localhost in the name.

That said, it's hard to see how this could be a simple misconfiguration problem. My guess is that some server is shipping with these certificates as a default and the ISPs are neglecting to change them after they install the software. Pretty amazing it's this widespread, though.

Acknowledgement: Thanks to Danny McPherson for helping me make contact with the ISPs. Anyone with more information please contact me at ekr@rtfm.com.


February 22, 2009

I generally hate On the Media, and today's piece on the DTV Transition didn't disappoint:
Brooke Gladstone: Could you tell me again why we're doing this?
Kim Hart [WaPo]: Sure. Well, this was all part of a plan that was designed to reclaim the spectrum that the over the air broadcasters and network broadcasters had been using for over half a century. The reason they wanted to do that is because they wanted to give them back to public safety organizations so they could use them for their first responders communication networks as well as make some money for the government and sell them at an auction to wireless companies like Verizon and AT&T.
BG: Now what we're talking about, obviously, is switching from over the air broadcast, once accessible to anyone who has a bent coathanger and a TV to digital reception, which most people get through cable or satellite receivers. How many people still rely on the rabbit ears?

KH: Well anywhere between 10 million and 20 million households, especially in rural areas. Internet doesn't reach all the way out to where they live and in some markets this is their main gateway to information, local news, and emergency.

To listen to Gladstone, you'd think that DTV was going to mean that you couldn't get TV over the air, but of course this is completely untrue. To recap, in order to transmit television signals over the air you need to encode them somehow. Digital and analog are two different methods of encoding the information—and of course there's more than one way to do each of them, with the US and Europe using different standards. Digital transmission has some advantages, including more efficient use of the channel bandwidth and more flexibility, which is why we're cutting over to it. Unsurprisingly, if you have an old analog set, you won't be able to receive digital signals. This would have been just as true if, for instance, the US had decided to switch from NTSC (the American standard) to PAL (the European one). Actually, it's sort of a minor miracle that the black and white to color transition was performed in a backward compatible way, due to some very clever engineering. [*].

So, while it's true that people generally get digital signals through cable or satellite that's because the cable and satellite providers have switched over already (and why? because it's better), there's absolutely no technical reason why it can't be used for over the air transmission. It's just a matter of having a compatible receiver. In fact, if you'd just stop being cheap and fork over for a new TV with such a receiver, it would work for you already. The whole point of this elaborate converter distribution program is to let you receive digital signals without forking over for a new TV. So, it's hardly like all those poor people in rural areas are suddenly going to be cut off from all communications in some post-video-apocalyptic nightmare.

While I'm on the topic, I don't really think it's accurate to imply that the point of the switchover was to reclaim the spectrum for public safety applications. Obviously, that's a bonus, but it wasn't my impression that that was the primary driver.


February 21, 2009

A couple weeks ago I was on my way home and then this came on the radio and I about drove my car off the road. Here's BBC's summary:

In two editions of Heart and Soul, the BBC World Service explores the controversy in the United States between creation and evolution and investigates a spectrum of beliefs.

To gain insights into the minds of the personalities involved, the BBC gave microphones to two of the key players from very different viewpoints and asked them for their reactions through a series of encounters and interviews."

In this second programme we hear from Dr Henry Morris III. He is Executive vice President of the Institute for Creation Research, founded by his father. He believes a literal interpretation of the biblical book of Genesis, suggesting that the Earth, life and humans were created over six days less than 10,000 years ago.

Not to go all PZ Myers on you here, but this is nuts. As far as I can tell, Morris indeed believes that the earth is less than 10,000 years old, but, put simply, he's wrong. Yes, it's true that a bunch of other people agree with him, but they're wrong too. Yes, yes, it's of course possible that the entire universe was created with fake evidence of age, but there's no evidence for this whatsoever absent Morris's preexisting religious commitments. We might as well consider the possibility that the world sits on the back of an invisible turtle. So, while I don't dispute Morris's right to believe what he believes, it would be great if the media would stop acting like it's in any sense epistemically valid.


February 20, 2009

The Ninth Circuit has enjoined California's law banning the sale of video games to minors. [ruling] . Maybe I'm just cynical, but arguably this is a good outcome for the authors of the law. After all, if it's upheld, it becomes a political non-issue, since there's not a lot of constituency for allowing children to buy violent video games. On the other hand, if it's struck down they get to run against the activist courts, pass revised versions of the law which will get struck down, etc. (Cf. the communications decency act, flag burning, etc.)

February 19, 2009

In Slate this week we're treated to William Saletan's usual handwringing about how biotech, though of course inevitable, inevitably cheapens people's attitudes towards reproduction. This time the issue is preimplantation genetic diagnosis (PGD), where you you do in vitro fertilization and biopsy the embryos to decide which, if any to implant. Obviously, this is something you might want to if you know you're a carrier for something nasty (e.g., Tay-Sachs) and want to avoid passing it on. Saletan's objection, predictably, is that people will use it for purposes he doesn't approve of (sex selection, height selection, etc.):
Two months ago, the Fertility Institutes, an assisted reproduction company headquartered in Los Angeles, began advertising the "pending availability" of genetic tests that would offer "a preselected choice of gender, eye color, hair color and complexion" in artificially conceived children. On Thursday, Gautam Naik of the Wall Street Journal reported that "half a dozen" potential clients had contacted the company to request such tests. As of today, the tests still aren't for sale. But several trends are converging to make aesthetic trait selection an impending business.


See how smooth the transition can be? You're already screening for diseases. Why not add one more factor while you're at it? So now you'll know which embryos are male and which are female, just in case two of them turn out to be healthy and you're lucky enough to be able to choose which one to put in the womb. And if you're checking sex, why not throw in eye color and complexion? You don't have to do anything with the information yet. Just run the test and find out what your options are.


This is how revolutions happen: Technology matures, trends converge, and cultural changes pave the way. By the time Steinberg opens his trait-selection business and does for that practice what he's already doing for sex selection, it'll be too late to stop him. In fact, before you know it, we'll be used to it.

Unfortunately (or fortunately, depending on your perspective), the statistics pretty severely limit the use of for trait selection. The problem is that this is a screening procedure: you don't get to make the embryos you want, you just make a bunch of random embryos and then screen out the ones you don't like. And while any given trait may be reasonably probable, they're not that jointly probable.

In case you've forgotten your high school genetics, there are a bunch of common cases with different probabilities of the embryo expressing the trait:

  • Autosomal recessive traits (ones where you need to have two copies of the allele to show the trait) [blue eyes, Tay-Sachs] but both parents are heterozygous (has one copy). In this case, 1/4 of the candidate embryos will express the trait.
  • Autosomal recessive traits where one parent is homozygous and one is heterozygous. In this case, 1/2 of the candidate embryos will express the trait.
  • Autosomal dominant traits [brown eyes, Huntington's] where both parents are heterozygous, in which case 3/4 of the embryos will express the trait.
  • Autosomal dominant traits where one parent is heterozygous, in which case 1/2 of the embryos will express the trait.
  • Sex-linked traits (e.g., hemophilia), where you express the trait if you have two X chromosomes with the allele (i.e., a homozygous female) or be XY with two the X chromosome having the allele. The statistics vary here based on the father's status and whether the mother is homozygous.

Anyway, to a first order you can think of the chances of getting any particular set of alleles as statistically independent. So, if for instance the father is heterozygous for blue eyes and a Tay Sachs carrier, then there's 1/4 chance each of having: blue/Tay-Sachs, brown/Tay-Sachs, blue/normal, and brown/normal. This creates a real problem for a screening procedure: if you want to select for five traits, each of which have a 1/2 chance of being expressed in the embryo, then any individual embryo has only a 2-5 (1/32) chance of exhibiting the trait. A single egg harvesting yields somewhere in the order of 10-30 eggs [*], so even if all of these turn into good quality embryos, you have somewhere between a 30-60% chance of getting even one embryo with the mix of traits you want. Even if you're willing to go with only three traits, your chances of getting one or more matching embryos is only about 70%. And of course if you're doing IVF and PGD (Did I mention it's horrifically expensive?) so you can screen out some trait, you're already down to somewhere between 1/4-3/4 of embryos before you start screening for height, eye color, etc.


February 18, 2009

When I initially read the NYT article on Boxee, a piece of software that aggregates content, including content from video streaming sites like Hulu, my first reaction was "how long till this gets shut down". Boxee's whole reason for being is to provide a unified interface for all your content, but that inherently disintermediates Hulu and their content providers, which want you to go through their interface, see their banner ads, ads for other shows, etc. So, it's not too surprising to see that Hulu is cutting off Boxee users:
Later this week, Hulu's content will no longer be available through Boxee. While we never had a formal relationship with Boxee, we are under no illusions about the likely Boxee user response from this move. This has weighed heavily on the Hulu team, and we know it will weigh even more so on Boxee users.

Our content providers requested that we turn off access to our content via the Boxee product, and we are respecting their wishes. While we stubbornly believe in this brave new world of media convergence -- bumps and all -- we are also steadfast in our belief that the best way to achieve our ambitious, never-ending mission of making media easier for users is to work hand in hand with content owners. Without their content, none of what Hulu does would be possible, including providing you content via Hulu.com and our many distribution partner websites.

It's unsurprising that the content providers want some sort of return for making their content available on demand through a seamless interface. Rather than something like Boxee which is sort of a hack on the existing web on demand services, I would expect one of the content providers to do a deal with Netflix, whose subscription service would let them compensate the content providers on an ongoing basis.


February 17, 2009

Joe Hall posts about TrapCall, a system for circumventing caller-id blocking (it also does call recording and voicemail transcription). I thought it might be worth explaining what's going on for those who aren't too familiar with the innards of telephony.

The important thing to know is that telephony systems are nearly all digital now (the specific protocol is called signaling system 7 (SS7). The only part of the system that's analog is the part between the handset in your house and the nearest central switch, where the A-D and D-A conversion happens—and not even then if we're talking cellular telephony,. The way that caller-id works on analog phones is that the originating switch sends the caller's phone number (and potentially a name) in the SS7 setup message to the terminating switch, and the receiving switch encodes that information in the silent inter-ring interval where it's decoded by the callee's phone and displayed to the callee. The situation is basically similar with cell phones except that the connection between the cell phone and the switch isn't analog.

One of the basic assumptions of SS7 is that any device which gets to connect to the telephony network and speak SS7 is trusted. In particular:

  • The originating switch can put any information in the setup message it wants, including advertising random numbers that aren't actually connected to the switch.
  • Caller-id blocking (when the caller doesn't want their id propagated), is implemented by having a bit in the setup message that tells the receiving switch not to encode the caller-id information onto the callee's line.

The first point implies that you can't really trust anything you see on caller-id. If you can get a digital connection to the phone network, such as in a call center, a PBX or a home ISDN line, you can generally put any information you want in your messages [Technical note: this protocol is Q.931, not SS7, but you can think of it as SS7 for now], including false caller ID information. Since it's not at all hard to get this kind of access, caller-id from the telephone network is basically unreliable. The second point implies that caller-id blocking isn't that trustworthy either. If you have a digital connection to the network, there is a reasonable chance that you will get the caller-id information for any caller even if they have turned on blocking—you may see the bit that tells your phone not to show you, but nothing makes it obey that. In principle the receiving switch could suppress this information in the Q.931 but my understanding is that generally switches don't.

As far as I can tell TrapCall uses some combination of these features. You arrange to forward blocked calls to TrapCall, which acts as if it were your voicemail provider (I imagine that if someone really leaves a voicemail they just proxy it to your real voicemail box) which then reads the caller-id information and calls you back with spoofed caller-id matching the caller's (blocked) caller-id.

Acknowledgment: Jon Peterson filled in some of the details here. All mistakes are mine, etc...]


February 16, 2009

I've got a reasonably large computation job—bigger than I can conveniently do on my own hardware—I need to do, and so naturally I thought EC2. For those of you who don't know, the basic idea behind EC2 is that you have Amazon Machine Images (AMIs), which represent the state of a machine which is off (e.g., the disk drive state). You can activate as many instances as you want, booting off the same AMI, which gives you a bunch of nearly identical machines (except for the IP address, etc.) which you can then log into and use for whatever you want. All the management is via this Web services interface which you drive with client-side Java apps. So, for instance ec2-run-instances XXX brings up a single instance of image XXX.

After about 5 hours screwing around with it, I've figured out how to do what I want, but I have to say, they don't make it super-convenient.

  • Nothing has a mnemonic name. So, for instance, all the images are names ami-XXXXXXXX where the Xs are hex digits. Running instances are similar. Now, I can totally understand why it's convenient to use numeric identifiers, but since they make you download their toolchain, you'd think they could at least let you assign symbolic names of your choosing to the objects.
  • The tools are orthogonal but uh, fine grained. So, to bring up a new instance and log into it, you do (1) start the instance with ec2-run-instances (2) run ec2-get-console-output to see if it's booted and to get the SSH public key [repeat as necessary] (3) run ec2-describe-instances to get the domain name for the machine so you can log in (4) ssh in.
  • The default images are fairly minimal: no Emacs, no compiler, no debugger, etc. Now, they have yum, so you can install this stuff easily, but this brings us to...
  • The images don't have any persistent state. So, if you install Emacs, and shutdown the instance, it's back to the initial state when you start it again. And since you pay by the operating hour even if the machine is idle, you don't want to leave the machine running all the time. Amazon does provide a storage service (actually, two, S3 and EBS), but you still need to do some work on a machine-by-machine basis to make it connect automatically.
  • Amazon does let you take a running machine and make a new image out of it, but the process is pretty slow, so what ends up happening is you get the machine in the state you think you want it, pickle the image, and then next time you boot it you realize you forgot something. I repeated this a few times before I got an image I liked.

This probably all works OK as a replacement for your own data center where you would need to absorb all the installation cost anyway, but if what you need is a temporary pile of computrons for a single compute job, EC2 isn't that great a match. It'll get the job done but the overhead is awful high.


February 15, 2009

Usually I do my long runs at Rancho San Antonio (PG&E Trail or Black Mountain) but the past few weeks it's been incredibly crowded to the point that I had to park on the road leading up to the park and then run in. I thought it might be worth changing things up so this weekend I tried a different route, the Alambique Trail in Wunderlich County Park. At the entrance to the park is a free dirt parking lot with a portajohn. From there the trail goes from about 400 feet 5.5 miles up to Skyline (this is with the Skyline trail at the top rather than the Alambique trail all the way). It's basically one continuous climb to the top and then you turn around and come back down (there are also extensions along the Skyline Ridge Trail to Huddart park.) The trail is quite well marked--there was one place where I got off course because I saw a path that was well trodden but unmarked. Should have stuck to the signs.

Compared to Rancho, the footing is generally worse, with a lot more foliage, rocks and roots, as well as a fair amount of horse manure you have to dodge. The trail is a pretty good workout, but I don't really like the up then down format. The climb to the top is unrelenting and then you really pound your quads and your knees coming down that 5 mile descent. The PG&E Trail, for instance has about the same amount of climbing and even some long up or down stretches, but a fair amount of it is rolling. Today it was raining (pouring, really) and while footing was good, thanks to my Inov-8 Roclite 295s, a significant fraction of the trail was puddles, streams, and mud, with the results you see above. This actually understates the situation a bit, since I'm wearing green socks and so you can't see that my shoes and socks are covered in dirt, leaves, sticks, etc.


February 14, 2009

OK, this is incredibly cool: Shimano has what looks like a viable electronic shifting system. Most people don't think a lot about bicycle shifting, but this a topic of real interest to people who do a lot of riding: mechanical shifting kind of sucks and a working electronic shifting system would be quite nice.

As background, the way bicycle gearing works is this: on the front you have either two or three gears ("chainrings") directly connected to the pedals ("cranks"). On the back, you have between 5 (if your bike is ancient) and 10 (if your bike is new and expensive) gears. They're connected to the rear wheel by a ratchet so that when you pedal faster than the wheel, it drives the wheel but if you stop pedalling while you're riding the gears spin freely with respect to the wheel so you can coast (this is called a "freewheel" or "freehub" depending on how it's put together.). The gears are connected with a chain. Anyway, the amount of mechanical advantage you get is determined by the ratio between the front and back gears. If they're the same size then every turn of the pedals turns the wheel once: the bigger the front gear the harder it is to pedal (but the faster you go with each pedal stroke); the smaller the front gear, the easier it is to pedal but the slower you go. The opposite is true for the gears on the back.

Roughly speaking people choose the general range of the gearing with the front gear and the fine-grained gear selection with the back gears, which are closely spaced. So, for instance, if you want to climb a hill you'd choose the small gear in the front. On the other hand if you're on flats or the downhill you would choose the big ring. Incidentally, while it's natural to think of the gearing as being sequential, it's not. Let's denote a given gear configuration X-Y as being ring X on the front and Y on the back, with "harder" numbers bigger. So, 1-1 is the easiest gear and 2-10 is the hardest. However, 1-10 is almost always harder than 2-11 and there's actually quite a bit of overlap. This means that you can get a fair amount of gear flexibility without changing the front gear, which is good because front gear shifts are very clumsy compared to back gear shift.

Why Electronic Shifting Is Promising
Mechanical shifting has a number of drawbacks, mostly connected with the front derailler (the gizmo that does the shifting):

  • Because all the linkages are mechanical you can only really have one set of shifters. This is inconvenient for triathletes and time trialists who tend to use aerodynamic handlebars a lot of the time. You can put the levers so you can shift conveniently in the aerodynamic position or the upright position but not both.
  • The front derailler doesn't shift well under load, so if you suddenly find yourself on an uphill (which means you're putting a lot of load on the chain) it can be hard to downshift, which is inconvenient because this is exactly when you need to shift. In the worst case, the chain can come off and then you're really hosed.
  • Because the rear has a lot of gears stacked on top of each other, there's a lot of displacement of the chain even as far as the front. This means that you need to "trim" the front derailler to stop it from rubbing; a configuration at the front that works with the biggest gear in the back will cause rubbing with the smallest gear in the back and vice versa.

It's easy to see how electronic shifting can solve the first of these: you can run as many wires as you want so you can have shifters in any location you want. This has been obvious for quite some time and there have been a number of stabs at electronic shifting but they've never worked well. Early reviews suggest that Shimano's does.

Another advantage of Shimano's system is that the front derailler is self-adjusting: this means that you don't have to trim and that allegedly it shifts well under load. The downside is that it's $4k for now, but this is the kind of thing that comes down in price.

1 Technical note: gears are denominated in the number of teeth. For instance, a 20-speed bike might have two front chainrings with 53 and 39 teeth, and 10 back cogs with teeth ranging from 11 to 25 teeth. With our above notation: the ratio for 1-10 is 3.5 and the ratio for 2-1 is 2.12, so there's a very substantial amount of overlap.


February 13, 2009

NYT reports on Hughes Telematics' plans to provide networked access to various aspects of your vehicle's operations:
Hughes Telematics, which is behind the communications systems in Chrysler and Mercedes-Benz vehicles that are to make their debuts this summer, is headed in that direction. Its next-generation technology, expected to appear in 2010, would allow drivers to install software in their cars, just as iPhones let users download applications to their handsets.


Other applications proposed by Hughes include remotely starting a car, resetting its alarm or unlocking the doors with an iPhone. Unlike wireless key fobs, commands could be sent to the car over the Internet.

I hate to sound like the stereotypical computer security guy, but the risks here seem pretty obvious: it's one thing to have your car stereo Internet accessible, after all if you're driving your car stereo from your iPhone, you already have that. It's quite another to have your engine be remotely controllable, which is obviously necessary for a remote start. One has to wonder what other parts of the car's operational electronics are accessible from the same computer. It's bad enough that someone could potentially steal your car remotely, though key fob to car protocols are often pretty insecure anyway; you really don't want someone turning off your car remotely. You might think that this problem could be solved with adequate comsec measures and firewalls to prevent remote penetration of the car computer. That's a hard problem in and of itself, but as soon as you start adding communications-style apps you need to worry about remote malware infection.

Obviously, what you really want here is to have the operational electronics airgap isolated from anything that you can install new software on. Ordinarily I would expect the people designing this kind of system to do that (No, really, I've met some of them and they're cautious), but if you're going to have remote start, you need some kind of integration, so I wonder how this is expected to work.


February 12, 2009

The US Vaccine court has ruled in three cases that autistic children (or rather their parents) aren't entitled to compensation. From a technical persective, this is of course correct: there's just no evidence that vaccines cause autism except in exceptional cases. From a social perspective, I'm not sure it's such a great idea. As I understand it, the rationale for the Vaccine Injury Compensation Program is to provide stability in the vaccine system by providing a form of insurance for manufacturers. Since the parents of these children (and the thousands of other autistic children) don't show a lot of signs of giving up their beliefs about a vaccine-autism link, paying off these suits might be a cheap tradeoff to remove what's turning into a real (though imagined) disincentive for parents to vaccinate their children.

February 11, 2009

The NYT has a sort of odd article about the expected lifespan of LED lightbulbs:
When a manufacturer says that an LED lamp will last 25,000 or 50,000 hours, what the company actually means is that at that point, the light emanating from that product will be at 70 percent the level it was when new.

Why 70 percent? Turns out, it's fairly arbitrary. Lighting industry engineers believe that at that point, most people can sense that the brightness isn't what it was when the product was new. So they decided to make that the standard.


If nothing else in the lamp fails, like its electronics, the product will continue to work until it becomes really dim. But some engineers are proposing a way to get around even that.

Their idea is that once the LEDs start to emit less light, increase the power to each one to increase its brightness. Unfortunately, that will also diminish the life of the lamp.


Not only would contractors need to use thicker cables, but the utilities would need to create more power, partially negating the appeal of LED lighting in the first place.

I'm not saying that this won't work, but it seems there's a relatively obvious alternative: have the bulb stop emitting light entirely once it gets below some threshold (70% seems reasonable, I suppose). As far as the extra power goes, presumably the tradeoff here is straightforward: estimate how much energy is required to produce a new LED bulb if it's thrown out at the time it gets too dim and compare that to the additional energy that will be required to overdrive the LED once it starts to dim.


February 10, 2009

I'm hoping to pick the brains of readers here. I've gotten interested in clockwork and mechanical logic. Can anyone recommend references on the topic? Optimally, I'd be looking for something that was about how to build mechanical computing devices, but I'd settle for a good book on mechanical clocks and watches.

February 9, 2009

This is an interesting development. The California SoS posts a list of donors to various political campaigns, but it's pretty un-user friendly. Someone has done a mashup with Google Maps so you can see everyone in your area who donated to Proposition Eight. Obviously, this is trivially extensible to any arbitrary political issue; it's just that Prop 8 has generated a huge amount of heat in a relatively tech savvy community. I wonder how long it is before a site like this is up for every issue or, perhaps more interestingly, before you can profile every donation for everyone who lives near you. For all I know you can do it now.

February 8, 2009

Most climbing at climbing gyms is done on top-rope: this means that the climber is supported by a rope that runs through an anchor at the top of the wall. The person holding the other end of the rope (the belayer) stands at the bottom of the wall and takes in the slack in the rope as the climber ascends. I suppose it's theoretically possible to belay just by holding the rope in your hands, but it's prohibitively difficult: most people can't easily apply 150+ pounds (600+ Newtons) of force to a rope continuously, and if the climber is falling, it's much harder to stop them. Instead, you use a belay device: a gizmo that attaches to the belayer's harness and lets the belayer apply friction to the rope.

Most modern belay devices are of the "tube" variety, such as the ATC, shown below.

The way that an ATC works is that the climber's rope goes through both the ATC, which is just a metal tube, and a carabiner attached to the climber's harness. In order for the climber's rope to pay out, it has to feed through the carabiner and the ATC. To stop the rope from feeding, the belayer pulls on the free end of the rope with his brake hand, which pulls the rope tight against the edge of the ATC where friction stops it. Effectively, then, the climber is hanging from the belayer's harness (remember, the rope is going through an anchor above the climber), with the belay device coupling the rope to the carabiner.

The key point here is that an ATC is passive; if you ever let go of the free end of the rope, it will feed freely and if the climber is weighting the rope, they will fall, potentially to their death. This makes technique important. In particular, as the climber ascends you need to take out slack in the rope by pulling on the free end. However, eventually your arm will be fully extended and you need to move your brake hand up the rope. It's important to do this without letting go, because if the climber falls during that period there's nothing to stop them. The standard technique for this is to take your other hand and grab the rope below your brake hand and then slide it up, so there is always a hand arresting the rope. There are other techniques, but they are less safe.

You can also get active belay devices which autolock if tension is suddenly applied. The most popular of these is probably the Petzl Grigri. The Grigri has two main advantages for toproping: first, if you're not paying attention and the climber falls, the grigri will autolock and catch them [there is a lever you pull to unlock the device so the climber can descend.] Second, if the climber needs to hang for a while, you don't need to apply tension to the free end of the rope to keep it from feeding through as you would with an ATC. [Lead climbing, where the climber pulls the rope up with them and sets their own anchors, is more complicated. Here, I'm just talking about toproping. Grigris are popular in part because the autolock seems safer. Planet Granite, where I climb, has Grigris fixed to the line and won't even let you toprope with an ATC. PG also has another safety feature: the anchors at the top of the wall are large diameter (4-6") metal pipes with the rope wrapped around the pipe twice.. The effect is that when tension is applied to the rope it cinches around the pipe, creating a lot of friction and thus arresting the climber's fall (though not necessarily to the point where they couldn't get hurt) even if the belayer does nothing. [Technical note: the instructors at PG regularly tell students that the double wrap halves the climber's effective weight, but this misunderstands the basic physics. There's no movable pulley; it's purely a friction effect.]

I'm not sure that either of these are really good ideas for a climbing gym. While I suspect that Grigris really are safer for topropoing if that's all you use, they don't give you any feedback for bad technique because the climber won't fall even if you totally fail to lock off the device yourself. It's quite common to see people using the "pinch" technique where they pull the free end of the rope up to the end going towards the climber, pinch them both together with their free hand [which is usually pulling down on the rope to feed it through the aforementioned high friction anchors], and then move their brake hand up towards the belay device. This is OK with a Grigri, but because the ATC's friction depends on the rope being pulled against the edge, doesn't provide acceptable braking with an ATC. Yesterday, I heard an instructor at PG tell a belay class that if they lost control of the climber's descent (e.g., you're pulling the lever and having trouble braking with the brake hand) you can just let go and the Grigri will autolock. While this is true with a Grigri (though you're not supposed to rely on it), it would be disastrous if you were using an ATC. So, my concern here is that always using a Grigri encourages bad habits that could get you into real trouble if you ever used an ATC, either because you were lead belaying or were at a gym that had ATCs instead of Grigris. Perhaps what would be best would be if some small fraction of the ropes at PG were set up without belay devices and they encouraged you to use ATCs on those so you would learn how.

The pipe thing is even worse. Lots of gyms have anchors with much lower friction (e.g., carabiners at the top). If you're used to the slow descent provided by the pipe, you could drop your climber way too fast even if you were using a Grigri. Even if you don't ever climb at another gym, the variation in anchor friction is quite high with the anchor getting slower as the rope ages. I've seen anchors where the climber actually had to bounce to feed the rope through and descend and ones where there was barely more friction than a carabiner. So, this is a problem even if you only climb at the same gym. Again, I know why they do it: it seems safer, but it teaches bad habits which could get you, or rather the person at the other end of the rope, seriously hurt.


February 7, 2009

Jim Fleming no longer holds the record for the most irrelevant political spam I received. This afternoon I got a message encouraging me to vote Likud on February 10th and explaining how much Tzipi Livni sucks. Now, not being an Israeli citizen, I suspect it's illegal for me to vote in this election. Better yet, this message was in Hebrew, which I can't read, making it even less likely it would influence me; I had to have it translated.

February 6, 2009

Dan Savage addresses the difficult ethical issue of the mutual obligations of the laptop user and the coffee shop in which he works:
Don't want people to sit in your cafe with their laptops? There's a simple solution: don't have WiFi. But if you're going to have WiFi then for fuck's sake have fucking WiFi. And if your WiFi isn't working, if it's down and it's gonna be down all day, you might wanna mention that to people before they wait in line, buy a coffee, leave a tip, sit down, and pull out their computers. Because then each and every one of those computer users is going to walk up to the counter and ask if you have WiFi. It's an asshole move to look at each laptop computer user/customer in turn like they've just asked you if you have herpes. And if it really kills you to sneer out, "Yeah, we have WiFi, but it's down," then put a little sign on the door that says the WiFi's out. Then laptop users won't bother you with their questions, their presence, or their patronage.

UPDATE: And laptop users? Tip based on the amount of time you intend to spend in the cafe, not on the price your beverage; buy your refills; share tables; and always remember that you're not actually in your office.

I occasionally work in coffee shops, so this is a topic I've given some thought to. I think it's pretty clear that there's some implicit obligation for patrons to fork over some money occasionally and not just sit at a table (yes, yes, I realize that there's no contract requiring you to do so, but think about the equilibrium issues here: if nobody ever paid for their drinks you can bet that coffee shops would start forcing you to rent tables.) But this doesn't tell you how much to spend or how to allocate your payments between the coffee shop and the staff.

If the shop is pretty full, I think it's reasonably clear: you're depriving the shop of space that could be used by paying customers so you should be buying a bit more than the average customer. The same logic holds for the staff, since presumably those customers would tip. If the shop is mostly empty, though, the situation seems a little more complicated. You're not costing the shop any money and WiFi is basically free for the shop to offer (the router is cheap and the Internet service is a fixed cost.) That doesn't mean you don't need to fork over any money, since, as I said, there's an implicit obligation, but I have no idea what the right amount is. I usually buy a drink when I come in and then maybe one every hour or two. It's not clear how much to tip the staff either: their work scales with the number of drinks you order, so my instinct is whatever fraction of your food and drinks you usually would tip.

As far as the shop's obligation to you, the flip side of the implicit contract is that they will offer you Wi-Fi ("Wait", I hear you object, "why should you even think they have Wi-Fi, let alone rely on it?" That seems simple: some coffee shops advertise it and even in shops which don't many if not most of the customers are regulars and so know it's provided and often went to the shop explicitly to work.). Obviously, that doesn't mean it needs to work perfectly, but if they know it's hosed they should probably tell you before you've plonked down your money.


February 5, 2009

What's there to say about the whole idiotic Michael Phelps flap? The guy's 23. He smoked dope. Or not. What did you expect? Who cares? But then I read something like this:
U.S. swimming officials Thursday suspended Olympic hero Michael Phelps from competition for three months, the latest fallout from a photo that caught him puffing on a bong at a party.

USA Swimming, the sport's national governing body, also cut off its financial support to Phelps for the same three-month period, effective Thursday.

"This is not a situation where any anti-doping rule was violated, but we decided to send a strong message to Michael because he disappointed so many people, particularly the hundreds of thousands of USA Swimming member kids who look up to him as a role model and a hero," the federation said in a statement. "Michael has voluntarily accepted this reprimand and has committed to earn back our trust."

So, I've never been a USA Swimming kid, but I remember competing in high school sports and I don't think that I would have been disappointed to discover that some athlete I respected (for their physical skills, remember!) had smoked marijuana. It wasn't like my teammates weren't getting drunk at parties. This whole meme that kids need to be protected from the very notion that professional athletes aren't perfect has a pretty strong odor of "I'm shocked, shocked, to find that gambling is going on in this casino." Can people really have this little memory of what it was like to be kids themselves?

It's important to remember that from the perspective of the sport, smoking marijuana is really qualitively different from using steroids. Marijuana doesn't confer any kind of performance advantage so it doesn't undermine the sport [I'm not taking a position on whether steroids should be allowed or not. However, as long as they're banned, using them is cheating. Fair competition depends on rules, no matter how arbitrary.] The grounds for punishing athletes for using marijuana are (1) it's illegal so it "contravenes the spirit of the sport" and (2) it's bad for you. You might or might not think those are legitimate grounds for WADA to be doing anything, but certainly they're a lot less legitimate than those for regulating steroids or EPO. There's no real connection to the sport; WADA is just punishing athletes for behaviors they disapprove of.

One more observation: the selection of marijuana is fairly arbitrary. Remember that alcohol is illegal in some jurisdictions, but it's not a prohibited substance for athletes to use outside of competition.


February 4, 2009

I may have been a little quick on the trigger last night when I wrote about EFF's suit against YouTube [*]. Fred von Lohmann responds in the comments section:
Just to clarify a few things here.

First, YouTube has licenses from all the major performing rights organizations (ASCAP, BMI, SESAC), so the public performance is licensed, whether it's sung by a teenager or a professional. That means Warner Music must think some other right is being infringed. Reproduction? Derivative work? They don't tell you when they send a DMCA takedown notice or submit a fingerprint for automated Content ID matching.

Second, if you look at the four factors, the video is plainly noncommercial. It certainly doesn't displace sales of any professional versions of the song. And it also doesn't threaten any plausible "licensing" market, since I don't believe that music publishers are in the business of granting licenses to teenagers making noncommercial videos. I think those are the two most important of the factors here, and both favor the YouTuber.

And, finally, do we really want a copyright system that *discourages* people from engaging in this kind of creativity, especially when it doesn't hurt any existing commercial markets for the copyright owners?

I've met Fred and he seems like a pretty sharp guy, so I probably should have assumed that he had a reasonable point.

Anyway, a few notes here. If YouTube has licenses from the performing rights organizations, then (as I understand it), then yeah, this performance would be licensed. And since this is just a cover, then it's not at all clear what about this video Warner has a problem with.

With that said, while the performance itself is noncommercial, YouTube certainly makes money from the video from the advertisements they show on the page. As I understand it, the publishers go after bar owners who have live music, even if the musicians themselves aren't paid, so I'm not sure the situation is that different for YouTube's position vis-a-vis the publisher, at least ethically, which is a different matter from legally (though again, as Fred says, they have a license.) As for what kind of copyright system we want to have, I think it's pretty clear that we do have the kind of copyright system that discourages people from engaging in creativity.


February 3, 2009

EFF complains about YouTube's automated copyright enforcement system:
This is what it's come to. Teenagers singing "Winter Wonderland" being censored off YouTube.

Fair use has always been at risk on YouTube, thanks to abusive DMCA takedown notices sent by copyright owners (sometimes carelessly, sometimes not). But in the past several weeks, two things have made things much worse for those who want to sing a song, post an a capella tribute, or set machinima to music.

First, it appears that more and more copyright owners are using YouTube's automated copyright filtering system (known as the Content ID system), which tests all videos looking for a "match" with "fingerprints" provided by copyright owners.

As I've no doubt mentioned before, I'm no lawyer, but isn't it actually copyright infringement to publicly perform covers of songs written by other people? Now, as I understand it, you would only need permission from the song writer's representative [typically Harry Fox Agency], not the copyright holder on the recording, so maybe Time Warner is out of line here, but that isn't the same as saying it's legal.


February 2, 2009

This afternoon Congressman apprarently John Fleming (R-LA) decided he really wanted to update me on what he was doing for his constituents, so he sent me 25 copies of the same email:
I am honored to serve you as the new representative of the 4th Congressional District. My first priority is to stay in touch with everyone I represent, including understanding your interests and concerns and keeping you up to date on news and proposed legislation affecting you. As part of my effort to meet that goal, I will be periodically sending e-newsletters.

I invite you to contact me, anytime, with your opinions and ideas, as your feedback is invaluable to help me do the best job I can as your representative.

It has been a busy start to the 111th Congress and the economy remains my top priority. I hear from Northwest Louisiana residents every day who are asking for Congress to help stabilize the economy, but not to increase the burden for our families. I hear you and I agree. That is the reason I voted no on the Economic Stimulus Bill (H.R. 1) filed by the Speaker Pelosi led majority in the House.


Blah blah blah.

It's actually a little hard to tell if this is authentic: it looks to be a real newsletter, but I don't see any link off his main site at house.gov, and the site that's hosting the newsletter is different. Whois reveals that the domain is owned by this dude:

Name:		Gregory    Hildebrand
Address:	12121 Wilshire Blve
		Los Angeles, Ca  90025

Email Address:	gregory@politicalsystems.net
Phone Number:	(310)696-2250

But www.politicalsystems.net leads to a Coldfusion stack trace. So, this could be someone trying to make Fleming look bad, but per Hanlon's razor, I suspect that it's just super-incompetent political marketing.

From the NYT article on Obama's e-mail:
After all, Gov. Sarah Palin of Alaska found her e-mail account broken into and her messages posted online last year when she was running for vice president. Imagine a president's e-mail put on display for the whole world to see -- or perhaps just for the head of a hostile foreign intelligence service.

To minimize the risk, the government technology gurus have made it impossible to forward e-mail messages from the president or to send him attachments, people informed about the precautions say. His address is likely to be changed regularly as well. And the president's friends and staff members are being lectured about security.

So, it's trivial to stop people from sending him attachments. Your average email filtering system can do this no problem. Lecturing people about security is easy too (though probably futile). However, as far as anyone in the public computer security field knows, from forwarding e-mail that was sent to me is basically impossible. Once the email is available on a computer you control, you can do pretty much anything you want with it, including foward it. The only real exception to this if the computer isn't really under your control, but is running software controlled by the government, which isn't really scalable. Even that's not enough: the government would need to replace your hardware with something that they control because otherwise you can modify the software to allow forwarding. That isn't to say one couldn't label mail with some "no forwarding" tag, it's just that your mail client wouldn't be required to obey it. Indeed, as far as I know there's no widely accepted tag like this, even for advisory purposes.

Even if it were possible to prevent you from forwarding emails from the president, it's not clear how this would prevent the threat described in the first paragraph. OK, so you can't forward the message, but nothing stops you from just whipping out your camera and taking a picture of the screen and sending that to the New York Times, foreign intelligence service, etc. Remember that that's just digital information too, so it's pretty much equally easy to forward. Even if we imagine that a digital photo is problematic for some reason [technical note: sometimes people propose schemes designed to make it difficult to photograph or videotape movies, etc. Generally the idea is to exploit some misfeature of the recording sensor, that isn't an issue in ordinary recording scenarios.] there's nothing stopping you from having a second computer which you use to—and this might be too sophisticated for some attackers—retype the entire message and send it to someone else.

Neither you or I is ever likely to receive an email from the president, to this isn't a very cosmic issue. However, a very similar delusion, namely that you can stop people from making copies of the music and videos you sell them, has been the cause of a very large amount of inconvenience for users, so it's not trivial to get this right either. I suspect that pretty much any computer security person (Alex Halderman, call your office) the reporters had talked to would have dumped cold water on this claim, but I also suspect that they didn't even know enough about computers or think about the threat model enough to be suspicious; they just wrote it down. I wonder what would have happened if someone had told these reporters that in the future Air Force One would be powered by perpetual motion machines?


February 1, 2009

No real content tonight, but you may want to check out the following:

Also, I'm not a huge football fan, but this year's Super Bowl was pretty exciting, what with a 100 yard(!) interception return, Pittsburgh holding the lead for most of the game, Arizona scoring 9 points in a minute with less then 2 minutes to go up 23-20, and finally Pittsburgh scoring another touchdown with 35 seconds to go, in a play which required instant replay to determine whether the receiver's feet touched the ground in the endzone before he fell out of bounds.