May 2008 Archives

 

May 31, 2008

Predicting Human Brain Activity Associated with the Meanings of Nouns
Tom M. Mitchell, Svetlana V. Shinkareva, Andrew Carlson, Kai-Min Chang, Vicente L. Malave, Robert A. Mason, Marcel Adam Just

The question of how the human brain represents conceptual knowledge has been debated in many scientific fields. Brain imaging studies have shown that different spatial patterns of neural activation are associated with thinking about different semantic categories of pictures and words (for example, tools, buildings, and animals). We present a computational model that predicts the functional magnetic resonance imaging (fMRI) neural activation associated with words for which fMRI data are not yet available. This model is trained with a combination of data from a trillion-word text corpus and observed fMRI data associated with viewing several dozen concrete nouns. Once trained, the model predicts fMRI activation for thousands of other concrete nouns in the text corpus, with highly significant accuracies over the 60 nouns for which we currently have fMRI data.

This paper is pretty interesting. Basically, they have measured fMRI activation data for 60 words. So, voxel v has activation level A_v_w for word w. For each word, they have measurements of a bunch of linguistic parameters P_1, P_2, P_3..., etc. They then fit a predictive model for the effect of each parameter on the activation level of each voxel, so for instance you could say that if a word is associated with "sight" (i.e., it appears near "sight" in text corpii) that increases the activation of voxel v by .1 units. This is fairly straightforward regression modelling stuff.

Once you have the model fitted, you can then predict the activation of each voxel for a novel word by taking its linguistic parameter values and plugging them into the model. Their results are actually pretty good. They have a corpus of 60 word/fMRI pairs and they use 58 as a training set and 2 as a test set. They then try to differentiate the two test words by asking which predicted activation pattern is closer. The results are significantly better than chance: mean=.77 for what appears to be arbitrary words and mean=.62 when the words are from the same semantic category (e.g., "celery" and "corn"). Moreover, a significant amount of the error appears to come from head motion by the subjects.

I'm not sure how to interpret this from a scientific perspective. It's a long way from knowing which brain cells are used in processing certain words to knowing how the brain actually processes those words. On the other hand, it's not clear you ned that deep an understanding to build a brain-scanning fMRI gizmo that does something useful. Though we're a long way from that too. Even ignoring the fact that we don't understand the brain well enough, hanging out with your head in a noisy magnet probably isn't a lot of fun.

 

May 27, 2008

I'm in Boston today for the IETF P2P Infrastructure workshop. Anyway, we land and as usual, as soon as we land they tell us that we're allowed to use our cell phones but other electronic devices have to stay off until they open the aircraft doors. So, wait a second. I can't use an iPod, but if it's an iPhone that's totally different and I can use it? Do cell phones emit some sort of protective radiation that I'm blissfully unaware of? Is it different between CDMA and GSM? If I have a laptop and it's got a cell modem in it, is that OK? Does it have to be built in or does a USB cell modem provide the magic safetyfying effect? Would it work if I just duct taped my cell phone to my laptop? Outstanding!
 

May 25, 2008

AP reports that the FCC is considering significantly restricting cell phone service provider early termination fees:
Cell phone companies routinely charge customers $175 or more for quitting their service early. Under a proposal to the Federal Communications Commission, the wireless industry would give consumers the opportunity to cancel service without any penalty for up to 30 days after they sign a cell phone contract or until 10 days after they receive their first bill.

The proposal also would cap such fees and reduce them month by month over the course of a contract based on how long customers have left, according to people familiar with the offer speaking on condition of anonymity because the FCC has not accepted it. The plan would not abolish cancellation fees entirely.

In exchange for the government's approval, the agreement would let cell phone companies off the hook in state courts where they are being sued for billions of dollars by angry customers. If approved by the FCC, the proposal also would take away the authority of states to regulate the charges, known as early termination fees.

The nation's No. 2 wireless company, Verizon Wireless, offered the proposal to the FCC for its review after high-level meetings with senior FCC officials. It did so in consultation with other leading wireless companies, whose executives indicated they would not oppose its provisions, people familiar with the offer told the AP.

Hmm...

Cingular's current plan is:

  • Up-front $30 or so activation charge.
  • 30-days to cancel.
  • $175 cancellation fee, reducing $5 every month.

So, this new plan wouldn't be much of an improvement, since $7/month would decay to 0 over the life of a 2-year contract, especially as this article doesn't say that the fees would actually decay to 0. Reading between the lines, this looks a lot more like a plan for the cell companies to preempt state action than it does like the FCC intervening to help you out.

 

May 24, 2008

For some reason I checked out Conservapedia today. Sort of an amazing artifact, if basically insane. It's like—well, it actually is—they want to create a whole alternate reality where the normal rules of intellectual discourse don't apply. Here's the (somewhat famous) article on the kangaroo:
According to the origins theory model used by young earth creation scientists, modern kangaroos are the descendants of the two founding members of the modern kangaroo baramin that were taken aboard Noah's Ark prior to the Great Flood. It has not yet been determined by baraminologists whether kangaroos form a holobaramin with the wallaby, tree-kangaroo, wallaroo, pademelon and quokka, or if all these species are in fact apobaraminic or polybaraminic.

After the Flood, these kangaroos bred from the Ark passengers migrated to Australia. There is debate whether this migration happened over land[6] with lower sea levels during the post-flood ice age, or before the supercontinent of Pangea broke apart[7] The idea that God simply generated kangaroos into existence there is considered by most creation researchers to be contra-Biblical.

Other views on kangaroo origins include the belief of some Australian Aborigines that kangaroos were sung into existence by their ancestors during the "Dreamtime" [8] and the evolutionary view that kangaroos and the other marsupials evolved from a common marsupial ancestor which lived hundreds of millions of years ago.[9] In accordance with their worldviews, a majority of biologists regard evolution as the most likely explanation for the origin of species including the kangaroo.

Uh, yeah. Incidentally, that passage contains links to Baraminology, the study of Biblical kinds. I almost expect there to be a page on the Turtles all the way down theory of cosmology. I was going to try to make a serious argument about this, but it's just laughable.

Incidentally, Mrs. G noted the weird juxtaposition of Pangea and flood theory. Unsurprisingly, there's a footnote pointing to this uh, explanation about how Pangea is compatible with flood theory. In case you're curious, it's that the rate of geologic activity was higher during the flood.

 

May 23, 2008

One of the weird tropes in the mortgage crunch is reports of people walking away from their houses when the value of their loan exceeds the value of the house. Obviously, this works better if your mortgage is non-recourse, meaning that the lender can't go after your personal assets once they foreclose. If you don't have any other real assets anyway, this doesn't make much difference, but if you're someone with substantial other assets who just made a bad investment, then this is pretty convenient (though it doesn't exactly do wonders for your credit score). As far as I can make out, in California, first mortgages are generally non-recourse, but refis are often recourse (see the FTB's discussion here). That could be a nasty surprise...
 

May 22, 2008

As you may have heard by now, Debian introduced a distribution level patch to OpenSSL that pretty much completely wiped out the PRNG, with the result that it generated predictable keys. Plenty has been written about this, but it's worth noting that this bug has been hanging around for two years and was far from hidden. On the contrary, there was an outstanding bug documenting the "problem" that resulted in the patch and it wasn't hard to find the corresponding fix in Debian SVN. So, here we have a fairly obvious (to a security expert) error in a section of code that is well known to be security critical, specifically called out in the bug database and yet it took two years for someone to notice. What does that say about how difficult it would be to insert and hide a backdoor in a piece of software?
 
The Times reports on a study (press release here) by the Center for Work Life Policy on women in science and technology fields. The study isn't available yet, but the press release and the NYT article seem to confuse a number of issues:
The 147-page report (which was sponsored by Alcoa, Johnson & Johnson, Microsoft, Pfizer and Cisco) is filled with tales of sexual harassment (63 percent of women say they experienced harassment on the job); and dismissive attitudes of male colleagues (53 percent said in order to succeed in their careers they had to ?act like a man?); and a lack of mentors (51 percent of engineers say they lack one); and hours that suit men with wives at home but not working mothers (41 percent of technology workers says they need to be available ?24/7?).

...

The result, she said has been a work environment that dismisses women. Female employees come up against "the kind of culture that evolves when women are in the extreme minority," she said. (Think "Lord of the Flies.") The ideal worker in this realm is "the hacker who goes into his cubicle and doesn't emerge for a week, having not showered or eaten anything but pizza. Those people exist and they are seen as heroes."

So, there are five complaints here:

  1. Sexual harassment
  2. Dismissive attitudes
  3. Lack of mentors
  4. Long hours and inflexible schedules
  5. A culture that rewards lone work

When evaluating these complaints, we need to examine two axes: the pragmatic question of what would benefit companies, and the fairness issue of how companies ought to behave. In many cases, these are aligned. For instance, it's clearly unfair to subject women to sexual harassment and it's doubtful that it's somehow favorable to the company either, since at minimum it demoralizes a significant fraction of your workforce.

On the other hand, in some cases these forces may be in tension. To take one example: if engineers willing to work 80 hours a week are a lot more productive than engineers who can only afford to work 40 hours week (I'm not saying this is so; I suspect the relationship is a lot more complicated than this), then expecting your engineers to be willing to work long hours might well benefit the company; it's a tradeoff between the additional effort you get out of your existing staff and the reduced population you're able to draw from (assuming that some people simply can't work those hours). Similarly, it could be true that lone hackers slaving away in their cubicles is the best way to produce software (it's far from clear that that's true, but I've certainly seen plenty of high quality software produced that way), in which case again it may be in the company's best interest to rely on such people, even if they're harder to find than the average programmer.

Now, obviously one could say "yes, it's true that practice X would be more efficient, but it's so difficult for a large segment of the population that it's unfair to engage in it". It's not clear to me that this argument has anywhere near the moral force that (for instance) an argument against practices that aren't beneficial to the company are, since you're asking the company to do something that's against their interest. Ignoring the question of male vs. female, if I'm the kind of worker who would like to put in my 40 hours and then go play PS/2, I'm going to be at a disadvantage compared to my co-worker who is prepared to spend 70 hours a week at work. It's not clear to me that when he gets promoted and I don't that that's inherently unfair.

Let's try turning this around and look at a job where most of the employees are women: day care workers and day care. Now, I haven't done a scientific study, but it seems to me pretty likely that the reason that most of these workers are women is that women like working with kids a lot more. Yet, I don't think it would be reasonable to say that this was an anti-male environment and that employers should find some way to remove the aspects of the environment that make it less congenial to men (i.e., the kids). That would obviate the whole point of the job!

So, at minimum we've got some kind of spectrum of practices that are preferential to some types of employees:

  • Practices which are actually detrimental to job performance (these absolutely exist)
  • Practices which are neutral.
  • Practices which improve job performance.
  • Practices which are essential to job performance.

So, I think we can all agree that we should move away from practices that disadvantage women and that are also bad for job performance and we can probably agree that practices which are neutral should be changed as well. This leaves us with how to handle practices which are beneficial to the organization but preferential to some types of employees. [I should note at this point that it can be hard to assess which category any given practice falls into. The people in charge of the organization will generally defend any existing practice, no matter how stupid.] The general social consensus seems to be that organizations should have to make accomodations as long as the hit to their productivity isn't too large. But of course, this leaves us in the uncomfortable position where the organization which is faced with making a change which would probably reduce productivity somewhat is incentivized to claim that it would result in a huge productivity reduction while activists for whoever is on the disadvantaged end of the practice (in this case, women), have an incentive to claim that there wouldn't be any impact on productivity, with neither side being much interested in the truth.

 

May 16, 2008

Interesting fact: there's a significant amount of evidence that sleeping on the left hand side as opposed to the right hand side significantly reduce GERD. For instance: Khoury et al. (1999(:
METHODS: Ten patients, three female and seven male (mean age 47.6 yr, range 30-67 yr) with abnormal recumbent esophageal pH <4 on 24-h pH-metry participated. A standardized high fat dinner (6 PM) and a bedtime snack (10 PM) were administered to all patients. GER during spontaneous sleep positions was assessed with a single channel pH probe placed 5 cm above the lower esophageal sphincter (LES) and with a position sensor taped to the sternum. Data were recorded with a portable digital data logger (Microdigitrapper-S, Synectics Medical) and analyzed for recumbent percent time pH <4 and esophageal acid clearance time in each of four sleeping positions. Time elapsed between change in sleeping position and GER episodes was also calculated.

See also Katz et al. (1994), van Herwaarden et al. (2000). The mechanism doesn't appear to be entirely clear, however.

 

May 15, 2008

I totally agree that banning (or even significantly restricting) people with HIV from entering the US is nuts, but despite Andrew Sullivan's protestations to the contrary, it's really not unenforceable.

This law has lasted so long because no domestic constituency lobbies for its repeal. Immigrants or visitors with HIV are often too afraid to speak up. The ban itself is also largely unenforceable -- it's impossible to take blood from all those coming to America, hold them until the results come through and then deport those who test positive. Enforcement occurs primarily when immigrants volunteer their HIV status -- as I have -- or apply for permanent residence. The result is not any actual prevention of HIV coming into the United States but discrimination against otherwise legal immigrants who are HIV-positive.

Rapid HIV tests are readily available, and the OraQuick test involves only an oral swab and reads out in 20 minutes minutes. It wouldn't be at all hard to design an immigration system that forced people to go through oral HIV testing. You'd just need somewhere to hold them for 20 minutes, which I suspect CBP could easily arrange.

More to the point, the idea that for such a ban to be enforceable requires point of entry testing strikes me as basically wrong. We have a whole array of immigration controls (not a terrorist, not a nazi, etc.) that are based on matching up information that's indexed by personal identity (i.e., your name, age, passport #, etc.) against the person standing in front of the immigration agent. But those checks to a great extent rely on accurate record keeping and enforcement by the country who issued that individual's passport. If my country of origin doesn't bother to check identity before issuing passports, I'll be able to get in even if I'm actually Osama bin Laden. The United States certainly could require that people who wish to enter the country come with a certification that they've had recent HIV testing and are HIV negative. Those who didn't could be deported or tested on entry.

Even absent such infrastructure, there are plenty of immigration requirements that don't get routinely checked but if you're found to have lied are used as the basis for prosecution or deportation. When Mrs. Guesswork got her green card, they asked her if she was a Nazi. I doubt they checked up on that, but I'm sure if it later came out that she was Ilsa, She Wolf of the SS they'd find some way to punish her.

 

May 14, 2008

The NYT has an article, about the transition from CFC to HFA metered dose asthma inhalers (because of the negative effect of CFCs on the ozone layer). [See here for calculations about the total amount of CFC emissions from MDIs]. There have been some problems, including:

  • The HFA inhalers have a weaker spray than the CFC inhalers so people worry that they're not getting a full dose. This is a particular problem since the old CFC inhalers tended to produce a weaker spray when they were just about empty.
  • Ventolin HFA has a dose counter so you know whether you are getting down to the end of the inhaler.
  • This spray is particularly weak with the ProAir, which, for some reason, is preferred by a lot of health insurance plans. I've used the ProAir and can attest to having had some concern about whether it's working or not. ProAir, unfortunately, does not have a dose counter.
  • The new albuterol HFA inhalers aren't available in generic, so this increases people's costs significantly.

It's just not the albuterol inhalers which have been changed over. The inhaled corticosteroids are transitioning over as well, but because they're not used as rescue inhalers, I guess people are less sensitive about whether they're working or not.

In other pharma news, generic omeprazole (Prilosec) is now on the market, though prices don't seem to be much less than the brand name version. I wonder if eventually it will come in huge jars for $.01 a pill like ibuprofen.

 

May 13, 2008

Watched the first couple episodes of Rome last night. Looks like a generally pretty solid show, though I notice that being a Roman seems to involve having a lot of noisy, semi-public sex. Anyway, I was particularly struck by the scene where Lucius Verones gets home after 8 years away with the legions and his kids don't recognize him. Times sure have changed now that we don't have the Roman Legions. These days when that happens it usually means you've made United Global Services..
 
Lauren Weinstein is rightly concerned about Charter Communications' plans to "enhance" your browsing experience by injecting banner ads into your Web pages based on analysis of your browsing habits.

If this is something you're not that thrilled about, (which I can easily understand), then you might get to thinking what your options are. Charter offers an opt-out but as far as I know there's nothing forcing them to do so, and their opt-out appears to be pretty inconvenient:

Yes. As our valued customer, we want you to be in complete control of your online experience. If you wish to opt out of the enhanced service we are offering, you may do so at any time by visiting www.charter.com/onlineprivacy and following our easy to use opt-out feature. To opt out, it is necessary to install a standard opt-out cookie on your computer. If you delete the opt-out cookie, or if you change computers or web browsers, you will need to opt out again.

You could just change ISPs, of course, if you're lucky enough to live in a non-monopoly area and your other choices don't offer this enhanced feature set.

As Weinstein observers, one possible defense is to do HTTPS connections to every server, but that requires cooperation from all the server operators which has the usual network effect/collective action problems. But there's at least one obvious way to protect yourself unilaterally: set up a VPN to some provider who promises not to mess with your packets. You'd still be getting packet carriage from Charter, but they wouldn't be able to mess with your packets much, other than to drop or delay them. Certainly, they would not be able to inject their own traffic. This technique would probably introduce some latency, but the provider could locate their VPN concentrator near a major exchange point, which would reduce the latency quite a bit. The major obstacle would be finding someone to provide this service; I know there are providers which do IPv6 tunnels, but I don't know if they do v4 tunnels.

The effect of all this is to reduce your local ISP to raw packet carriage. Effectively, you're treating them long a long wire between you and your real ISP, the tunnel provider. Obviously, local ISPs could stop you from doing this, but it's hard to see on what grounds they would do so if they don't block enterprise VPNs.

 

May 12, 2008

Last Sunday's NYT has a depressing article about the sports injury rate in high school and college girls/women. Most of the article is stories about injured athletes. It's a familiar story for anyone with experience with elite athletes. There's an initial injury, followed by an incomplete recovery, followed by successive injuries, followed by even less complete recovery, etc. This is a common pattern for elite athletes, who tend to be highly motivated to train and compete (this is how you get good) and therefore bad at the kind of laying off you need to recover from injuries.
Amy said that she had "a lot of complications" with the first one. But what she described in her understated way sounded more like a nightmare than complications. She briefly became addicted to her pain pills. She lost weight and became so dehydrated she had to be hospitalized and hooked up to an IV. She received a "huge lecture" from the nurses on how to take better care of herself.

But she achieved her goal and made the under-19 team, the highlight of her too-brief career. As Amy walked toward me the first time we met, her right leg was stiff and her whole gait crooked. She moved like a much older woman. If I hadn't known her history, I would never have believed she had been an athlete, let alone an elite one. She had undergone, by her count, five operations on her right knee. Her mother counted eight, and believed that Amy did not put certain minor cuttings in the category of actual operations. She was done playing. She had been told she would need a knee replacement, maybe by the time she turned 30.

Amy told me about her final operation, recalling that when she came out of anesthesia, the surgeon seemed as if he was going to cry. He looked at her in silence for what seemed like a long time, trying to compose himself. Finally, he told her, "Amy, there was nothing in there left to fix."

The implied message of the article is that female injury rates are a lot worse than that of males, but it only cites a small number of statistics and it's clear the data is a bit fuzzy. The article focuses on ACL injuries, which are truly awful but I'm not sure they're representative. For instance, this study suggests that the overall injury rate is comparable for males and females, but that women's knee injury rates are much higher:

The boys' and girls' data were compared and statistically analyzed. The rate of injury was 0.56 among the boys and 0.49 among the girls. The risk of injury per hour of exposure was not significantly different between the two groups. In both groups, the most common injuries were sprains, and the most commonly injured area was the ankle, followed by the knee. Female athletes had a significantly higher rate of knee injuries including a 3.79 times greater risk of anterior cruciate ligament injuries. For both sexes, the risk of injury during a game was significantly higher than during practice.

See also here which describes a similar pattern, as does this. (Caution, I'm working only on the abstracts here, since the actual articles are behind pay walls.) Bottom line: I'm not sure what to make of this. I haven't done a thorough literature review, but it's not clear to me that in general—as opposed to in the specific case of ACL and perhaps concussion—females are at a significantly higher rate for sports injuries than males.

One thing I was sort of surprised not to see was any discussion of the female athlete's triad: (anorexia, amenorrhea, and osteporosis). I'd be interested to know to what extent eating disorders result in increased risk of ACL-type injuries (this isn't a topic I know a lot about).

 

May 10, 2008

Danny McPherson posts about his experience with the free WiFi in the Unied Red Carpet Club:
More interesting is perhaps the access model they employ. To login, all you need is the United Mileage Plus number of the primary Red Carpet Club account holder. Now, having long questioned the wisdom of a luggage tag that displays these numbers, be it a "hole-punched" Mileage Plus membership card, or a more obvious oval-shaped Red Carpet Club tag, I'm even more wary now. But if you're in bind and need your airport wireless fix, odds are you won't have to walk far to find one available for the taking. As a matter of fact, I see two from where I'm sitting right now.

I've yet to explore how difficult it would be to exhaustive search for valid numbers, or if multiple logins are permitted at a given time, or how far outside of the Red Carpet Club these numbers are valid, or... I also wonder how long it'll be until some poor schmuck is arrested for allegedly downloading child porn from an airport wireless network...

If this were a wired network this wouldn't be a security problem. After all, if you're inside the RCC, presumably you're an RCC member (unless you bought a day pass), in which case you should be entitled to use the network. But as Danny indicates, the wireless AP is probably accessible from outside the RCC, so if you sit outside the club, you should be able to get on the network, making it just a matter of having a valid mileage plus number, which you can get off of someone's luggage tag.

As far as exhaustive search goes, MP numbers are 11 digits long, but the first digit seems to always be zero, so this is a 10 digit space. I don't know how many RCC members there are, but Wikipedia claims that there are about 750,000 Premier and Premier Executive members, so let's say there are on the order of 200,000 RCC members, or 2*10^{-5} of the space. If the numbers are randomly distributed, you'd need to search about 100,000 numbers in order to find one. This could take quite some time (over a day at one per second). You might be able to get some leverage because the distribution isn't random. They seem to be issued in some kind of increasing sequence, though there seem to be too many numbers for it to be strictly sequential. If there's a check digit like in credit card numbers this would make the space a lot easier to search. (If someone knows the actual algorithm, please write in.) Of course, you only need to know a few valid numbers, so this might not be a totally prohibitive attack if reading it off someone's tag weren't so easy.

Three more thoughts:

  • RCC entry itself is a lot more valuable than access to the wireless, since the wireless access doesn't cost United much, but access to the club costs them food and (in the International terminal), free drinks. I assume it's not hard forge an MP card once you know a valid number. I'm not an RCC member so usually when I'm there it's on the "international ticket" + Star Alliance Gold exception, so they check my ticket, which is hard to forge. Do they insist on seeing your ticket if you're an RCC member? If not, this is actually a new attack vector on the RCC, since it would let you extract numbers even if it weren't easy to read them off other people's luggage.
  • There's actually a fairly easy way to secure the system against remote attacks (ones that don't involve somehow gaining access to the RCC interior) that wouldn't require lining the RCC walls with copper sheeting. For the first login to the RCC network, require not just your RCC #, but also a random passcode given to people on entrance (or maybe posted on the wall). After that, you can install a cookie on their computers and just let them on without a new login. 1
  • I'm a bit curious how the system checks for RCC number validity. Does it have a local copy of the RCC database? Is it connected to United's central systems? That could be interesting.

1 See draft-rescorla-stateless-tokens for a description of some techniques for avoiding the need for a centralized cookie database.

 

May 9, 2008

In preparation for the IETF P2P Infrastructure Workshop, I've revised and expanded this post into a "position paper submission.

Introduction

In mid-2007 it was revealed [4] that Comcast was blocking peer-to-peer traffic (most famously BitTorrent) on their network by injecting RST packets to terminate TCP [7] connections. The BitTorrent community almost immediately discovered carrying BitTorrent over an encrypted tunnel (VPN or SSH) was not subject to blocking, thus completing another cycle of the ongoing arms race between peer-to-peer implementors and network operators. This paper explores some predictable next moves in the game and their consequences for the network.

This isn't intended to be comprehensive, because the request was for short papers, but I think it hits the high points. You can find the full note here.

 

May 8, 2008

Even the most diehard TeXhead has moments when he needs to read some Word document. Tonight was such a night and I have Office 2004 on my machine for just such an eventuality (Please don't write in to tell me that I should run Pages. As I said, I don't want to run either of them, but I also don't want to deal with Pages/Word incompatibility.) Anyway, I boot up Word and the Leopard firewall asks me if I'd like to let Word listen for network connections. I go to click no and either manage to click it or raise some other window or something. The dialog disappears and when I check the firewall it sure does say to block MS Word. So, that's OK, I guess.

And then I get to thinking, "Why is Word opening up TCP listening ports anyway?" So, I run netstat -a | grep LISTEN and get:

[49] /usr/sbin/netstat -a | grep LISTEN
tcp4       0      0  *.3369                 *.*                    LISTEN
...

Hmmm. What's 3369? Google doesn't know, so that's not good. I close Word and the port goes away and lsof confirms it's Word:

[52] /usr/sbin/lsof -i TCP:3369

COMMAND  PID USER   FD   TYPE    DEVICE SIZE/OFF NODE NAME
Word    8198  ekr   16u  IPv4 0x6c4d66c      0t0  TCP *:3369 (LISTEN)

I shut down Word and my WiFi and restart it, but it's not listening now. Maybe I need the network on. Sure enough, I bring the WiFi back up and restart Word and now it's listening, but on a different port: 3828 this time. Stranger and stranger. Now ordinarily this would only be about a 4.0 freakout on a scale of 1 to 10, but it turns out that I only recently installed Office on this machine and was unaware of the following delightful property of MS AutoUpdate: it only installs one update at a time, no matter how many updates are pending. So, when you have 10-20 updates to install, and you're just letting update run itself, it takes forever to get uprev. The consequence of this is that I was loading random people's documents with some two year old (and vulnerable) version of Word. Who knows what malware I've had the joy of installing. This jacks things up to a freakout factor of about 6.2.

Next step: compare to another machine. It shows up on my other Mac, which is a little comforting, but of course that machine could be infected too. I double check with Hovav, who is about as paranoid as I am, and his copy of Office is is listening, but on some other random port. That's sort of comforting. This is starting to look a lot less like malware and a lot more like a feature of Word. A little more digging tells us the process name that is actually doing the listening. It's Word (as I knew) but with some wacky argument starting with -psn_0_.... Searching on this, we find out that I'm not the only person who has had this question.

If you close UDP 2222, then no other computers will know which TCP port your copy of word has chosen to listen to (in the 3000-3999 range), because that info is broadcasted in the UDP packets. The protocol is thus: Your copy of word spews it's serial number (encoded) and the TCP port it is listening on in a packed on UDP 2222. Other copies of word on the network get this packet and then respond the your copy of word on the specified TCP port if they have the same serial. Then one copy shuts down.

I guess it was malware after all. Outstanding!

 

May 2, 2008

George Ou made this argument at the FCC En Banc hearing at Stanford on 4/25 (A/V here).
It's actually quite common throughout the world that TCP RSTs are used.

...

Speaking of the 1:45 AM resets, ISPs all over the world, they've found that up to 12% of sessions get reset, all over the world. It's almost like there's this 12% of background noise of TCP resets that are happening that may not be coming from comcast but could be coming from any device on the Internet, all routers, all firewalls support that feature and we don't really know where it's coming from.

Here's ATT's response to Vuze's claim that they use RSTs for "network management purposes" (i.e., terminating connections they don't like):

In response to your specific question about AT&T's network management practices, AT&T does not use "false reset messages" to manage its network. We agree with Vuze that the use of the Vuze Plug-In to measure network traffic has numerous limitations and deficiencies, and does not demonstrate whether any particular network providers or their customers are using TCP Reset messages for network management purposes. Given that Vuze itself has recognized these problems with the measurements generated by its Plug-In, we believe that Vuze should not have published these misleading measurements, nor filed them with the FCC. Moreover, as Vuze and others have acknowledged, TCP resets are generated for many reasons wholly unrelated to the network management practices of broadband network providers, which explains why resets may appear on networks of companies such as AT&T who do not use TCP resets for network management (see, e.g., An Analysis ofTCP Reset Behaviour on the Internet, University of Calgary (2004)).

I've reviewed the paper by Arlitt and Williamson to which AT&T is referring (Ou didn't cite his sources), and while it's interesting work, I don't think it really speaks to Ou's argument. The RSTs that Arlitt and Williamson are talking about are primarily ungraceful terminations of TCP connections that would be ending anyway. The authors suggest a number of cases here:

  • Servers aggressively closing connections after short idle times, but the client already has a request in flight and the server responds with an RST.
  • Clients responding to FINs from the server with an RST. The reasons for this are a bit unclear.
  • Servers closing connections with RSTs.
  • Connections to servers which aren't listening on a given port and so are rejecting it.

In all of these cases but the last, though, the Web transactions are actually over, so while there may be some negative effects from not going through the correct TCP finish handshake (cf. RFC 1337), neither side perceives this as failed transactions. And in the final case, the server explicitly is rejecting the connection, so this seems appropriate as well. It's also fairly straightforward to distinguish these cases as a passive observer (as the authors have done) with the appropriate tools.

What Comcast has done, however, is something different: they were (are?) using RSTs to abort other people's transactions. The base rate of normal RSTs isn't really that useful for assessing the appropriateness of third party RSTs as a network traffic management technique. As a hypothetical, imagine that Comcast were forging FINs instead of RSTs. One could expand Ou's argument to say "FINs are a natural feature of the Internet", but it doesn't really follow that it's desirable to have third parties forging FINs on your connections.

It does bear, on the other hand, on what we can infer from Vuze's data. Vuze hasn't really published that many details of their methodology, but they claim to be measuring the total number of RSTs, not just those of Azureus/Bittorrent connections (Incidentally, I'm not sure how I'd feel as a user about installing some app that sniffed all the traffic on my network and sent statistics to Vuze)[-- see update below; EKR]:

The Vuze Plug\u2010In constantly monitors the rate of network interruptions occurring from RST ("reset") packets by measuring the total number of attempted network connections and the total number of network connections that were interrupted by a reset message. By comparing these two values, one can calculate the ratio of network connections interrupted by reset messages. We have chosen to reflect the median ratio in order to reduce variability in the data given the sample size.

The Plug\u2010In collects data for all Internet connections, not just connections occurring due to use of the Vuze application, and logs it every ten minutes Then, at the top of the hour, the Plug\u2010In aggregates the data into one\u2010hour blocks and transmits it to Vuze, Inc.. By definition, each source of data had the Vuze application installed and launched in order connections.

But if you're measuring all RSTs and not attempting to determine which ones are "normal" and which ones represent connection failure, then it's not clear how representative your data is. It is sort of interesting how much variation (about an order of magnitude from 2.5% to 24%) there is in terms the rate of RSTs, but as Iljitsch van Beijnum observes, this could be the result of caching proxies and the like in the network. You may not particularly want your ISP interposing a proxy, but that's a different question than whether they're actually blocking your P2P traffic.

This isn't the only possible reason, either. For instance, users might just have different software profiles. Given that Vuze claims to have 8000 users on 1200 ASs (with the data being reported for ASs with greater than 20 users, there could well just be a lot of statistical variation. Some evidence of this is that the results from Comcast alone span from 14% to 24%). In order to really make sense of data like Vuze's we'd need to try to distinguish normal RSTs from those injected in the network, which requires more forensics (TTL inspection, IP ID, etc.) than Vuze's paper describes.

UPDATE (8:49 AM): I was wrong about this needing to be a packet sniffer. I just read the source (here; thanks to Danny McPherson for pointing out that I could download it.) They're just using netstat to read the network statistics and grabbing the reset counter out of the results. On the other hand, this means that they're not even in principle able to differentiate between RSTs generated on Azureus connections and those on other connections or between those generated by some man in the middle or by endpoints. While the variation in reported RSTs remains interesting, you'd need a significantly more advanced tool than this to really diagnose what's going on.

 

May 1, 2008

In the NYT, Gina Kolata reports on a study that found that a substantial number of athletes show negative results on urine tests for testosterone, even when they're doping:
The 55 men in a drug doping study in Sweden were normal and healthy. And all agreed, for the sake of science, to be injected with testosterone and then undergo the standard urine test to screen for doping with the hormone.

The results were unambiguous: the test worked for most of the men, showing that they had taken the drug. But 17 of the men tested negative. Their urine seemed fine, with no excess testosterone even though the men clearly had taken the drug.

It was, researchers say, a striking demonstration of a genetic discovery. Those 17 men can build muscles with testosterone, they respond normally to the hormone, but they are missing both copies of a gene used to convert the testosterone into a form that dissolves in urine. The result is that they may be able to take testosterone with impunity.

...

Men with the gene deletion still metabolize testosterone, Dr. Schulze says. But, she adds, she does not know where the hormone goes. "We have no idea," she said. "That's what we're trying to find out."

If you've got this gene deletion, you've potentially got an enormous advantage in terms of being able to dope without getting caught. Even for those who don't have the gene deletion, I wonder whether there's some chemistry you could use to force testosterone metabolism down whatever alternate pathway is involved here (or alternately to disable the standard pathway), producing a masking effect for even those with normal genetic profiles.