On testing airport security effectiveness

| Comments (8) | Security: Airport
Linos, Linos, and Colditz's BMJ paper on airport screening is getting a lot of attention. LLN write:
We systematically reviewed the literature on airport security screening tools. A systematic search of PubMed, Embase, ISI Web of Science, Lexis, Nexis, JSTOR, and Academic Search Premier (EBSCOhost) found no comprehensive studies that evaluated the effectiveness of x ray screening of passengers or hand luggage, screening with metal detectors, or screening to detect explosives. When research teams requested such information from the US Transportation Security Administration they were told that evaluating new screening programmes might be useful, but it was overshadowed by "time pressures to implement needed security measures quickly."16 In addition, we noticed that new airport screening protocols were implemented immediately after news reports of terror threats (fig 1).

It's unsurprising that there are no real studies on this topic, but it's not at all clear that even if we wanted to do some it would be practical, or even possible to do so. The authors suggest a controlled trial of screening effectiveness at detecting specific types oxsxsf attacks:

After informing the airport managers, gaining approval from research ethics committees and police, and registering our trial with one of the acceptable International Committee of Medical Journal Editors trial registries, we would select passengers at random at the check-in desks and give each traveller a small wrapped package to put in their carry-on bags. (We would do this after they have answered the question about anyone interfering with their luggage.) A total of 600 passengers would be randomised to receive a package, containing a 200 ml bottle of a non-explosive liquid, a knife, or a bag of sand of similar weight (control package) in a 1:1:1 ratio. Investigators and passengers would be blinded to the contents of the package. Our undercover investigators would measure how long it takes to get through security queues and record how many of the tagged customers are stopped and how many get through. A passenger who is stopped and asked to open the wrapped box would be classed as a positive test result, and any unopened boxes would be considered a negative test result.

This study design seems problematic as a measure for screening effectiveness. Security screening is fundamentally different from screening for diseases because disease screening isn't adversarial.

To take the simplest case, consider genetic diseases. When you screen for Tay-Sachs, the Tay-Sachs gene isn't trying to figure out how to evade your screen. Even in cases like cystic fibrosis where there are genotypes which produce pathology but aren't detectable with standard screening methods (the basic CF screen only detects 80% of mutations) there's not selective pressure for the undetectable genotype, just pressure against the detectable ones. The undetectable genotypes don't increase in the population.

To take a slightly more complicated case, consider non-genetic diseases, which do evolve. HIV, for instance, regularly evolves resistance to the antiretrovirals we use to treat it. [Warning, I'm working from general principles here. If there are cases of evolved resistance to screening, I'd love to hear about them.] Screening is a different case, though, for at least two reasons. First, the reason you get HIV drug resistance is to a great extent due to selective pressure between the genotypes present in a given patient, so when you treat that patient with antiretrovirals, this exerts selective pressure against the susceptible genotypes and so you end up with a much higher fraction of resistant genotypes within the patient. But of course when you're doing screening, any nontrivial fraction of detectable organisms leads to a positive result and (presumably) treatment, so you don't get as much selective pressure between the detectable and undetectable variants. Second, virii and bacteria aren't intelligently trying to evade your screening, so even if there is some evolved stealth, you would likely have plenty of time to adapt and test your screening technology.

By contrast, in the case of airline screening, you have an intelligent attacker with a very short reaction cycle, so as soon as they know what kind of screening you are using they can move to evade it. Also, you don't need each attacker to independently evolve defenses—as soon as someone figures out a defense technique, they can tell a lot of other attackers about it. (This is also why signature-based virus detection is such a hard problem with relatively high false negative rates). This makes the problem of evaluating whether a given set of screening techniques work as the authors propose very problematic. By the time you've done your effectiveness study, it's already obsolete.

More importantly, this study design sort of confuses a technique (stopping people from bringing weapons through the security checkpoint) with the goal (stopping people from blowing up airplanes). But of course thse aren't the same thing. For instance, you could jump the fence and smuggle explosives into the sterile area. So, the question you really want to ask is whether airport security decreases the chance of planes being bombed. In order to do this, you need a different study design: one which compares various security regimes in terms of the number of terrorist attacks that occur under them. This is a much harder study to do, for a number of reasons.

First, you have the "outrun the bear problem". Say that you have both good and bad security and terrorists preferentially attack airports with bad security. This doesn't necessarily tell you that if everyone adopted good security you would see fewer attacks. The terrorist might just be lazy enough to choose the softer targets, but would mount attacks anyway—this is a variant of the adaptiveness problem. We just don't understand the supply model that well.

Second, ignoring this problem, it's not clear we have enough data to do a meaningful study, because the number of terrorist attacks is so low. Remember that there have been no successful US airline hijackings or bombings since September 11th 2001, so if you'd run a study of this type starting in 2002, you would not be able to reject the null hypothesis that good airline security (assuming, as seems likely, that there's existing variation in screening quality) was useless. We just don't know whether the reason we haven't had any attacks in over five years is because of good security or because people aren't trying, and you'd need a lot more data to get a significant result.

Given these issues, it's pretty hard to imagine what kind of study would let you decide these issues. That's not to say that I think that the current flavor of airport security is useful, but that doesn't mean that it's that meaningful a criticism that there aren't studies that show that it is.


It seems like you could construct studies of airport security effectiveness using red team exercises. Assign various possible security regimes among the different airport and then have lots of red teams try to breach their security.

I can foresee some challenges, such as achieving an appropriate level of realism without actually blowing up a plane or taking hostages and potentially opening up airports with weak assigned regimes to real attacks. But I'm pretty confident the resulting data would be much better than what we've got now.

However, I probably wouldn't want these studies published.

Kevin, your approach (or something like it) of course happens all the time. Often part of the studies get leaked, and you see headlines about how 75% of bombs got through security, etc. I'm sure the TSA is running red team exercises on a regular basis, but just not publishing the results. I bet if instead of just calling the TSA and asking nicely for the results, if the paper authors had filed a proper FOIA request, they might have got something -- even if only a "those studies are classified, but here are their dates and titles" response.

Yeah, I know Craig. The last line of my post was mild sarcasm.


I don't think that your proposed test is as useful as one might like, because it depends on the red teams being accurate models of the actual attackers. But part of the problem is that we don't really have such a model.

I'm not so sure that we don't have a halfway decent model. We know they're human. We know their goal is to terrorize us by killing people in impressive ways. We know they're trained in weapons and explosives. We actually have many of their training materials.

I would think special forces service members with experience fighting alongside local forces in Afghanistan and Iraq who have been given terrorist training materials would make pretty good proxies.

Real world diseases adapt to screening all the time. For example, Herpes has become considerably less symptomatic over the last few decades, as a result of which a lot more people have it, although most of them don't know they have it, and never will.


Do you have evidence that this is a reaction to screening, as opposed to favoring asymptomacity because that results in people being less careful? They're not quite the same thing.

Seems a mistake to train people that it might be ok to accept packages from strangers* at the airport.

*Strangers, strangers with badges, stragers with uniforms, etc.

Leave a comment