Wait, how does the terrorist score system work?

| Comments (7) |
DHS has been secretly developing/using an Automated Targeting System. Basically, it's a data mining system designed, well, let's let AP tell it:
The scores are assigned to people entering and leaving the United States after computers assess their travel records, including where they are from, how they paid for tickets, their motor vehicle records, past one-way travel, seating preference and what kind of meal they ordered.

...

Government officials could not say whether ATS has apprehended any terrorists. Customs and Border Protection spokesman Bill Anthony said agents refuse entry to about 45 foreign criminals every day based on all the information they have. He could not say how many were spotted by ATS.

...

The government notice says some or all of the ATS data about an individual may be shared with state, local and foreign governments for use in hiring decisions and in granting licenses, security clearances, contracts or other benefits. In some cases, the data may be shared with courts, Congress and even private contractors.

...

In a privacy impact assessment posted on its website this week, Homeland Security said ATS is aimed at discovering high-risk individuals who "may not have been previously associated with a law enforcement action or otherwise be noted as a person of concern to law enforcement."

Ahern said ATS does this by applying rules derived from the government's knowledge of terrorists and criminals to the passenger's travel patterns and records.

For security reasons, Ahern declined to disclose any of the rules, but a Homeland Security document on data-mining gave an innocuous example of a risk assessment rule: "If an individual sponsors more than one fiancee for immigration at the same time, there is likelihood of immigration fraud."

Here's the list of the "predictors" they're using:

  • Passenger manifests
  • Immigration control records
  • Information from personal searches
  • Property seizure records
  • Vehicle seizure records
  • Aircraft arrival records
  • Visa data
  • FBI National Criminal Information Center data
  • Treasury enforcement actions involving people and businesses
  • Dates of flight reservation and travel
  • Passenger name
  • Passenger seat information
  • Passenger address
  • Form of travel payment
  • Billing address
  • E-mail address
  • Telephone numbers
  • Travel itinerary
  • Miles flown as a frequent flyer
  • Travel agency used
  • Travel agent who made arrangements
  • Passenger travel status
  • History of one-way travel
  • History of not showing up for flights
  • Number of bags
  • Special services, such as need for wheelchair or special meals for dietary or religious reasons
  • Voluntary/involuntary upgrades
  • Historical changes to the passenger's record

A few observations:

  • Deciding whether any given variable is a good predictor for any other variable is a hard statistics/econometrics problem. It's not incredibly difficult if the effect is big and the variables are reasonably independent, but generally that's not the case. Doing a study like this with 25+ predictors is a major undertaking—whole papers are written on the topic of a single predictor. Think of all the work that's gone into the far easier question of whether moderate levels of drinking improve health.
  • The more variables you have, the larger data set you need in order to perform the fit. It's very hard to believe that we have anywhere enough data points (known terrorists plus their travel histories) in order to do a proper study. We of course have much more data on "criminals" but even then, answering questions like this with any level of certainty is really hard.
  • At least some of these predictors seem to have a high degree of multicollinearity. In particular, frequent travelers tend to have a lot of frequent flier miles, voluntary upgrades, and low baggage check rates. This kind of multicollinearity tends not to be great when you're performing regression analysis. Of course, it's certainly possible that whoever is doing the analysis pruned their predictor set, but it suggests that there's some gap between what we're seeing and what's really being done (or the analysis isn't being done properly). It's hard to distinguish these without more data.
  • Looking at a lot of the predictors here, it's pretty hard to believe most of them provide a useful signal. I know people with all kinds of travel histories from no travel to United 1K, and as far as I know none of them are terrorists. Personally, I've been everything from UA "General Member" to Premier Executive and haven't blown up any planes.

Unsurprisingly, DHS hasn't disclosed any real information about their methodology, but based on the above, I'm skeptical that they've done any real statistical modelling of the data (whether by regression analysis or neural networks or whatever). More likely, it's just some set of rules of thumb that DHS has come up with, maybe augmented by some ad hoc stats. While you don't necessarily need a model where everything is significant at p<.05, you definitely need effects that give you a very large odds ratio. Otherwise you end up with totally unacceptable false positive rates.

I can certainly understand that DHS wouldn't want to reveal their methodology. After all, in theory terrorists or criminals could change their behavior to evade detection. Of course they could perfectly well release whether they've actually found any terrorists using these techniques. On the other hand, if it turns out they don't work I could understand why they wouldn't want to release that either.

7 Comments

The multicollinearity can hurt or help, if the samples you're looking for are ones where people deviate from the norm. It could, for example, be the case that while the people you're looking for are frequent travelers, they tend to not sign up for frequent flier programs, and so *don't* have lots of miles, where you might otherwise expect them to.

Also, I tend to believe that they try and tell the press there's more scienciness behind these systems than there is in practice. I suspect strongly that in most cases, these systems are mainly large, well indexed data collection systems which don't generate predictions so much as provide a system whereby the feds can go get a list of all people who bought tickets at some given travel agency for travel between morocco and france in the last 2 years, after the French police raid the travel agency and discover nefariousness. That sort of thing.

Deciding whether any given variable is a good predictor for any other variable is a hard statistics/econometrics problem. It's not incredibly difficult if the effect is big and the variables are reasonably independent, but generally that's not the case. Doing a study like this with 25+ predictors is a major undertaking—whole papers are written on the topic of a single predictor. Think of all the work that's gone into the far easier question of whether moderate levels of drinking improve health.

This is exactly what anti-spam software has to do. In fact, more variables probably helps with prediction rather than hinders, which is why doctors actually use a batch of indicators to predict long term health.

The advantage that anti-spam systems have is much more data to work with.

Sure, more variables helps with prediction iff you have enough data to actually perform a fit. But if you don't have enough data it tends to lead to overfitting.

I know people with all kinds of travel histories from no travel to United 1K, and as far as I know none of them are terrorists.

One interview I heard suggested that the prediction model is not so much to catch terrorists as to weed out people who are very unlikely to be terrorists. My interpretation is that the false positive rate is quite high, but the false negative is very low, so the people who are identified at low risk can be sped through the checks.

Of course, whenever I hear someone answer the question of whether the system has been effective by "we can't tell you, that's classified," I'm very skeptical.

Actually, anti-spam is great example of why this type of thing is so hard to do. I still get a lot of false positives and false negatives with my various spam filters. And this is when you have a corpus of millions of identified positives and negatives with which to train/build your prediction system.

Seeing as how we have maybe a few dozen positives in the terrorist case and one false negative is what we're trying to avoid, the result of any prediction system will be an enormous number of false positives.

What's the big deal?

This score is strictly analogous to the FICO score, which is obviously accurate. Were it not, we'd have people borrowing too much and getting in trouble, and creditworthy folks denied access to cash. Neither of those happens with any appreciable frequency.

Right? Oh, waitaminute...

This might make sense (in technical terms, perhaps not public policy terms) for detecting likely criminals. It's probably hopeless for terrorists because the number of terrorists is so low. That means both that we can't narrow down the model very much (we're fitting a 1000 degree polynomial to 40 data points), and that even a very low false positive rate is probably unacceptable because in practice, it will give us thousands of false positives per correct detection.

I mean, if you're trying to detect drug dealers or car thieves, there are *lots* of those guys, so you can build up a good model. And among people you think are plausible drug-dealing suspects, maybe 1% really are drug dealers, so a false-positive rate of 1% means you needlessly hassle one person per criminal you catch. (Hopefully, this doesn't mean a no-knock midnight raid where you shoot a couple people by mistake.)

Leave a comment