How to ask sensitive questions on a survey

| Comments (6) | TrackBacks (26) |
Let's say that you want to do a survey where you ask people questions they might not want to reveal their answers to, e.g. "Have you ever taken illegal drugs?"

Here's the problem. Let's assume that we just simply ask the question and population fraction π has the attribute you're interested in measuring (e.g., they've smoked pot or whatever). Unfortunately, only fraction λ of those people are willing to admit it. So, when you do your survey, πλ answers "Yes". Say this value is F. This doesn't help you much: you now know that at least F people have the attribute, but you have no way of measuring the upper bound. All you know is that the true value of π is somewhere between F (if λ=1) and 1 (if λ=F). Obviously, this technique works well if you have a good estimate of λ, but poorly if you don't.

The standard methodology for removing this kind of error is called a Randomized Response survey, which uses secret randomness to mask the response. The basic method looks like this:

  1. Interviewer asks question. E.g. "Have you ever smoked marijuana?". We'll assume that "Yes" is the sensitive answer here.
  2. Subject flips a coin.
  3. If the coin comes up heads, the subject answers "Yes".
  4. If the coin comes up tails, the subject answers the question truthfully.

The results can be summarized in the following contingency table:

Coin flip result
Heads (.5)Tails (.5)
Smoker (π)YesYes
Non-smoker (1-π)YesNo

In this survey, the only people who will answer "No" will be people who both flipped a tails and haven't smoked marijuana. Because these two are independently distributed, the fraction of No answerers will be approximately (1-π)/2. This makes it very easy to estimate π. We simply take the No response rate, N, and compute 1-2N, which gives us our estimate of π.

Now, obviously the above assumes that people answer truthfully, which we don't know that they'll do. We need to ask whether it's reasonable for people to answer truthfully. Without randomization, the reason that people don't answer truthfully is that it reveals information about them. I.e., if you say "Yes" then the interviewer knows you're a smoker. With a randomized response design, the interviewer gets some information: if you say No you definitely are not a smoker, but if you say Yes you might or might not be.

Remember that the Yes response rate Y is given by 1-(1-π)/2 = .5 + π/2. Out of that set, Y, π will have actually smoked marijuana and .5-&pi/2; will have not, but will have answered yes because of the coin flip. (π/2 will have smoked marijuana but also flipped heads). Now, assume that the researcher has done the study and made his estimate of π. This means that his a priori estimate is that an arbitrary person he meets (who he hasn't asked the question of) has a π chance of having smoked marijuana. Now, if you answer "Yes" to the randomized question above, he can adjust his estimate: you now have a &pi/(.5 + &pi/2) chance of being a smoker.

How much does this improve his information? It depends on the value of π. If π is relatively small (e.g. .1), then .5 + π/2 is approximatley .5 and so the new estimate becomes 2*π--the survey question has caused the interviewer to double his estimate of your chance of being a smoker. On the other hand, if π is fairly high (e.g. .5) then .5 + π/2 starts to approach 1 and the interviewer gets less information about you in particular. In no case does this technique let the interviewer more than double1 his confidence of your positive status, so the amount of individual information leakage is fairly small.

Of course, this demonstration that not that much information is transmitted, while using only simple probability theory, is still somewhat involved, so it's not entirely clear that interviewees actually answer truthfully when this technique is used (see here for one analysis). Nevertheless, this general kind of survey design is very widely used to elicit answers to embarassing questions.

1. The limit at a factor of 2 is a result of the 50/50 nature of the coin flip. If we used a die roll so that people answered truthfully (say) 2/3 of the time the advantage would be larger.

26 TrackBacks

Listed below are links to blogs that reference this entry: How to ask sensitive questions on a survey.

TrackBack URL for this entry: http://www.educatedguesswork.org/cgi-bin/mt/mt-tb.cgi/96

Recently, I stumbled across a logical economics space where a decision had to be made and no rational information was available. It wasn't exactly that there was no information, but that there was too much noise, and the working hypothesis... Read More

angeles audi dealership los from ceixnoirs.dyndns.org on July 17, 2005 6:38 PM

audi dealerships beechmont audi audi stereo audi a6 owners audi tt dsg performance 2006 audi rs4 audi le mans audi 90 quattro apr audi 98 audi a4 1.8 turbo cold air intake 98 audi a4 1.8 turbo cold air intake audi body kit Read More

young gay boys free gay pics gay hentai free gay sex movies gay anal sex gay blowjob Read More

naked gay men naked gay men gay hentai free gay pics gay wrestling free gay stories Read More

saratoga friend finder from friend finder sites on August 26, 2005 9:17 AM

hacked username and password for adult friend finder adult personals in area code 44811 adult friendfinder gay 100% free adult personals free adult swinger personals online adult personals oriental adult personals adult personals alaska adult friend fi... Read More

Toyota says Prius can accelerate from 0-60 mph in about 10 seconds, anemic by modern expectations, but then we’ve come to expect a lot. Read More

2005 Ford Mustang from 2005 Ford Mustang on September 12, 2005 4:19 AM

The 2005 Ford Mustang is offered in V6 and V8 Deluxe and Premium trims, each with slightly more generous standards packages. Read More

The H2 wagon and new H2 SUT are built by AM General in Indiana from a General Motors design and are sold through GM dealers. Read More

Viper’s big, vented discs with ABS get it done nicely in the braking department, hauling the car down from 60 mph to zero in less than 100 feet. It sticks, and it stops. Read More

Only a 2-door convertible body style with 4-wheel drive returned for 1995, Samurai’s final season. Read More

It may not be the two-door muscle car you remember from the ’60s, but the new Dodge Charger takes that formula and adds things like a usable interior and lots of standard equipment. Read More

Mitsubishi Eclipse from Mitsubishi Eclipse on September 26, 2005 1:48 AM

The Eclipse Spyder is generally pleasant to drive, thanks to a smooth power delivery and a compliant suspension. Read More

japanese literature from literature during the british renaissance on September 26, 2005 5:52 AM

irony in literature gender roles in childrens literature literature organizer write literature review literature review examples my side of the mountain literature guides irish literature michael storey literature rack literature symbols medieval liter... Read More

The Mini Cooper gets a few interior enhancements for 2005, including new map lights and cascade lighting located on the center of the top windshield frame and illuminated door handles, all designed to improve night-time interior visibility Read More

Viewed by itself, the Ford GT can easily be mistaken for the original Le Mans-winning GT40 race car and its successors. Read More

Jeep Wrangler from Jeep Wrangler on October 4, 2005 12:13 AM

The versatility and functionality of the 2005 Jeep Wrangler is increased with the additional 13 inches of space behind the Unlimited’s second row, giving the Jeep double the amount of cargo capacity. Read More

nfl betting from nfl football betting on December 10, 2005 7:08 AM

TITLE: nfl betting URL: http://www.freehost.ag/nfl/footballbetting.htm IP: 72.36.222.3 BLOG NAME: nfl football betting DATE: 12/10/2005 07:08:09 AM Read More

enlarge it from male enhancement on December 26, 2005 3:03 PM

TITLE: enlarge it URL: http://h1.ripway.com/penis-enlargement/ IP: 81.94.235.50 BLOG NAME: male enhancement DATE: 12/26/2005 03:03:08 PM Read More

Girl photos from Sex movie free download of hollywood on January 1, 2006 8:16 PM
jersey from jersey collection on January 9, 2006 9:49 AM

TITLE: jersey URL: http://jersey.fasthoster.de/ IP: 209.6.25.110 BLOG NAME: jersey collection DATE: 01/09/2006 09:49:02 AM Read More

TITLE: designers handbag URL: http://handbag.ha.funpic.de/ IP: 203.177.50.98 BLOG NAME: replica handbag DATE: 01/12/2006 07:27:34 AM Read More

TITLE: hoodia URL: http://www.20mbweb.com/Health/hoodia/ IP: 66.128.32.7 BLOG NAME: hoodia DATE: 01/17/2006 11:33:15 AM Read More

Free Ringtones from Free Ringtones on January 24, 2006 5:05 AM

Free Ringtones Read More

Adipex cheap sudden adipex cheap adipex p effeminate adipex p. Read More

6 Comments

I don't get it--why on earth would you want to reduce the amount of information the subject is giving you? Isn't the goal to make the subject more comfortable about giving you as much information as possible?

Now, this trick may do that, in cases where the subject understands probability theory well enough to figure out that he or she is giving out relatively little information with his or her answer, and can therefore follow the rules safely. But in that case, wouldn't it be better still to enact this same procedure with a secretly biased coin that almost always comes out to "tails"--unbeknownst to the subject--so as to maximize the total information gathered? And if the subject is bad at probability theory, mightn't he or she conclude just about anything--including that this scheme is all a big trick to wheedle information out of him or her, and therefore that it'd be best to lie, or refuse to answer, or run screaming from the study? And mightn't there be much more effective ways of persuading subjects to answer truthfully, without losing any information at all?

Or has this method actually been tested against alternatives and found to be (to my surprise, if it's true) the most comforting for subjects?

Well, the meta-goal is to produce the most accurate estimate of the underlying population statistic. If the subjects lie when asked directly, then you're not getting good information. So, you may get more accurate estimates if the subject thinks they're leaking less individual information--and since you don't actually care about this particular subject's status... Why do you find this concept difficult?

As for the question of whether this works, it's been extensively studied and the results are a bit mixed. In fact, the original post contained a link to a meta-analysis of a variety of studies on this topic that also included references to many of those studies, so you might wish to start there.

The part I have trouble with is the jump from "if the subject thinks they're leaking less individual information" to "if the subject is leaking less individual information". It appears to me that the designers of the protocol made this jump, assuming that if they designed a protocol which reduced the amount of information leaked by each subject, then the subjects would also think they were leaking less information, and therefore be more honest.

I would have thought it more effective to design protocols that make subjects feel more comfortable about being honest--either by giving the impression of less information leakage, or by reassuring the subject that revealing the information is okay--while preserving as much actual information revelation as possible. To me, the protocol you describe sounds like the worst of both worlds--excellent actual information reduction properties, using a probabilistic technique that might well not reassure anyone lacking a sophisticated understanding of probability theory that it's safe to be honest.

Well, whether this works or not doesn't really depend on your opinion--it's an empirical question. Maybe it works, maybe it doesn't, but this kind of armchair analysis doesn't strike me as a very good way to determine the answer.

Yes, of course, it's a purely empirical question. That's why I put the final question in my first comment--to make it clear that while my intuition told me that this would be a lousy method, I was perfectly willing to accept experimental results that contradicted my intuition. When you came back with the answer that the empirical results were in fact inconclusive, though, I started to suspect that I might be on to something.

I should note that the page you linked to, and one of the references it cites, mention several variations on the "randomized response" technique which might be statistically identical or even inferior to the one you described, but still psychologically preferable. (That is, they may yield less information assuming completely honest respondents, but nevertheless work better because they're better at encouraging honesty in respondents.) For example, if you have the responder answer according to a second coin flip, rather than "yes", in the case of "heads", then very little extra information is lost, but the responder no longer has the temptation to rule out an embarrassing revelation entirely simply by answering "no".

Years ago, I read a bit about a large study of family violence (Straus' and Gelles' National Family Violence Survey--the one which acheived some notoriety by discovering that women were only slightly less violent than men towards their spouses). Their survey did not use randomized response, but they did use various methods to encourage accurate responses, including gradually ratcheting up the embarrassment level of the questions, treating the embarrassing questions as natural followups to the less embarrassing ones, and carefully phrasing the questions so that the supposedly embarrassing answers would seem as normal as the unembarrassing answers. It seems quite plausible to me (although, again, this is purely an empirical question) that such techniques may be much more effective than randomized response at persuading people to answer such questions truthfully.

"I should note that the page you linked to, and one of the references it cites, mention several variations on the "randomized response" technique which might be statistically identical or even inferior to the one you described, but still psychologically preferable."

You seem to be under the misimpression that my post was intended to provide a complete tutorial on randomized response theory, rather than providing an introduction to a technique I thought was interesting. The basic insight that randomness might help is what's important.

The rest of your comment is just you rehashing your intuition, which, as I previously noted, isn't really that dispositive.

Leave a comment