What the heck is up with software facts labels?

| Comments (4) | TrackBacks (14) |
Chris Walsh points to in favor of Paul Black's Software Facts labels. The analogy here (explicitly made by Black) is to the "Nutrition Facts" labels found on food). But compare Black's sample label to a real nutrition label:

Software Facts

Name InvadingAlienOS
Version 1996.7.04
Expected number of users 15

Modules 5 483   Modules from libraries 4 102

     % Vulnerability

Cross Site Scripting 22 65%
Reflected 12 55%
Stored 10 55%

SQL Injection 2 10%

Buffer overflow 5 95%

Total Security Mechanisms 284 100%
     Authentication 15 5%
Access control 3 1%
Input validation 230 81%
Encryption 3 1%
    AES 256 bits, Triple DES

Report security flaws to: ciwnmcyi@mothership.milkyway

Total Code 3.1415×109 function points 100%
     C 1.1×109 function points 35%
Ratfor 2.0415×109 function points 65%

Test Material 2.718×106 bytes 100%
     Data 2.69×106 bytes 99%
Executables 27.18×103 bytes 1%

Documentation 12 058 pages 100%
     Tutorial 3 971 pages 33%
Reference 6 233 pages 52%
Design & Specification 1 854 pages 15%

Libraries: Sun Java 1.5 runtime, Sun J2EE 1.2.2,
Jakarta log4j 1.5, Jakarta Commons 2.1,
Jakarta Struts 2.0, Harold XOM 1.1rc4, Hunter JDOMv1

Compiled with gcc (GCC) 3.3.1

Stripped of all symbols and relocation information.

The most obvious thing about the Nutrition Facts label is that it's designed to be interpreted in an almost completely context free fashion. I don't need to know what a carbohydrate is or whether cholesterol is good or bad for me because the label tells me how much I'm supposed to eat and that this stuff, whatever it is, has about 10% of my total daily cholesterol intake. In fact, you can construct a mostly balanced diet (though certainly not a maximally healthy one) by just putting together a basket of foods that gets you to around 100% of each nutrient. The one big missing piece of information here is a list of all the vitamins you should be getting because that list would show mostly zeros here). But it's easy to get that list and then you can still use the adding up procedure. And you still don't need to know why you need Vitamin B1--just make sure you get some.

Of course, people in general do have some opinions about what their nutrient intake should be (e.g., low-carb or high-carb diets, reduced sodium levels), and the label also provides minimal information that lets you adjust your diet in line with such macro nutritional goals. But even then, relatively minimal context is required to understand the label. I.e. if you've been told to eat no more than XX mg of sodium, it's a simple matter of addition to work out what you should be eating.

Now let's take a look at Black's label. The first block (Name, etc.) is just identifying so we can pretty much ignore that. The next block reads:

Modules 5 483   Modules from libraries 4 102

     % Vulnerability

Cross Site Scripting 22 65%
Reflected 12 55%
Stored 10 55%

SQL Injection 2 10%

Buffer overflow 5 95%

Total Security Mechanisms 284 100%
     Authentication 15 5%
Access control 3 1%
Input validation 230 81%
Encryption 3 1%
    AES 256 bits, Triple DES

Report security flaws to: ciwnmcyi@mothership.milkyway

The first line is obviously supposed to be an analogy to "Calories/Calories from fat", but you can see immediately that there's something wrong here. First, there's no reference anywhere in the label to tell you how many modules there should be or how many should be from libraries. This isn't a simple matter of missing instructions because there's no reasonable consensus on the answer to either of these questions. Indeed, there's no reasonable consensus on how to even count modules in a particular program. (Are SSLv2 and SSLv3 different modules in OpenSSL? How about SSLv3 and TLS? Is OpenSSL one big module?) So, this first line is basically meaningless.

The second chunk here, % Vulnerability, is simply baffling. I think the numbers after the attacks (e.g., Cross Site Scripting 22) are meant to be counts and then we read that we're 65% Vulnerable to Cross-Site Scripting and 95% vulnerable to buffer overflow. What the heck does this mean? In the Nutrition Label, case, these percentages mean something very specific: the fraction of the RDA of this particular nutrient that this product contains, but that doesn't seem like what it means here, unless the point is that the recommended number of buffer overflows is a little over 5. So, I have no idea what these numbers mean, and I doubt Black does either.

Even if we ignore the percentages, the raw counts are totally meaningless. There are all sorts of vulnerabilities which the vendor doesn't know about, so how are they supposed to report them? On the other hand, if we're just going to report the ones the vendors know about, that's not really that useful because those are presumably the ones they're fixing and what we're really concerned with is the ones that will be discovered tomorrow?

Next we turn to the Total Security Mechanisms block. Again, this leaves us with the problem of defining a security mechanism: is SSL a single mechanism? Or is each algorithm its own mechanism? Each cipher suite? How about the PRF? The Finished message is one or two? Each X.509 extension? The mind boggles. The percentages here are equally baffling. Should they add up to 100? They don't. And even if they do, how do we do the math? Does 256-bit AES count for twice as much as 128-bit AES?

Moving on, we come to:

Total Code 3.1415×109 function points 100%
     C 1.1×109 function points 35%
Ratfor 2.0415×109 function points 65%

Test Material 2.718×106 bytes 100%
     Data 2.69×106 bytes 99%
Executables 27.18×103 bytes 1%

Documentation 12 058 pages 100%
     Tutorial 3 971 pages 33%
Reference 6 233 pages 52%
Design & Specification 1 854 pages 15%

Once again, we get a bunch of descriptive information without any normative context. Is it good that this software has C in it? How about Ratfor? I've got my opinions but this isn't really something that your average user can be expected to assess for themselves. The problem becomes even worse when we get to Test Material and Documentation. First, it's almost impossible to know what appropriate values are here. Second, it's incredibly trivial to game them even if we did have recommendations. Test material's too small? Here's a big file full of zeros. Documentation's too long? shrink the font. You may be able to standardize this stuff, but I doubt that you will be able to do it in any way that's not easy to game.

Finally, we have a block containing some "ingredients":

Libraries: Sun Java 1.5 runtime, Sun J2EE 1.2.2,
Jakarta log4j 1.5, Jakarta Commons 2.1,
Jakarta Struts 2.0, Harold XOM 1.1rc4, Hunter JDOMv1

Compiled with gcc (GCC) 3.3.1

Stripped of all symbols and relocation information.

I'm trying to figure out why anyone would need to know this stuff at the level of label reading. I dbout that one person in 100,000 cares what compiler some piece of software was build with. And to the extent people do care, they surely want to know stuff like the compilation flags and the header files it was compiled against. Perfect stuff for some nerd-oriented appendix, but hardly of much use to the average user deciding whether to buy the software. Note, again, the big difference between the nutrition facts label, which is totally usable to a layman, and this, which is practically impenetrable, even to an expert.

Obviously, this is a strawman that's intended to be evocative and the particular information set being described here could change, so why am I focusing on the details like this? Because I don't think that it can be made much better given the current state of knowledge. The computer security community is almost completely unable to offer any objective, easy-to-understand tools for assessing the prospective security of a software product. And when you ask people to do so, you get the kind of data dump of mostly irrelevant descriptive information that this kind of effort represents. Would it be great to be able to succinctly tell users what kind of security they could expect? Sure. But it's not something we're even close to ready for and we won't be until we understand the problem domain much better than we do now.

14 TrackBacks

Listed below are links to blogs that reference this entry: What the heck is up with software facts labels?.

TrackBack URL for this entry: http://www.educatedguesswork.org/cgi-bin/mt/mt-tb.cgi/449

With Microsoft Office Student and Teacher Edition 2003 you can work at home with the full versions of the latest Microsoft Office products to create the world's most widely used documents, spreadsheets, and presentations--at a low price. You can even i... Read More

young girls shemale escorts shaved pussy young teen sex Read More

poker casino134 from poker casino134 on February 9, 2006 10:01 AM

poker casino poker 204 Read More

futon covers from futon covers on February 9, 2006 9:59 PM

futon covers leather sofas leather sofas waterbed mattresses waterbed mattresses Read More

waterbeds waterbeds futon beds futon beds Read More

waterbeds memory foam mattress memory foam mattress air mattress air mattress Read More

waterbeds papasan chair papasan chair aerobed aerobed Read More

partypoker video poker video poker casino casino Read More

cars free ringtones free ringtones poker poker Read More

boise real estate from boise real estate on February 23, 2006 7:01 AM

boise real estate monarch beach real estate monarch beach real estate olympia real estate Read More

free verizon ringtones from free verizon ringtones on February 25, 2006 3:51 PM

funny ringtones funny ringtones free verizon ringtones free verizon ringtones Read More

automobile insurance from automobile insurance on February 25, 2006 6:28 PM

platelets lynch shoddy:disfigure Anselm yea diet Keenan atonement!travel insurance http://www.unique-insurance.com/ Read More

free polyphonic ringtones free polyphonic ringtones ringtones for motorola ringtones ... Read More

4 Comments

The only thing in this post with which I even slightly disagree is the final sentence. I think our ability to "measure security" is constrained more by a lack of data than by a lack of domain-specific understanding.


I hasten to add that I may well be wrong.


I pointed favorably to Black's label not because I think it is by any means ideal, but because I view the very fact that it is being discussed as a positive sign.


As I put it in the post you react to:


"Hopefully, continuing research and greater data availability will allow us to have a more compact and tractable for non-geeks version of this [referring to the label] instead of a shrink-wrap license".

If this label idea goes anywhere (and I think of the label more as a metaphor than as a prototype), we can look forward to political wrangling akin to that associated with the USDA's food pyramid. Oh, boy!

Well, if by data you mean empirical measurement-type data, then sure. The problem is that we don't have any good idea what the marginal increase in risk associated with an additional vulnerability is. It's not even clear that this is a well-formed question.

As for whether it's a positive sign: what I'm starting to see is a lot of attempts to construct completely ad-hoc metrics with no real evidence that those metrics correspond to anything useful--probably because actually validating those metrics is hard work. I'm not sure I see that as a positive sign.

Tried trackback, don't think it worked: Some comments - http://spiresecurity.typepad.com/spire_security_viewpoint/2005/10/software_securi.html.

Btw, I think the software facts label was a Jeff Williams of Aspect Security idea that Paul Black has spurred on.

Couldn't agree more EKR. The only thing I'd like to add is that bad documentation can be worse that any documentation. In this case giving the user information that they won't understand, is likely to increase confuion and misunderstanding - not lessen it.

Leave a comment