Fail-what?

| Comments (5) | COMSEC Networking
In my previous post about SWORDS robots, I referred to "fail-safe" and "fail-unsafe" strategies. Now, clearly, if you're a civilian in the line of fire of a killer robot, you'd think a strategy in which the robot shut itself down when it couldn't communicate with base to be "safe", you might feel a little differently if you were a soldier who had to go out into enemy fire because a minor communication glitch caused your robot to shut down.

As another example, take a system like Wireless Access in Vehicular Networks (WAVE), which provides for communications between vehicles and between vehicles and road-side units. WAVE can be used for safety messages, such as the Curve Speed Warning message, which allows a station at the side of the road to broadcast the maximum safe speed for a given curve. Obviously, you'd like there to be some message integrity here to prevent an attacker from broadcasting a fake speed. Now, what happens when the integrity check fails; do you ignore the message?

A decent argument could be made that either ignoring or trusting such messages was "fail-safe". Obviously, ignoring them appears safe in the sense that your vehicle reverts to what it was without the WAVE functionality, so you haven't been damaged. On the other hand, the curve speed warning is designed to help safety (that's why it's being broadcast) so ignoring it is arguably failing unsafe! I don't really have a position on what's right or wrong here, but it should be clear that the terminology is confusing.

I've heard people substitute the terms "fail-open" or "fail-closed", but those are even worse. If you're an electrical engineer, a closed circuit means current flows and an open circuit means current doesn't. On the other hand, an open firewall means that data flows but a closed one means it doesn't.

I don't know of any really good terms, unfortunately.

5 Comments

Fail-safe vs fail-unsafe are two terms that depend very heavily on the definition of safe. As you point out the soldier has a very different mission than the civilian in the robot's path. Take the case of the SST (Space Shuttle). From the astronauts' point of view (and the accountants), fail-safe means mechanisms that protect the orbiter and crew. From the RSO's (Range Safety Officer) point of view, any deviation that puts civilians at risk means SST destruction. Hence, these terms are useless without the context and contracts around their employ.

How about fail-stop and fail-go?

I came in here to suggest "fail-active" and "fail-inactive", but I see that Kevin had pretty much the same idea.

Great minds think alike ;-) My original thought was in fact active/inactive, then I got worried that some discipline might have co-opted those terms to have an esoteric and undesirable meaning. I figured no discipline would bother with go/stop. Plus they are shorter for us lazy speakers and writers.

I'm ok with fail-stop and fail-go, however, they convey something different than fail-safe and fail-unsafe. In a way - they are orthogonal - and ultimately you cannot select which is safe and which is unsafe without considering the actor / observer relationships to the system.

I don't think I've ever thought about the duality of this problem before - that there is a failure mode in terms of system nominal operation and a failure classification from a safety point of view - the latter being quite subjective / relative.

Interesting stuff.

Leave a comment