In Search of Evaluation

August 6, 2005 by Ping

In a recent e-mail message, Ian Grigg wrote that security professionals often seek perfection whereas users typically deal in fuzzy probabilities and moderate risks.

I run into this conflict in perspective all the time, since I frequently alternate between talking with security folks and usability folks, and am constantly amazed at what is obvious to one group and not the other (or is “obvious” to both groups, yet conflicting).

There are good reasons behind both philosophies.  Understanding their origins can help us address the valid concerns raised by both.

1Actually, this is only true if the flaw is in the TCB.  Sadly, most software today (e.g.  Windows) flagrantly violates the principle of least privilege so, in effect, everything is the TCB, and security has in fact tended to zero over time.  There are a few cases where programmers have been careful to build systems out of defensively correct components (e.g.  privilege-separated SSH), and this has helped to prevent security from tending to zero over time.

The fundamental thing that makes security different from other computer-related disciplines is the presence of the active adversary.  In particular, attackers adapt to security measures, always looking for new attacks.  What this means for the practitioner is that the security of any flawed program tends to zero over time.1 The tiniest flaw becomes a vulnerability as large as any other at the moment that exploit code is posted online.  Given this assumption, the only way to evaluate the security of a system is to attack it.  Staying ahead of the game is a matter of attacking our own systems more vigourously than the attackers do, then patching every flaw we find.

The fundamental thing that makes usability different from other computer-related disciplines is the complexity of human behaviour.  What this means for the practitioner is that predictions and models of how people will use human-computer interfaces are frequently wrong.  Consequently, the only way to evaluate the usability of a system is to test it with real users in a controlled experiment.  Staying ahead of the game is a matter of designing, testing, re-designing, and iterating as much as possible until a particular design demonstrates success in practice with the real target audience in its real context.

When you compare the two perspectives like this, it becomes clear that each group’s methodology fails to address the other group’s primary concern.  How can we evaluate the security of human-computer systems when the users, the software, and the attackers are all moving targets?  The best I’ve been able to come up with so far is this:

Hold yearly competitive user studies in which teams compete to design secure user interfaces and develop attacks against other teams’ user interfaces.  Evaluate the submissions by testing on a large group of users.

This would at least try to address the adaptive-adversary and human-complexity issues while providing for yearly iteration of designs.  It’s a tall order, though.  Do you have any ideas for ways to make it more practical, or for other hybrid evaluation methodologies?

Do we want secure user interfaces, or simple ones?

I see much of the problem with phishing as being the banks sending dancing rabbits, which distract from the funky URL.

If you send a text note saying “Please visit us at your bookmarked URL,” you gain much security by eliminating noise.

This is, of course, over-simplified, but I believe is worth considering.

 

This thread raises important questions. But I think there is a more fundamental problem right now. The security folks don’t know much about doing user testing and they are just not doing very much of it. As an example, look at the article “Trusted Paths for Browsers” in the May issue of TISSEC http://doi.acm.org/10.1145/1065545.1065546. From a security research perspective, this is really nice work and worthy of publication in such a highly-regarded journal as TISSEC. But if the authors had submitted it to ToCHI it would have surely been rejected. In the user study they trained the users to use their proposed interface to identify spoofed web sites, then they presented users with a number of web sites and asked them to identify which ones were spoofed. They did this under a few different conditions and learned something about which conditions lead to greater success. But even under the most successful condition, they didn’t get better than 89% accuracy. I would expect that in more typical use in which a user is browsing as part of their every day activity, not thinking about their training, they are going to do worse than that. How much worse? We don’t know, the study hasn’t been done. I think we really need to see a study with this system used as part of normal browsing activity (very difficult to study in a controlled way) or a study similar to the Johnny studies (originally Tygar and Whitten, more recently Garfinkel and Miller) to test this sort of system. The Johnny studies are laboratory studies, but they put users in a role playing situation in which they seem to be motivated to defend themselves from attacks. My point is not to pick on this particular TISSEC paper - this is just the most recent example of a paper I’ve read that had this particular problem. My point is rather that security researchers are publishing papers with very weak user studies (if they include any user studies at all) and I think the first step is to get them to start thinkking about how to do better user studies, perhaps by working in partnership with HCI folks.

Lorrie

This thread raises important questions. But I think there is a more fundamental problem right now. The security folks don’t know much about doing user testing and they are just not doing very much of it.

What you say is true, but it seems to me that’s only half the issue. Yes, the security folks are mostly ignoring what matters to usability folks. But usability folks also need to acknowledge that a standard usability testing regime fails to fully address the concerns of security folks as well, because of the adaptive nature of attacks. I hypothesize that part of the reason security folks don’t do standard usability testing is that it doesn’t seem to provide the answers they need, and if we could come up with evaluation methods that had more of the attack-and-defend flavour of computer security research, security folks might be more interested in them.

 
 

Notes on security defences

Adam points to a great idea by EFF and Tor: Tor is a decentralized network of computers on the Internet that increases privacy in Web browsing, instant messaging, and other applications. We estimate there are some 50,000 Tor users currently,…