When User Studies Attack

July 8, 2005 by Ping

71% of office workers stopped in the London Underground seemed willing to give their password in exchange for a chocolate bar, but we don’t know if those passwords were real.  MailFrontier ran an online phishing IQ test, but it’s not externally valid because the user has the wrong primary task.

Rob Miller highlighted three challenges that we face:

  1. In the real world, security is a secondary task for most users, so we can’t make it a primary task in a controlled user study.
  2. In the real world, users are protecting real personal data, but security may seem less important in an artificial scenario.
  3. In the real world, users’ rights are really violated.  But in a study, we can’t ethically do that.

I’ll add one more, which was suggested to me by Doug Tygar:

  1. In a typical controlled study, experimenters devise a fixed attack.  But in the real world, attackers adapt their attacks in response to security measures.

How can we conduct studies that test how users behave under attack, while achieving a proper balance of ethical considerations and validity of the results?

Usability folks evaluate their work by running controlled user studies, because they recognize that the only way to know human behaviour is to observe real humans.  Security folks evaluate their work by trying to crack it, because they recognize that attackers adapt to the available vulnerabilities.  So, how about running a competitive user study where grey-hat teams compete to attack subjects?  Could such a study be conducted in an ethical way?

In the Indiana University study, students received messages from indiana.edu addresses asking them to visit a non-Indiana-University website, which asked for their Indiana University password. Remarkably, the experimenters were able to get human subjects approval for their study even though it attacked users without obtaining their consent.

There were four factors in obtaining the waiver of consent:
1. The research involves no more than minimal risk to the subjects.
2. The waiver or alteration will not adversely affect the rights and welfare of the subjects.
3. The research could not practicably be carried out without the waiver or alteration.
4. Whenever appropriate, the subjects will be provided with additional pertinent information after participation.

 

Simson Garfinkel asks:

How do we determine the proper dosage of intervention?

We don’t have to identify studies as security studies. Should we do a comparative test of consent forms to see if they affect the experiment?

Does it introduce bias to offer users increased payment to motivate them to avoid attacks?

Can we look at attacks as a learning experience?

 

Usable Security - a new UX and Security blog

Well, I’m a bit late to the party but that doesn’t mean I can’t still welcome Ping and his Usable…

 

I’ve been looking at security as a ‘good’ in economics terms. So far, I’ve come up with a bunch of characteristics. My main question is how the market for security works, and in this sense I’ve narrowed in on how a user tests a security ‘good’ which bears some parallels to your question of usability tests.

My ‘goods’ show these characteristics so far:

* a test of the product by a simulated threat is:
- expensive and/or destructive, and/or
- the results cannot be relied upon to predict defence against a real threat;
* The attacker is an active actor in the process:
- profits by success
- bypasses defences and avoids statistical boxing
- can be relied upon to be dishonest
* a test of the product by a real threat is:
- difficult to arrange,
- insufficiently frequent to facilitate statistical analysis,
- could be destructive, and
* the results cannot be relied upon to predict defence against any other real threat.
* An actual event is both
- destructive and costly, and
- designed to minimise response, neutralising aggressive defence tactics, and
- versatile, seeks to migrate.
* The result is a negative sum game, in that
- as long as the attacker’s profit exceeds their costs, it is worth doing,
- the costs to the victim will generally exceed the profits to the attacker, the difference being damage,
- may or may not be an insurance market to compensate.

Quite a challenge!

 

Just throwing out some brainstorming ideas here. This idea is mostly derived from a dinner conversation I had with Ping, Tyler Close, and Sue Butler.

Suppose you design your study like this:

1. Have two comparable tools/techniques/whatever that you wish to compare.
2. Instrument both of them to collect the results that you want to measure, to (optionally) anonymize and to submit the results back to you.
3. Solicit test subjects who can be either
a. fully informed — there are two tools, this is one of them, this is what the tool does, it is going to report results back to us, etc.
b. partially informed — this is a tool, this is what it does, it is going to report results back to us, etc.
4. Profit! Err, I mean, publish!

As I recall from my conversation with Ping et al., the hard part of this strategy is automating the measurement and anonymization of the results.

You know what I hate about web log discussions? I never know if anybody is going to read the “comments” section of this old (16 days old) post. The “comments” in a web log are the worst of both worlds: well-meaning interlocutors will probably not read them any more after a day or two, but spam bots, profilers, enemies can google them for all eternity.

As a small mitigating measure, participants in discussions on this blog receive e-mail notifications of new comments. But that only includes people who have previously posted at least one comment.

 
 
 
Measure for Measure wrote:

—– How can we conduct studies that test how users behave under attack, while achieving a proper balance of ethical considerations and validity of the results?

One approach would be to try a *natural experiment*.

There are a number of organizations who do business, even conduct transactions, using third party websites. (e.g. Capital 0ne, The Art Institute of Chicag0). It wouldn’t be too difficult to compare their existing business model to a phishing attack. Collecting data on the transactions that these operations successfully carry out might provide a decent measure of consumer (non)resistance to transactions with dubious security.

Another natural experiment would involve websites who process credit cards without Verisign or the like. Yes, a few still exist.