This brings us to the close of SOUPS 2009.
It’s sad. I know. But never fear! SOUPS 2010 beacons!
So, here’s hoping I’ll see all you usable security folks at Microsoft’s campus in Seattle around this time next year!
This brings us to the close of SOUPS 2009.
It’s sad. I know. But never fear! SOUPS 2010 beacons!
So, here’s hoping I’ll see all you usable security folks at Microsoft’s campus in Seattle around this time next year!
Moderator: Luke Kowalski, Corporate UI Architect, Oracle
Stuart Schechter, Microsoft Research
David Recordon, Open Platforms Tech Lead, SixApart
David Sward, Director of User Centered Design, Symantec
Nancy Frishberg, User Experience Strategist and BayChi chair
Rashmi Sinha, CEO, SlideShare
The opening premise is that Open Source is a neutral factor on usability of security software.
One of the first myths of open source security is that transparency of the code renders the result more secure and usable. Conversely, this level of exposure leaves it excused to unethical hackers.
There was some research that showed that over time the OpenBSD does indeed include fewer vulnerabilities over time. Over a period of 8 years the rate was shown to have reduced by half. It does improve. Maybe that wouldn’t happen with closed source, but Windows Vista vs. XP may exhibit roughly the same properties.
OAuth is like your valet key for the web. Google, Yahoo!, etc., had designed their own distributed auth protocols and it was proving annoying for developers, so the community came together and developed a common protocol. Then back in February, when Twitter started using OAuth for SSO, it was noticed that there was a security vulnerability in OAuth when used in this manner. The OAuth community came together with a proposed fix and involve the stakeholders in solving the problem.
Building on what David said, it is important that open source not only use open code, but open standards, because otherwise understanding and fixing it can be very difficult despite the openness.
There has recently been a shift from how vulnerable the code is to how vulnerable the user is. Social engineering is emerging as a greater danger than concrete code deficiencies. As such, whether the code itself is opened or closed is less important when it comes to the effective security.
One thing to consider is the scale of the project and the visibility. Merely being open doesn’t say anything of the expertise of who is looking at the code. Not all eyes are equal. Take Debian, which had an error in its OpenSSL implmentation for years - how many people knew about and were exploiting it without reporting it?
Ka-Ping Yee had done an experiment where he planted a bug in 550 lines of code and then set experts loose on trying to find it, but they had trouble finding it. Given that, what are the odds of finding an exploit injected into Firefox? Ka-Ping’s response: Static analysis alone, human review alone, and automated testing alone, are all easy to defeat; however, defeating all three in concert is actually quite difficult.
Open could make a statement that open source software gets fixed frequently. But is something going to be more usable because it can be changed more often, or does having such a flexible UI render it more or less confusing to the average user? Is this better or worse than packaged software that at least ensures a static UI between releases?
It’s important to remember that grabbing a nightly build of Thunderbird is not an average user. These projects actually have release schedules, official releases, and the like.
The Open Source community finds bugs in all sorts of manners, including those listed above, but many successful projects have dedicated developers. But how many UX designers are out there working on Open Source projects? It can be very difficult for a UX designer to get involved because the available infrastructure is generally aimed at code developers. Often these projects start with the code, but what happens to end user analysis? Who does these steps?
How do corporations participate in Open Source projects? Sun Microsystems contributed many developers to the Gnome project, providing over 450 different manners of icons for different resolutions and sizes. This seems like something unlikely to be done by a volunteer effort. Google open sources many projects and does Google’s Summer of Code, released Android, and encourages employees to routinely contribute to countless Open Source project. On the UI front there is also Season of Usability. Facebook has engaged the OpenID community and shared lessons from Facebook Connect. Even Microsoft has Open Source w/ Microsoft. Most of the contributions to Open Office come from Sun and IBM. Oracle will be contributing Btrfs to the Linux community. Part of the success of Open Source has been driven by the involvement of traditional organizations.
Were there any papers at SOUPS from the open source community? It doesn’t seem so. It would be great if the open source community was performing and reporting on the basic research.
One thing big differentiation is the supporting of bug reporting. Simson Garfinkel made the point that Apple is very responsive when he submits bugs, and tends to follow up and respond. The results on Open Source projects are generally pathetic. They respond okay to security bugs, but seem to entirely ignore usability bugs - UI tickets often get closed after two years of being ignored.
On the flipside, it is not uncommon for users to report having gotten Open Source bug reports addressed within hours, whereas the commercial software world is often comparatively slow to respond and get new versions out. Developers in the Open Source world are forced to serve many roles because the projects generally have much less infrastructure.
But there are a lot of Open Source projects that DO have a public relations team, a legal team, technical writers, and other functions more commonly associated with commercial software. Is one of the fundamental problems that the Open Source support tools are themselves unpleasant to use? Somebody suggested SuggestionBox.
Part of the problem with Open Source is rampant brand fragmentation. There are oodles of products with cute little names, oodles of non-profit organizations, lots of different metaphors and interaction styles and assumptions. Generally speaking, the companies tend to do a better job of managing their brand and the user experience.
But aren’t there lots of companies too? Why do we have Microsoft and Apple and Google and Yahoo!? When you have a big company and you have standards and have to follow those standards it can often stifle innovation. Maybe the milieu in the Open Source community is actually a strength! It is very easy for an Open Source extension to try new ideas, including those that break existing modes and assumptions. Even if only 1% of the ideas are great ideas, those ideas can quickly propagate and be adopted.
Very few Open Source companies are driven by a business model. Most are volunteer efforts, so it is unfair to apply the same expectations to Open Source projects. If you use Open Source and the UI is kinda clunky and you don’t like it…well…you’re not paying for it, so are you really setting the proper expectations?
Yes, there are some Open Source projects that care about usability and some that do not, but this true for larger closed-source companies as well. But another angle to think of this may well be large vs. small - when an Open Source project is catering to a relatively small community it doesn’t necessarily make sense to devote oodles of effort to being generically usable. For instance, a project aimed at highly technical users probably shouldn’t be wasting time trying to be usable by novices.
People seem to be more willing and interested in contributing their time to try and improve the product. When somebody comes across a bug on Google’s site they don’t necessarily go out of their way to write up a list of grievances and send them to Google, whereas they might for an Open Source project because the underlying models of Open Source invites that, whereas the underlying model of a large company may not be perceived in that way.
Part of the value of Open Source projects lies in their ability to quickly adapt and try out new protocols. Take OpenID as an example - an Open Source project can easily integrate it and even strip it out again if need be, whereas a large company may have a lot of bureaucracy and debate around adopting the new technology. Is this good or bad for general usability? Is flux in the UI overly confusing or a fertile avenue of innovation?
In many ways the Open Source community is a meritocracy. Even when experts in the field try to contribute to projects, sometimes they are ignored because they aren’t well known in the community. How do we bridge these gaps?
There is sense in both the Open Source and Closed Source communities that security bugs are very important and need to be addressed promptly. Usability bugs, however, are routinely ignored on either front, even when they cause poor security in practice. Why? How do we fix this? What should be done when an entire community makes such a fundamental assumption?
There is a huge opportunity for universities to encourage students to participate in Open Source development. It is good for the community, good for its users, and good for the students’ resumes.
The lines between conventional development models and Open Source are indeed blending. Conventional corporations are contributing more, while Open Source projects are gaining real-world adoption and commercial acceptance.
The collaboration between independent developers and companies often result in the most robust communities and projects. Ultimately the software ecosystem needs both and benefits highly from the collaboration between the two.
Open Source vs. Closed Source is roughly a wash when it comes to usability and security. Both have pros and cons, and neither is a clear win.
Its very possible that we’re asking the wrong question. The real concern is how this will drive commoditation of the industry.
The Open Source community has been found to be a ready and willing partner in conducting usability and security research. There should be more outreach to that community along these lines.
A lot of value seems to come from the capacity of the Open Source community to question the usability and security of the Closed Source community, rapidly innovate upon perceived shortcomings, and leverage their adaptability to demonstrate the best path forward for all to follow.
Diana Smetters and Nathan Good
Access control is a specification of policy indicating who can do what to whom. Access control is hard to use. People often get around it by granting overly permissive capabilities. Looking at Windows XP, there are over a dozen of checkboxes that can be flipped for each file! However, people like access controls - it has been shown that people like feeling they have control over their sharing.
The study examined how users actually make use of access controls. It focused on group memberships in Administrator-managed systems (Windows domain groups and Unix groups) and User-managed systems (DocuShare and e-mail mailing lists), and ACLs in DocuShare at a “medium-sized industrial research lab,” including ~300 users ranging across researchers, administrative staff, and interns. The systems have been in use for over a decade. The data was collected through active user accounts (IRB wouldn’t approve having the Administrator collect it across the entire system), and anonymized prior to analysis.
It was found that when users are able to create and manage their own groups, they belong to a lot more of them. 90% of DocuShare groups and 55% of mailing lists were closed. Only 13.4% of users owned groups, with group age ranging from four months to eleven years. User groups were often duplicated and sometimes had completely misleading names. The Administrator-created groups tended to be more organized, with more intuitive names. 5.2% of DocuShare objects had their ACLs explicitly modified, with it being more likely to see permissions explicitly set on folders than files. It was more common for updates to change who had access rather than changing what level of access they had. Though users created relatively few ACLs, they were surprisingly complicated when they were created.
Kurt Kluever and Richard Zanibbi
CAPTCHA’s are used for a variety of purposes, but most generally to combat spammers. A desirable CAPTCHA should be automatically generated, should not rely on secret databases or algorithms, should be usable, and should be hard to spoof. Most existing CAPTCHAs fail in one or more of these respects, usually usability.
This study proposes using video CAPTCHAs, in which videos are played and a human user is expected to propose appropriate tags for the video. The algorithm to create the CAPTCHA selects a random video from YouTube and uses text and metadata from Related Videos to generate an appropriate set of tags. Any tags that are too common, such as “funny” or “music,” are stripped out. The user’s results are graded by first being normalized to lowercase and removing punctuation and stop words and adding stem words so that “dogs” will match “dog,” and employed Levenshtein distance to allow for minor misspellings, etc.
There were two studies run online, one with 233 participants and one with 300, though only 143 and 184, respectively, completed the survey. The average completion time was 20 seconds in the first study and 17 seconds in the second study. Users reported finding the text CAPTCHAs faster, but the video CAPTCHAs more enjoyable. They simulated an attacker using tag frequency data from the database to guess for the video. For one twiddling of the settings humans were successful 77% of the time while the computer was only successful 2% of the time.
Richard Chow, Ian Oberst and Jessica Staddon
It is often important to share sensitive documents, but protecting privacy is important. A typical solution is do redact important bits, but often the redacted information can be recovered. Another approach is is to sanitize the data by replacing specific terms with more general terms that hide the underlying data without destroying utility.
The researchers created a tool that helps the user discover privacy risks in the document by scanning the terms in the document and comparing against their prevalence on the web and their linkability to known sensitive terms. The sensitive terms are highlighted to guide the user in sanitizing the document. As the user makes changes the document is continuously rescored so the user can evaluate the effectiveness of their changes. The tool also suggests replacement terms that improve privacy.
The study included twelve users instructed to sanitize two short biographies, of Harrison Ford and Steve Buscemi. Some users behaved differently when they were using the tool than when they weren’t. It seemed that when users were dealing with unfamiliar topics they relied on the tool’s judgement more than their own. The privacy achieved was measured by employing Amazon’s Mechanical Turk service to see if the actors could be identified by the sanitized biographies. The study was focused on preserving privacy and not on the biography’s resulting utility.
Ran Halprin and Moni Naor
Random number generation is important for many security tasks - especially cryptography. And yet getting good random numbers is notoriously difficult in practice. Sources of randomness traditionally include “secret” data such as MAC addresses; real-time data such as hard-disk access and click timing; physical sources such as lava lamps, cloud patterns, and atmospheric noise; and requesting the user do something randomish like striking a bunch of keyboard keys.
How can we safely ask the user? Humans aren’t very good at recognizing randomness, let along generating it. This study proposed involving the user in a competitive game on the idea that the user behavior will behave erratically during the game and that the user will be accepting of such a technique of gathering randomness. The study proposed a game, “Mice and Elephants,” which was designed to be easy to play, not require much in the way of resources, and encourage users to employ strategies with high entropy. The gameplay instructs the player to hide r mice on an grid after which the computer moves the mouse around the board trying to crush the user’s mice. To provide variety and encourage diverse placement, each board has obstacles added that obstruct a number of spaces. It is presumed to be safe to use system-generated randomness to build the board and move the elephant since these are used merely to prompt randomness in the user’s response. Further, the implmentation uses an extractor to smooth out any patterns in the user’s selections and incorporate the result into a robust PRNG. The game continues until the user has generated enough entropy.
The study included 482 players who played a total of 24,008 clicks. The participants did not know the experiment’s objective. They showed a bias towards corners and edges, and tended to hide multiple mice along the same axis. Still, it was found that there were generally ~10 bits of min-entropy generated per mouse hidden. Going forward the researchers would like to explore different types of games, such as those based on the camera or accelerometer, and have to compare the net effort, acceptability, and results to the total quality of non-game inputs.
Ronald Kainda, Ivan Flechais and Andrew William Roscoe
Out-of-band device pairing refers to pairing devices using a channel external to the devices themselves, such as through user interactions. Technical security is achieved by using protocols based on formal proofs and are governed by the quality of the secrets involved. However, the security achieved in practice must account for user behavior and responses to protocol failures and the like. Some common OOB methods are the users comparing fingerprints on the two devices and confirming they match, users manually copying the fingerprint from one device to the other, using auxiliary capabilities such as a camera on one device capturing the screen of the other device or sharing a secret via a memory card, and pairing using short-range channels such as infrared exchange.
The study compared the compare & confirm method, the copy & enter method, and a method where one device read a barcode from the other device. For the first two methods they considered both textual and numeric strings. They used the Nokia N95 and N73 with a custom P2P payment system. Participants were presented with written directions and surveyed on their impressions. The participants consistently preferred the numeric compare & confirm method, while pairing via images, barcode scanning or melodies were ranked last. Security failures are defined as subjects either indicating devices matched when the did not, or did not match when they in fact did. As measured by security, copy & enter and barcode scanning were rated as most secure. Combining these measures of usability and security, copy & enter was found to achieve the best balance.
Alfred Kobsa, Rahim Sonawalla, Gene Tsudik, Ersin Uzun and Yang Wang
Secure device pairing refers to the pairing two or more devices in a manner that can be trusted such that the users pair the devices they believe they are pairing without allowing a malicious third-party to join in the process. Generally this has to be completed in a context where the devices have no shared secrets, the users are not security experts, and the devices as mass-market consumer devices so turning to highly expensive solutions is not an option.
Various methods have been proposed to solve this over the years, from physically attaching the devices with a cable, using laser transceivers, to confirming the match of a code displayed on each device. This study sought to evaluate the usability and security of the most promising methods, preemptively rejecting those already shown to have poor usability or security. The first methods compared were users comparing PINs on two devices to match them and similarly comparing two images displayed on the two devices. They also examined button-enabled methods such as pressing a button on one device as the LED flashes on the second device, when the second device vibrates, or when the second device beeps. The study also tested variants of Loud and Clear, where one device speaks out a sentence that the user confirms is displayed on the other device. Finally, they tested a method which used the camera on one device to capture the barcode on the second device, a similar approach using a video camera and a flashing LED on the second device, and HAPADEP audio pairing.
The study included twenty-two subjects. The participants were presented with a scenario and then tasked with completing pairing with each method, assigned in random order, after which they submitted a questionnaire to evaluate the pairings. The study also recorded the pairing attempts for later evaluation. The PIN-compare methods were found to be the quickest and most usable, followed by sentence, then image comparison. Pairing via audio and button pressing got the lowest scores. Subjects perceived PIN-comparison as the most secure method.
Discussion session lead by Simson Garfinkel. Free form discussion follows.
(there were other sessions, but as I only attended this one, this is the only one I got to blog)
Simson wants to talk about system constraints rather than usability constraints. In practice, focusing on one at the detriment of the other simply creates an insecurity at one end or the other. Instead, focusing on both, ideally by leaving the UI constant, allows for a balanced approach.
Too often people assume that security can be achieve through minor tweaks to the UI. We should perhaps be more focused on architectural changes that bring the system more in line with expectations in the first place.
It feels like we cannot get ourselves to ask the right questions - too often we expect the humans in the loop to do things humans are known to be bad at.
Least privilege should be adhered to and seriously addressed, in particular, by allowing for a finer granularity.
Least privilege is very nice, but it has a fundamental assumption of static capabilities. If it isn’t static then it requires privilege management, which becomes more difficult the more privileges there are to manage and the more often they may change.
There are some things that users are pretty good at doing, such as knowing which specific file they want to open. This alignment is invaluable, because it allows for leveraging of the user’s intuition.
Architecture is key. UI is, in the end, the presented abstraction. A simpler, well-designed underlying architecture can allow for simpler, more usable abstractions.
The firewall is a perfect example of a flaw in practice. The firewall was invented to buttress failures at the OS level. If there is a sufficiently strong firewall then an organization can make useful assumptions about the environment within the firewall.
Windows, Mac, and Linux all run applications with the capability to touch all of the user’s files. In many ways this is too large a level of granularity. What about the web? In many ways, it has a much more usable security model because sites are genuinely separate from one another.
What are areas where we can have big changes with small amounts of work? For instance, we have encrypted filesystems. How could we make use of these to, say, enforce least privilege at an architectural level, such as an installer limited to accessing only a specific folder.
What if the installer was part of the OS instead of trusting third-parties to decide what gets installed on your system? Then the OS has an opportunity to monitor and restrict the capabilities of the installer and limit dangerous behaviors. A good example is the installer for Google’s Android OS.
Another idea for a trusted path is an equivalent of Alt-Tab on the PC, where there is an app-switching UI that is rendered by the OS.
When you delete a file, it should actually be deleted. In Windows Vista, when you format a hard-drive it finally actually formats the hard-drive.
If you have version control and good tracking of changes, then write access matters less because you can always go back to an old version. However, there is a danger then that sharing the file allows those viewing it to see older versions. One solution to this is for sharing to only share current and future versions but not past versions.
Most organizations, given the choice, would rather have better backups that cannot be erased rather than have an ensured ability to securely redact old data.
Automatic updates are a double-edged sword. On one hand, they ensure security patches are distributed and installed. On the other, they cause the system to be vulnerable to newly introduced bugs and security configuration issues.
HP has a process called CATA, which builds security reviews in from the earliest design states. They have found an 87% reduction in the number of vulnerabilities that have occurred after release. It is mainly an up-front cost to ensure discipline in conducting reviews.
For a long time something like half the CERT advisories were buffer-overflow adversaries. These days the dominant advisories are cross-site scripting attacks.
Evidence carrying patches could be required to prove that they are sufficiently restricted/limited before execution.
Assigning semantic tags to files and setting permissions based on these tags may allow for a closer alignment with users’ mental models.
If the average user logs into their mail account and finds their mail is missing, are they more likely to think that a user hacked into their account and deleted the messages or that there was a system failure that lost the data? It was suggested that users tend to assume an error occurred before suspecting an attacker.
Trusted favorites - i.e. ways of handing links to another use in a way that the user is ensured that the link they received and clicked upon goes to the location that the person that sent it intended.
Application deletion on most platforms is hard, but it is especially hard on Windows due to some design decisions made for Windows 3.1. An OS that does it well is Android or OLPC.
Some of these are great ideas. But what is the low-hanging fruit we can do simply?
Stuart Schechter and Robert Reeder
What to do when the user forget their password? A common method is to provide security questions. Unfortunately, an initial analysis of the most commonly used security questions found that none were all that great, suffering from either poor memorability or poor security. What about e-mail based recovery? This doesn’t work well in the important case that the user has forgotten their e-mail password! What other options are available? Some other mechanisms available are social authentication using trustees, SMS message to mobile phones, printed shared secrets kept in safe places, and remembering old passwords.
If none of these are trusted in isolation, then what combination is sufficient? In particular, how should the UI be designed that communicates to the user which combinations are sufficient? For example, answering a single secret question is fairly weak evidence, but getting an SMS message sent to your cellphone is fairly strong evidence. One metaphor that the researchers examined was an examine metaphor in which the different identificaiton methods are worth differing numbers of points with an indication of the total number of points necessary to “pass” the exam. Another metaphor examined was to present the same information sorted into Strong, Medium, and Weak evidence, with a list of evidential requirements such as saying a Strong piece of evidence combined with any other type of evidence, two Medium pieces of evidence, etc., would suffice.
The study had eighteen participants - ten with college degrees and eight without. They showed each participant a series of screenshots depicting Live ID’s default UI versus the two proposed UIs and asked the participants to answer whether Jane Doe would be able to get back into her account given a set of information she had available. The first comparison was the default Live ID UI vs. the Exam metaphor. Users were very confused by the Live ID UI, whereas there was good understanding of the Exam metaphor. They then presented the Short Exam vs. the Long Exam, and surprisingly found that people actually did better for the Long Exam. Finally, they compared the Long Exam vs. the Evidence Scale, and found that the Exam was better understood than the Evidence scale.
Mike Just and David Aspinall
Challenge questions often serve as part of a password recover mechanism, though are sometimes included along with conventional authentication. For a long time there was little research in this area, but some studies have emerged recently, generally concluding that challenge questions are neither very usable nor secure.
This study sought to improve the current state by proposing a systematic and repeatable way to analyze the security and usability of challenge questions, with a focus on user-chosen questions. They proposed a novel experiment for collecting 180 challenge questions. The participants were directed to a website where they entered their security questions, but wrote their answers on a piece of paper that they then sealed in an envelope. A few weeks later they were re-presented with their questions and asked to answer them again, and then compare against their original answers and report their success-rate at memorability to the researchers. It was conducted in this manner to keep from actually gathering sensitive information from the participants. Unfortunately, even after just twenty-three days the users were unexpectedly poor at properly remembering their answers.
The researchers then analyzed the submitted security questions, looking at three different attackers - one making a blind guess, one making a focused guess, and one allowed to make observations of the user. For each attacker they scored each question as Low, Medium, or High security. Low was chosen anything less than the threshold set by a 6-character alphabetic password, and Medium as anything less than an 8-character alphanumeric password. It was found that of the 180 characters, 174 were Low security. Still, since the attacker needed three questions to spoof authentication, these were combinable for the blind and focused guess attackers, meaning the cumulative security was often Medium or High. However, the analysis left out various forms of attack such as another site asking the same questions, the answers to questions correlating, etc.
Alexander De Luca, Martin Denzel and Heinrich Hussmann
Many password entry systems suffer from weakness against attacks where the attacker can view either the keyboard or screen. The proposal is to use eye movements for password entry, building off the findings of EyePassword, Eye Gestures, and PassShapes. The researchers implemented EyePassShapes, in which the user traces a shape by moving their eye between a series of points. Their prototype used a standard eye tracking system (called ERICA).
In an initial evaluation the researchers included ten participants and examined whether it was easier to trace out EyePassShapes on a dotted background versus a gridded background. What differences they found were evaluated to be insignificant, so they went with dotted backgrounds. They then conducted the usability study proper with twenty-four participants using PINs, PassShapes, EyePassShapes, and EyePINs. The usability was based on the user reports which indicated that PINs were the most usable, followed by PassShapes, with EyePassShapes and EyePINs similarly usable.
They then conducted a security evaluation by allowing an attacker three attempts at breaking each entry attempt. The attack was a security expect but did not participate in the user study, but was able to view videos of each subject filmed from the front and the side. The PIN and PassShapes were broken 100% of the time, with EyePassShapes broken 54.5% of the time and EyePIN 41.7% of the time. It was found that when the EyePassShape was traced as a series of strokes instead of one continues stroke, it was harder to crack.
Finally, they conducted a memorability evaluation. It was found that EyePassShapes had similar memorability to normal PassShapes. They only evaluated the memorability of a single shape, and not what would happen if users had multiple shapes to remember.