Balancing Usability and Security in a Video CAPTCHA

July 17, 2009 by Richard Conlan
Kurt Kluever and Richard Zanibbi

CAPTCHA’s are used for a variety of purposes, but most generally to combat spammers.  A desirable CAPTCHA should be automatically generated, should not rely on secret databases or algorithms, should be usable, and should be hard to spoof.  Most existing CAPTCHAs fail in one or more of these respects, usually usability.

This study proposes using video CAPTCHAs, in which videos are played and a human user is expected to propose appropriate tags for the video.  The algorithm to create the CAPTCHA selects a random video from YouTube and uses text and metadata from Related Videos to generate an appropriate set of tags.  Any tags that are too common, such as “funny” or “music,” are stripped out.  The user’s results are graded by first being normalized to lowercase and removing punctuation and stop words and adding stem words so that “dogs” will match “dog,” and employed Levenshtein distance to allow for minor misspellings, etc.

There were two studies run online, one with 233 participants and one with 300, though only 143 and 184, respectively, completed the survey.  The average completion time was 20 seconds in the first study and 17 seconds in the second study.  Users reported finding the text CAPTCHAs faster, but the video CAPTCHAs more enjoyable.  They simulated an attacker using tag frequency data from the database to guess for the video.  For one twiddling of the settings humans were successful 77% of the time while the computer was only successful 2% of the time.