CAPTCHAs, the semantic gap, and cats
Yesterday, Slashdot posted an article about a new CAPTCHA method proposed by researchers at
A CAPTCHA is method of authenticating human users. A common CAPTCHA displays some text (usually distorted) and asks a human user to type the text. The intention is that while a human can easily recognize the letters, a computer would have trouble.
IMAGINATION is interesting in that it uses images, not text, to authenticate users. It exploits the semantic gap in order to differentiate humans from machines. The semantic gap is the disparity between a human’s description of a scene and the computational representation of the same scene. While a human may be able to look at an image and conclude that it’s a picture of “two happy people running on the beach”, this is far more difficult for a computer to accomplish.
An earlier effort to use images as CAPTCHAs is KittenAuth. This system presents the user with an array of 16 or so images from a limited number of categories. The user is asked to select images with a common theme (e.g. “select all images of cats”). However, this method has several weaknesses. First of all, the number of possible solutions is far less than those of a text CAPTCHA. Each image is a binary choice, rather than at least 26 choices for each text character (far more if case sensitivity and numerical digits are also includes). Also, the categories of the images used can be inferred by trial and error. In other words, it may be easier for an automated system to use brute force to compromise this type of CAPTCHA.
Microsoft’s Asirra project proposes a similar approach to KittenAuth, but with one notable difference – it includes a database of over three million images. The Asirra project was also inspired by HotCaptcha, which asks users to select three pictures of “hot” people, another task that is easier for humans than for computers, but may offend users.
The first proposed image CAPTCHA project was ESP-PIX from CMU. The system presents the user four images and asks the user to select a single word from a list that describes the images. Again, there is a possibility that a brute force attack would succeed.
The key challenge to designing an effective CAPTCHA is that while the test must be difficult for a computer to solve, it should be easy for a human to handle. Unfortunately, as OCR and other techniques to interpret text-based CAPTCHAs improve, the text becomes more distorted and more difficult for humans to interpret. Perhaps image interpretation tasks offer hope, if the number of possible solutions to an image CAPTCHA can be expanded without frustrating the user.
Image-based CAPTCHAs could also be used to improve image retrieval. The image CAPTCHAs already ask users to annotate or categorize images, but compare their results to complete ground truth. One possibility is to randomly impute uncategorized/unlabeled images into the system (without notifying the user). These images would not affect the CAPTCHA, but they would be categorized or annotated by human users. Once enough concurrent information is collected, the image could be added to the database.
Comments(1)

