Ever wonder why you are offered two separate words in the reCaptcha box? They call it a “free anti-bot service that helps digitize books”. What they really mean to say is that if you type in two words, one of the words will help you and the other word will help them.
The security implication of this is only one of the two words is the real test for anti-bot access. The other word is to help them fix issues in their digital book images.
reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More specifically, each word that cannot be read correctly by OCR is placed on an image and used as a CAPTCHA. This is possible because most OCR programs alert you when a word cannot be read correctly.
One word they already know and the other word they are trying to decipher. If you type in two random words, you fail their test. If you type in one random word you have a good chance of passing the test as well as giving their database bogus information.
Many years ago as a graduate student I worked on a Xerox implementation for the blind. Fellow blind students would scan books and then give me the output files to correct and verify. I built simple scripts with WordPerfect to look for the number 5, for example, and substitute for the letter s. It was not terribly sophisticated (I am no linguist) but it was enough to save me the trouble of reading every word of every page.
The reCaptcha effort seems to headed in the same direction but using human labor as the solution instead of algorithms. Although I can see why they find this attractive, it begs a question of trust. It also begs the question of whether you want to bother putting in two words or gambling with just one. Try it and see.
One thought on “50% reCaptcha Failure”