Whenever you are creating a new email account, registering yourself at a webs ite or even when you try to post a comment on this blog, you have to type a seemingly distorted set of characters in a rectangular box. This may just take a few seconds of your time, but you probably did not know that you could be digitizing books and newspapers through this process.
Via ScienceDaily:
They can work so prodigiously because Carnegie Mellon computer scientists led by Luis von Ahn have taken a widely used Web site security measure, called a CAPTCHA, and given it a second purpose — digitizing books produced prior to the computer age. When Web visitors solve one of the distorted-letter puzzles so they can register for email or post a comment on a blog, they simultaneously help turn the printed word into machine-readable text.
More than a year after implementing their version, called reCAPTCHA, http://recaptcha.net/ on thousands of Web sites worldwide, the researchers conclude that their word deciphering process achieves the industry standard for human transcription services — better than 99 percent accuracy. During the reCAPTCHA system’s first year of operation, more than 1.2 billion reCAPTCHAs have been solved and more than 440 million words have been deciphered. That’s the equivalent of manually transcribing more than 17,600 books.Von Ahn said reCAPTCHAs are being used to digitize books for the Internet Archive and to digitize newspapers for The New York Times. Digitization allows older works to be indexed, searched, reformatted and stored in the same way as today’s online texts.Old texts are typically digitized by photographically scanning pages and then transforming the text using optical character recognition (OCR) software. But when ink has faded and paper has yellowed, OCR sometimes can’t recognize some words — as many as one out of every five, according to the Carnegie Mellon team’s tests. Without reCAPTCHA, these words must be deciphered manually at great expense.“We are demonstrating that we can take human effort — human processing power — that would otherwise be wasted and redirect it to accomplish tasks that computers cannot yet solve,” von Ahn said.
Read the entire article here