Uncategorized

Google Does OCR

Gautam 18 years ago

The Google juggernaut rolls on. It’s latest trick is being able to read and understand scanned PDF documents. While Google could always read and index PDF documents created with a text layer, this new trick included OCR to be able to read, parse and index scanned text in a PDF too. Impressive.

Via Ars Technica:

As announced on the Official Google Blog, the company is now performing optical character recognition (OCR) on documents that it indexes and identifies as scanned as PDFs. Google has indexed documents that were saved as text-based PDFs for quite some time. But many documents wind up being made into PDFs through scans, which store the text as images. Google has now decided that its open-source OCRopus technology, based on software called “Tesseract” that HP developed, is up to the task of indexing scanned documents that can contain any mixture of text, images, and coffee stains.

Leave a Reply Click here to cancel reply.

A Million Ways to Read

70,000 Books To Be Converted to E-Books by Libraries in Gujarat

Kindle Scout – Amazon’s Crowdsourced Publishing Platform

Worldreader’s Digital Libraries in Ghana

DISCLAIMER :Everything here is the personal opinions of the authors and is not read or approved by pratham books before it is posted. No warranties or other guarantees will be offered as to the quality of the opinions or anything else offered here

By continuing to use our website, you agree to our terms & conditions and privacy policy. Agree

Google Does OCR

Leave a Reply Click here to cancel reply.

ABOUT US

OUR WORK

GET INVOLVED

Key Responsibilities:

Required skills: