OCR in Preview
Maybe everyone knows this already, but it was news to me. Today I opened a PDF document in Preview. It contained a URL which I wanted to open in my browser. Unfortunately, the document was a full-page bitmap of a scanned paper page. This was obvious just by looking at the document, but my mousing hand went on autopilot. I told my hand, "Err, this is not going to work—it's a bitmap of a scanned page." "Yeah, I know," replied my hand, "but I can't stop myself trying." Oddly, the text was selectable. I copied it. I pasted it. A few characters were wrong, but it was a neat idea. A better quality scan would no doubt have helped. This:
http://www.apple.com/ilife/video/ilife04_32C.html
became this:
tttp: //www. apple. oom/ilife/videc/ilifeO432C.htm.l
Nice try. The URL was a 404 anyway.
http://www.apple.com/ilife/video/ilife04_32C.html
became this:
tttp: //www. apple. oom/ilife/videc/ilifeO432C.htm.l
Nice try. The URL was a 404 anyway.
Were you actually able to do this? I was unable to recreate it but if Automator could access this power, a new application could be on its way....
ReplyDeleteIt is quite possible the PDF had been created in Acrobat and had been processed by the Paper Capture plugin which does OCR. This overlays the text it finds on the bitmap it uses as the source.
ReplyDeleteI confirm that Preview does that automatically. I tried on multiple pdf scanned documents from 3 different scanners.
ReplyDeleteI think this is a great feature!
also if you search in spotlight, you can get results in a scanned text, with the search result highlighted. amazing.
ReplyDeleteHow about from image files such as .tiff? Any ideas?
ReplyDeleteSorry folks. This just isn't going to work. The PDF was already OCR'd with the text in place. Preview does not OCR automatically. My wife scans about 100 pages a week for school, but not OCR'd. Never have they been searchable, not in Preview (Lion) or in PDFExpert (iOS 5)
ReplyDeleteI'm not sure who is doing the OCR (perhaps the Canon All-in-one scanner I'm using). but it gets sent to my Mac as a .pdf and I can then search the text very nicely in Preview.
ReplyDeleteWow. It worked for me. Never knew! Thanks!!!
ReplyDelete