Weblog entry #13 for dkg

good OCR tools under debian?
Posted by dkg on Fri 26 Jan 2007 at 18:39
Tags: none.
I have never needed to do Optical Character Recognition (turning scanned documents back into text form), but it appears i may soon need to (in english, FWIW).

Does anyone have a preferred tool/suite that is packaged for debian?

A scan of the archive turns up

  • gocr
  • ocrad
  • unpaper
  • clara
none of which i've ever used, and some of which seem stale (clara's version number is 20031214-2. Suggestions? Things to avoid? Have i missed something important?

 

Comments on this Entry

Posted by redbeard (216.49.xx.xx) on Tue 6 Feb 2007 at 11:57
[ Send Message | View redbeard's Scratchpad | View Weblogs ]

I haven't used it, but tesseract-ocr (currently only in etch) is supposed to be a commercial quaility OCR package that far outperforms anything other open source packages currently available.

Apparently, Google released it after acquiring it from UNLV, which acquired it from HP. The reason Google got it is that the original developer was working at Google at the time. Check out this Linux.com article on it for more info.

Good luck.
Michael

[ Parent | Reply to this comment ]

User Login

Username:

Password:

[ Advanced Login ]

Register Account

Quick Site Search