I need a Linux-based server that can be setup to receive images and transform them into text that will be inserted into a database. Is that possible, especially via an API to allow the organization to interact with the service if need be?
4 Answers
Teseract seems to be the best. http://code.google.com/p/tesseract-ocr/
Reviews seem to say it is the only one that beats retyping things. http://www.linux.com/archive/feature/138511 http://www.linux.com/archive/feed/57222
Do people not google any mone? 5 min reading what I pulled up with "linux ocr" as my search terms.
- 1,743
I had a project that required OCR. You can use GOCR for the OCR part. For conversion into pbm image format you can use djpeg. If you need in to be integrated with web, you can call conversion/ocr from PHP, also from here to implement DB saving.
- 113
I'd set up a message queue and submit tasks to it for processing. All you'd really need to do is upload the file as an image to a shared storage platform, maybe GlusterFS or similar, then push the filename and path into a message queue, for processing. All you'd need to do then is set up a process to listen to the queue, and run gocr on it, pushing the output data into your database..
Easy.. In Theory. ;)
- 27,578