![]() Ia, Her meiner _leute hat ihn nicht in _iMe_. Ejne Aprikose ist doc.h eine vjel edlere m3cht. Menc_al fra_e 3ch _jch, wa- _ gerade der Maulbeerba_ es 3st. Ja, wer _einer _leute hat ihn njcht jn _3meg. It’s a smaller version of the original 300 DPI scan that I fed to the OCR programs. The figure to the right shows the original text. To illustrate, I have prepared a small example from a German book written by my wife’s grandfather. This is not a representative survey, but it is clear that some open source tools perform far better than others. Many open source tools are available for this job, but I tested a selection and found that most didn’t produce satisfactory results. Now comes the most important part: the automated optical character recognition. pnm files because the best OCR tool I have found requires the TIFF input format. The following unpaper script prepares the scanned images for optimal OCR performance:Ĭonvert unpapered-$i.pnm prepared-$i.tif & rm unpapered-$i.pnm See the manual page for detailed information. You can also tell unpaper that two pages are scanned in one image. For example, -pre-rotate -90 rotates the image counterclockwise. ![]() ![]() If you scanned the pages in the right orientation - that is, right side up - you can use the default settings with unpaper otherwise, you can use some of the utility’s many options. However, you can use the unpaper command before applying the OCR magic to preprocess the image and thus get the text recognized more accurately. If you feed these images into an OCR program, you won’t get accurate results no matter how good the OCR engine might be. Your scans may not be positioned consistently or have shadows in the corners. Don’t worry about the page number you can cut it out later with little effort. Try to position the book in a way that makes it possible to use these parameters to define a rectangle that contains only the text, not the binding or the border. Also, adjust the settings for the parameters -l (discard on the left), -t (discard on the top), -x, and -y (the X and Y coordinates on the bottom right corner of the page). Scanimage -device 'brother2:bus1 dev1' -format=pnm -mode 'True Gray' -resolution 300 -l 90 -t 0 -x 210 -y 200 -brightness -20 -contrast 15 >scan-$i.pnmĪdjust the parameters of the scanimage command according to your scanner model (find out which device names you can use with scanimage -L and look up device-specific options with scanimage -help -device yourdevice).
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |