Cleaning up an image for OCR with ImageMagick and 'textcleaner' -
i have following image i'd prepare ocr tesseract:
the objective clean image , remove of noise. i'm using textcleaner
script uses imagemagick following parameters:
./textcleaner -g -e normalize -f 30 -o 12 -s 2 original.jpg output.jpg
the output still not clean:
i tried kinds of variations parameters no luck. have idea?
if convert jpeg, always have type of artifacts seeing.
this typical "feature" of jpeg compression. jpegs never images showing sharp lines, contrasts uniform colors between different areas of image, using few colors. true black + white texts. jpeg "good" typical photos, lots of different colors , shading...
your problem resolved if use png output format. following image demonstrates this. generated same parameters last example command used, png output format:
textcleaner -g -e normalize -f 30 -o 12 -s 2 \ http://i.stack.imgur.com/ficx7.jpg \ out.png
here similar zoom output:
you can improve output more if play parameters of textcleaner script. your job... :-)
Comments
Post a Comment