Cleaning up an image for OCR with ImageMagick and 'textcleaner' -


i have following image i'd prepare ocr tesseract: enter image description here

the objective clean image , remove of noise. i'm using textcleaner script uses imagemagick following parameters:

./textcleaner -g -e normalize -f 30 -o 12 -s 2 original.jpg output.jpg 

the output still not clean: enter image description here

i tried kinds of variations parameters no luck. have idea?

if convert jpeg, always have type of artifacts seeing.

this typical "feature" of jpeg compression. jpegs never images showing sharp lines, contrasts uniform colors between different areas of image, using few colors. true black + white texts. jpeg "good" typical photos, lots of different colors , shading...

your problem resolved if use png output format. following image demonstrates this. generated same parameters last example command used, png output format:

textcleaner -g -e normalize -f 30 -o 12 -s 2 \     http://i.stack.imgur.com/ficx7.jpg       \     out.png 

png instead of jpeg output

here similar zoom output:

zoomed png

you can improve output more if play parameters of textcleaner script. your job... :-)


Comments

Popular posts from this blog

c++ - Difference between pre and post decrement in recursive function argument -

php - Nothing but 'run(); ' when browsing to my local project, how do I fix this? -

php - How can I echo out this array? -