android - tess-two OCR not decoding correctly -
i have followed tutorials tesseract , tess-two , eyes-two installed , part of android app.
it runs, ocr text returned baseapi.getutf8text(); complete gibberish.
bitmapfactory.options options = new bitmapfactory.options(); options.insamplesize = 4; bitmap bmp = bitmapfactory.decodefile(path , options); receipt.setimagebitmap(bmp); try { exifinterface exif = new exifinterface(path); int exiforientation = exif.getattributeint(exifinterface.tag_orientation , exifinterface.orientation_normal); int rotate = 0; switch (exiforientation) { case exifinterface.orientation_rotate_90: rotate = 90; break; case exifinterface.orientation_rotate_180: rotate = 180; break; case exifinterface.orientation_rotate_270: rotate = 270; break; } if (rotate != 0) { int w = bmp.getwidth(); int h = bmp.getheight(); matrix matrix = new matrix(); matrix.prerotate(rotate); bmp = bitmap.createbitmap(bmp, 0, 0, w, h, matrix, false); } bmp = bmp.copy(bitmap.config.argb_8888, true); tessbaseapi baseapi = new tessbaseapi(); baseapi.init(data_path , "eng"); baseapi.setimage(bmp); string ocrtext = baseapi.getutf8text(); baseapi.end(); log.i("ocr text", "rotate " + rotate); log.i("ocr text", "ocr "); log.i("ocr text", ocrtext); log.i("ocr text", "======================================================================================="); photographing check has ocr characters returns
05-14 11:01:59.131: i/ocr text(18199): rotate 90 05-14 11:01:59.131: i/ocr text(18199): ocr 05-14 11:01:59.131: i/ocr text(18199): 4— ‘ ‘ 05-14 11:01:59.131: i/ocr text(18199): \dxfi ‘ 05-14 11:01:59.131: i/ocr text(18199): w man"! no accounv 05-14 11:01:59.131: i/ocr text(18199): 1’ 05-14 11:01:59.131: i/ocr text(18199): my... «unblm m. mm. 05-14 11:01:59.131: i/ocr text(18199): :~a 05-14 11:01:59.131: i/ocr text(18199): «ln. 05-14 11:01:59.131: i/ocr text(18199): ‘ “w “in. n “h‘m‘ 05-14 11:01:59.131: i/ocr text(18199): mmnwnmw- .; k. ' 05-14 11:01:59.131: i/ocr text(18199): wilt-run”. uni” nl 05-14 11:01:59.131: i/ocr text(18199): mam. 05-14 11:01:59.131: i/ocr text(18199): ======================================================================================= any advice on how clean , correct ocr recognition? device used samsung galaxy 7".
you use like
ocrtext = ocrtext.replaceall("[^a-za-z0-9]+", " "); ocrtext = ocrtext.trim(); which based on tesseract implementation found here: simpleandroidocractivity.java
Comments
Post a Comment