たTesseract OCRドイツの特殊文字

C++でドイツのPNG画像を読み取るためたTesseractのOCRを使用してIAMと私はたTesseract OCRドイツの特殊文字

ようないくつかの特殊文字の問題を得たßäöüなど。

これを正しく読んだり、何をする必要があるかについて、私はtesseractを訓練する必要がありますか？

This is the part of the original image read by tesseract

tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();

UPDATE

SetConsoleOutputCP(1252);//changed to german. SetConsoleCP(1252);//changed to german wcout << "ÄÖÜ?ß" << endl; // Open input image with leptonica library Pix *image = pixRead("D:\\Images\\Document.png"); api->Init("D:\\TesseractBeispiele\\Tessaractbeispiel\\Tessaractbeispiel\\tessdata", "deu"); api->SetImage(image); api->SetVariable("save_blob_choices", "T"); api->SetRectangle(1000, 3000, 9000, 9000); api->Recognize(NULL); // Get OCR result wcout << api->GetUTF8Text());

After changing the Code below the Update ハードコーディングされたウムラウトが正しく表示されますが、画像issntからテキスト正しい、私は何を変更する必要がありますか？

たTesseractのバージョンは3.0.2 leptonicaバージョンであるたTesseractは、Unicode文字を認識することができ1.68

出典

2016-04-08 Cazzador

です。コンソールが表示されるように設定されていない可能性があります。

What encoding/code page is cmd.exe using?

Unicode characters in Windows command line - how?

出典

2016-04-08 13:22:31 nguyenq

コンソールは、ほぼ確実にUTF-8用に構成されていません。 – MSalters

utf8のコンソールをどのように構成しますか？ – Cazzador

i don't how to detect German the word from the image in windows environment. but i know how to detect German word to Linux environment. following code may get you some idea. 

/* 
* word_OCR.cpp 
* 
* Created on: Jun 23, 2016 
*  Author: root 
*/ 

#include <tesseract/baseapi.h> 
#include <leptonica/allheaders.h> 
#include <iostream> 

using namespace std; 

int main(int argc ,char **argv) 
{ 
    Pix *image = pixRead(argv[1]); 

    if (image == 0) { 
     cout << "Cannot load input file!\n"; 
    } 

    tesseract::TessBaseAPI tess; 
// insted of the passing "eng" pass "deu". 
    if (tess.Init("/usr/share/tesseract/tessdata", "deu")) { 
      fprintf(stderr, "Could not initialize tesseract.\n"); 
      exit(1); 
     } 

    tess.SetImage(image); 
    tess.Recognize(0); 

    tesseract::ResultIterator *ri = tess.GetIterator(); 
    tesseract::PageIteratorLevel level = tesseract::RIL_WORD; 

    if(ri!=0) 
    { 
     do { 
      const char *word = ri->GetUTF8Text(level); 

      cout << word << endl; 

      delete []word; 

     } while (ri->Next(level)); 


     delete []ri; 
    } 

} 
one thing you have to take care that pass good resolution image then and then it works fine.

出典

2016-06-24 07:40:21

これ以上の精度が必要な場合は、pixeRead（）でOTSUしきい値イメージを渡すことができます。 pixRead（）で通常の画像を渡しています。 OTSU閾値画像を通過させる。私はそのためのアルゴリズムを開発しました。。もし誰かが欲しいなら私に知らせてください。 –

たTesseract OCRドイツの特殊文字

答えて

関連する問題