Tesseractを使用した画像からテキストへの変換

フォルダ内のすべての画像を読み込み、画像からテキストを抽出しようとしています。私は2番目のforループのエラーメッセージを受け取り続ける。例えば、Tesseractを使用した画像からテキストへの変換

AttributeError: 'numpy.ndarray' object has no attribute 'read'

私がリストIMGにアクセスすることはできませんようです。何か案が？

# import OpenCV, Numpy, Python image library, Tesseract OCR 
import os 
import cv2 
import numpy 
from PIL import Image 
import pytesseract 
import glob 

#set tesseract path 
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract.exe' 

#read all image with .jpg format in a specifying folder 
img = []  

for i in glob.glob("C:\\Users\\daizhang\\Desktop\\Deloitte Development\\Python\\Reports\\Image\\*.jpg"): 
    n= cv2.imread(i,0) #convert image to grayscale  
    print(i) 
    img.append(n) 


for j in img: 
    im = Image.open(j) 
    text = pytesseract.image_to_string (j, lang='eng') 
    with open("C:\\Users\\daizhang\\Desktop\\Deloitte Development\\Python\Reports\\Image\\test.txt", "w") as f: 
    f.write(text.encode('utf8'))

出典

2017-10-31 Dai Zhang

'Image.open'は、ファイルを開いてそこからPIL Imageオブジェクトを作成するためのものです。 Numpy配列の生のイメージデータをPIL Imageオブジェクトに変換するには 'Image.fromarray（raw_image）'を使います。 –

私はMac OSXを持っていますが、このコードをファイルのウィンドウのパスディレクトリに合わせることができます。

import os 
from os import path 
from glob import glob 
from pytesseract import image_to_string 
from PIL import Image, ImageEnhance, ImageFilter 

def enhance_img(filename): 
    # Enhance image and save as under new name 
    im = im.filter(ImageFilter.MedianFilter()) 
    enhancer = ImageEnhance.Contrast(im) 
    im = enhancer.enhance(2) 
    im = im.convert('1') 
    im.save('newfilename') 

def convert_img(filename): 
    image = Image.open(filename) 

    # Convert image to text 
    file = open ('parsing.txt', 'a') 
    file.write(image_to_string(image)) 
    file.close 

def find_ext(dir, ext): 
    return glob(path.join(dir, "*.{}".format(ext))) 

# use the following for change directory 
    # os.chdir(path) 
filename = find_ext("","png") 

for file in filename: 
    # convert image to text 
    convert_img(file)

イメージを拡張したい場合は、次のブロックを追加し、上記のコードを調整して新しいファイル名をループします。

def enhance_img(filename): 
    # Enhance image and save as under new name 
    im = im.filter(ImageFilter.MedianFilter()) 
    enhancer = ImageEnhance.Contrast(im) 
    im = enhancer.enhance(2) 
    im = im.convert('1') 
    im.save('newfilename') 

For file in filename: 
    # to enhance image if needed 
    newfilename = filename[-3] + '_1.png' 
    enhance_img(file)

出典

2018-01-28 20:18:52 baodev

Tesseractを使用した画像からテキストへの変換

答えて

関連する問題