Pythonでファイルエンコーディングを処理するより良い方法はありますか？

未知のエンコーディングの異なるテキストファイルがあります。今では、最初にエンコーディングを検出するためにバイナリとしてファイルをオープンし、エンコーディングで再びオープンする必要があります。Pythonでファイルエンコーディングを処理するより良い方法はありますか？

bf = open(f, 'rb') 
    code = chardet.detect(bf.read())['encoding'] 
    print(f + ' : ' + code) 
    bf.close() 
    with open(f, 'r', encoding=code) as source: 
    texts = extractText(source.readlines()) 
    source.close() 
    with open(splitext(f)[0] + '_texts.txt', 'w', encoding='utf-8') as dist: 
    dist.write('\n\n'.join('\n'.join(x) for x in texts)) 
    dist.close()

この問題を解決する方法はありますか？

出典

2017-09-13 Jacob

？ –

このリンクを見てください。あなたが探しているものに役立つかもしれません。 https://stackoverflow.com/questions/18263136/how-to-deal-with-unknown-encoding-when-scraping-webpages –

@EricDuminilさまざまなソフトウェア用のファイルです。エンコーディングを推測する方法はありません。 – Jacob

代わりに、ファイルを再度開くと再読の、あなたはちょうどあなたがすでに読み込まれたテキストをデコードできます。それらのファイルから来るのか

with open(filename, 'rb') as fileobj: 
    binary = fileobj.read() 
probable_encoding = chardet.detect(binary)['encoding'] 
text = binary.decode(probable_encoding)

出典

2017-09-13 16:47:11 user2357112

Pythonでファイルエンコーディングを処理するより良い方法はありますか？

答えて

関連する問題