文字列値のデコードutf-8

文字列値をutf-8にデコードしたい。しかしそれは変わらない。だから、ここに私のコードは次のとおりです。文字列値のデコードutf-8

self.textEdit_3.append(str(self.new_header).decode("utf-8") + "\n")

結果画像はこちら：

enter image description here

元の出力値は次のとおりです。

['matchkey', 'a', 'b', 'd', '안녕'] # 안녕 is Korean Language

私はエンコーディングのデフォルトのエンコーディングを変更/ ASCIIの代わりにUnicodeを使ってutf-8にデコードする。最初の行に次のコードを追加しました：

import sys 
reload(sys) 
sys.setdefaultencoding('utf-8')

文字列の値が変更されないのはなぜですか？

出典

2017-11-28 Layla

あなたはこのようなあなたのコードを修正することができます

header = str(self.new_header).decode('string-escape').decode("utf-8") 
self.textEdit_3.append(header + "\n")

あなたはsetdefaultencodingラインを必要としません。

Expanantion：

元の値は、バイト文字列を含むリストである：

>>> value = ['matchkey', 'a', 'b', 'd', '안녕'] 
>>> value 
['matchkey', 'a', 'b', 'd', '\xec\x95\x88\xeb\x85\x95']

あなたはstrでこのリストを変換する場合、それはすべてのリスト要素にreprを使用します。：

>>> strvalue = str(value) 
>>> strvalue 
"['matchkey', 'a', 'b', 'd', '\\xec\\x95\\x88\\xeb\\x85\\x95']"

repr部品は次のようにデコードすることができます。

>>> strvalue = strvalue.decode('string-escape') 
>>> strvalue 
"['matchkey', 'a', 'b', 'd', '\xec\x95\x88\xeb\x85\x95']"

、これは次のようにUnicodeにデコードすることができます：読書の問題について

：

>>> univalue = strvalue.decode('utf-8') 
>>> univalue 
u"['matchkey', 'a', 'b', 'd', '\uc548\ub155']" 
>>> print univalue 
['matchkey', 'a', 'b', 'd', '안녕']

PS utf-8 bomのファイルをテストしてください。

# -*- coding: utf-8 -*- 

import os, codecs, tempfile 

text = u'a,b,d,안녕' 
data = text.encode('utf-8-sig') 

print 'text:', repr(text), len(text) 
print 'data:', repr(data), len(data) 

f, path = tempfile.mkstemp() 
print 'write:', os.write(f, data) 
os.close(f) 

with codecs.open(path, 'r', encoding='utf-8-sig') as f: 
    string = f.read() 
    print 'read:', repr(string), len(string), string == text

出典

2017-11-28 19:02:53 ekhumoro

コメントは議論の対象外です。この会話は[チャットに移動]されています（http://chat.stackoverflow.com/rooms/160658/discussion-on-answer-by-ekhumoro-string-value-decode-utf-8）。 –

文字列値のデコードutf-8

答えて

関連する問題