だから私は.txtファイルからデータを読み込み、最も一般的な30の単語を見つけてそれを印刷しようとしています。しかし、いつでも私は、私はエラーを受け取り、私のtxtファイルを読んでいる:UnicodeDecodeError: 'ascii'コーデックは0x92バイトをデコードできませんか?
"UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position 338: ordinal not in range(128)".
ここに私のコードです:
filename = 'wh_2015_national_security_strategy_obama.txt'
#catches the year of named in the file
year = filename[0:4]
ecount = 30
#opens the file and reads it
file = open(filename,'r').read() #THIS IS WHERE THE ERROR IS
#counts the characters, then counts the lines, replaces the non word characters, slipts the list and changes it all to lower case.
numchar = len(file)
numlines = file.count('\n')
file = file.replace(",","").replace("'s","").replace("-","").replace(")","")
words = file.lower().split()
dictionary = {}
#this is a dictionary of all the words to not count for the most commonly used.
dontcount = {"the", "of", "in", "to", "a", "and", "that", "we", "our", "is", "for", "at", "on", "as", "by", "be", "are", "will","this", "with", "or",
"an", "-", "not", "than", "you", "your", "but","it","a","and", "i", "if","they","these","has","been","about","its","his","no"
"because","when","would","was", "have", "their","all","should","from","most", "were","such","he", "very","which","may","because","--------"
"had", "only", "no", "one", "--------", "any", "had", "other", "those", "us", "while",
"..........", "*", "$", "so", "now","what", "who", "my","can", "who","do","could", "over", "-",
"...............","................", "during","make","************",
"......................................................................", "get", "how", "after",
"..................................................", "...........................", "much", "some",
"through","though","therefore","since","many", "then", "there", "–", "both", "them", "well", "me", "even", "also", "however"}
for w in words:
if not w in dontcount:
if w in dictionary:
dictionary[w] +=1
else:
dictionary[w] = 1
num_words = sum(dictionary[w] for w in dictionary)
#This sorts the dictionary and makes it so that the most popular is at the top.
x = [(dictionary[w],w) for w in dictionary]
x.sort()
x.reverse()
#This prints out the number of characters, line, and words(not including stop words.
print(str(filename))
print('The file has ',numchar,' number of characters.')
print('The file has ',numlines,' number of lines.')
print('The file has ',num_words,' number of words.')
#This provides the stucture for how the most common words should be printed out
i = 1
for count, word in x[:ecount]:
print("{0}, {1}, {2}".format(i,count,word))
i+=1
可能な複製をhttp://stackoverflow.com/questions/21129020/how-to-fix-unicodedecodeerror- ascii-codec-cant-decode-byte&http://stackoverflow.com/questions/26619801/unicodedecodeerror-ascii-codec-cant-decode-byte-0x92-in-position-47-ordinal – Jaimes
私にリンクしている記事を見ると '' open'のための[Python 3 docs](https://docs.python.org/3/library/functions.html#open)、特にその 'encoding'パラメータです。 Python 2では、 "open"の "new"バージョンは['io。open'](https://docs.python.org/2/library/io.html#io.open)。 PS:そのバイトは非標準(マイクロソフト)の右一重引用符であり、頻繁に「中括弧」アポストロフィとして誤用される可能性があります。 –
**上記のどれも* **これらの質問と回答はすべてPython 2を扱っています.POP 3のTextIOWrapperに関する非常に簡単な質問は例外ではなく、エンコード –