Python 2.6および3.2の問題点Windows上のurlopenルーチン

以前は、Python 2.6では、urllib.urlopenを多く使って、Webページコンテンツを取得し、後で受信したデータを処理しました。さて、これらのルーチンと、Python 3.2で使用しようとしている新しいルーチンは、Windowsだけであるように見えます（おそらくWindows 7のみの問題です）。代わりに次のコードを使用してPython 2.6および3.2の問題点Windows上のurlopenルーチン

Traceback (most recent call last): 
    File "TATest.py", line 5, in <module> 
    string = fp.read() 
    File "d:\python32\lib\http\client.py", line 489, in read 
    return self._read_chunked(amt) 
    File "d:\python32\lib\http\client.py", line 553, in _read_chunked 
    self._safe_read(2)  # toss the CRLF at the end of the chunk 
    File "d:\python32\lib\http\client.py", line 592, in _safe_read 
    raise IncompleteRead(b''.join(s), amt) 
http.client.IncompleteRead: IncompleteRead(0 bytes read, 2 more expected)

...：Windows 7でのpython 3.2.2（64）で、次のコードを使用して

...

import urllib.request 

fp = urllib.request.urlopen(URL_string_that_I_use) 

string = fp.read() 
fp.close() 
print(string.decode("utf8"))

私は、次のメッセージが表示されます

import urllib.request 

fp = urllib.request.urlopen(URL_string_that_I_use) 
for Line in fp: 
    print(Line.decode("utf8").rstrip('\n')) 
fp.close()

私は、Webページのコンテンツのかなりの量が、撮影の後、残りの部分を取得は、私は、これは、Windowsの問題であると考えていますが、とを扱うために、よりロバストにすることのpythonことができます... ...

別のページの収量を読み取ろうと

Traceback (most recent call last): 
    File "TATest.py", line 9, in <module> 
    for Line in fp: 
    File "d:\python32\lib\http\client.py", line 489, in read 
    return self._read_chunked(amt) 
    File "d:\python32\lib\http\client.py", line 545, in _read_chunked 
    self._safe_read(2) # toss the CRLF at the end of the chunk 
    File "d:\python32\lib\http\client.py", line 592, in _safe_read 
    raise IncompleteRead(b''.join(s), amt) 
http.client.IncompleteRead: IncompleteRead(0 bytes read, 2 more expected)

によって

Traceback (most recent call last): 
    File "TATest.py", line 11, in <module> 
    print(Line.decode("utf8").rstrip('\n')) 
    File "d:\python32\lib\encodings\cp1252.py", line 19, in encode 
    return codecs.charmap_encode(input,self.errors,encoding_table)[0] 
UnicodeEncodeError: 'charmap' codec can't encode character '\x92' in position 
21: character maps to <undefined>

を阻止されます何が原因ですか？ Linux上で同様のコード（バージョン2.6コード）を試しても、問題は発生しません。これを回避する方法はありますか？私もgmane.comp.python.develニュースグループに投稿しました

出典

2011-11-15 Thom Ives

あなたが読んでいるページがcp1252とエンコードされているようです。

import urllib.request 

fp = urllib.request.urlopen(URL_string_that_I_use) 

string = fp.read() 
fp.close() 
print(string.decode("cp1252"))

コンテンツの文字セットを指定するが、HTTPヘッダーを使用すると、ほとんどのページで十分です

There are many方法：

import urllib.request 

fp = urllib.request.urlopen(URL_string_that_I_use) 

string = fp.read().decode(fp.info().get_content_charset()) 
fp.close() 
print(string)

出典

2014-06-30 10:56:25

おかげCEES。私はしばらくこのことを見ていなかったし、今あなたが答えたことを認識しただけです。私はそれが将来価値があると確信しています。 –

@ThomIvesようこそ。解決策があなたのために働いていれば、それを合格とマークしてください。 –

Python 2.6および3.2の問題点Windows上のurlopenルーチン

答えて

関連する問題