2017-04-13 4 views
-1

あなたからの助けを求めることを願っています!私はPythonのフォーラムからユーザ名を取り除きたいが、私はその方法を理解できなかった。次は、コードの一部です:username pythonを使ってフォーラムから削り取る

パート1

<td class="alt2" title="reply: 11,view: 1,097"> 
    <div class="smallfont" style="text-align:right; white-space:nowrap"> 
    2017-03-28 <span class="time">23:44</span><br> 

    <a href="member.php?find=lastposter&amp;t=1907777" rel="nofollow">username</a> <a href="showthread.php?p=9575713#post9575713"><img class="inlineimg" src="http://s.bbkz.net/forum/images/buttons_style/tc_2/lastpost.gif" alt="last" title="last" border="0"></a> 
    </div> 
</td> 
<div class="smallfont"> 
    <span style="cursor:pointer" onclick="window.open('member.php?u=353562', '_self')">username</span> 
</div> 

一部はまた、フォーラムのリンクの形式はこれです:私は "をスクラップしたいhttp://www.example.com/forum/forumdisplay.php?f=148&order=desc&page=3

Pythonを使って異なるページにあるこれらのコードから「username」を取得してもらえますか?

ありがとうございます!

[Edit - Time sleep added] このようにする必要がありますか?

import requests 
from bs4 import BeautifulSoup 
import time 

url = 'http://www.example.com/forum/forumdisplay.php?f=148&order=desc&page=3' 

html_source = requests.get(url).text 

soup = BeautifulSoup(html_source, 'html.parser') 

a_tags = soup.find_all('a') 

for a in a_tags: 
    if 'member.php?' in a['href']: 
     print(a.text) 

time.sleep(10) 

これらはエラーメッセージです:あなたのコードは次のようなものになります

Traceback (most recent call last): 
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\packages\urllib3\connection.py", line 138, in _new_conn 
(self.host, self.port), self.timeout, **extra_kw) 
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\packages\urllib3\util\connection.py", line 98, in create_connection 
raise err 
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\packages\urllib3\util\connection.py", line 88, in create_connection 
sock.connect(sa) 
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 

During handling of the above exception, another exception occurred: 

Traceback (most recent call last): 
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 594, in urlopen 
chunked=chunked) 
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 361, in _make_request 
conn.request(method, url, **httplib_request_kw) 
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1106, in request 
self._send_request(method, url, body, headers) 
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1151, in _send_request 
self.endheaders(body) 
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1102, in endheaders 
self._send_output(message_body) 
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 934, in _send_output 
self.send(msg) 
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 877, in send 
self.connect() 
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\packages\urllib3\connection.py", line 163, in connect 
conn = self._new_conn() 
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\packages\urllib3\connection.py", line 147, in _new_conn 
self, "Failed to establish a new connection: %s" % e) 
requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.HTTPConnection object at 0x029131F0>:  Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 

During handling of the above exception, another exception occurred: 

Traceback (most recent call last): 
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\adapters.py", line 423, in send 
timeout=timeout 
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 643, in urlopen 
_stacktrace=sys.exc_info()[2]) 
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\packages\urllib3\util\retry.py", line 363, in increment 
raise MaxRetryError(_pool, url, error or ResponseError(cause)) 
requests.packages.urllib3.exceptions.MaxRetryError: 
HTTPConnectionPool(host='www.example.com', port=80): Max retries exceeded with url: /forum/forumdisplay.php?f=148&order=desc&page=3 (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x029131F0>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond',)) 

During handling of the above exception, another exception occurred: 

Traceback (most recent call last): 
File "C:/Users/user/PycharmProjects/untitled/backpackertw_v1.py", line 6, in <module> 
html_source = requests.get(url).text 
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\api.py", line 70, in get 
return request('get', url, params=params, **kwargs) 
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\api.py", line 56, in request 
return session.request(method=method, url=url, **kwargs) 
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\sessions.py", line 488, in request 
resp = self.send(prep, **send_kwargs) 
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\sessions.py", line 609, in send 
r = adapter.send(request, **kwargs) 
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\requests\adapters.py", line 487, in send 
raise ConnectionError(e, request=request) 
requests.exceptions.ConnectionError: 
HTTPConnectionPool(host='www.example.com', port=80): Max retries exceeded with url: /forum/forumdisplay.php?f=148&order=desc&page=3 (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x029131F0>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond',)) 
+2

あなたはbeautifulsoupを使用することができ、常にグーグルはあなたの友人です。 – anonyXmous

+0

'requests'、' beautifulsoup'、google .. –

答えて

0

import requests 
from bs4 import BeautifulSoup 

url = 'http://www.example.com/forum/forumdisplay.php?f=148&order=desc&page=3' 

html_source = requests.get(url).text 

soup = BeautifulSoup(html_source, 'html.parser') 

a_tags = soup.find_all('a') 

for a in a_tags: 
    if 'member.php?' in a['href']: 
     print(a.text) 

次に、あなたがAを使用して、いくつかのより多くのページにそれを実装する必要がありますループを作成して各URLを作成してください:

ie:

for i in range(10) 
    url = 'http://www.example.com/forum/forumdisplay.php?f=148&order=desc&page={}'.format(i) 
    ### 
    #insert the rest of your code here 
    ### 
+0

ありがとうございました。 しかし、私はこのようなエラーメッセージを受け取りました: 上記を見つけてください – Jasonm4432

+0

あなたの編集を見ました...この部分を見てください: 'TimeoutError:[WinError 10060]接続されたパーティが正しく応答しなかったために接続できませんでした接続されたホストが応答しなかったために確立された接続が失敗しました」 - 間違ったURLにアクセスしたか、正しくアクセスしていなければなりません。 –

+0

また、最後の部分を見て、次のメッセージが表示されます: 'requests.exceptions.ConnectionError: HTTPConnectionPool(host = 'www.example.com'、port = 80):' 'host = 'www.example.com''の代わりに正しいホストに変更する必要があります –

関連する問題