<a>からURLとタイトルを取得する方法beautifulSoupでタグ

-1

私はクラス= "pntc-txt"のdivからすべてのリンクを取得するスクリプトをコーディングしています。<a>タグから取得したいhref属性と、<a href="">Something</a>の間のテキスト。後にそのURLとテキストを取ってデータベースに挿入します。<a>からURLとタイトルを取得する方法beautifulSoupでタグ

import urllib.request 
from bs4 import * 

sock = urllib.request.urlopen("http://as.com/tag/moto_gp/a/") 
htmlSource = sock.read()        
sock.close()           

soup = BeautifulSoup(htmlSource) 


for div in soup.findAll('div', {'class': 'pntc-txt'}): 
    a = div.findAll('a') 
    print (a)

出典

2016-11-02 Albert

完全にここに文書化されていますhttps://www.crummy.com/software/BeautifulSoup/bs4/doc/#attributes –

はこれを試してみてください：

import requests 
from bs4 import * 

srcCode = requests.get("http://as.com/tag/moto_gp/a/") 
plainText = srcCode.text 

soup = BeautifulSoup(plainText) 


for div in soup.findAll('div', {'class': 'pntc-txt'}): 
    for each in div.findAll('a'):  #get all elements with 'a' tag 
     href = each.get('href') 
     print href   #print href 
     print each.string #print the text in tags 
     print each   #print whole tag

注：もhtmlページを読むためにurllibは部品を取り外し、私は私がこれまで行ってきたコードを投稿します。代わりにパッケージを使用requests

出典

2016-11-02 13:03:48 trahane

ありがとうver！それは非常にうまくいった:) – Albert

<a>からURLとタイトルを取得する方法beautifulSoupでタグ

答えて

関連する問題