Python BeautifulSoup MySQLのストレージとインタラクション

最初に、リターン文字列に先頭に1があり、それを渡すのに問題が発生しました。私は[0：]：メソッドを使って試してみました。私はそれをスキップするか、それをスキップしてid値である2番目の値にしたいと思います。スクラップテーブルPython BeautifulSoup MySQLのストレージとインタラクション

さらに、テーブルから返されたアイテムを格納用にフォーマットしようとすると、インデックス外のエラーが発生しています。私はデフストア（）を使用しています。

import requests 
from bs4 import BeautifulSoup 
import MySQLdb 

#mysql portion 
mydb = MySQLdb.connect(host='****', 
    user= '****', 
    passwd='****', 
    db='****') 
cur = mydb.cursor() 
def store (id, ticker): 
    cur.execute('INSERT IGNORE INTO TEST (id, ticker) VALUES (\"%s\", \"%s\")',(id, ticker)) 
    cur.connection.commit() 

base_url = 'http://finviz.com/screener.ashx?v=152&s=ta_topgainers&o=price&c=0,1,2,3,4,5,6,24,25,63,64,65,66,67' 
html = requests.get(base_url) 
soup = BeautifulSoup(html.content, "html.parser") 
main_div = soup.find('div', attrs = {'id':'screener-content'}) 
table = main_div.find('table') 
sub = table.findAll('tr') 
cells = sub[5].findAll('td') 

for cell in cells: 
    link = cell.a 
    if link is not None: 
    link = link.get_text() 
     id = link[0] 
     ticker = link[1] 
     store(id, ticker) 
    print(link)

出典

2017-02-01 Derek_P

何 ' "リターン文字列"'？どこで/どのようにこの "戻り文字列"を取得しますか？ – furas

あなたは 'link = []'を定義し、後でそれを 'link = cell.a'で上書きします。後で' link = link.get_text（） 'という文字列を取得しますが、' id = link [0] ' 、 'ticker = link [1]'となります。あなたは何をしようとしますか？ – furas

- 冗長リンク= []を削除しました。リンクリスト項目をコメントアウトすると、link.aデータが表示されます。データを保存するために、リスト項目に集めました。 –

私はあなたが本当のやろうかわからないが、これは私aところで

for row in rows: 
    columns = row.find_all('a') 

    id_ = columns[0].get_text() 
    ticker = columns[1].get_text() 
    company = columns[2].get_text() 
    sector = columns[3].get_text() 
    industry = columns[4].get_text() 

    print(id_, ticker, company, sector, industry)

と

import requests from bs4 import BeautifulSoup base_url = 'http://finviz.com/screener.ashx?v=152&s=ta_topgainers&o=price&c=0,1,2,3,4,5,6,24,25,63,64,65,66,67' html = requests.get(base_url) soup = BeautifulSoup(html.content, "html.parser") rows = soup.find_all('tr', class_=["table-dark-row-cp", "table-light-row-cp"]) for row in rows: columns = row.find_all('td') id_ = columns[0].a.get_text() ticker = columns[1].a.get_text() company = columns[2].a.get_text() sector = columns[3].a.get_text() industry = columns[4].a.get_text() print(id_, ticker, company, sector, industry)

またはイベントのために働く：することができますCSSセレクタ

も使用してください

rows = soup.select('#screener-content table[bgcolor="#d3d3d3"] tr[class]')

または

rows = soup.select('#screener-content table[bgcolor="#d3d3d3"] tr') 
# skip first row with headers 
rows = rows[1:]

出典

2017-02-02 00:25:07 furas

「table-dark-row-cp」タグの使用を考慮していなかった入力をありがとう。「テーブル・ダーク・...」と「テーブルライト...」の両方の項目を取得するために「と」を使用できますか？ –

'class_ = [" table-dark-row-cp "、" table-light-row-cp "]' – furas

リスト - もちろんです。私はちょうどそれに取り組んでいた - 束のおかげで！ –

Python BeautifulSoup MySQLのストレージとインタラクション

答えて

関連する問題