のpython beautifoulsoup間違った構文解析テーブル

私はテーブルを解析し、CSV形式へのデータの書き込みをしようとしましたが、beautifoulsoup は正しくテーブルを解析しません。のpython beautifoulsoup間違った構文解析テーブル

date=[] pollster=[] grade=[] sample=[] weight=[] clinton=[] trump=[] johnson=[] leader=[] adjusted=[] import requests from bs4 import BeautifulSoup url='http://projects.fivethirtyeight.com/2016-election-forecast/florida/' r = requests.get(url) soup=BeautifulSoup(r.content,"lxml") the_table=soup.find("table", attrs={"class":"t-desktop t-polls"}) rows = the_table.tbody.find_all('tr') for row in rows: if 'data-created' in row.attrs: cols = row.find_all('td') text_cols = [ele.text.strip() for ele in cols] date.append(text_cols[2]) pollster.append(text_cols[3]) grade.append(text_cols[4]) sample.append(text_cols[5]) weight.append(text_cols[6]) clinton.append(text_cols[7]) trump.append(text_cols[8]) johnson.append(text_cols[9]) leader.append(text_cols[10]) adjusted.append(text_cols[11]) import pandas as pd df=pd.DataFrame(date,columns=['date']) df['pollster']=pollster df['grade']=grade df['sample']=sample df['weight']=weight df['clinton']=clinton df['trump']=trump df['johnson']=johnson df['leader']=leader df['adjusted']=adjusted from urllib.parse import urlparse s=urlparse(url) import os f=os.getcwd()+"/"+s.path.split('/')[-2] + '.csv' df.to_csv(f)

それは間違っデータをcsvファイルを保存します： http://projects.fivethirtyeight.com/2016-election-forecast/arizona/

これは私が使用しているコードは次のとおりです。これがページです

,date ,pollster ,grade,sample ,weight,clinton,trump,johnson,leader ,adjusted 0,Aug. 21-27,USC Dornsife/LA Times, ,"2,545",LV ,44% ,44% , ,Clinton +1 ,Clinton +4 1,Aug. 24-26,Morning Consult , ,"2,007",RV ,39% ,37% ,8% ,Clinton +2 ,Clinton +2 2,Aug. 20-26,USC Dornsife/LA Times, ,"2,460",LV ,45% ,43% , ,Clinton +1 ,Clinton +5 3,Aug. 19-25,Ipsos ,A- ,334 ,LV ,50% ,43% , ,Clinton +7 ,Clinton +7 4,Aug. 19-25,Ipsos ,A- ,500 ,LV ,53% ,31% , ,Clinton +22,Clinton +22 5,Aug. 19-25,Ipsos ,A- ,443 ,LV ,32% ,45% , ,Trump +13 ,Trump +13 6,Aug. 19-25,Ipsos ,A- ,518 ,LV ,61% ,25% , ,Clinton +36,Clinton +36 7,Aug. 19-25,Ipsos ,A- ,392 ,LV ,47% ,41% , ,Clinton +7 ,Clinton +7 8,Aug. 19-25,Ipsos ,A- ,666 ,LV ,49% ,42% , ,Clinton +7 ,Clinton +7 and so on.....

私はbeautifoulsoupを変更した場合パーサー、まだ間違って解析します。 を手動で保存すると、の表がクロムインスペクタまたはfirefox firebugでコピーされました。が動作します。ここで生成された正しいデータのcsvです：

,date ,pollster,grade ,sample,weight,clinton,trump,johnson,leader ,adjusted 0 ,Ipsos ,A- ,362 ,LV ,0.67 ,43% ,46% , ,Trump +3 ,Trump +3 1 ,CNN/Opinion Research Corp. ,A- ,809 ,LV ,1.40 ,38% ,45% ,12% ,Trump +7 ,Trump +7 2 ,Ipsos ,A- ,438 ,LV ,0.25 ,39% ,47% , ,Trump +8 ,Trump +8 3 ,YouGov ,B ,"1,095",LV ,0.65 ,42% ,44% ,5% ,Trump +2 ,Trump +1 4 ,OH Predictive Insights/MBQF,C+ ,996 ,LV ,0.44 ,45% ,42% ,4% ,Clinton +3,Clinton +2 5 ,Integrated Web Strategy , ,679 ,LV ,0.35 ,41% ,49% ,3% ,Trump +8 ,Trump +5 6 ,Public Policy Polling ,B+ ,691 ,V ,0.49 ,40% ,44% , ,Trump +4 ,Trump +1 7 ,OH Predictive Insights/MBQF,C+ ,"1,060",LV ,0.16 ,47% ,42% , ,Clinton +4,Clinton +4 8 ,Greenberg Quinlan Rosner ,B- ,300 ,LV ,0.23 ,39% ,45% ,10% ,Trump +6 ,Trump +6 9 ,Public Policy Polling ,B+ ,896 ,V ,0.20 ,38% ,40% ,6% ,Trump +2 ,Tie 10,Behavior Research Center ,A ,564 ,RV ,0.16 ,42% ,35% , ,Clinton +7,Clinton +5 11,Merrill Poll ,B ,701 ,LV ,0.11 ,38% ,38% , ,Tie ,Tie 12,Strategies 360 ,B ,504 ,LV ,0.03 ,42% ,44% , ,Trump +2 ,Tie

ウェブから全体のHTMLが間違った構文解析beatifulsoupますなぜ？

[編集：SOLVED] このコードエキスJSONオブジェクトrace.stateData正規表現を使用してスクリプトタグから。データは最終的に解析されます。

r = requests.get(url) soup = BeautifulSoup(r.content, "lxml") script = soup.body.script.text script = script.replace("\n", "") re_match = re.match('.*race\.stateData = (.*);race\.path', script) str_json = re_match.group(1) j = json.loads(str_json) #parsing data code not relevant..

出典

2016-08-28 corrado1972

表は、 '

のpython beautifoulsoup間違った構文解析テーブル

関連する問題