2017-11-12 14 views
0

私はまだ美しいスープを使用する複雑さを学んでいます。beautifulsoup web scrape - python

私はhttp://www.nfl.com/injuries?week=1からデータフレームを作成しようとしています。プレーヤーの名前、位置、およびゲーム/怪我のステータスがわかります。私は見つけたコードを適応させようとしてきましたが、何もどこでも取得しませんでした。どこが間違っているかについての提案はありますか?

編集:もう少し見ると、私の元の問題はタグであった。そのように見える<script>type=javascript/text。だから私はそれを変えました。今私は近づいていますが、関連するデータを引き出す方法はわかりません。 {player: ""、position: "" .....}データを引き出すにはどうすればよいですか?

以下は、私が収集しようとしているもののサンプルです。

import bs4 
import requests as re 
import pandas as pd  

alpha = re.get('http://www.nfl.com/injuries?week=1') 

beta = bs4.BeautifulSoup(alpha.text,'lxml') 
#print(beta) 

gama = beta.findAll('script', {'type':"text/javascript"}) 
print(gama) 

あなたはこのように、正規表現(regex)を使用することができますサンプル

</script>, <script type="text/javascript"> 
nfl.use("node", "datatable", "datatable-sort", "mobile-panel", "overthrow", 
"overthrow-shadows", "tabview", function(Y) { 
var isTeamAway  = false, 
    isTeamHome  = false, 
    isTeam   = false, 
    homeAbbr  = 'DEN', 
    awayAbbr  = 'LAC', 
    gameWeek  = '1', 
    teamTabHome  = Y.one('.colors-DEN-1'), 
    teamTabAway  = Y.one('.colors-LAC-1'), 
    datatableHome = Y.one('.data-table-DEN-1'), 
    datatableAway = Y.one('.data-table-LAC-1'); 

var dataAway = [ 












    {player: "Inman Dontrelle ", position: "WR", injury: "Groin", practiceStatus: "Limited Participation in Practice", gameStatus: "Questionable", lastName: "Inman", firstName: "Dontrelle", esbId: "INM264861" }, 



    {player: "McGrath Sean ", position: "TE", injury: "Knee", practiceStatus: "Limited Participation in Practice", gameStatus: "Questionable", lastName: "McGrath", firstName: "Sean", esbId: "MCG631892" }, 











    {player: "Attaochu Jeremiah ", position: "DE", injury: "Hamstring", practiceStatus: "Limited Participation in Practice", gameStatus: "Questionable", lastName: "Attaochu", firstName: "Jeremiah", esbId: "ATT290361" }, 









    {player: "Boston Jayestin ", position: "S", injury: "Calf", practiceStatus: "Limited Participation in Practice", gameStatus: "Questionable", lastName: "Boston", firstName: "Jayestin", esbId: "BOS695248" }, 


]; 

var dataHome = [ 


    {player: "Booker Devontae ", position: "RB", injury: "Wrist", practiceStatus: "Did Not Participate In Practice", gameStatus: "Out", lastName: "Booker", firstName: "Devontae", esbId: "BOO019902" }, 



    {player: "Talib Aqib ", position: "CB", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Talib", firstName: "Aqib", esbId: "TAL428789" }, 



    {player: "Paradis Matthew ", position: "C", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Paradis", firstName: "Matthew", esbId: "PAR002722" }, 



    {player: "Kerr Zachariah ", position: "DT", injury: "Knee", practiceStatus: "Did Not Participate In Practice", gameStatus: "Out", lastName: "Kerr", firstName: "Zachariah", esbId: "KER593782" }, 



    {player: "Peko Kyle ", position: "DT", injury: "Foot", practiceStatus: "Limited Participation in Practice", gameStatus: "Questionable", lastName: "Peko", firstName: "Kyle", esbId: "PEK467819" }, 







    {player: "Dixon Riley ", position: "P", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Dixon", firstName: "Riley", esbId: "DIX641722" }, 



    {player: "Crick Jared ", position: "DE", injury: "Back", practiceStatus: "Did Not Participate In Practice", gameStatus: "Out", lastName: "Crick", firstName: "Jared", esbId: "CRI129618" }, 



    {player: "Wolfe Derek ", position: "DE", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Wolfe", firstName: "Derek", esbId: "WOL309455" }, 



    {player: "Lynch Paxton ", position: "QB", injury: "right Shoulder", practiceStatus: "Did Not Participate In Practice", gameStatus: "Out", lastName: "Lynch", firstName: "Paxton", esbId: "LYN526034" }, 





    {player: "Gotsis Adam ", position: "DE", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Gotsis", firstName: "Adam", esbId: "GOT428790" }, 



    {player: "Thomas Demaryius ", position: "WR", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Thomas", firstName: "Demaryius", esbId: "THO095855" }, 



    {player: "Charles Jamaal ", position: "RB", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Charles", firstName: "Jamaal", esbId: "CHA561428" }, 




]; 

答えて

1

:私は私が持っていた再輸入して

{player: "Logan Bennie ", position: "DT", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Logan", firstName: "Bennie", esbId: "LOG113260" }, 
{player: "Pelon Claudeson ", position: "DE", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Pelon", firstName: "Claudeson", esbId: "PEL747520" }, 
{player: "Pasztor Austin ", position: "T", injury: "Chest", practiceStatus: "Limited Participation in Practice", gameStatus: "Questionable", lastName: "Pasztor", firstName: "Austin", esbId: "PAS822673" }, 
{player: "Flacco Joseph ", position: "QB", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Flacco", firstName: "Joseph", esbId: "FLA009602" }, 
{player: "Dupree Alvin ", position: "LB", injury: "Shoulder", practiceStatus: "Did Not Participate In Practice", gameStatus: "Questionable", lastName: "Dupree", firstName: "Alvin", esbId: "DUP507860" }, 
{player: "Palmer Carson ", position: "QB", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Palmer", firstName: "Carson", esbId: "PAL249055" }, 
{player: "Bortles Robby ", position: "QB", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Bortles", firstName: "Robby", esbId: "BOR650964" }, 
{player: "Cooper Amari ", position: "WR", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Cooper", firstName: "Amari", esbId: "COO487703" }, 
{player: "Goode Najee ", position: "LB", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Goode", firstName: "Najee", esbId: "GOO217526" }, 
{player: "Rogers Chester ", position: "WR", injury: "Hamstring", practiceStatus: "Did Not Participate In Practice", gameStatus: "Out", lastName: "Rogers", firstName: "Chester", esbId: "ROG146742" }, 
{player: "Vannett Nicholas ", position: "TE", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Vannett", firstName: "Nicholas", esbId: "VAN643509" }, 
{player: "Norris Jared ", position: "LB", injury: "Groin", practiceStatus: "Did Not Participate In Practice", gameStatus: "Out", lastName: "Norris", firstName: "Jared", esbId: "NOR463803" }, 
{player: "Apple Eli ", position: "CB", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Apple", firstName: "Eli", esbId: "APP195645" }, 
{player: "Anthony Stephone ", position: "LB", injury: "Ankle", practiceStatus: "Limited Participation in Practice", gameStatus: "Questionable", lastName: "Anthony", firstName: "Stephone", esbId: "ANT204590" }, 
{player: "Inman Dontrelle ", position: "WR", injury: "Groin", practiceStatus: "Limited Participation in Practice", gameStatus: "Questionable", lastName: "Inman", firstName: "Dontrelle", esbId: "INM264861" }, 

注:

import bs4 
import requests 
import pandas as pd 
import re 

alpha = requests.get('http://www.nfl.com/injuries?week=1') 
beta = bs4.BeautifulSoup(alpha.text,'lxml') 
gama = beta.findAll('script', {'type':"text/javascript"}) 
for g in gama: 
    match = re.search(r'\{player(.*)',g.text) 
    if match: 
     print(match.group(0)) 

出力あなたは要求をreとしてインポートするように変更します。

関連する問題