2011-12-07 13 views
2

py384のスコアと目標を次のようなWebページから抽出することは可能でしょうか:http://www.uscho.com/standings/division-i-men/2011-2012/?私の問題は、テーブルがファンキーな構造になっているという事実にあります。私の問題で私を助けてくれるリソースはありますか?Webページからの情報をPythonで抽出する

+0

これらのテーブルは、私には完全に罰金に見えるが... – cdeszaq

答えて

1

例のWebページは、lxmlで簡単に解析できます。

ここでは、始めるための基本的なスクリプトです:

from urllib2 import urlopen 
from lxml import etree 

url = 'http://www.uscho.com/standings/division-i-men/2011-2012/' 

tree = etree.HTML(urlopen(url).read()) 

for section in tree.xpath('//section[starts-with(@id, "section_")]'): 
    print section.xpath('h3[1]/text()')[0] 
    for row in section.xpath('table/tbody/tr'): 
     cols = row.xpath('td//text()') 
     print ' ', cols[0].ljust(25), ' '.join(cols[1:]) 
    print 

出力:

Atlantic Hockey 
    Air Force     8 2 1 .773 17 40-26 9 4 2 .667 53-36 6 0 1 3 3 1 
    Mercyhurst    6 1 2 .778 14 21-15 7 7 2 .500 36-49 5 1 1 2 4 1 
    RIT      5 3 2 .600 12 24-20 6 5 2 .538 30-32 5 2 2 1 3 0 
    Robert Morris    5 2 1 .688 11 31-20 7 6 1 .536 44-43 3 2 1 3 3 0 
    Bentley     4 3 2 .556 10 25-18 4 8 3 .367 35-43 1 2 2 3 6 1 
    Canisius     4 3 2 .556 10 16-17 4 8 3 .367 23-41 2 2 1 2 6 2 
    Holy Cross    5 4 0 .556 10 28-26 7 7 0 .500 40-47 5 1 0 2 6 0 
    Niagara     3 2 4 .556 10 25-22 4 5 5 .464 36-39 1 2 2 3 3 3 
    Connecticut    4 5 1 .450 9 30-24 5 8 2 .400 41-42 3 1 0 1 7 2 
    American International 2 7 2 .273 6 24-36 3 12 2 .235 35-58 1 4 2 2 8 0 
    Army      1 5 4 .300 6 20-33 1 7 6 .286 26-47 0 4 2 1 3 3 
    Sacred Heart    0 10 1 .045 1 30-57 1 14 1 .094 39-86 0 5 1 0 9 0 

CCHA 
    Ohio State    9 2 1 1 .792 29 42-26 12 3 1 .781 53-31 6 1 1 6 2 0 
    Notre Dame    7 2 3 0 .708 24 36-28 10 5 3 .639 55-50 6 3 0 4 2 3 
    Western Michigan   6 4 2 2 .583 22 33-28 8 4 4 .625 49-34 5 2 1 3 2 3 
    Lake Superior    6 5 1 1 .542 20 31-32 10 6 2 .611 46-43 5 3 0 5 3 2 
    Ferris State    6 5 1 0 .542 19 28-27 10 5 1 .656 43-30 5 1 1 5 4 0 
    Michigan State   6 4 0 0 .600 18 32-23 10 5 1 .656 56-41 6 1 1 3 3 0 
    Northern Michigan   4 5 3 2 .458 17 28-31 7 6 3 .531 41-40 6 1 3 1 5 0 
    Miami      4 6 2 1 .417 15 26-31 8 8 2 .500 48-48 3 3 2 4 5 0 
    Michigan     4 6 2 1 .417 15 36-32 8 8 2 .500 64-47 7 5 0 1 3 2 
    Alaska     4 8 2 0 .357 14 26-33 7 9 2 .444 39-41 4 5 1 2 3 1 
    Bowling Green    1 10 1 1 .125 5 14-41 6 10 2 .389 32-49 3 6 1 3 4 1 

D-I Independent 
    Alabama-Huntsville  0 0 0 .000 0 - 1 15 1 .088 16-67 1 8 1 0 7 0 

ECAC 
    Cornell     6 1 1 .812 13 26-11 7 3 1 .682 32-18 4 1 1 3 1 0 
    Colgate     6 2 0 .750 12 28-15 11 4 1 .719 55-36 5 2 0 5 2 0 
    Clarkson     3 4 2 .444 8 19-18 9 6 4 .579 55-37 6 2 0 3 3 4 
    St. Lawrence    4 5 0 .444 8 16-22 5 10 0 .333 31-52 3 6 0 2 4 0 
    Union      3 2 2 .571 8 16-13 7 3 5 .633 49-29 1 2 2 6 1 3 
    Yale      4 2 0 .667 8 19-15 6 4 1 .591 36-31 3 2 0 3 1 0 
    Dartmouth     3 3 1 .500 7 18-22 4 5 1 .450 24-30 3 3 1 1 2 0 
    Princeton     3 5 1 .389 7 23-30 4 7 2 .385 30-39 2 2 1 1 4 0 
    Quinnipiac    2 4 3 .389 7 18-22 9 6 3 .583 57-40 6 1 2 3 5 1 
    Brown      3 3 0 .500 6 19-20 4 6 1 .409 24-30 2 2 0 1 4 1 
    Harvard     2 3 2 .429 6 20-21 3 3 3 .500 31-31 2 2 1 1 1 2 
    Rensselaer    1 6 0 .143 2 8-21 3 12 0 .200 18-42 2 5 0 1 7 0 

Hockey East 
    Boston College   9 3 0 .750 18 45-29 12 5 0 .706 63-42 5 3 0 6 2 0 
    Boston University   6 4 1 .591 13 37-34 8 5 1 .607 47-43 5 3 0 2 2 1 
    Merrimack     6 2 1 .722 13 23-18 9 2 1 .792 37-20 4 1 1 5 1 0 
    Massachusetts-Lowell  6 3 0 .667 12 33-27 9 4 0 .692 46-33 4 1 0 5 2 0 
    Providence    6 4 0 .600 12 37-29 8 7 1 .531 51-47 7 2 1 1 3 0 
    Maine      5 5 1 .500 11 37-35 6 6 2 .500 45-44 4 3 0 2 3 2 
    New Hampshire    4 6 1 .409 9 31-37 6 8 2 .438 56-56 6 2 0 0 6 2 
    Northeastern    3 7 2 .333 8 31-35 6 7 2 .467 46-39 2 2 1 4 5 1 
    Massachusetts    2 6 3 .318 7 29-39 4 7 4 .400 47-52 4 0 3 0 7 1 
    Vermont     1 8 1 .150 3 22-42 3 10 1 .250 33-59 2 5 1 1 5 0 

WCHA 
    Minnesota     10 2 0 .833 20 43-23 13 4 1 .750 75-36 8 1 0 5 3 1 
    Minnesota-Duluth   9 2 1 .792 19 52-27 11 3 2 .750 66-39 7 3 0 4 0 2 
    Nebraska-Omaha   6 3 3 .625 15 44-41 8 7 3 .528 60-58 5 2 1 3 4 2 
    Colorado College   6 4 0 .600 12 44-36 8 4 0 .667 52-38 5 0 0 3 4 0 
    North Dakota    6 6 0 .500 12 37-35 8 7 1 .531 49-48 5 2 1 3 5 0 
    Denver     4 3 3 .550 11 39-34 6 5 3 .536 51-44 5 2 2 1 3 1 
    Michigan Tech    5 6 1 .458 11 36-35 8 7 1 .531 48-43 6 3 1 2 4 0 
    St. Cloud State   4 5 3 .458 11 36-37 6 8 4 .444 57-58 3 1 3 2 7 1 
    Bemidji State    4 6 2 .417 10 32-42 6 8 2 .438 43-52 3 2 1 3 6 1 
    Wisconsin     4 7 1 .375 9 35-43 7 8 1 .469 52-52 7 3 0 0 5 1 
    Alaska-Anchorage   2 9 1 .208 5 20-47 5 9 2 .375 37-56 2 5 1 1 4 1 
    Minnesota State   2 9 1 .208 5 34-52 3 12 1 .219 39-64 1 4 1 2 8 0 
1

mechanizeBeatifulSoup

関連する問題