私はウェブサイトを掻き集めようとしていますが、最後のCSVファイルに特定の行を書きたいだけです。IndexError:文字列のインデックスが範囲外です[python、scraping]
:[['The Conservation Fund', 2013, '$480,674', '$55,266', '$0', 'LAWRENCE A SELZER', 'PRESIDENT & CEO'], ['The Conservation Fund', 2013, '$369,848', '$54,856', '$0', 'RICHARD L ERDMANN', 'EXECUTIVE VICE PRESIDENT'], ['The Conservation Fund', 2013, '$312,232', '$44,386', '$0', 'DAVID K PHILLIPS JR', 'EXECUTIVE VP AND CFO'], ['The Conservation Fund', 2013, '$251,615', '$16,125', '$0', 'DEAN H CANNON', 'SENIOR VP/GENERAL COUNSEL']]
rows = [
["The Conservation Fund",2014,"","","","Program Services: ","$174,530,077"],
["The Conservation Fund",2014,"","","","Administration: ","$2,810,944"],
["The Conservation Fund",2014,"","","","Fundraising: ","$2,144,456"],
["The Conservation Fund",2013,"$480,674","$55,266","$0","LAWRENCE A SELZER","PRESIDENT & CEO"],
["The Conservation Fund",2013,"$369,848","$54,856","$0","RICHARD L ERDMANN","EXECUTIVE VICE PRESIDENT"],
["The Conservation Fund",2013,"$312,232","$44,386","$0","DAVID K PHILLIPS JR","EXECUTIVE VP AND CFO"],
["The Conservation Fund",2013,"$251,615","$16,125","$0","DEAN H CANNON","SENIOR VP/GENERAL COUNSEL"]]
rows1 = [x for x in rows if x[6][0] != '$']
print(rows1)
私は私がしている期待する正確に何を得る:私は
IndexError: string index out of range.
を取得する行を指定しようとすると、私はこのコードを実行すると、このエラーを得ることはありません
今私はスクレイパーからこのようなリストの理解をしようとすると(私は法的にすべてを投稿することができないので、ここにいくつかのコードを貼り付けます):
for page in eins:
rows =[]
driver.get(page)
print("Getting {}".format(page))
soup = BeautifulSoup(driver.page_source, "lxml")
name = soup.find("h1", {"class" : "centered"})
print(name.text)
members = soup.findAll("g", { "transform" : "translate(0,0)"})
time = soup.find("option", {"selected" : "selected"}).text
time = int(time)
for year in members[2:]:
column = year.find_all("g")
for thing in column:
row_info = [name.text, time]
entries = thing.find_all("text")
if len(entries) != 5:
row_info.extend((5 - len(entries)) * [""])
for entry in entries:
row_info.append(entry.text)
rows.append(row_info)
time = time - 1
rows1 = [x for x in rows if x[6][0] != "$"]
は今、突然、私は次のエラーコード
Traceback (most recent call last):
File "Board_members.py", line 53, in <module>
rows1 = [x for x in rows if x[6][0] != "$"]
File "Board_members.py", line 53, in <listcomp>
rows1 = [x for x in rows if x[6][0] != "$"]
IndexError: string index out of range
を取得し、両方のインスタンスで同じようにフォーマットされていない行のリストですか?私はここで間違って何をしていますか?私は以前のcontinue関数と単純なif文でforループを試しましたが、すべてが同じエラーになります。
私はまだ初心者ですので、私の弱いコードを許してください。私は質問の答えをここで調べましたが、もし彼らがそこにいたら、私はそれらを理解できませんでした。どうもありがとうございます!
編集:ただのコンテキスト最初のインスタンスの行は、スクレーパーを使用して作成した管理されたcsvファイルから取得され、csvでは次のようになります。
organization,year,compensation,other,related,name,position
The Conservation Fund,2015,,,,Total Revenue: ,"$215,096,466"
The Conservation Fund,2015,,,,Contributions: ,"$114,351,967"
The Conservation Fund,2015,,,,Gov't Grants: ,"$9,723,802"
The Conservation Fund,2015,,,,Program Services: ,"$90,762,036"
The Conservation Fund,2015,,,,Investments: ,"$220,002"
The Conservation Fund,2015,,,,Special Events: ,$0
The Conservation Fund,2015,,,,Sales: ,$0
The Conservation Fund,2015,,,,Other: ,"$38,659"
The Conservation Fund,2014,,,,Total Expenses: ,"$179,485,477"
The Conservation Fund,2014,,,,Program Services: ,"$174,530,077"
The Conservation Fund,2014,,,,Administration: ,"$2,810,944"
The Conservation Fund,2014,,,,Fundraising: ,"$2,144,456"
The Conservation Fund,2013,"$480,674","$55,266",$0,LAWRENCE A SELZER,PRESIDENT & CEO
The Conservation Fund,2013,"$369,848","$54,856",$0,RICHARD L ERDMANN,EXECUTIVE VICE PRESIDENT
The Conservation Fund,2013,"$312,232","$44,386",$0,DAVID K PHILLIPS JR,EXECUTIVE VP AND CFO
編集2:これは私がrows1前に印刷する行から取得出力されます。
[['The Conservation Fund', 2015, '', '', '', 'Total Revenue: ', '$215,096,466'], ['The Conservation Fund', 2015, '', '', '', 'Contributions: ', '$114,351,967'], ['The Conservation Fund', 2015, '', '', '', "Gov't Grants: ", '$9,723,802'], ['The Conservation Fund', 2015, '', '', '', 'Program Services: ', '$90,762,036'], ['The Conservation Fund', 2015, '', '', '', 'Investments: ', '$220,002'], ['The Conservation Fund', 2015, '', '', '', 'Special Events: ', '$0'], ['The Conservation Fund', 2015, '', '', '', 'Sales: ', '$0'], ['The Conservation Fund', 2015, '', '', '', 'Other: ', '$38,659'], ['The Conservation Fund', 2014, '', '', '', 'Total Expenses: ', '$179,485,477'], ['The Conservation Fund', 2014, '', '', '', 'Program Services: ', '$174,530,077']]
x
の空の値をチェックするためのエラーを修正する可能性のあるコード以下でしょうか? しかし、私はrows1の前に行を印刷しようとしましたが、それは完全に動作します –