美しいスープと正規表現を使用

美しいスープと正規表現を使用しようと少し問題があります。美しいスープと正規表現を使用

私のHTMLは次のよう：

[<strong>See the full calendar</strong>, <strong>See all events</strong>, <strong>See all committee meetings</strong>, <strong>526 spaces</strong>, <strong>89 spaces</strong>, <strong>53 spaces</strong>, <strong>154 spaces</strong>, <strong>194 spaces</strong>, <strong>See all news releases</strong>] 
 
[<strong>See the full calendar</strong>, <strong>See all events</strong>, <strong>See all committee meetings</strong>, <strong>526 spaces</strong>, <strong>89 spaces</strong>, <strong>53 spaces</strong>, <strong>154 spaces</strong>, <strong>194 spaces</strong>, <strong>See all news releases</strong>]

を私が欲しいものは、強力なタグの間にスペースの数だけあります。

私が使用して試してみました：

print(soup.find_all(re.compile("\d\d\d\s[a-zA-Z]{6}|(strong)")))

をしかし、これはprint(soup.find_all('strong'))が行うすべてのものを返しています。

誰かが間違っていると知っていますか？私が正しくあなたを理解している場合

出典

2017-07-10 Maverick

ありがとうございました！ AttributeError： 'ResultSet'オブジェクトに属性 'split''がありません - アイデアはありますか？ @Ludisposed – Maverick

すべてのスペースの合計が必要な場合、またはそれぞれの強力なタグにはスペースカウンタが必要ですか？ – Ludisposed

最終目標はこれをcsvにエクスポートすることです。各 "x spaces"は各行ごとに別々のレコードにする必要があります – Maverick

は、あなたがsoup.find_allのtextプロパティを使用して、コンパイル済みの正規表現パターンを渡すことができます。

import re 
spaces = [] 
for tag in content.find_all(text=re.compile("\d+(?= spaces)")): 
    spaces.append(int(tag.string.split()[0])) 

print(spaces)

出力：

を

[526, 89, 53, 154, 194, 526, 89, 53, 154, 194]

出典

2017-07-10 11:35:18

OPが望むものは混乱しています。タグ。テキストプロパティを使うことは、私が知らなかった素敵な機能です。 – Ludisposed

@ルイスディスカバリー "私があなたを正しく理解すれば"、ハハ。 OPにはスペースが含まれていて、xxxスペースのタグがコンテンツとして残っているので、その意味を想定していました。 –

はい、再読み込み後：OP： "so each x spaces"私はあなたが正しいと思います – Ludisposed

まず見つけるすべての強力なタグ

strong_tags = soup.find_all('strong') 
spaces_in_tags = {} 

# Afterwards iterate over the tags.. Then do either 

for strong in strong_tags: 
    # 1. (EDIT add \s+ so multiple spaces between words will count as 1 space) 
    number_of_spaces = len(re.findall('\s+', strong)) 
    # 2. 
    number_of_spaces2 = len(strong.split())-1 

    # Then add them do a dictionary/list whatever suits your need 
    # For example to have the string as the key parameter in a dictionary 
    spaces_in_tags[strong] = number_of_spaces

出典

2017-07-10 11:24:25 Ludisposed

美しいスープと正規表現を使用

答えて

関連する問題