有用なパターンは、前に来た<tr class="style6"><td>SomeStuff</td></tr>
を数えることです。あなたの例では最初のグループのために
、それは次のようになります。第二グループのために
//tr[not(@class="style6")][count(preceding-sibling::tr[@class="style6"])=1]
:
//tr[not(@class="style6")][count(preceding-sibling::tr[@class="style6"])=2]
など
私が使用していない
nokogiriので、ここではPythonとlxml
を使用して例:
>>> import lxml.html
>>> from pprint import pprint
>>> doc = lxml.html.fromstring('''<tr class="style6"><td>SomeStuff</td></tr>
... <tr><td>Some other stuff group 1</td></tr>
... <tr><td>Some other stuff group 1</td></tr>
... <tr><td>Some other stuff group 1</td></tr>
... <tr><td>Some other stuff group 1</td></tr>
... <tr><td>Some other stuff group 1</td></tr>
... <tr class="style6"><td>SomeStuff</td></tr>
... <tr><td>Some other stuff group 2</td></tr>
... <tr><td>Some other stuff group 2</td></tr>
... <tr><td>Some other stuff group 2</td></tr>
... <tr><td>Some other stuff group 2</td></tr>
... <tr><td>Some other stuff group 2</td></tr>
... <tr class="style6"><td>SomeStuff</td></tr>
... <tr><td>Some other stuff group 3</td></tr>
... <tr><td>Some other stuff group 3</td></tr>
... <tr><td>Some other stuff group 3</td></tr>
... <tr><td>Some other stuff group 3</td></tr>
... <tr><td>Some other stuff group 3</td></tr>''')
>>> pprint(list(lxml.html.tostring(row)
... for row in doc.xpath('''
... //tr[not(@class="style6")]
... [count(preceding-sibling::tr[@class="style6"])=1]''')))
[b'<tr><td>Some other stuff group 1</td></tr>\n',
b'<tr><td>Some other stuff group 1</td></tr>\n',
b'<tr><td>Some other stuff group 1</td></tr>\n',
b'<tr><td>Some other stuff group 1</td></tr>\n',
b'<tr><td>Some other stuff group 1</td></tr>\n']
>>> pprint(list(lxml.html.tostring(row)
... for row in doc.xpath('''
... //tr[not(@class="style6")]
... [count(preceding-sibling::tr[@class="style6"])=2]''')))
[b'<tr><td>Some other stuff group 2</td></tr>\n',
b'<tr><td>Some other stuff group 2</td></tr>\n',
b'<tr><td>Some other stuff group 2</td></tr>\n',
b'<tr><td>Some other stuff group 2</td></tr>\n',
b'<tr><td>Some other stuff group 2</td></tr>\n']
>>> pprint(list(lxml.html.tostring(row)
... for row in doc.xpath('''
... //tr[not(@class="style6")]
... [count(preceding-sibling::tr[@class="style6"])=3]''')))
[b'<tr><td>Some other stuff group 3</td></tr>\n',
b'<tr><td>Some other stuff group 3</td></tr>\n',
b'<tr><td>Some other stuff group 3</td></tr>\n',
b'<tr><td>Some other stuff group 3</td></tr>\n',
b'<tr><td>Some other stuff group 3</td></tr>']
>>>