2016-07-20 15 views

答えて

1

デフォルトでは、lxml will create a parent div when the string contains multiple elementsです。

あなたが代わりに個々の断片を扱うことができます:

from lxml import html 
test_cases = ['<div>1</div><div>2</div>', 'I am pure text'] 
for test_case in test_cases: 
    fragments = html.fragments_fromstring(test_case) 
    print(fragments) 
    output = '' 
    for fragment in fragments: 
     if isinstance(fragment, str): 
      output += fragment 
     else: 
      output += html.tostring(fragment).decode('UTF-8') 
    print(output) 

出力:

[<Element div at 0x3403ea8>, <Element div at 0x3489368>] 
<div>1</div><div>2</div> 
['I am pure text'] 
I am pure text 
関連する問題