1
私はpython 3.5を使用してhtml文字列をスクラップし、そのhtml文字列内のNameを抽出しています。次のように私のコードは次のとおりです。Pythonのscrapyページがxpathと連携していません
from scrapy.selector import Selector
html_string = '<html
xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:w="urn:schemas-microsoft-com:office:word"
xmlns="http://www.w3.org/TR/REC-html40">
<head>
<link href="https://www.rentlinx.com/Templates/MainStyle.css" type="text/css" rel="stylesheet" ></link>
<style>
<!-- /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0in; margin-bottom:.0001pt; font-size:12.0pt;} -->
</style>
</head>
<body>
<table width="100%" cellpadding="12">
<tr>
<td width="100%" style="background: #e8f2f5;">
<img src="https://www.rentlinx.com/images/page-logo-v15.png" alt="new lead" style="padding-top: 2px; padding-bottom: 2px;" />
</td>
</tr>
</table>
<br />
<table width="100%" cellpadding="12">
<tr>
<td>
<p style="font-family: Tahoma, sans-serif; font-size: 16px; line-height: 20px; color: #666; font-weight: bold;">You have a new lead!</p>
<p style="font-family: Tahoma, sans-serif; font-size: 16px; line-height: 20px; color: #666;"> This
<strong>basic (free)</strong> lead was generated for your property courtesy of RentLinx.
</p>
<br />
<table cellpadding="7" style="width: 200px;">
<tr>
<td style="font-family: Tahoma, sans-serif; background: #009dc6; color: white; font-size: 20px; display: inline-block;"> Lead Details </td>
</tr>
</table>
<table cellpadding="0" cellspacing="0" border="0">
<tr>
<td style="border-right: 1px solid #CCC; width: 12px;"></td>
<td style="padding: 10px;">
<span style="font-family: Tahoma, sans-serif; color: #00aedb; font-size: 12px; font-weight: bold; text-transform: uppercase; line-height: 15px;">From:</span>
<br />
<span style="font-family: Tahoma, sans-serif; font-size: 16px; line-height: 20px; color: #666;">Foo bar</span>
</td>
</tr>
<tr>
<td style="border-right: 1px solid #CCC;"></td>
<td style="padding: 10px;">
<span style="font-family: Tahoma, sans-serif; color: #00aedb; font-size: 12px; font-weight: bold; text-transform: uppercase; line-height: 15px;">Date:</span>
<br />
<span style="font-family: Tahoma, sans-serif; font-size: 16px; line-height: 20px; color: #666;">5/21/2016 3:24:10 AM</span>
</td>
</tr>
<tr>
<td style="border-right: 1px solid #CCC;"></td>
<td style="padding: 10px;">
<span style="font-family: Tahoma, sans-serif; color: #00aedb; font-size: 12px; font-weight: bold; text-transform: uppercase; line-height: 15px;">Regarding:</span>
<br />
<span style="font-family: Tahoma, sans-serif; font-size: 16px; line-height: 20px; color: #666;">My street and your street</span>
</td>
</tr>
<tr>
<td style="border-right: 1px solid #CCC;"></td>
<td style="padding: 10px;">
<span style="font-family: Tahoma, sans-serif; color: #00aedb; font-size: 12px; font-weight: bold; text-transform: uppercase; line-height: 15px;">Contact Information:</span>
<br />
<span class="value" style="line-height: 28px; padding-top: 5px;">
<a href="tel:1112223333" title="Call" style="color: #007998; font-family: Tahoma, sans-serif; font-size: 16px; line-height: 20px; text-decoration: none;">(111) 222-3333</a>
<br />
<a href="mailto:[email protected]" title="Email" style="color: #007998; font-family: Tahoma, sans-serif; font-size: 16px; line-height: 20px; text-decoration: none;">[email protected]</a>
</span>
</td>
</tr>
<tr>
<td style="border-right: 1px solid #CCC;"></td>
<td style="padding: 10px;">
<span style="font-family: Tahoma, sans-serif; color: #00aedb; font-size: 12px; font-weight: bold; text-transform: uppercase; line-height: 15px;">Comments:</span>
<br />
<span style="font-family: Tahoma, sans-serif; font-size: 16px; line-height: 20px; color: #666;"> Hi, I like your apartment. Thanks </span>
</td>
</tr>
<tr>
<td style="border-right: 1px solid #CCC;"></td>
<td style="padding: 10px;">
<span style="font-family: Tahoma, sans-serif; color: #00aedb; font-size: 12px; font-weight: bold; text-transform: uppercase; line-height: 15px;">Lead From:</span>
<br />
<span style="font-family: Tahoma, sans-serif; font-size: 16px; line-height: 20px; color: #666;">
<a href="https://www.marsplanet.com/13933360/" title="Lead from Mars Planet" style="font-family: Tahoma, sans-serif; font-size: 16px; line-height: 20px; color: #666;">MarsPlanet</a>
</span>
</td>
</tr>
</table>
</td>
</tr>
</table>
<table width="100%" cellpadding="12">
<tr>
<td>
<p style="font-family: tahoma, sans-serif; font-size: 16px; line-height: 20px; color: #666;"> Thanks,
<br /> The RentLinx Team
</p>
<p style="background-color: #3f3d5d; color: White; padding: 8px; "> Want more leads like this? Upgrade your property to RentLinx
<strong>
<em>Plus!</em>
</strong> today! Just
<a href="https://www.rentlinx.com" style="color: #fff;">login to RentLinx</a>, then click "Go Plus!"
</p>
<p>
<a href="http://www.facebook.com/rentlinx">
<img src="https://www.rentlinx.com/images/facebook/FB-f-Logo__blue_29.png" width="29" height="29" style="margin: 8px; border: 0;" align="absmiddle" />
</a>Like RentLinx? Please like us on facebook!
<a href="http://www.facebook.com/rentlinx">www.facebook.com/rentlinx</a>
</p>
</td>
</tr>
</table>
<img src="http://delivery.rentlinx.com/" alt="" width="1" height="1" border="0" style="height:1px !important;width:1px !important;border-width:0 !important;margin-top:0 !important;margin-bottom:0 !important;margin-right:0 !important;margin-left:0 !important;padding-top:0 !important;padding-bottom:0 !important;padding-right:0 !important;padding-left:0 !important;"/>
</body>
</html>'
s = Selector(text=html_string)
name = s.xpath('/html/body/table[2]/tbody/tr/td/table[2]/tbody/tr[1]/td[2]/span[2]/text()').extract()[0]
print(name)
おかげでラファエルをお試しください。はい、「tobdoy」がインスペクタに現れましたが、スズhtmlはありませんでした。できます。 'name = Selector.xpath( '/ html/body/table [2]/tr/td/table [2]/tr [1]/td)のようにxpathを使うと、' Selector'でも動作します。 extract_first() 'なぜあなたは' Selector'ではなく 'response'を使うようアドバイスしましたか?[2]/span [2]/text() – JVK
セレクタは推奨されなくなりました。私の答えを見つけたら、それを正しい答えとして受け入れることを検討してください=)[編集] - > [This](http://doc.scrapy.org/en/latest/news.html#id1)これは実際には間違って、セレクタは行く方法でなければなりません! –