2017-06-06 6 views
0

私はHTMLテーブルを埋め込んだ電子メールを持っています。私はBeautifulSoupを使用してデータテーブルを抽出しますが、このメソッドではキャプチャできないデータがテーブルのすぐ外側にあります。私が言ったようにPython - 特定の単語のHTMLファイルにRegExを使用しますか?

が、私は、テーブル細胞内部からの情報をキャプチャするbs4を使用します。ここでは

は、2つのデータテーブルを持つ例のメールです。私はこのデータをDataFrameに変換します。私は Packageの価格をキャプチャして、各魚重量の値に追加したいと思います。

for line in f: 
    if ("Package" in line): 
    print("line:", line) 

...のような簡単なコマンドは何も印刷しません。私は近いHTMLを調べるとき、私はそれがこのようになっていることを参照してください。

<html> 
<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> 
<title>FW: NEFS 11 fish available</title> 
<link rel="important stylesheet" href=""> 
<style>div.headerdisplayname {font-weight:bold;}</style></head> 
<body> 
<table border=0 cellspacing=0 cellpadding=0 width="100%" class="header-part1"><tr><td><b>Subject: </b>FW: NEFS 11 fish available</td></tr><tr><td><b>From: </b>Claire Fitz-Gerald <[email protected]></td></tr><tr><td><b>Date: </b>6/2/2016 5:55 PM</td></tr></table><br> 
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"> 
<head> 
<meta http-equiv="Content-Type" content="text/html; "> 
<meta name="Generator" content="Microsoft Word 15 (filtered medium)"> 
<style><!-- 
/* Font Definitions */ 
@font-face 
    {font-family:"Cambria Math"; 
    panose-1:2 4 5 3 5 4 6 3 2 4;} 
@font-face 
    {font-family:Calibri; 
    panose-1:2 15 5 2 2 2 4 3 2 4;} 
@font-face 
    {font-family:"Franklin Gothic Demi"; 
    panose-1:2 11 7 3 2 1 2 2 2 4;} 
@font-face 
    {font-family:"Franklin Gothic Book"; 
    panose-1:2 11 5 3 2 1 2 2 2 4;} 
@font-face 
    {font-family:Verdana; 
    panose-1:2 11 6 4 3 5 4 4 2 4;} 
/* Style Definitions */ 
p.MsoNormal, li.MsoNormal, div.MsoNormal 
    {margin:0in; 
    margin-bottom:.0001pt; 
    font-size:12.0pt; 
    font-family:"Times New Roman",serif;} 
a:link, span.MsoHyperlink 
    {mso-style-priority:99; 
    color:#0563C1; 
    text-decoration:underline;} 
a:visited, span.MsoHyperlinkFollowed 
    {mso-style-priority:99; 
    color:#954F72; 
    text-decoration:underline;} 
p.msonormal0, li.msonormal0, div.msonormal0 
    {mso-style-name:msonormal; 
    mso-margin-top-alt:auto; 
    margin-right:0in; 
    mso-margin-bottom-alt:auto; 
    margin-left:0in; 
    font-size:12.0pt; 
    font-family:"Times New Roman",serif;} 
span.EmailStyle18 
    {mso-style-type:personal-reply; 
    font-family:"Calibri",sans-serif; 
    color:#1F497D;} 
.MsoChpDefault 
    {mso-style-type:export-only; 
    font-family:"Calibri",sans-serif;} 
@page WordSection1 
    {size:8.5in 11.0in; 
    margin:1.0in 1.0in 1.0in 1.0in;} 
div.WordSection1 
    {page:WordSection1;} 
--></style><!--[if gte mso 9]><xml> 
<o:shapedefaults v:ext="edit" spidmax="1026" /> 
</xml><![endif]--><!--[if gte mso 9]><xml> 
<o:shapelayout v:ext="edit"> 
<o:idmap v:ext="edit" data="1" /> 
</o:shapelayout></xml><![endif]--> 
</head> 
<body lang="EN-US" link="#0563C1" vlink="#954F72"> 
<div class="WordSection1"> 
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D">Please see below quota listings.<o:p></o:p></span></p> 
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D"><o:p>&nbsp;</o:p></span></p> 
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D">Thanks,<o:p></o:p></span></p> 
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D"><o:p>&nbsp;</o:p></span></p> 
<p class="MsoNormal"><span style="font-family:&quot;Franklin Gothic Book&quot;,sans-serif;color:#1F497D">Claire Fitz-Gerald<o:p></o:p></span></p> 
<p class="MsoNormal"><i><span style="font-size:10.0pt;font-family:&quot;Franklin Gothic Book&quot;,sans-serif;color:#1F497D"><o:p>&nbsp;</o:p></span></i></p> 
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:&quot;Franklin Gothic Demi&quot;,sans-serif;color:#002776">Cape Cod Commercial Fishermen's Alliance<o:p></o:p></span></b></p> 
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:&quot;Franklin Gothic Book&quot;,sans-serif;color:#DE3500">~ Small Boats.&nbsp; Big Ideas. ~</span></b><b><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#DE3500"><o:p></o:p></span></b></p> 
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:&quot;Franklin Gothic Demi&quot;,sans-serif;color:#002776">Celebrating 25 years. Navigating 25 more.</span></b><span style="font-size:11.0pt;font-family:&quot;Franklin Gothic Book&quot;,sans-serif;color:#002060"> 
<o:p></o:p></span></p> 
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:#1F497D"><o:p>&nbsp;</o:p></span></p> 
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif"> NEFS V [mailto:[email protected]] 
<br> 
<b>Sent:</b> Thursday, June 02, 2016 12:32 PM<br> 
<b>To:</b> Ben Martens &lt;[email protected]&gt;; Claire Fitz-Gerald &lt;[email protected]&gt;; Dave Leveille 2 &amp; 6 &lt;[email protected]&gt;; Hank SHS &lt;[email protected]&gt;; John Haran 10 &amp; 13 &lt;[email protected]&gt;; Linda MaCann 7 &amp; 8 &lt;[email protected]&gt;; 
mike walsh 6 &lt;[email protected]&gt;; Patrick NCCS &lt;[email protected]&gt;; paula lynch 12 &lt;[email protected]&gt;; Spice Montgomery 3 &lt;[email protected]&gt;; Stephanie Rafael-DeMello 9 &lt;[email protected]&gt;; tory bramante 6 &lt;[email protected]&gt;; NEFS 
11 Charles Felch &lt;[email protected]&gt;; NEFS 11 David Goethel &lt;[email protected]&gt;; NEFS 11 Fanel Dobre &lt;[email protected]&gt;; NEFS 11 Geordie King &lt;[email protected]&gt;; NEFS 11 Jamie Hayward &lt;[email protected]&gt;; NEFS 11 Jayson Driscoll &lt;[email protected]&gt;; 
NEFS 11 Mike and Pat Anderson &lt;[email protected]&gt;; NEFS 11 Neil Pike &lt;[email protected]&gt;; NEFS 11 Richard Anderson &lt;[email protected]&gt;; NEFS 11 Tom Lyons &lt;[email protected]&gt;; Puggy &lt;[email protected]&gt;<br> 
<b>Subject:</b> NEFS 11 fish available<o:p></o:p></span></p> 
<p class="MsoNormal"><o:p>&nbsp;</o:p></p> 
<div> 
<div> 
<p class="MsoNormal"><span style="font-family:&quot;Arial&quot;,sans-serif">All,<o:p></o:p></span></p> 
</div> 
<div> 
<p class="MsoNormal"><span style="font-family:&quot;Arial&quot;,sans-serif">NEFS 11 has the following available:<o:p></o:p></span></p> 
</div> 
<div> 
<p class="MsoNormal"><span style="font-family:&quot;Arial&quot;,sans-serif"><o:p>&nbsp;</o:p></span></p> 
</div> 
<div> 
<p class="MsoNormal"><b><u><span style="font-size:13.5pt;font-family:&quot;Arial&quot;,sans-serif">Package 1:&nbsp; $ 500.00</span></u></b><span style="font-family:&quot;Arial&quot;,sans-serif"><o:p></o:p></span></p> 
</div> 
<div> 
<table class="MsoNormalTable" border="0" cellspacing="0" cellpadding="0" width="396" style="width:297.0pt;border-collapse:collapse"> 
<tbody> 
<tr style="height:15.0pt"> 
<td width="232" style="width:174.0pt;padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">gb cod east</span><o:p></o:p></p> 
</td> 
<td width="55" style="width:41.0pt;padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td width="109" style="width:82.0pt;padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">1</span><o:p></o:p></p> 
</td> 
</tr> 
<tr style="height:15.0pt"> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">gb cod west</span><o:p></o:p></p> 
</td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">5</span><o:p></o:p></p> 
</td> 
</tr> 
<tr style="height:15.0pt"> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">gom cod</span><o:p></o:p></p> 
</td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">148</span><o:p></o:p></p> 
</td> 
</tr> 
<tr style="height:15.0pt"> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">gb haddock east</span><o:p></o:p></p> 
</td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">1</span><o:p></o:p></p> 
</td> 
</tr> 
<tr style="height:15.0pt"> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">gb haddock west</span><o:p></o:p></p> 
</td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">2</span><o:p></o:p></p> 
</td> 
</tr> 
<tr style="height:15.0pt"> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">gom haddock</span><o:p></o:p></p> 
</td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">12</span><o:p></o:p></p> 
</td> 
</tr> 
<tr style="height:15.0pt"> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">white hake</span><o:p></o:p></p> 
</td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">4</span><o:p></o:p></p> 
</td> 
</tr> 
<tr style="height:15.0pt"> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">pollock</span><o:p></o:p></p> 
</td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">162</span><o:p></o:p></p> 
</td> 
</tr> 
<tr style="height:15.0pt"> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">redfish</span><o:p></o:p></p> 
</td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">25</span><o:p></o:p></p> 
</td> 
</tr> 
</tbody> 
</table> 
</div> 
<div> 
<p class="MsoNormal"><span style="font-family:&quot;Arial&quot;,sans-serif"><o:p>&nbsp;</o:p></span></p> 
</div> 
<div> 
<div> 
<p class="MsoNormal"><span style="font-family:&quot;Arial&quot;,sans-serif">​</span><b><u><span style="font-size:13.5pt;font-family:&quot;Arial&quot;,sans-serif">Package 2: $ 5,225.00</span></u></b><span style="font-family:&quot;Arial&quot;,sans-serif"><o:p></o:p></span></p> 
</div> 
<div> 
<table class="MsoNormalTable" border="0" cellspacing="0" cellpadding="0" width="387" style="width:290.0pt;border-collapse:collapse"> 
<tbody> 
<tr style="height:15.0pt"> 
<td width="232" style="width:174.0pt;padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">gom cod</span><o:p></o:p></p> 
</td> 
<td width="45" style="width:34.0pt;padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td width="109" style="width:82.0pt;padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">916</span><o:p></o:p></p> 
</td> 
</tr> 
<tr style="height:15.0pt"> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">gom winter fl</span><o:p></o:p></p> 
</td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">498</span><o:p></o:p></p> 
</td> 
</tr> 
</tbody> 
</table> 
<p class="MsoNormal"><span style="font-family:&quot;Arial&quot;,sans-serif;display:none"><o:p>&nbsp;</o:p></span></p> 
<table class="MsoNormalTable" border="0" cellspacing="0" cellpadding="0" width="387" style="width:290.0pt;border-collapse:collapse"> 
<tbody> 
<tr style="height:15.0pt"> 
<td width="232" style="width:174.0pt;padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">gom haddock</span><o:p></o:p></p> 
</td> 
<td width="45" style="width:34.0pt;padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td width="109" style="width:82.0pt;padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">284</span><o:p></o:p></p> 
</td> 
</tr> 
<tr style="height:15.0pt"> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">white hake</span><o:p></o:p></p> 
</td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">505</span><o:p></o:p></p> 
</td> 
</tr> 
<tr style="height:15.0pt"> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">dab</span><o:p></o:p></p> 
</td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">1,293</span><o:p></o:p></p> 
</td> 
</tr> 
<tr style="height:15.0pt"> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">pollock</span><o:p></o:p></p> 
</td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">812</span><o:p></o:p></p> 
</td> 
</tr> 
<tr style="height:15.0pt"> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">redfish</span><o:p></o:p></p> 
</td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">1,910</span><o:p></o:p></p> 
</td> 
</tr> 
<tr style="height:15.0pt"> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">witch fl</span><o:p></o:p></p> 
</td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">352</span><o:p></o:p></p> 
</td> 
</tr> 
<tr style="height:15.0pt"> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">cc/gom yellowtail</span><o:p></o:p></p> 
</td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">306</span><o:p></o:p></p> 
</td> 
</tr> 
</tbody> 
</table> 
<p class="MsoNormal"><span style="font-family:&quot;Arial&quot;,sans-serif">​<o:p></o:p></span></p> 
</div> 
</div> 
<div> 
<p class="MsoNormal"><b><u><span style="font-size:13.5pt;font-family:&quot;Arial&quot;,sans-serif">Package 3:&nbsp; $ 44,150.00</span></u></b><span style="font-family:&quot;Arial&quot;,sans-serif"><o:p></o:p></span></p> 
</div> 
<div> 
<table class="MsoNormalTable" border="0" cellspacing="0" cellpadding="0" width="449" style="width:337.0pt;border-collapse:collapse"> 
<tbody> 
<tr style="height:15.0pt"> 
<td width="232" style="width:174.0pt;padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">gb cod east</span><o:p></o:p></p> 
</td> 
<td width="45" style="width:34.0pt;padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td width="63" style="width:47.0pt;padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td width="109" style="width:82.0pt;padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">5</span><o:p></o:p></p> 
</td> 
</tr> 
<tr style="height:15.0pt"> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">gb cod west</span><o:p></o:p></p> 
</td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">17</span><o:p></o:p></p> 
</td> 
</tr> 
<tr style="height:15.0pt"> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">gom cod</span><o:p></o:p></p> 
</td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">5,000</span><o:p></o:p></p> 
</td> 
</tr> 
<tr style="height:15.0pt"> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">gom winter fl</span><o:p></o:p></p> 
</td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">2,900</span><o:p></o:p></p> 
</td> 
</tr> 
</tbody> 
</table> 
</div> 
<table class="MsoNormalTable" border="0" cellspacing="0" cellpadding="0" width="449" style="width:337.0pt;border-collapse:collapse"> 
<tbody> 
<tr style="height:15.0pt"> 
<td width="232" style="width:174.0pt;padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">gb haddock east</span><o:p></o:p></p> 
</td> 
<td width="45" style="width:34.0pt;padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td width="63" style="width:47.0pt;padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td width="109" style="width:82.0pt;padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">836</span><o:p></o:p></p> 
</td> 
</tr> 
<tr style="height:15.0pt"> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">gb haddock west</span><o:p></o:p></p> 
</td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">2,118</span><o:p></o:p></p> 
</td> 
</tr> 
<tr style="height:15.0pt"> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">gom haddock</span><o:p></o:p></p> 
</td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">18,000</span><o:p></o:p></p> 
</td> 
</tr> 
<tr style="height:15.0pt"> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">white hake</span><o:p></o:p></p> 
</td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">8,842</span><o:p></o:p></p> 
</td> 
</tr> 
<tr style="height:15.0pt"> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">dab</span><o:p></o:p></p> 
</td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">8,650</span><o:p></o:p></p> 
</td> 
</tr> 
<tr style="height:15.0pt"> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">pollock</span><o:p></o:p></p> 
</td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">78,000</span><o:p></o:p></p> 
</td> 
</tr> 
<tr style="height:15.0pt"> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">redfish</span><o:p></o:p></p> 
</td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">35,923</span><o:p></o:p></p> 
</td> 
</tr> 
<tr style="height:15.0pt"> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">witch fl</span><o:p></o:p></p> 
</td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">3,250</span><o:p></o:p></p> 
</td> 
</tr> 
<tr style="height:15.0pt"> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">cc/gom yellowtail</span><o:p></o:p></p> 
</td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"></td> 
<td style="padding:0in 0in 0in 0in;height:15.0pt"> 
<p class="MsoNormal"><span style="color:black">2,250</span><o:p></o:p></p> 
</td> 
</tr> 
</tbody> 
</table> 
<div> 
<p class="MsoNormal"><o:p>&nbsp;</o:p></p> 
</div> 
<div> 
<div> 
<p class="MsoNormal"><b><u><span style="font-size:13.5pt;font-family:&quot;Arial&quot;,sans-serif">Package 4:&nbsp; $ 43,135.00</span></u></b><span style="font-family:&quot;Arial&quot;,sans-serif"><o:p></o:p></span></p> 
</div> 
</div> 
<div> 
<div> 
<p class="MsoNormal"><span style="font-family:&quot;Verdana&quot;,sans-serif">GOM cod&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;6,900</span><o:p></o:p></p> 
</div> 
</div> 
<div> 
<div> 
<p class="MsoNormal"><span style="font-family:&quot;Verdana&quot;,sans-serif">dabs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3,800</span><o:p></o:p></p> 
</div> 
</div> 
<div> 
<div> 
<p class="MsoNormal"><span style="font-family:&quot;Verdana&quot;,sans-serif">witch fl&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 4,000</span><o:p></o:p></p> 
</div> 
</div> 
<div> 
<div> 
<p class="MsoNormal"><span style="font-family:&quot;Verdana&quot;,sans-serif">cc/gom yt&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5,100</span><o:p></o:p></p> 
</div> 
</div> 
<div> 
<div> 
<p class="MsoNormal"><span style="font-family:&quot;Arial&quot;,sans-serif"><o:p>&nbsp;</o:p></span></p> 
</div> 
</div> 
<div> 
<div> 
<p class="MsoNormal"><b><span style="font-size:13.5pt;font-family:&quot;Arial&quot;,sans-serif">GB West Cod&nbsp; - 3,251 lbs libe weight = $ 6,500.00</span></b><span style="font-family:&quot;Arial&quot;,sans-serif"><o:p></o:p></span></p> 
</div> 
<p class="MsoNormal"><span style="font-family:&quot;Arial&quot;,sans-serif"><br clear="all"> 
<o:p></o:p></span></p> 
</div> 
<p class="MsoNormal"><br> 
-- <o:p></o:p></p> 
<div> 
<div> 
<div> 
<div> 
<div> 
<div> 
<div> 
<div> 
<p class="MsoNormal">Daniel Salerno<o:p></o:p></p> 
</div> 
<div> 
<p class="MsoNormal">NEFS 5 &amp; NEFS 11<o:p></o:p></p> 
</div> 
<div> 
<p class="MsoNormal">401-932-0070<o:p></o:p></p> 
</div> 
<div> 
<p class="MsoNormal">401-633-6539 (fax)<o:p></o:p></p> 
</div> 
</div> 
</div> 
</div> 
</div> 
</div> 
</div> 
</div> 
</div> 
</div> 
</body> 
</html> 
</body> 
</html> 

私はHTMLと専門家ではないよ、キャプチャし、それに対応する魚の重量値にPackage価格を追加する方法はありますか?

マイ関連するコードである:

package_regex = re.compile(r'package(.*)(?=\n)', re.IGNORECASE) 
with open(file_path) as in_f: 
    for line in in_f: 
     for match in package_regex.finditer(in_f.read()): 
      price, stuff = match.groups() 
      print("price:", price) 
      print("stuff:", stuff) 



with open(file_path) as in_f: 
    msg = email.message_from_file(in_f)    #type: <class 'email.message.Messgae'> 

html_msg = msg.get_payload(1)      #type: <class 'email.message.Message'> 

body = html_msg.get_payload(decode=True)   #type: <class 'bytes'> or type: 'int' 

html = body.decode()        #type: <class 'str'> 
for line in html: 
    if ("Package" in line): 
     print("line:", line) 

tables = bs4.BeautifulSoup(html).find_all("table") #type: <class 'bs4.element.ResultSet'> 
data = [] 
for table in tables: 
    for row in table.find_all("tr"): 
     data.append([cell.text.strip() for cell in row.find_all("td")]) 
+0

私はライン 'チャンク= soup.find_all( 'P'、{ 'クラス': "MsoNormalを"})を追加しようとした' '最後for'ループの前に、最後の' for'ループのように変更: 'チャンクでの行のために: 場合line.textの 'パッケージ':テーブルのテーブルの 印刷line.text :' ... forループ変わらないの残りの部分。それぞれの行に 'Package#'という行があり、それに対応するFish-Weight値に対応するパッケージが続きます。それもあなたのために働くかもしれませんか?明らかでない場合は、完全なコードを投稿することができます。 – davedwards

+0

'chunkは' 'for table in tables'の後ろにありますか?もしあなたがそれを投稿できるのであれば、それは理解しやすいでしょう。 – theprowler

+0

はい、 'chunks'がどこに行く問題では、私はtables''後にそれを置くと 'ラインのchunks'ループで – davedwards

答えて

1
from bs4 import BeautifulSoup 
soup = BeautifulSoup(html_doc, 'html.parser') 

tables = soup.find_all(lambda tag: tag.name=='table') # and tag.has_attr('id') and tag['id']=="Table1") 

chunks = soup.find_all('p', {'class' : "MsoNormal"}) 

for line in chunks: 
    if 'Package' in line.text: 
     print line.text 
     for table in tables: 
      for row in table.find_all("tr"): 
       print [cell.text.strip() for cell in row.find_all("td")] 

出力:

Package 1:  $ 500.00 
[u'Subject: FW: NEFS 11 fish available'] 
[u'From: Claire Fitz-Gerald'] 
[u'Date: 6/2/2016 5:55 PM'] 
[u'gb cod east', u'', u'1'] 
[u'gb cod west', u'', u'5'] 
[u'gom cod', u'', u'148'] 
[u'gb haddock east', u'', u'1'] 
[u'gb haddock west', u'', u'2'] 
[u'gom haddock', u'', u'12'] 
[u'white hake', u'', u'4'] 
[u'pollock', u'', u'162'] 
[u'redfish', u'', u'25'] 
[u'gom cod', u'', u'916'] 
[u'gom winter fl', u'', u'498'] 
[u'gom haddock', u'', u'284'] 
[u'white hake', u'', u'505'] 
[u'dab', u'', u'1,293'] 
[u'pollock', u'', u'812'] 
[u'redfish', u'', u'1,910'] 
[u'witch fl', u'', u'352'] 
[u'cc/gom yellowtail', u'', u'306'] 
[u'gb cod east', u'', u'', u'5'] 
[u'gb cod west', u'', u'', u'17'] 
[u'gom cod', u'', u'', u'5,000'] 
[u'gom winter fl', u'', u'', u'2,900'] 
[u'gb haddock east', u'', u'', u'836'] 
[u'gb haddock west', u'', u'', u'2,118'] 
[u'gom haddock', u'', u'', u'18,000'] 
[u'white hake', u'', u'', u'8,842'] 
[u'dab', u'', u'', u'8,650'] 
[u'pollock', u'', u'', u'78,000'] 
[u'redfish', u'', u'', u'35,923'] 
[u'witch fl', u'', u'', u'3,250'] 
[u'cc/gom yellowtail', u'', u'', u'2,250'] 
​Package 2: $ 5,225.00 
[u'Subject: FW: NEFS 11 fish available'] 
[u'From: Claire Fitz-Gerald'] 
[u'Date: 6/2/2016 5:55 PM'] 
[u'gb cod east', u'', u'1'] 
[u'gb cod west', u'', u'5'] 
[u'gom cod', u'', u'148'] 
[u'gb haddock east', u'', u'1'] 
[u'gb haddock west', u'', u'2'] 
[u'gom haddock', u'', u'12'] 
[u'white hake', u'', u'4'] 
[u'pollock', u'', u'162'] 
[u'redfish', u'', u'25'] 
[u'gom cod', u'', u'916'] 
[u'gom winter fl', u'', u'498'] 
[u'gom haddock', u'', u'284'] 
[u'white hake', u'', u'505'] 
[u'dab', u'', u'1,293'] 
[u'pollock', u'', u'812'] 
[u'redfish', u'', u'1,910'] 
[u'witch fl', u'', u'352'] 
[u'cc/gom yellowtail', u'', u'306'] 
[u'gb cod east', u'', u'', u'5'] 
[u'gb cod west', u'', u'', u'17'] 
[u'gom cod', u'', u'', u'5,000'] 
[u'gom winter fl', u'', u'', u'2,900'] 
[u'gb haddock east', u'', u'', u'836'] 
[u'gb haddock west', u'', u'', u'2,118'] 
[u'gom haddock', u'', u'', u'18,000'] 
[u'white hake', u'', u'', u'8,842'] 
[u'dab', u'', u'', u'8,650'] 
[u'pollock', u'', u'', u'78,000'] 
[u'redfish', u'', u'', u'35,923'] 
[u'witch fl', u'', u'', u'3,250'] 
[u'cc/gom yellowtail', u'', u'', u'2,250'] 
Package 3:  $ 44,150.00 
[u'Subject: FW: NEFS 11 fish available'] 
[u'From: Claire Fitz-Gerald'] 
[u'Date: 6/2/2016 5:55 PM'] 
[u'gb cod east', u'', u'1'] 
[u'gb cod west', u'', u'5'] 
[u'gom cod', u'', u'148'] 
[u'gb haddock east', u'', u'1'] 
[u'gb haddock west', u'', u'2'] 
[u'gom haddock', u'', u'12'] 
[u'white hake', u'', u'4'] 
[u'pollock', u'', u'162'] 
[u'redfish', u'', u'25'] 
[u'gom cod', u'', u'916'] 
[u'gom winter fl', u'', u'498'] 
[u'gom haddock', u'', u'284'] 
[u'white hake', u'', u'505'] 
[u'dab', u'', u'1,293'] 
[u'pollock', u'', u'812'] 
[u'redfish', u'', u'1,910'] 
[u'witch fl', u'', u'352'] 
[u'cc/gom yellowtail', u'', u'306'] 
[u'gb cod east', u'', u'', u'5'] 
[u'gb cod west', u'', u'', u'17'] 
[u'gom cod', u'', u'', u'5,000'] 
[u'gom winter fl', u'', u'', u'2,900'] 
[u'gb haddock east', u'', u'', u'836'] 
[u'gb haddock west', u'', u'', u'2,118'] 
[u'gom haddock', u'', u'', u'18,000'] 
[u'white hake', u'', u'', u'8,842'] 
[u'dab', u'', u'', u'8,650'] 
[u'pollock', u'', u'', u'78,000'] 
[u'redfish', u'', u'', u'35,923'] 
[u'witch fl', u'', u'', u'3,250'] 
[u'cc/gom yellowtail', u'', u'', u'2,250'] 
Package 4:  $ 43,135.00 
[u'Subject: FW: NEFS 11 fish available'] 
[u'From: Claire Fitz-Gerald'] 
[u'Date: 6/2/2016 5:55 PM'] 
[u'gb cod east', u'', u'1'] 
[u'gb cod west', u'', u'5'] 
[u'gom cod', u'', u'148'] 
[u'gb haddock east', u'', u'1'] 
[u'gb haddock west', u'', u'2'] 
[u'gom haddock', u'', u'12'] 
[u'white hake', u'', u'4'] 
[u'pollock', u'', u'162'] 
[u'redfish', u'', u'25'] 
[u'gom cod', u'', u'916'] 
[u'gom winter fl', u'', u'498'] 
[u'gom haddock', u'', u'284'] 
[u'white hake', u'', u'505'] 
[u'dab', u'', u'1,293'] 
[u'pollock', u'', u'812'] 
[u'redfish', u'', u'1,910'] 
[u'witch fl', u'', u'352'] 
[u'cc/gom yellowtail', u'', u'306'] 
[u'gb cod east', u'', u'', u'5'] 
[u'gb cod west', u'', u'', u'17'] 
[u'gom cod', u'', u'', u'5,000'] 
[u'gom winter fl', u'', u'', u'2,900'] 
[u'gb haddock east', u'', u'', u'836'] 
[u'gb haddock west', u'', u'', u'2,118'] 
[u'gom haddock', u'', u'', u'18,000'] 
[u'white hake', u'', u'', u'8,842'] 
[u'dab', u'', u'', u'8,650'] 
[u'pollock', u'', u'', u'78,000'] 
[u'redfish', u'', u'', u'35,923'] 
[u'witch fl', u'', u'', u'3,250'] 
[u'cc/gom yellowtail', u'', u'', u'2,250'] 

残念ながら

[u'Subject: FW: NEFS 11 fish available'] 
[u'From: Claire Fitz-Gerald'] 
[u'Date: 6/2/2016 5:55 PM'] 

各反復において、いくつかの冗長性、非相関繰り返しがあります回避することができる必要ならば編集する。

+0

IDKの男の前に、それも私のために働いて近くに来ていない。べきではありません私はHTMLのものを吸って、私はBeautifulSoupにあまり慣れていない。私はあなたのプリントアウトが正確に私がそれを望むだろうと思われるので、あなたのコードが明日働くように努力し続けます。最終的にデータをDataFrameに変換して、そのパッケージ価格をキャプチャしてそれに対応するものに追加したいのです – theprowler

+0

本当ですか?うーん、すみません。あなたのコードでは、 '持っているのhtml = body.decode()'、それはあなたがその 'html'変数で私のコードで' html_doc'を交換することができるはずの文字列だからです。私は基本的にあなたが貼り付けた 'html'ファイルを文字列にコピーし、それにコードを走らせました。 – davedwards

+0

OK OKので、それは私のコードをめちゃくちゃにされたhtml_doc'片は、いつものようにそれは私の一部の愚かな間違いだった 'ということでした。あなたのコードは完全に機能しました。だから、基本的にあなたが行に 'もし「パッケージ」のような簡単なコマンドを実行できませんHTML文書に:'、あなたがより多くのコマンドを使用し、その後、使用のようなもの 'BeautifulSoup'を行うと' p'タグと 'MsoNormal'タグを見つけなければなりません'package 'in line.text:'に似ていますが、私はこの権利を理解していますか? – theprowler