PHPのregexを使って次のチェーンメールから本文部分を抽出したいと思います。 チェーンメールはtxt形式で保存されます.HTMLタグを抽出する間、bodyタグに存在する場合は変更しないでください。ボディ部分を含むすべての電子メールヘッダをPHPのメールから抽出します
$content = <<<HEREDOC
From: Matrimony <[email protected]>
Sent: Fri, 12 Aug 2011 16:17:40
To: "[email protected]" <[email protected]>
Subject: Re: bride search
From: brides <[email protected]>
Sent: Fri, 12 Aug 2011 15:49:52
To: "Matrimony " <[email protected]>
Cc: "groom" <[email protected]>
Subject: Re: bride search
PFA
Regds.,
sales
From: shaadi <[email protected]>
Sent: Tue, 22 Feb 2011 16:40:24
To: <[email protected]>, <[email protected]>
Cc: "'lagna '" <[email protected]>, <[email protected]>, <[email protected]>, "'beta data'" <[email protected]>, "'test S'" <[email protected]>
Subject: Re:data transfer would be made live for 145 test
This is to inform you that we are going to test today.
Activity Timing: 9:00 PM onwards
Thanks and Regards,
free matrimony
shaadi Operations
P Please do not print this e-mail unless it is absolutely necessary
From: shaadi [nikaah:[email protected]]
Sent: 21 February 2011 23:09
To: [email protected]; [email protected]
Cc: 'lagna '; [email protected]; [email protected];
Subject: data transfer would be made live for 145 test
Hi,
gtsdhsdbh
anbdsmbsa
sda the data test .
Would request you to send in your feedback.
Thanks and Regards,
beta data
assa xyz
P Please do not print this e-mail unless it is absolutely necessary
HEREDOC;
O/P
Array
(
[0] => Array
(
[0] => Re: bride search
[1] => Re: bride search
PFA
Regds.,
sales
[2] => Re:data transfer would be made live for 145 test
This is to inform you that we are going to test today.
Activity Timing: 9:00 PM onwards
Thanks and Regards,
free matrimony
shaadi Operations
P Please do not print this e-mail unless it is absolutely necessary
)
[1] => Array
(
[0] => Re: bride search
[1] => Re: bride search
PFA
Regds.,
sales
[2] => Re:data transfer would be made live for 145 test
This is to inform you that we are going to test today.
Activity Timing: 9:00 PM onwards
Thanks and Regards,
free matrimony
shaadi Operations
P Please do not print this e-mail unless it is absolutely necessary
)
)
私はO/P
preg_match_all('/(?<=Subject:)(.*?[\n][\s]*?)(?=From:)/is',$content,$rest);
の上に取得するために使用されるが、それはから「持っていないとして、それは最後の1を与えていませんどのような正規表現中央のデータを取得します。 そのクリアを望みます。 その他の方法がある場合はお知らせください。
preg_match_all('/(?m:^From:\x20(?<From>[^\n]*)\n^Sent:\x20(?<Sent>[^\n]*)\n^To:\x20(?<To>[^\n]*)\n(?:^Cc:\x20(?<Cc>[^\n]*)\n)?^Subject:\x20(?<Subject>[^\n]*)\n)(?<Body>.*?(?=(?:\nFrom:)|$))/s',$content,$matches);
echo "<pre>".print_r($matches,true);
その私はあなたがこれを理解するために、いくつかの非常に賢く解析を必要としているhttp://www.mangalsutrabandhan.com
regexが最適な選択肢であるかどうかわかりません。 to/from/subjectデータの「クラスタ」に基づいてドキュメントを分割する方がよいでしょう。そこから、その間の何かがコンテンツと見なされるべきです。 –
希望する出力を明確にするために質問を編集しますか? – paulmelnikow