preg_match_allを使用してすべて<a href links that are NOT mailto: links

I'm trying to use preg_match_all to scan the source of a page and pull all links that are mailto: links into one array and all links that are not mailto: links into another array. Currently I'm using:preg_match_allを使用してすべて<a href links that are NOT mailto: links

$searches = array('reg'=>'/href(=|=\'|=\")(?!mailto)(.+)\"/i','mailto'=>'/href(=|=\'|=\")(?=mailto)(.+)\"/i'); 
foreach ($searches as $key=>$search) 
{ 
    preg_match_all($search,$source,$found[$key]); 
}

The mailto: links search is working perfectly, but I can't find the reason why the non mailto: link search is pulling both mailto: and non-mailto: links, even with the negative look ahead assertion in place. What am I doing wrong?

出典

2012-02-12 mtylerb

と一致するように '"mailto:'に対して失敗しません[** The pony、彼は来ます**]（http://stackoverflow.com/questions/1732348/regex-match-open-tags -except-xhtml-self-contained-tags/1732454＃1732454） - [X] HTMLの正規表現解析が悪い考えである理由の標準的な参照 – rdlowrey

A saner solution that isn't so fragile would be to use DOMDocument...

$dom = new DOMDocument; 

$dom->loadHTML($html); 

$mailLinks = $nonMailLinks = array(); 

$a = $dom->getElementsByTagName('a'); 

foreach($a as $anchor) { 
    if ($anchor->hasAttribute('href')) { 
     $href = trim($anchor->getAttribute('href')); 
     if (substr($href, 0, 7) == 'mailto:') { 
      $mailLinks[] = $href; 
     } else { 
      $nonMailLinks[] = $href; 
     } 
    } 
}

CodePadをプルします。

出典

2012-02-12 22:54:49 alex

興味深いことに、DOMDocumentクラスを使用したことはありません。私はそのオプションを探求します。ありがとう。 – mtylerb

あなたの正規表現は、ここで最短の代替を探します：

(=|=\'|=\")

あなたはどちらか=最後のことを並べ替え、またはより一般的に使用する必要があります。

=[\'\"]?

代わり/またはその他のために.+?を交換しますより明示的/制限的な[^\'\">]+したがって、負のアサーションは、.+

出典

2012-02-12 22:55:57 mario

preg_match_allを使用してすべて<a href links that are NOT mailto: links

答えて

関連する問題