ページからのすべてのURLを取得する方法（PHP）

URLが記載されたページがあります（ブックマーク/サイトのリストのようなもの）。どのように私はそのページからすべてのURLを取得し、txtファイル（1行に1つ、説明なしのURLのみ）に書き込むためにPHPを使用するのですか？ページからのすべてのURLを取得する方法（PHP）

ページは次のようになります。

Some description

Other description

Another one

そして私は、スクリプトのTXT出力は次のようになりたいと思います：

2009-07-15 Phil

片道

$url="http://wwww.somewhere.com"; 
$data=file_get_contents($url); 
$data = strip_tags($data,"<a>"); 
$d = preg_split("/<\/a>/",$data); 
foreach ($d as $k=>$u){ 
    if(strpos($u, "<a href=") !== FALSE){ 
     $u = preg_replace("/.*<a\s+href=\"/sm","",$u); 
     $u = preg_replace("/\".*/","",$u); 
     print $u."\n"; 
    } 
}

出典

2009-07-15 00:34:32 ghostdog74

このようなリンクがある場合：上記のコードはリンクを見つけられません –

別の方法

$url = "http://wwww.somewhere.com"; 

$html = file_get_contents($url); 

$doc = new DOMDocument(); 
$doc->loadHTML($html); //helps if html is well formed and has proper use of html entities! 

$xpath = new DOMXpath($doc); 

$nodes = $xpath->query('//a'); 

foreach($nodes as $node) { 
    var_dump($node->getAttribute('href')); 
}

出典

2013-03-14 16:17:29 user2066719

あなたは、特定のWebページ内のすべてのリンクを取得するためにこれを使用することができます。

<?php 

    $var = fread_url($url); 

    preg_match_all ("/a[\s]+[^>]*?href[\s]?=[\s\"\']+". 
        "(.*?)[\"\']+.*?>"."([^<]+|.*?)?<\/a>/", 
        $var, &$matches); 

    $matches = $matches[1]; 
    $list = array(); 

    foreach($matches as $var) 
    {  
     print($var."<br>"); 
    } 

    function fread_url($url,$ref="") 
    { 
     if(function_exists("curl_init")){ 
      $ch = curl_init(); 
      $user_agent = "Mozilla/4.0 (compatible; MSIE 5.01; ". 
          "Windows NT 5.0)"; 
      $ch = curl_init(); 
      curl_setopt($ch, CURLOPT_USERAGENT, $user_agent); 
      curl_setopt($ch, CURLOPT_HTTPGET, 1); 
      curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
      curl_setopt($ch, CURLOPT_FOLLOWLOCATION , 1); 
      curl_setopt($ch, CURLOPT_FOLLOWLOCATION , 1); 
      curl_setopt($ch, CURLOPT_URL, $url); 
      curl_setopt($ch, CURLOPT_REFERER, $ref); 
      curl_setopt ($ch, CURLOPT_COOKIEJAR, 'cookie.txt'); 
      $html = curl_exec($ch); 
      curl_close($ch); 
     } 
     else{ 
      $hfile = fopen($url,"r"); 
      if($hfile){ 
       while(!feof($hfile)){ 
        $html.=fgets($hfile,1024); 
       } 
      } 
     } 
     return $html; 
    } 

    ?>

出典

2016-04-30 13:47:16

ページからのすべてのURLを取得する方法（PHP）

答えて

関連する問題