XML/HTML文字列の長さを制限する

XMLファイルを解析し、記事の最初の150語をREAD MOREリンクで表示しようとしています。しかし、150語を正確に解析するわけではありません。私はまた、IMGタグコードなどを解析しないようにそれを作る方法がわからないです...コードはここXML/HTML文字列の長さを制限する

// Script displays 3 most recent blog posts from blog.pinchit.com (blog..pinchit.com/api/read) 
    // The entries on homepage show the first 150 words of description and "READ MORE" link 

    // PART 1 - PARSING 

    // if it was a JSON file 
    // $string=file_get_contents("http://blog.pinchit.com/api/read"); 
    // $json_a=json_decode($string,true); 
    // var_export($json_a); 


    // XML Parsing 
    $file = "http://blog.pinchit.com/api/read"; 
    $posts_to_display = 3; 
    $posts = array(); 

    // get all the file nodes 
    if(!$xml=simplexml_load_file($file)){ 
     trigger_error('Error reading XML file',E_USER_ERROR); 
    } 

    // counter for posts member array 
    $counter = 0; 

    // Accessing elements within an XML document that contain characters not permitted under PHP's naming convention 
    // (e.g. the hyphen) can be accomplished by encapsulating the element name within braces and the apostrophe. 

    foreach($xml->posts->post as $post){ 

     //post's title 
     $posts[$counter]['title'] = $post->{'regular-title'}; 

     // post's full body 
     $posts[$counter]['body'] = $post->{'regular-body'}; 

     // post's body's first 150 words 
     //for some reason, I am not sure if it's exactly 150 
     $posts[$counter]['preview'] = substr($posts[$counter]['body'], 0, 150); 

     //strip all the html tags so it doesn't mess up the page 
     $posts[$counter]['preview'] = strip_tags($posts[$counter]['preview']); 


     //post's id 
     $posts[$counter]['id'] = $post->attributes()->id; 


     $posts_to_display--; 
     $counter++; 
     //exit the for loop after we parse out all the articles that we want 
     if ($posts_to_display == 0) break; 
    } 

    // Displays all of the posts 

    foreach($posts as $post){ 

     echo "<b>" . $post['title'] . "</b>"; 
     echo "<br/>"; 
     echo $post['preview']; 
     echo " <a href='http://blog.pinchit.com/post/" . $post[id] . "'>Read More</a>"; 
     echo "<br/><br/>"; 

    }

下回った結果が今どのように見えるかです。

ダウンロード：クラブSPORTIVA 何もあなたがなめらかな、洗練された、セクシーなスポーツカーのホイールの後ろの日のように完全に無料と制御に感じさせるん。ホテルユタサロンホテルユタ続きを読む

月曜日メニュー：それは続きを読む

Pinchyドリンク&ロックス全く驚きではありませんスパイシーグレープフルーツ、パプリカ、Creamsicles は今日夏と香ばしい感じ、そして我々はそれが取っ認めざるを得ませんこれをすべての前菜、すべてのデザート、またはすべての飲み物にする衝動に抵抗することはたくさんありません続きを読む

出典

2011-08-01 CodeCrack

HTMLタグはあなたのキャラクターの合計に対してカウントされます。その後、あなたのプレビューサンプルを取り、最初のタグを取り除く：

$preview = strip_tags($posts[$counter]['body']); 
$posts[$counter]['preview'] = substr($preview, 0, 150).'...';

また、1は通常、それが続くことを示すために切り詰められたテキストの末尾に（「...」）楕円を追加します。

これには、<p>と<br>のようなタグを削除する可能性があることに注意してください。あなたがそれらを保存したい場合は、strip_tagsのための2番目の引数として渡すことができます。

$preview = strip_tags($posts[$counter]['body'], '<br><p>'); 
$posts[$counter]['preview'] = substr($preview, 0, 150).'...';

しかし、XML形式のタグが（<br />を）これをオフに投げる可能性があることをあらかじめご了承ください。 XML/HTMLが混在している場合は、htmLawedなどを使ってタグのフィルタリングを上げる必要がありますが、コンセプトは変わりません。切り捨てる前にHTMLを取り除いてください。

出典

2011-08-01 20:20:07

ahはい.. bodyタグのタグを削除するのを完全に忘れました。ありがとう！ – CodeCrack

タグ<regular-body>を見ると、HTMLが含まれているようです。したがって、DOMDocument（http://www.php.net/manual/en/domdocument.loadhtml.php）に解析することをお勧めします。その後、すべての項目をループして特定のタグを無視することができます（例：<img>は無視しますが、<p>は無視してください）。その後、必要なものをレンダリングし、150文字に切り捨てることができます。

出典

2011-08-01 20:20:14 afuzzyllama

XML/HTML文字列の長さを制限する

答えて

関連する問題