2016-05-30 8 views
0

文章を文章に分解してから 'exploded'文字列に分解したいが、句読点を配列の要素として保持する必要がある。PHP preg_split文またはpreg_match文が配列内に句読点を保持する

例のテキスト:

$meta = 'I am looking to break this paragraph into chunks. 
     I have researched, tried and tested various combinations; however, I cannot 
     seem to make it work. Would anyone help me figure this out? 
     I thank you in advance...' 

所望の出力は次のようになります。

Array ([0] => 
      Array ([0] => I [1] => am [2] => looking [3] => to [4] => break [5] => [6] => this [7] => paragraph [8] => into [9] => chunks [10] => .) 
     [1] =>  
      Array ([0] => I [2] => have [3] => researched [4] => , [5] => tried [...... 
      ......] [5] => figure [6] => this [7] => out [8] => ?) 
     [3] => 
      Array ([0] => I [1] => thank [2] => you [3] => in [4] => advance [5] => ...) 
    ) 

私が使用して試してみました:

$s = preg_split('/\s*[!?.]\s*/u', $meta, -1, PREG_SPLIT_NO_EMPTY); 

を文章を分離するが、この作品ながら、句読点が消えます。

私はあなたがするpreg_matchを使ってやりたいことができ句読点

答えて

1

と、この2つのレベルのアレイを構築を支援本当に感謝:

$meta = 'I am looking to break this paragraph into chunks. 
     I have researched, tried and tested various combinations; however, I cannot 
     seem to make it work. Would anyone help me figure this out? 
     I thank you in advance...'; 

preg_match_all('/(\w+|[.;?,]+)/', $meta, $m); 
print_r($m); 

説明:

/   : regex delimiter 
    (  : begin group 1 
    \w+  : 1 or more aphanumeric character <=> [a-zA-Z0-9_] 
    |  : OR 
    [.;?,]+ : 1 or more punctuation 
)   : end of group 1 
/   : regex delimiter 

このグループ1のevry wordに句読点文字のすべてのグループを一致させて格納します。あなたは互換性のあるユニコードになりたい場合は

、あなたは句読点のために任意の文字のための\p{L}\p{P}を使用することができます。

/(\p{L}+|\p{P}+)/ 

出力:

Array 
(
    [0] => Array 
     (
      [0] => I 
      [1] => am 
      [2] => looking 
      [3] => to 
      [4] => break 
      [5] => this 
      [6] => paragraph 
      [7] => into 
      [8] => chunks 
      [9] => . 
      [10] => I 
      [11] => have 
      [12] => researched 
      [13] => , 
      [14] => tried 
      [15] => and 
      [16] => tested 
      [17] => various 
      [18] => combinations 
      [19] => ; 
      [20] => however 
      [21] => , 
      [22] => I 
      [23] => cannot 
      [24] => seem 
      [25] => to 
      [26] => make 
      [27] => it 
      [28] => work 
      [29] => . 
      [30] => Would 
      [31] => anyone 
      [32] => help 
      [33] => me 
      [34] => figure 
      [35] => this 
      [36] => out 
      [37] => ? 
      [38] => I 
      [39] => thank 
      [40] => you 
      [41] => in 
      [42] => advance 
      [43] => ... 
     ) 

    [1] => Array 
     (
      [0] => I 
      [1] => am 
      [2] => looking 
      [3] => to 
      [4] => break 
      [5] => this 
      [6] => paragraph 
      [7] => into 
      [8] => chunks 
      [9] => . 
      [10] => I 
      [11] => have 
      [12] => researched 
      [13] => , 
      [14] => tried 
      [15] => and 
      [16] => tested 
      [17] => various 
      [18] => combinations 
      [19] => ; 
      [20] => however 
      [21] => , 
      [22] => I 
      [23] => cannot 
      [24] => seem 
      [25] => to 
      [26] => make 
      [27] => it 
      [28] => work 
      [29] => . 
      [30] => Would 
      [31] => anyone 
      [32] => help 
      [33] => me 
      [34] => figure 
      [35] => this 
      [36] => out 
      [37] => ? 
      [38] => I 
      [39] => thank 
      [40] => you 
      [41] => in 
      [42] => advance 
      [43] => ... 
     ) 

) 
+0

おかげ@Toto、働いていました。あなたが理解して学ぶのを助けるために書かれたものを説明するチャンスがあれば、私はそれを感謝します – Jacob

+1

@Jacob:私の編集を見てください。 – Toto

+0

説明に時間を割いてくれてありがとう! – Jacob

関連する問題