文字列を検索し、元の文字列を見つけたコンテンツで分割します

コンテンツを見つける良い方法があるかどうか、また見つかったコンテンツで結果を分割するかどうかはわかります。例えば、文字列がある場合：文字列を検索し、元の文字列を見つけたコンテンツで分割します

string str = "you androids don't exactly cover for each other in times of stress. 
i think you're right it would seem we lack a specific talent you humans possess 
i believe it's called empathy";

と検索文字列、例えば：元の文字列の吹き返しによって分離見つかった文字列と

 
var sList = new List {"for each other", "talent", "you humans"};

結果は次のようになります

同じ文字列の場合

は、2つの異なる検索文字列（ここではそれあなた）である：

 
you 
androids don't exactly cover 
for each other 
other in 
times of stress. i think you're right it would seem we lack a specific 
talent 
you 
you humans 
possess i believe it's called empathy

出典

2017-05-26 Anonymous

ないあなたがコンテンツを複製するが、最初のものは使用して可能なはずどこが第二のやるかどうかはわかり ' Regex.Split'。 – juharr

あなたが一致する正規表現を使用することができます。

 
var sList = new List {"for each other", "other in", "talent", "you humans", "you"};

正しい出力は、このする必要があります文字列内の文字列のセットを指定し、その間のギャップを考慮して、重複する一致範囲を調整する必要があります。

using System; 
using System.Collections.Generic; 
using System.Text.RegularExpressions; 
using System.Linq; 

public class Program 
{ 
    public static void Main() 
    { 
     string str = "you androids don't exactly cover for each other in times of stress. i think youre right it would seem we lack a specific talent you humans possess i believe it's called empathy"; 
     var sList = new List<string> {"for each other", "other in", "talent", "you humans", "you"}; 
     var chRangeMap = new bool[str.Length]; 
     for (var i = 0; i < chRangeMap.Length; ++i) chRangeMap[i] = false; 

     var matchedTokenMap = sList 
      .Select(i => "\\b" + Regex.Escape(i) + "\\b") 
      .SelectMany(p => (new Regex(p)).Matches(str).OfType<Match>()) 
      .Cast<Match>() 
      .Select(m => new 
        { 
         StartIndex = m.Index, 
         EndIndex = m.Index + m.Length, 
         Length = m.Length 
        }) 
      .Select(r => { 
       for (var i = r.StartIndex; i < r.EndIndex; ++i) chRangeMap[i] = true; 
       return r; 
       }); 

     var fullTokenized = 
      matchedTokenMap.Concat(
       GetArrayRanges(chRangeMap, false) 
        .Select(r => new 
          { 
           StartIndex = r.Item1, 
           EndIndex = r.Item2, 
           Length = r.Item2 - r.Item1 
          }) 
      ) 
      .OrderBy(k => k.StartIndex).ThenBy(sk => sk.Length); 

     foreach(var token in fullTokenized) 
     { 
      WriteTrimmed(str.Substring(token.StartIndex, token.Length)); 
     } 
    } 

    private static void WriteTrimmed(string str) 
    { 
     str = str.Trim(); 
     if (!string.IsNullOrWhiteSpace(str)) 
     { 
      Console.WriteLine(str); 
     } 
    } 

    private static IEnumerable<Tuple<int, int>> GetArrayRanges(bool[] array, bool seekValue) 
    { 
     int? rangeStart = null; 

     for(var i = 0; i < array.Length; ++i) 
     { 
      if (array[i] == seekValue) 
      { 
       if (!rangeStart.HasValue) 
       { 
        rangeStart = i; 
       } 
      } 
      else 
      { 
       if (rangeStart.HasValue) 
       { 
        yield return Tuple.Create(rangeStart.Value, i); 
        rangeStart = null; 
       } 
      } 
     } 

     if (rangeStart.HasValue) 
     { 
      yield return Tuple.Create(rangeStart.Value, array.Length); 
     } 
    } 
}

DotNETFiddle of the code。

出典

2017-05-26 16:45:40 LB2

潜在的なすべての特殊文字をエスケープするために 'Regex.Escape'を使用する方が良いです。 – juharr

@juharrあなたは間違いなしです。私はそれを更新しました。提案していただきありがとうございます。 – LB2

LB2こんにちは、これは別の検索文字列で同じ値の場合の結果を除いて、便利な解決法です –

これを試してみてください：

List<string> parts = new List<string> { str }; 
sList.ForEach(seperator => parts = parts 
    .SelectMany(part => Regex.Match(part, "(.*) ?(\\b" + seperator + "\\b) ?(.*)|(.+)") 
     .Groups 
     .Cast<Group>() 
     .Where(group => group.Success) 
     .Select(group => group.Value) 
     .Skip(1)) 
    .ToList()); 

parts = parts 
    .Where(x => !string.IsNullOrWhiteSpace(x)) 
    .ToList();

出力：

you 
androids don't exactly cover 
for each other 
in times of stress. i think youre right it would seem we lack a specific 
talent 
you 
humans 
possess i believe it's called empathy

Dotnet Fiddle Demo

出典

2017-05-26 17:07:35 degant

出力が期待される出力と一致しません。最後の2番目の入力に余分な "you"があります。：p –

@RufusL実際には正しいのですが、 'you人間 'は' sList'を使って分割するリストの一部であるため、 'you' +'人間 'に実際に分解されるからです。 OPは同じことを確認する必要があります – degant

私は知っている、それはあなたのコードではなく、期待される出力のバグです。したがって、私のコメントの最後にスマイルな顔... –

文字列を検索し、元の文字列を見つけたコンテンツで分割します

答えて

関連する問題