正規表現による文書フィルタリング

私は入力文書を検証するための最良の解決法を見つけようとしています。私は文書のすべての行をチェックする必要があります。基本的に各行に無効な文字または文字が存在する可能性があります。検索（妥当性検査）の結果は、「無効な文字で行のインデックスを取得し、この行の無効な文字のインデックスを取得します。正規表現による文書フィルタリング

私は標準的な方法（ファイルを開く - >すべての行を読む - >文字を1つずつチェックする）を行う方法を知っていますが、この方法は最適化された方法ではありません。この代わりに、最良の解決策は "MatchCollection"（私の意見では）を使うことです。

しかし、これをC＃で正しく行うにはどうすればいいですか？

リンク：

http://www.dotnetperls.com/regex-matches

例： "ここではいくつかの入力テキスト、\ nは、このテキストの別の行"

最初の行[0]が[6]インデックスの無効な文字を見つけました。[1] [0,12,21]インデックスに無効な文字が見つかりました。

using System; 
using System.Text.RegularExpressions; 

namespace RegularExpresion 
{ 
    class Program 
    { 
     private static Regex regex = null; 

     static void Main(string[] args) 
     { 
      string input_text = "Some Înput text here, Îs another lÎne of thÎs text."; 

      string line_pattern = "\n"; 

      string invalid_character = "Î"; 

      regex = new Regex(line_pattern); 

      /// Check is multiple or single line document 
      if (IsMultipleLine(input_text)) 
      { 
       /// ---> How to do this correctly for each line ? <--- 
      } 
      else 
      { 
       Console.WriteLine("Is a single line file"); 

       regex = new Regex(invalid_character); 

       MatchCollection mc = regex.Matches(input_text); 

       Console.WriteLine($"How many matches: {mc.Count}"); 

       foreach (Match match in mc) 
        Console.WriteLine($"Index: {match.Index}"); 
      } 

      Console.ReadKey(); 
     } 

     public static bool IsMultipleLine(string input) => regex.IsMatch(input); 
    } 
}

出力：

は1行ファイルです
どのように多くの試合：4
インデックス：5
インデックス：22
指数：34
インデックス：43

出典

2016-09-04 Nerus

*「無効な文字」*とは何ですか？標準的な方法*が速く、コードを投稿するかもしれません。 –

私はasciiでないleletterにマッチしたいと思う。 'Regex.Matches（s、@" [\ p {L} - [a-zA-Z]] "）'を試してください。ただし、これには行インデックス情報は含まれません。 –

コードのように、私はMatchCollectionを使用して複数行の解決策を見つけることができません。 – Nerus

リンク：私、その後賢く誰かが、用語でこのコードをチェックすると、ヘルプのための http://www.dotnetperls.com/regexoptions-multiline

SOLUTION

using System; 
using System.Text.RegularExpressions; 

namespace RegularExpresion 
{ 
    class Program 
    { 
     private static Regex regex = null; 

     static void Main(string[] args) 
     { 
      string input_text = @"Some Înput text here, 
Îs another lÎne of thÎs text."; 

      string line_pattern = "\n"; 

      string invalid_character = "Î"; 

      regex = new Regex(line_pattern); 

      /// Check is multiple or single line document 
      if (IsMultipleLine(input_text)) 
      { 
       Console.WriteLine("Is a multiple line file"); 

       MatchCollection matches = Regex.Matches(input_text, "^(.+)$", RegexOptions.Multiline); 

       int line = 0; 

       foreach (Match match in matches) 
       { 
        foreach (Capture capture in match.Captures) 
        { 
         line++; 

         Console.WriteLine($"Line: {line}"); 

         RegexpLine(capture.Value, invalid_character); 
        } 
       } 
      } 
      else 
      { 
       Console.WriteLine("Is a single line file"); 

       RegexpLine(input_text, invalid_character); 
      } 

      Pause(); 
     } 

     public static bool IsMultipleLine(string input) => regex.IsMatch(input); 

     public static void RegexpLine(string line, string characters) 
     { 
      regex = new Regex(characters); 

      MatchCollection mc = regex.Matches(line); 

      Console.WriteLine($"How many matches: {mc.Count}"); 

      foreach (Match match in mc) 
       Console.WriteLine($"Index: {match.Index}"); 
     } 

     public static ConsoleKeyInfo Pause(string message = "please press ANY key to continue...") 
     { 
      Console.WriteLine(message); 

      return Console.ReadKey(); 
     } 
    } 
}

たTHXみんな、基本的にはいいことだろうパフォーマンスの

よろしく、ネラス。

出典

2016-09-04 11:38:24 Nerus

私のアプローチは、文字列の配列に分割され、それぞれに行が含まれます。配列の長さがちょうど1の場合は、1行しかないことを意味します。そこから、Regexを使って各行にマッチさせて、探している無効な文字を探します。

string input_text = "Some Înput text here,\nÎs another lÎne of thÎs text."; 
string line_pattern = "\n"; 

// split the string into string arrays 
string[] input_texts = input_text.Split(new string[] { line_pattern }, StringSplitOptions.RemoveEmptyEntries); 

string invalid_character = "Î"; 

if (input_texts != null && input_texts.Length > 0) 
{ 
    if (input_texts.Length == 1) 
    { 
     Console.WriteLine("Is a single line file"); 
    } 

    // loop every line 
    foreach (string oneline in input_texts) 
    { 
     Regex regex = new Regex(invalid_character); 

     MatchCollection mc = regex.Matches(oneline); 

     Console.WriteLine("How many matches: {0}", mc.Count); 

     foreach (Match match in mc) 
     { 
      Console.WriteLine("Index: {0}", match.Index); 
     } 
    } 
}

--- EDIT ---検討する

もの：

あなたはファイルからの入力を取得する場合、私はラインではなく、全体でラインを読むためにあなたをお勧めしますテキスト。
通常、無効な文字を検索するときは、指定しません。代わりにパターンを探します。例：a-z、A-Z、0-9の文字ではありません。あなたの正規表現は少し違うでしょう。

出典

2016-09-04 12:55:11 kurakura88

正規表現による文書フィルタリング

答えて

関連する問題