どのようにして、Javaの文書内の単語の頻度を計算できますか？

私はJavaの初心者です。配列として複数の文書の単語数を計算し、新しいファイルの出力を特定のアカウントの名前で作成するプログラムがあります。次の関数を使用しました。、私は別の簡単なものと交換することはできますか？どのようにして、Javaの文書内の単語の頻度を計算できますか？

public static void main(String[] args) { 
    String fn = "C:\\Users\\Angel\\Desktop\\myproject\\Preprocessing/"; 
    File ff = new File(fn); 
    ff.mkdir(); 

    int flage; 

    String dir = "C:\\Users\\Angel\\Desktop\\myproject  \\ConvertingToText"; //read 
    String s = ""; 
    File folder = new File(dir); 
    String se = ""; 
    File fs[] = folder.listFiles(); 
    /*for(File f:fs) // print files name 
     {System.out.println(f.getName());}*/ 
    for (File f: fs) { 
     String fn1 = fn + f.getName() + "/"; 
     File ff1 = new File(fn1); 
     ff1.mkdir(); 
     System.out.println(f.getName()); 
     System.out.println(f.getAbsolutePath()); // 
     File folder2 = new File(f.getAbsolutePath()); 
     File[] f3 = folder2.listFiles(); 
     for (File fi: f3) { 
      s = readTextFile(fi.getAbsolutePath()); 
      String fn4 = fn1 + fi.getName() + "/"; 
      s = s.toLowerCase(); 

      String[] keys = s.split(" "); 
      String[] uniquewords; 
      int count = 0; 
      //System.out.println(s); 
      uniquewords = getUniquewords(keys); 

      for (String key: uniquewords) { 
       if (null == key) { 
        break; 
       } 
       for (String sr: keys) { 
        if (key.equals(sr)) { 
         count++; 
        } 
       } 
       System.out.println("[" + key + "]" + count); 
       count = 0; 
      } 
     } 
    } 
} 
private static String[] getUniquewords(String[] keys) { 
    String[] uniquewords = new String[keys.length]; 

    uniquewords[0] = keys[0]; 
    int uniquewordIndex = 1; 
    boolean keyAlreadyExists = false; 

    for (int i = 1; i < keys.length; i++) { 
     for (int j = 0; j <= uniquewordIndex; j++) { 
      if (keys[i].equals(uniquewords[j])) { 
       keyAlreadyExists = true; 
      } 
     } 

     if (!keyAlreadyExists) { 
      uniquewords[uniquewordIndex] = keys[i]; 
      uniquewordIndex++; 
     } 
     keyAlreadyExists = false; 
    } 
    return uniquewords; 
}

出典

2017-01-24 nani je

私がする必要がドキュメント内の単語の数を計算します。たとえば、 "hello"という単語の場合、document1、document2などにどれだけ表示されているかを知り、その結果をファイルに入れます。 –

マップを使用してユニークワードの数を格納することをお勧めします。こうすることで、テキストを1回だけ繰り返し、後で印刷できる結果を構築できます。

単純な実装は以下の通りです：

public Map<String, Integer> wordFrequencyIn(String text) { 
    String[] words = text.trim().toLowerCase().split("\\s+"); // NOTE: splits on any whitespace character, not just " " 
    HashMap<String, Integer> result = new HashMap<>(); 

    for (String word : words) { 
     int count = result.getOrDefault(word, 0); 
     result.put(word, count + 1); 
    } 

    return result; 
}

地図を印刷するには、エントリを反復処理することができます。

for (Map.Entry<String, Integer> entry : result.entrySet()) { 
    System.out.println("[" + entry.getKey() + "]" + entry.getValue()); 
}

あなたがマップの使用に慣れていない場合、私はOracleのチュートリアルとマニュアルを見てみお勧め：このよう

出典

2017-01-26 17:41:27 phss

ありがとう、とても役に立ちます –

@nanijeあなたはそれが大丈夫なら、あなたは答えを受け入れたものにしたいと思うかもしれません。 – phss

ああ私は次回にそれをやろうよ@phss –

どのようにして、Javaの文書内の単語の頻度を計算できますか？

答えて

関連する問題