ワードの関連付けをカウントする

私は新しいJavaです。私は文の中でお互いの言葉の関連性を数える必要があります。例えば、「犬は猫であり、犬は猫である」という文については、最終的な会合数はとなる。最初の行は、犬 - 犬（0）、犬 - （2）、犬 - ）Dog-and（1）、Dog-Cat（2）ワードの関連付けをカウントする

などです。

アソシエーションマトリックスを開発しています。どのようにそれを開発することができますか？

出典

2010-12-19 Rushdi Shams

興味深い！これを使うのは何ですか？また、「dog-is」のカウント2はなぜですか？このプロセスが助けてくれるかどうかを見てください：http://it.toolbox.com/blogs/enterprise-solutions/building-an-association-matrix-15499 –

@Pangea：まあ、 "Dog"は2 "is"つまり、Dog-pairは2という値を得ます。テーブルを使って行列を作るのは簡単ですが、実装時には失われます。 –

私は申し訳ありませんが、私は "犬は" 1回だけ右に発生して参照してください。「犬は犬で猫は猫」 –

文を別々の単語に分割します。
ペアを生成します。
同じペアをマージします。

それは同じくらい簡単です：

String[] words = sentence.split("\\s"); //first step 
List<List<String>> pairs = 
    new ArrayList<List<String>>((int)(((words.length)/2.0) * (words.length - 1))); 
for (int i = 0; i < words.length - 1; i++) { 
    for (int j = i + 1; j < words.length; j++) { 
     List<String> pair = Arrays.asList(words[i], words[j]); 
     Collections.sort(pair); 
     pairs.add(pair); 
    } 
} //second step 
Map<List<String>, Integer> pair2count = new LinkedHashMap<List<String>, Integer>(); 
for (List<String> pair : pairs) { 
    if (pair2count.containsKey(pair)) { 
     pair2count.put(pair, pair2count.get(pair) + 1); 
    } else { 
     pair2count.put(pair, 1); 
    } 
} //third step 

//output 
System.out.println(pair2count);

出典

2010-12-19 00:31:53 Roman

おかげで、ローマ。文から単語を分けることができます。

String sentence=null; 
    String target="Dog is a Dog and Cat is a Cat"; 
    int index = 0; 
    Locale currentLocale = new Locale ("en","US"); 
    BreakIterator wordIterator = BreakIterator.getWordInstance(currentLocale); 
    //Creating the sentence iterator 
    BreakIterator bi = BreakIterator.getSentenceInstance(); 
    bi.setText(target); 

    while (bi.next() != BreakIterator.DONE) { 

     sentence = target.substring(index, bi.current()); 
     System.out.println(sentence); 
     wordIterator.setText(sentence); 
     int start = wordIterator.first(); 
     int end = wordIterator.next(); 

     while (end!=BreakIterator.DONE){ 

      String word = sentence.substring(start,end); 
      if (Character.isLetterOrDigit(word.charAt(0))) { 

       System.out.println(word); 

      }//if (Character.isLetterOrDigit(word.charAt(0))) 

      start = end; 
      end = wordIterator.next(); 
     }//while (end!=BreakIterator.DONE) 
     index = bi.current(); 
    } // while (bi.next() != BreakIterator.DONE)

しかし、他の2点は得られませんでした。ありがとう。

出典

2010-12-19 00:43:26

+1はBreakIteratorを使用しています –

これは過労、IMHOです。 'target.split（" \\ s "）'で十分であり、この複雑なコードをすべて置き換えることができます。 – Roman

BreakIteratorの+1です。 – orangepips

ワードの関連付けをカウントする

答えて

関連する問題