停止単語の削除にApache luceneを使用しているときの例外

入力テキストからの停止単語の削除には、次のコードを使用しています。私はtokenStream.incrementToken()が実行されているときに例外になります。停止単語の削除にApache luceneを使用しているときの例外

java.lang.IllegalStateException: TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.

コード：

public static String removeStopWords(String textFile) throws Exception { 
     CharArraySet stopWords = EnglishAnalyzer.getDefaultStopSet(); 
     TokenStream tokenStream = new StandardTokenizer(); 
     tokenStream = new StopFilter(tokenStream, stopWords); 
     StringBuilder sb = new StringBuilder(); 
     CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class); 
     tokenStream.reset(); 
     while (tokenStream.incrementToken()) { 
      String term = charTermAttribute.toString(); 
      sb.append(term + " "); 
     } 
     return sb.toString(); 
    }

出典

2017-08-20 Rizstien

は、以下のようにあなたのTokenStreamをインスタンス化 -

TokenStream tokenStream = new StandardAnalyzer().tokenStream("field",new StringReader(textFile));

出典

2017-08-21 21:05:57 darcula

このコードでは、 "フィールド" とは何ですか？ – Rizstien

"field"は、作成されたTokenStreamが使用されているフィールド（IndexableField）の名前です。 tokenStreamがフィールドに固有でない場合は、代わりにnullを渡すことができます。また、あなたの入力がStringなので、 - tokenStream（null、textFile）;を使うことができます。 – darcula

停止単語の削除にApache luceneを使用しているときの例外

答えて

関連する問題