スタンフォードNLPの通貨を正常に機能しない

私はStanford CoreNLPを抽出に使用しています。以下は、通貨記号パーセント€5億0.875のスタンフォードNLPの通貨を正常に機能しない

2015年3月5日ケリングの問題と一緒に通貨を抽出しようとしています、そこからの文章がある

私は抽出するために必要なデータがある€5億0.875

パーセント** $ **5億0.875

の

2015年3月5日ケリングの問題として、デフォルトではNLPの寄付文

は、だから私は、今の文が

3月5日セント

あたり€5億0.875の2015ケリング問題しかし、私は

props.put("annotators", "tokenize, cleanxml, ssplit, pos, lemma, ner, regexner"); 
props.setProperty("ner.useSUTime", "0"); 
_pipeline = new StanfordCoreNLP(props); 
Annotation document = new Annotation(text); 
_pipeline.annotate(document);

として適切に来ている

public static readonly TokenizerFactory TokenizerFactory = PTBTokenizer.factory(new CoreLabelTokenFactory(), 
      "normalizeCurrency=false"); 
DocumentPreprocessor docPre = new DocumentPreprocessor(new java.io.StringReader(textChunk)); 
docPre.setTokenizerFactory(TokenizerFactory);

を書きましたテキスト= 2015年3月5日Kering発行€500,000,000 0.875％

誰も私を助けてくださいすることができ

は、だから私はラインprops.put("tokenize.options", "normalizeCurrency=false"); を追加しました。しかし、まだ出力が$ 5.000000000875E9

と同じである

<token id="9"> 
    <word>$</word> 
    <lemma></lemma> 
    <CharacterOffsetBegin>48</CharacterOffsetBegin> 
    <CharacterOffsetEnd>49</CharacterOffsetEnd> 
    <POS>CD</POS> 
    <NER>MONEY</NER> 
    <NormalizedNER>$5.000000000875E9</NormalizedNER> 
</token>

として出力を取得しています。あなたに私はこのコードを実行したとき、それは「$」に通貨記号を変更しなかった

出典

2017-03-23 Madhu

ありがとう：

package edu.stanford.nlp.examples; 

import edu.stanford.nlp.ling.*; 
import edu.stanford.nlp.pipeline.*; 

import java.util.*; 

public class TokenizeOptionsExample { 

    public static void main(String[] args) { 
    Annotation document = new Annotation("5 March 2015 Kering Issue of €500,000,000 0.875 per cent"); 
    Properties props = new Properties(); 
    props.setProperty("annotators", "tokenize,ssplit"); 
    props.setProperty("tokenize.options", "normalizeCurrency=false"); 
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props); 
    pipeline.annotate(document); 
    for (CoreLabel token : document.get(CoreAnnotations.TokensAnnotation.class)) { 
     System.out.println(token); 
    } 
    } 
}

出典

2017-03-27 06:48:03 StanfordNLPHelp

はあなたの助けをいただき、ありがとうございます。私はprops.setProperty（ "tokenize.options"、 "normalizeCurrency = false"）を追加しました。以前に出力が<トークンID = "9">のような任意の通貨記号なしであるが 48 49 CD MONEY 5.000000000875E9 – Madhu

C＃で（CoreLabel token：document.get（CoreAnnotations.TokensAnnotation.class））を書く方法を教えてください。私はjavaについてよく知らないので。抽出のためにC＃を使用しています。 – Madhu

スタンフォードNLPの通貨を正常に機能しない

答えて

関連する問題