CoreNLPセンチメントトレーニングのデータが間違った形式になっています

私はcorenlpの独自のセンチメント分析モデルをトレーニングしようとしています。私はJavaコード（コマンドラインからではない）でこれを行いたいので、データを準備するためにhttps://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/sentiment/BuildBinarizedDataset.javaの部分をコピーしてから、実際のトレーニングを行うためにhttps://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/sentiment/SentimentTraining.javaからいくつかの部分をコピーします。CoreNLPセンチメントトレーニングのデータが間違った形式になっています

> Debugging collaped Unary:(ROOT (NP (DT The) (NNS performances)) (@S (VP (VBP are) (ADJP (RB uniformly) (JJ good))) (. .)))

：printlnのは、私のような何かを与える

String text = IOUtils.slurpFileNoExceptions(inputPath); 
    String[] chunks = text.split("\\n\\s*\\n+"); // need blank line to 
    for (String chunk : chunks) { 
     if (chunk.trim().isEmpty()) { 
      continue; 
     } 
     String[] lines = chunk.trim().split("\\n"); 
     String sentence = lines[0]; 
     StringReader sin = new StringReader(sentence); 
     DocumentPreprocessor document = new DocumentPreprocessor(sin); 
     document.setSentenceFinalPuncWords(new String[] { "\n" }); 
     List<HasWord> tokens = document.iterator().next(); 
     Integer mainLabel = new Integer(tokens.get(0).word()); 
     tokens = tokens.subList(1, tokens.size()); 
     Map<Pair<Integer, Integer>, String> spanToLabels = Generics.newHashMap(); 
     for (int i = 1; i < lines.length; ++i) { 
      extractLabels(spanToLabels, tokens, lines[i]); 
     } 
     Tree tree = parser.apply(tokens); 
     Tree binarized = binarizer.transformTree(tree); 
     Tree collapsedUnary = transformer.transformTree(binarized); 
     if (sentimentModel != null) { 
      Trees.convertToCoreLabels(collapsedUnary); 
      SentimentCostAndGradient scorer = new SentimentCostAndGradient(sentimentModel, null); 
      scorer.forwardPropagateTree(collapsedUnary); 
      setPredictedLabels(collapsedUnary); 
     } else { 
      setUnknownLabels(collapsedUnary, mainLabel); 
     } 
     Trees.convertToCoreLabels(collapsedUnary); 
     collapsedUnary.indexSpans(); 
     for (Map.Entry<Pair<Integer, Integer>, String> pairStringEntry : spanToLabels.entrySet()) { 
      setSpanLabel(collapsedUnary, pairStringEntry.getKey(), pairStringEntry.getValue()); 
     } 

     //trainingTrees.add(collapsedUnary); 
     System.out.println("Debugging collaped Unary:" + collapsedUnary); 
    }

：私は次のように、かつてのリンクのコード、行171から226を自分のコード内のビット（何が起こっているかを理解するために）凝縮しました私が理解から、）（ここでは、別の文章をコピーするためのフォーマットについては、申し訳ありませんが）このように見えることになっている、一方で

：

(3 (2 (2 The) (2 Rock)) (4 (3 (2 is) (4 (2 destined) (2 (2 (2 (2 (2

としてはhttps://mailman.stanford.edu/pipermail/java-nlp-user/2013-November/004308.htmlに説明し、stanford corenlp sentiment training set、How to train the Stanford NLP Sentiment Analysis toolなど

BuildBinarizedDatasetでこれらの行の後に何も起こりません。誰かが正しい形式にする方法を教えてもらえますか？私は後で取得エラーすなわち

（一緒に何か自分がここで非常に愚かな感じ、と私は欠けている何かがなければならないがハッキング。）、SentimentTrainingで、次のとおりです。

Exception in thread "main" java.lang.NumberFormatException: For input string: "DT" 
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) 
at java.lang.Integer.parseInt(Integer.java:580) 
at java.lang.Integer.valueOf(Integer.java:766) 
at edu.stanford.nlp.sentiment.SentimentUtils.attachLabels(SentimentUtils.java:37) 
at edu.stanford.nlp.sentiment.SentimentUtils.attachLabels(SentimentUtils.java:33) 
at edu.stanford.nlp.sentiment.SentimentUtils.attachLabels(SentimentUtils.java:33) 
at edu.stanford.nlp.sentiment.SentimentUtils.readTreesWithLabels(SentimentUtils.java:69) 
at edu.stanford.nlp.sentiment.SentimentUtils.readTreesWithGoldLabels(SentimentUtils.java:50) 
at de.dkt.eservices.esentimentanalysis.modules.CoreNLPSentimentAnalyzer.trainModel(CoreNLPSentimentAnalyzer.java:251) 
at de.dkt.eservices.esentimentanalysis.modules.CoreNLPSentimentAnalyzer.main(CoreNLPSentimentAnalyzer.java:306)

理にかなっている、与えられました数字が必要ですが、ツリー内のノードのラベルを取得します。

ここにはどんなポインタにも感謝します！それが提供するオプションを認めていないため、

public static Tree traverseTreeAndChangePosTagsToNumbers(Tree tree) { 

    for (Tree subtree : tree.getChildrenAsList()) { 
     if (subtree.label().toString().matches("\\D+")) { 
      subtree.label().setValue("2"); 

     }if (Integer.parseInt(subtree.label().toString())<0||Integer.parseInt(subtree.label().toString())>4){ 
      subtree.label().setValue("2"); 
     } 
     if (!(subtree.isPreTerminal())) { 
      traverseTreeAndChangePosTagsToNumbers(subtree); 
     } 
    } 

    return tree; 
}

そうでもないまともな解決策：

出典

2017-06-09 Igor

は本当の解決策を見つけていないが、場合には他の誰かがこの問題に遭遇し、次のようにトリックをしましたセンチメントの範囲（つまり、サブフレーズの数が常に2（中立）であるため、ツリー内のサブフレーズに注釈を付ける）ので、感情は常に文/ツリー全体の値に基づいていますが、少なくとも構文エラーを取り除きます。

出典

2017-07-11 09:43:21 Igor

CoreNLPセンチメントトレーニングのデータが間違った形式になっています

答えて

関連する問題