Mallet：OutOfMemoryError：Javaヒープスペース

Malletのトレーニングデータでは、OutOfMemoryErrorのために処理が停止しました。 bin/malletの属性MEMORYはすでに3GBに設定されています。トレーニングファイルoutput.malletのサイズはわずか31 MBです。トレーニングデータのサイズを縮小しようとしました。しかし、それはまだ同じエラーがスローされます：これは私のビン/マレットファイルです：Mallet：OutOfMemoryError：Javaヒープスペース

[email protected]:~/dev/test_models/Mallet$ bin/mallet train-classifier --input output.mallet --trainer NaiveBayes --training-portion 0.0001 --num-trials 10 
Training portion = 1.0E-4 
Unlabeled training sub-portion = 0.0 
Validation portion = 0.0 
Testing portion = 0.9999 

-------------------- Trial 0 -------------------- 

Trial 0 Training NaiveBayesTrainer with 7 instances 
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space 
     at cc.mallet.types.Multinomial$Estimator.setAlphabet(Multinomial.java:309) 
     at cc.mallet.classify.NaiveBayesTrainer.setup(NaiveBayesTrainer.java:251) 
     at cc.mallet.classify.NaiveBayesTrainer.trainIncremental(NaiveBayesTrainer.java:200) 
     at cc.mallet.classify.NaiveBayesTrainer.train(NaiveBayesTrainer.java:193) 
     at cc.mallet.classify.NaiveBayesTrainer.train(NaiveBayesTrainer.java:59) 
     at cc.mallet.classify.tui.Vectors2Classify.main(Vectors2Classify.java:415)

が、私はこの問題

EDITに任意のヘルプや洞察をappriciateでしょう。

#!/bin/bash 


malletdir=`dirname $0` 
malletdir=`dirname $malletdir` 

cp=$malletdir/class:$malletdir/lib/mallet-deps.jar:$CLASSPATH 
#echo $cp 

MEMORY=10g 

CMD=$1 
shift 

help() 
{ 
cat <<EOF 
Mallet 2.0 commands: 

    import-dir   load the contents of a directory into mallet instances (one per file) 
    import-file  load a single file into mallet instances (one per line) 
    import-svmlight load SVMLight format data files into Mallet instances 
    info    get information about Mallet instances 
    train-classifier train a classifier from Mallet data files 
    classify-dir  classify data from a single file with a saved classifier 
    classify-file  classify the contents of a directory with a saved classifier 
    classify-svmlight classify data from a single file in SVMLight format 
    train-topics  train a topic model from Mallet data files 
    infer-topics  use a trained topic model to infer topics for new documents 
    evaluate-topics estimate the probability of new documents under a trained model 
    prune    remove features based on frequency or information gain 
    split    divide data into testing, training, and validation portions 
    bulk-load   for big input files, efficiently prune vocabulary and import docs 

Include --help with any option for more information 
EOF 
} 

CLASS= 

case $CMD in 
     import-dir) CLASS=cc.mallet.classify.tui.Text2Vectors;; 
     import-file) CLASS=cc.mallet.classify.tui.Csv2Vectors;; 
     import-svmlight) CLASS=cc.mallet.classify.tui.SvmLight2Vectors;; 
     info) CLASS=cc.mallet.classify.tui.Vectors2Info;; 
     train-classifier) CLASS=cc.mallet.classify.tui.Vectors2Classify;; 
     classify-dir) CLASS=cc.mallet.classify.tui.Text2Classify;; 
     classify-file) CLASS=cc.mallet.classify.tui.Csv2Classify;; 
     classify-svmlight) CLASS=cc.mallet.classify.tui.SvmLight2Classify;; 
     train-topics) CLASS=cc.mallet.topics.tui.TopicTrainer;; 
     infer-topics) CLASS=cc.mallet.topics.tui.InferTopics;; 
     evaluate-topics) CLASS=cc.mallet.topics.tui.EvaluateTopics;; 
     prune) CLASS=cc.mallet.classify.tui.Vectors2Vectors;; 
     split) CLASS=cc.mallet.classify.tui.Vectors2Vectors;; 
     bulk-load) CLASS=cc.mallet.util.BulkLoader;; 
     run) CLASS=$1; shift;; 
     *) echo "Unrecognized command: $CMD"; help; exit 1;; 
esac 

java -Xmx$MEMORY -ea -Djava.awt.headless=true -Dfile.encoding=UTF-8 -server -classpath "$cp" $CLASS "[email protected]"

オリジナルのトレーニングファイルには60,000個のアイテムがあります。アイテム数（20,000インスタンス）を減らすと、通常のようにトレーニングが実行されますが、約10GBのRAMが使用されます。

出典

2017-06-22 Long Le Minh

正確にどのファイルを変更しましたか？ bin/malletまたはbin/mallet.sh？ – mikep

bin/malletとbin/mallet.bat –

は、いずれかのjavaの呼び出しですか？ – mikep

bin/malletでJavaへの呼び出しを確認し、フラグ-Xmx3gを追加して、別のXmxがないことを確認します。もしそうなら、それを編集してください）。

出典

2017-06-22 04:43:20 mikep

Mallet：OutOfMemoryError：Javaヒープスペース

答えて

関連する問題