2016-06-17 3 views
2

スペイン語のテキスト解析を行うために、RでStanford CoreNLPパッケージを使用し始めています。だから、私は次のことを試してください。Stanford CoreNLP in R:スペイン語が動作しない

R 

R version 3.2.2 (2015-08-14) -- "Fire Safety" 
Copyright (C) 2015 The R Foundation for Statistical Computing 
Platform: x86_64-pc-linux-gnu (64-bit) 

R is free software and comes with ABSOLUTELY NO WARRANTY. 
You are welcome to redistribute it under certain conditions. 
Type 'license()' or 'licence()' for distribution details. 

    Natural language support but running in an English locale 

R is a collaborative project with many contributors. 
Type 'contributors()' for more information and 
'citation()' on how to cite R or R packages in publications. 

Type 'demo()' for some demos, 'help()' for on-line help, or 
'help.start()' for an HTML browser interface to help. 
Type 'q()' to quit R. 

> install.packages("coreNLP") 
Installing package into ‘/home/ach/R/x86_64-pc-linux-gnu-library/3.2’ 
(as ‘lib’ is unspecified) 
--- Please select a CRAN mirror for use in this session --- 
trying URL 'https://cran.rediris.es/src/contrib/coreNLP_0.4-1.tar.gz' 
Content type 'application/x-gzip' length 17392 bytes (16 KB) 
================================================== 
downloaded 16 KB 

* installing *source* package ‘coreNLP’ ... 
** package ‘coreNLP’ successfully unpacked and MD5 sums checked 
** R 
** data 
*** moving datasets to lazyload DB 
** inst 
** preparing package for lazy loading 
** help 
*** installing help indices 
** building package indices 
** testing if installed package can be loaded 
* DONE (coreNLP) 

The downloaded source packages are in 
    ‘/tmp/RtmpO3q77z/downloaded_packages’ 
> library(coreNLP) 
> downloadCoreNLP(type="base") 
trying URL 'http://nlp.stanford.edu/software//stanford-corenlp-full-2015-04-20.zip' 
Content type 'application/zip' length 360824440 bytes (344.1 MB) 
================================================== 
downloaded 344.1 MB 

[1] 0 
> 
> downloadCoreNLP(type="spanish") 
trying URL 'http://nlp.stanford.edu/software//stanford-spanish-corenlp-2015-01-08-models.jar' 
Content type 'application/x-java-archive' length 25007256 bytes (23.8 MB) 
================================================== 
downloaded 23.8 MB 

> initCoreNLP() 
Searching for resource: config.properties 
Adding annotator tokenize 
TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer. 
Adding annotator ssplit 
Adding annotator pos 
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.7 sec]. 
Adding annotator lemma 
Adding annotator ner 
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [3.5 sec]. 
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [1.2 sec]. 
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [2.3 sec]. 
Initializing JollyDayHoliday for SUTime from classpath: edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1. 
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt 
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt 
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.holidays.sutime.txt 
Adding annotator parse 
Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.4 sec]. 
Adding annotator dcoref 
Adding annotator sentiment 
> > sInes <- "Hola padre. Acabo de llegar a casa. Tengo ganas de cenar" 
> annotation <- annotateString(sInes) 
> token <- getToken(annotation) 
> token[token$sentence==2,c(1:4,7)] 
    sentence id token lemma POS 
4  2 1 Acabo Acabo NNP 
5  2 2  de  de NNP 
6  2 3 llegar llegar NNP 
7  2 4  a  a DT 
8  2 5 casa casa FW 
9  2 6  .  . . 

すべてが(エラーが私の知る限り、見ることができない)正常に動作するように見えるが、それは動作しません。たとえば、「casa」は、不適切な外部語(FW)としてタグ付けされます。

だから、誰もこれについて何か考えているのですか?

props.setProperty("tokenize.language", "es"); 

答えて

2

感謝言語設定を変更するのは簡単です。

# update to newest version of the package 
devtools::install_github("statsmaths/coreNLP") 

# download base library (mandatory): 
coreNLP::downloadCoreNLP() 

# download desired language library: 
coreNLP::downloadCoreNLP(type="spanish") 

# attach package 
library(coreNLP) 

# run initCoreNLP specifying your language of choice 
initCoreNLP(type="spanish") 
+0

私はこれをRシェルでinitCoreNLP()コマンドの前と後の両方で試しましたが、次のように表示されます。エラー:関数 "props.setProperty" – ACCaminero

1

パッケージの著者、最近作るの更新をした:あなたはスペイン語をダウンロードするが、同様にスペイン語にトークナイザを設定するだけでなく、必要たくさん

アグスティン