Stemcompletion in R

Rのテキストマイニングに取り組んでいますが、句読点、数字、URL、ストップワードを削除した後で、私のコーパスの文書はほとんどありません。その後Stemcompletion in R

myStopwords <- setdiff(myStopwords, c("r", "big")) 
myCorpus <- tm_map(myCorpus, removeWords, myStopwords) 
myCorpus <- tm_map(myCorpus, stripWhitespace) 
myCorpusCopy <- myCorpus 
for (i in c(1:2, 320)) 
{ 
    cat(paste0("[", i, "] ")) 
    writeLines(strwrap(as.character(myCorpus[[i]]), 60)) 
} 

[1] examples calling java code r 
[2] simulating mapreduce r big data analysis using flights data 
rbloggers 
[320] r reference card data mining now cran lists many useful r 
functions packages data mining applications

、私はforループを実行しようとすると

myCorpus <- tm_map(myCorpus, stemDocument) 
myCorpus <- tm_map(myCorpus, stemCompletion, dictionary=myCorpusCopy)

が、それは

for (i in c(1:2, 320)) 
{ 
cat(paste0("[", i, "] ")) 
writeLines(strwrap(as.character(myCorpus[[i]]), 60)) 
} 

[1] NA 
[2] NA 
[320] NA

任意のアイデア以下のように、NAを示しており、以下のように語幹のためにしようとしています私はここで間違っていますか？

出典

2017-07-02 subro

私はあなたの問題を再現し、組み込みのデータセット：私は最後の行の後myCorpusオブジェクトの要素は、そのようなmetaと私の場合contentと、それらの構造中に複数のフィールドを、持っていることがわかった

data("crude") 

myCorpus  <- as.VCorpus(crude) 
myCorpusCopy <- myCorpus 
myCorpus <- tm_map(myCorpus, stemDocument) 
myCorpus <- tm_map(myCorpus, stemCompletion, dictionary=myCorpusCopy)

今elemntsは文字ベクタと呼ばれます。

あなたはまだ要素にアクセスすることができます

myCorpus[[1]]

Diamond Shamrock Corp said that\neffect today it had cut it contract price for crude oil by\n1.50 dlrs a barrel.\n The reduct bring it post price for West Texas\nIntermedi to 16.00 dlrs a barrel, the copani said.\n "The price reduct today was made in the light of falling\noil product price and a weak crude oil market," a company\nspokeswoman said.\n Diamond is the latest in a line of U.S. oil compani that\nhav cut it contract, or posted, price over the last two days\ncit weak oil markets.\n Reuter 
                                                                                                                                 "content" 
                                                                                                                                  <NA> 
                                                                                                                                 "meta"

しかしas.character()方法は、あなたが望むものからオブジェクトの要素の新しい構造（str()）の反対側の部分に当たっています。今度は本体テキストが実際にはnamesとして格納されています。

私はこのようなループを修正することができました：

for (i in c(1:2, length(myCorpus))) 
{ 
    cat(paste0("[", i, "] ")) 
    writeLines(strwrap(as.character(names(myCorpus[[i]])), 60)) 
}

[1] Diamond Shamrock Corp said that effect today it had cut it 
contract price for crude oil by 1.50 dlrs a barrel. The 
reduct bring it post price for West Texas Intermedi to 
16.00 dlrs a barrel, the copani said. "The price reduct 
today was made in the light of falling oil product price 
and a weak crude oil market," a company spokeswoman said. 
Diamond is the latest in a line of U.S. oil compani that 
hav cut it contract, or posted, price over the last two 
days cit weak oil markets. Reuter 

[2] OPEC may be forc to meet befor a schedul June session to 
readdress it product cutting agr if the organ want to halt 
the current slide in oil prices, oil industri analyst said. 
"The movement to higher oil price was never to be as easy a 
OPEC thought. They may need an emerg meet to sort out th 
problems," said Daniel Yergin, director of Cambridg Energy 
Research Associates, CERA. Analyst and oil industri sourc 
said the problem OPEC face is excess oil suppli in world 
oil markets. "OPEC problem is not a price problem but a 
production issu and must be address in that way," said Paul 
Mlotok, oil analyst with Salomon Brother Inc. He said the 
market earlier optim about OPEC and its abl to keep product 
under control have given way to a pessimist outlook that 
the organ must address soon if it wish to regain the initi 
in oil prices. But some other analyst were uncertain that 
even an emerg meet would address the problem of OPEC 
production abov the 15.8 mln bpd quota set last December. 
"OPEC has to learn that in a buyer market you cannot have 
deem quotas, fix price and set differentials," said the 
region manag for one of the major oil compani who spoke on 
condit that he not be named. "The market is now tri to 
teach them that lesson again," he added. David T. Mizrahi, 
editor of Mideast reports, expect OPEC to meet befor June, 
although not immediately. However, he is not optimist that 
OPEC can address it princip problems. "They will not meet 
now as they tri to take advantag of the wint demand to sell 
their oil, but in late March and April when demand 
slackens," Mizrahi said. But Mizrahi said that OPEC is 
unlik to do anyth more than reiter it agreement to keep 
output at 15.8 mln bpd." Analyst said that the next two 
month will be critic for OPEC abil to hold togeth price and 
output. "OPEC must hold to it pact for the next six to 
eight weeks sinc buyer will come back into the market 
then," said Dillard Sprigg of Petroleum Analysi Ltd in New 
York. But Bijan Moussavar-Rahmani of Harvard Univers 
Energy and Environ Polici Center said that the demand for 
OPEC oil ha been rise through the first quarter and this 
may have prompt excess in it production. "Demand for their 
(OPEC) oil is clear abov 15.8 mln bpd and is probabl closer 
to 17 mln bpd or higher now so what we ar see character as 
cheat is OPEC meet this demand through current production," 
he told Reuter in a telephon interview. Reuter 
[20] Argentin crude oil product was down 10.8 pct in Januari 
1987 to 12.32 mln barrels, from 13.81 mln barrel in Januari 
1986, Yacimiento Petrolifero Fiscales said. Januari 1987 
natur gas output total 1.15 billion cubic metrers, 3.6 pct 
higher than 1.11 billion cubic metr produced in Januari 
1986, Yacimiento Petrolifero Fiscal added. Reuter

出典

2017-07-02 04:27:42

答えて

関連する問題