2017-12-07 14 views
2

ngramから得られたいくつかのテキストのリストを得て、元のデータテーブルに列として追加したい。ngramテキストがRの別の列になるようにする

> prep_test 
                          prep_test 
1:      Women Athletic,Athletic Apparel,Apparel Pants,Pants Tights,Tights Leggings 
2:                  Beauty Makeup,Makeup Face 
3:                  Beauty Makeup,Makeup Face 
4:  Electronics Cell,Cell Phones,Phones Accessories,Accessories Cases,Cases Covers,Covers Skins 
5:                   Women Shoes,Shoes Boots 
6:             Men Men,Men s,s Accessories,Accessories Belts 
7: Electronics Cell,Cell Phones,Phones Accessories,Accessories Cell,Cell Phones,Phones Smartphones 
8:               Women Tops,Tops Blouses,Blouses Other 
9:      Women Athletic,Athletic Apparel,Apparel Pants,Pants Tights,Tights Leggings 
10:            Home Home,Home DÃ,DÃ cor,cor Home,Home Fragrance 



str(prep_test) 
Classes ‘data.table’ and 'data.frame': 10 obs. of 1 variable: 
$ prep_test:List of 10 
    ..$ : chr "Women Athletic" "Athletic Apparel" "Apparel Pants" "Pants Tights" ... 
    ..$ : chr "Beauty Makeup" "Makeup Face" 
    ..$ : chr "Beauty Makeup" "Makeup Face" 
    ..$ : chr "Electronics Cell" "Cell Phones" "Phones Accessories" "Accessories Cases" ... 
    ..$ : chr "Women Shoes" "Shoes Boots" 
    ..$ : chr "Men Men" "Men s" "s Accessories" "Accessories Belts" 
    ..$ : chr "Electronics Cell" "Cell Phones" "Phones Accessories" "Accessories Cell" ... 
    ..$ : chr "Women Tops" "Tops Blouses" "Blouses Other" 
    ..$ : chr "Women Athletic" "Athletic Apparel" "Apparel Pants" "Pants Tights" ... 
    ..$ : chr "Home Home" "Home DÃ" "DÃ cor" "cor Home" ... 
- attr(*, ".internal.selfref")=<externalptr> 

現在のコードは、ここで

bigram_fun <- function(y){ 
    y <- gsub("[[:punct:][:blank:]]+", " ", y) 
    y <- ngram_asweka(y, min=2, max=2) 
    #y <- str_split_fixed(y, ",", n=Inf) 
    #y <- unlist(y) 
    return(y) 
} 

prep_test <- all[1:10, 9] 
prep_test <- apply(prep_test, 1, bigram_fun) 
prep_test <- data.table(prep_test) 
prep_test 

dput

> dput(prep_test) 
list(c("Women Athletic", "Athletic Apparel", "Apparel Pants", 
"Pants Tights", "Tights Leggings"), c("Beauty Makeup", "Makeup Face" 
), c("Beauty Makeup", "Makeup Face"), c("Electronics Cell", "Cell Phones", 
"Phones Accessories", "Accessories Cases", "Cases Covers", "Covers Skins" 
), c("Women Shoes", "Shoes Boots"), c("Men Men", "Men s", "s Accessories", 
"Accessories Belts"), c("Electronics Cell", "Cell Phones", "Phones Accessories", 
"Accessories Cell", "Cell Phones", "Phones Smartphones"), c("Women Tops", 
"Tops Blouses", "Blouses Other"), c("Women Athletic", "Athletic Apparel", 
"Apparel Pants", "Pants Tights", "Tights Leggings"), c("Home Home", 
"Home DÃ", "DÃ cor", "cor Home", "Home Fragrance")) 

所望の結果列のnグラムを生成する

Bigram 1   Bigram 2   Bigram 3    Bigram 4  ... 
"Women Athletic" "Athletic Apparel" "Apparel Pants"  "Pants Tights"... 
"Beauty Makeup" "Makeup Face"  NA     NA   ... 
"Beauty Makeup" "Makeup Face"  NA     NA   ... 
"Electronics Cell" "Cell Phones"  "Phones Accessories" "Accessories Cases" 
"Women Shoes"  "Shoes Boots"  NA     NA 

どんな答えを感謝し、ここで

+0

あなたのコードのいずれかが – Chris

+0

'prep_test'は、あなたの質問でdata.tableオブジェクトで再現可能であるように、'あなたのデータのdput'をアップロードします。しかし、あなたの 'dput'にはデータテーブルではなく、リストが含まれています。何か不足していますか? – jazzurro

答えて

0

これは動作するはずです初心者としてここに貧しい質問して申し訳ありません:

library(plyr) 
df = rbind.fill(lapply(mylist,function(x) {as.data.frame(t(x))})) 
colnames(df) = sapply(seq(1:ncol(df)),function(x) {paste0("Bigram ",x)}) 

出力:

  Bigram 1   Bigram 2   Bigram 3   Bigram 4  Bigram 5   Bigram 6 
1 Women Athletic Athletic Apparel  Apparel Pants  Pants Tights Tights Leggings    <NA> 
2  Beauty Makeup  Makeup Face    <NA>    <NA>   <NA>    <NA> 
3  Beauty Makeup  Makeup Face    <NA>    <NA>   <NA>    <NA> 
4 Electronics Cell  Cell Phones Phones Accessories Accessories Cases Cases Covers  Covers Skins 
5  Women Shoes  Shoes Boots    <NA>    <NA>   <NA>    <NA> 
6   Men Men   Men s  s Accessories Accessories Belts   <NA>    <NA> 
7 Electronics Cell  Cell Phones Phones Accessories Accessories Cell  Cell Phones Phones Smartphones 
8  Women Tops  Tops Blouses  Blouses Other    <NA>   <NA>    <NA> 
9 Women Athletic Athletic Apparel  Apparel Pants  Pants Tights Tights Leggings    <NA> 
10  Home Home   Home DÃ    DÃ cor   cor Home Home Fragrance    <NA> 

は、この情報がお役に立てば幸い!

+0

これは動作しています!恐ろしいthxフロリアン:D –

0

バイグラムをデータフレームに変換し、溶けたデータフレームにバインドしてから、次のようにしてワイドフォーマットの整理済みデータファイルにキャストできます。

theBigrams <- list(c("Women Athletic", "Athletic Apparel", "Apparel Pants", 
"Pants Tights", "Tights Leggings"), c("Beauty Makeup", "Makeup Face"), 
c("Beauty Makeup", "Makeup Face"), c("Electronics Cell", "Cell Phones", 
"Phones Accessories", "Accessories Cases", "Cases Covers", "Covers Skins" 
), c("Women Shoes", "Shoes Boots"), c("Men Men", "Men s", "s Accessories", 
"Accessories Belts"), c("Electronics Cell", "Cell Phones", "Phones Accessories", 
"Accessories Cell", "Cell Phones", "Phones Smartphones"), c("Women Tops", 
"Tops Blouses", "Blouses Other"), c("Women Athletic", "Athletic Apparel", 
"Apparel Pants", "Pants Tights", "Tights Leggings"), c("Home Home", 
"Home DÃ", "DÃ cor", "cor Home", "Home Fragrance")) 

meltedBigrams <- do.call(rbind,lapply(seq_along(theBigrams),function(i) { 
    x <- theBigrams[[i]] 
    bigram <- 1:length(x) 
    id <- rep(i,length(x)) 
    data.frame(id,bigram,value=x,stringsAsFactors=FALSE) 
})) 
library(reshape2) 
castData <- dcast(meltedBigrams,id ~ bigram) 
castData 

...と出力:

> castData 
    id    1    2     3     4    5     6 
1 1 Women Athletic Athletic Apparel  Apparel Pants  Pants Tights Tights Leggings    <NA> 
2 2 Beauty Makeup  Makeup Face    <NA>    <NA>   <NA>    <NA> 
3 3 Beauty Makeup  Makeup Face    <NA>    <NA>   <NA>    <NA> 
4 4 Electronics Cell  Cell Phones Phones Accessories Accessories Cases Cases Covers  Covers Skins 
5 5  Women Shoes  Shoes Boots    <NA>    <NA>   <NA>    <NA> 
6 6   Men Men   Men s  s Accessories Accessories Belts   <NA>    <NA> 
7 7 Electronics Cell  Cell Phones Phones Accessories Accessories Cell  Cell Phones Phones Smartphones 
8 8  Women Tops  Tops Blouses  Blouses Other    <NA>   <NA>    <NA> 
9 9 Women Athletic Athletic Apparel  Apparel Pants  Pants Tights Tights Leggings    <NA> 
10 10  Home Home   Home DÃ    DÃ cor   cor Home Home Fragrance    <NA> 
> 
+0

Thxレングレスキー:Dこれも動作します! thxそんなに! –