2017-01-06 9 views
-1

Pythonで作成したリストをcsvのセルに埋め込みました。私は要素をRのデータテーブルに強制しようとしていますが、テキストを含む特定のベクトルに固執しています。その理由は、strsplit()は "、"で分割して数値でうまく動作しますが、テキストの埋め込みカンマがあれば、あるベクトルは他のベクトルより長くなります。以下は、再現可能な例をまとめたものです。ご提供いただけるお手伝いをありがとうございます!Pythonリストを分割する方法R

x <- c("['SPOSORSHIP FOR CONVENTION']", "['GENERAL CONTRIBUTION', 'GENERAL CONTRIBUTION']", 
"['WOMEN & POPULATION']", "['PROGRAM SUPPORT', 'PROGRAM SUPPORT']", 
"['MULTIPLE GRANTS FOR MULTIPLE PURPOSES']", "['IMPROVING NATIONAL PARKS']", 
"['general operating support']", "['Civic Engagement', 'Animal Welfare', 'Religion']", 
"['RESEARCH SUBAWARD']", "['OPERATIONAL SUPPORT', 'OPERATIONAL SUPPORT']", 
"['PROMOTE FILM INDUSTRY']", "['TO SUPPORT PUBLIC AFFAIRS PROGRAMS', 'TO SUPPORT PUBLIC AFFAIRS PROGRAMS', 'TO SUPPORT PUBLIC AFFAIRS PROGRAMS', 'TO SUPPORT PUBLIC AFFAIRS PROGRAMS', 'TO SUPPORT PUBLIC AFFAIRS PROGRAMS', 'TO SUPPORT PUBLIC AFFAIRS PROGRAMS']", 
"['10TH ANNUAL GREAT LAKES RESTORATION CONFERENCE AND PETER WEGE TRIBUTE LUNCHEON']", 
"['Conservation', 'Conservation']", "['FOR GENERAL OPERATING SUPPORT']" 
) 
+0

一重引用符で囲まれたものはすべて、新しい文字ベクトルの一意の要素である必要があります。だから、strsplitに類似していて、リストを解除することができます... – StanO

+0

問題を再現するために質問を編集し、例のサイズを最小限に抑えることができますか? –

答えて

1

これはおそらく役立ちます。私が最初に

cleeaned_text = gsub("(^\\['+)|('\\]\\b)",'',x) #remove '[ and ]' 
unlist(strsplit(cleeaned_text, "', '")) #split on ', ' 
[1] "SPOSORSHIP FOR CONVENTION"              
[2] "GENERAL CONTRIBUTION"               
[3] "GENERAL CONTRIBUTION"               
[4] "WOMEN & POPULATION"                
[5] "PROGRAM SUPPORT"                
[6] "PROGRAM SUPPORT"                
[7] "MULTIPLE GRANTS FOR MULTIPLE PURPOSES"           
[8] "IMPROVING NATIONAL PARKS"              
[9] "general operating support"              
[10] "Civic Engagement"                
[11] "Animal Welfare"                 
[12] "Religion"                  
[13] "RESEARCH SUBAWARD"                
[14] "OPERATIONAL SUPPORT"               
[15] "OPERATIONAL SUPPORT"               
[16] "PROMOTE FILM INDUSTRY"               
[17] "TO SUPPORT PUBLIC AFFAIRS PROGRAMS"            
[18] "TO SUPPORT PUBLIC AFFAIRS PROGRAMS"            
[19] "TO SUPPORT PUBLIC AFFAIRS PROGRAMS"            
[20] "TO SUPPORT PUBLIC AFFAIRS PROGRAMS"            
[21] "TO SUPPORT PUBLIC AFFAIRS PROGRAMS"            
[22] "TO SUPPORT PUBLIC AFFAIRS PROGRAMS"            
[23] "10TH ANNUAL GREAT LAKES RESTORATION CONFERENCE AND PETER WEGE TRIBUTE LUNCHEON" 
[24] "Conservation"                 
[25] "Conservation"                 
[26] "FOR GENERAL OPERATING SUPPORT" 
+1

それだけです!ありがとう!!! – StanO

1

2つの溶液 ' ''[と']、その後は上の分割削除:

# with stringr 
library(stringr) 
a <- str_replace_all(x, "\\['|'\\]", "") %>% 
    str_split("', '") %>% 
    unlist 

# with base 
b <- unlist(strsplit(gsub("\\['|'\\]", "", x), "', '")) 

identical(a, b) 

結果:

[1] "SPOSORSHIP FOR CONVENTION" 
[2] "GENERAL CONTRIBUTION" "GENERAL CONTRIBUTION" 
[3] "WOMEN & POPULATION" 
... 

トリックは、最初の文字列をトリミングすることですカンマではなく、', 'で区切ります。

関連する問題