文字列を含む一般的なテキスト形式の2列を見つけるにはどうすればよいですか？

私は 'What is Physics？'のようなデータを含む2つの列 'Title'を持っています。 '物理学は....の研究である'のようなデータを含むもう1つの列 'Content'。 ['is'、 'Physics']のような共通のテキストが必要です。これは、すべてのデータ行に対して実行する必要があります。どのように私はRを使用して達成することができますか？文字列を含む一般的なテキスト形式の2列を見つけるにはどうすればよいですか？

よろしく、

出典

2017-01-13 AYa

私はあなたが以下のような何かをしたいと思う：

df <- data.frame(col1=c('what is physics?', 'set cover is NP hard', 'abstract algebra'), 
       col2=c('Physics is the study of...', 'Example of an NP complete problem is 3-SAT', 'linear algebra'), 
       stringsAsFactors = FALSE) 
#  col1    col2 
# 1  what is physics? Physics is the study of... 
# 2 set cover is NP hard Example of an NP complete problem is 3-SAT 
# 3  abstract algebra linear algebra 

apply(df, 1, function(x) intersect(tolower(unlist(strsplit(gsub('[^a-zA-Z\\s]+', ' ', x[1]), split=' '))), 
           tolower(unlist(strsplit(gsub('[^a-zA-Z\\s]+', ' ', x[2]), split=' '))))) 

#[[1]] 
#[1] "is"  "physics" 

#[[2]] 
#[1] "is" "np" 

#[[3]] 
#[1] "algebra"

出典

2017-01-13 20:05:45

それは "非公開にでエラーが発生しました（strsplit（GSUB（" [^-ZA-Z \\ S]」と言い、 ""、x [2]））、split = ""）：未使用引数（split = ""） " – AYa

あなたは今すぐ確認していただけますか？タイプミスがありました。 –

文字列を含む一般的なテキスト形式の2列を見つけるにはどうすればよいですか？

答えて

関連する問題