条件に基づいて複数の行の文字列を1つの行にまとめます。

-2

は、私はこのデータを持っていたと言う：条件に基づいて複数の行の文字列を1つの行にまとめます。

df <- data.frame(
    text = c("Treatment1: This text is","on two lines","","Treatment2:This text","has","three lines","","Treatment3: This has one") 
       ) 
df 
         text 
1 Treatment1: This text is 
2    on two lines 
3       
4  Treatment2:This text 
5      has 
6    three lines 
7       
8 Treatment3: This has one

すべての「治療」は、同じ行の下にあるすべてのテキストで自分の行にあるように、どのように私はこのテキストが解析でしょうか？

例えばこれは、所望の出力です：

text 
1 Treatment1: This text is on two lines 
2 Treatment2: This text has three lines     
3 Treatment3: This has one

誰もがこれを行う方法をお勧めしますか？

出典

2017-10-15 boshek

多分次のようなものです。
まず、dput形式のデータで、投稿のデータセットを共有するのに最適な形式です。

df <- 
structure(list(text = c("Treatment1: This text is", "on two lines", 
"", "Treatment2:This text", "has", "three lines", "", "Treatment3: This has one" 
)), .Names = "text", class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8"))

今やbase Rコードです。

fact <- cumsum(grepl("treatment", df$text, , ignore.case = TRUE)) 
result <- do.call(rbind, lapply(split(df, fact), function(x) 
        trimws(paste(x$text, collapse = " ")))) 
result <- as.data.frame(result) 
names(result) <- "text" 
result 
#         text 
#1 Treatment1: This text is on two lines 
#2 Treatment2:This text has three lines 
#3    Treatment3: This has one

EDIT。
Rich Scriven氏のコメントによれば、tapplyは上記のコードを大幅に単純化することができます。（私は時々私はあまり複雑に、その表示されませんでした。）

result2 <- data.frame(
    text = tapply(df$text, fact, function(x) trimws(paste(x, collapse = " "))) 
) 

all.equal(result, result2) 
#[1] "Component “text”: 'current' is not a factor"

出典

2017-10-15 21:54:49

'tapply（）'を見てください。 'do.call（rbind、lapply（split ... ...）））の代わりに使用できます。 –

@RichScrivenありがとう、あなたの提案を編集して回答してください。 –

x <- gsub("\\s+Treatment", "*BREAK*Treatment", 
      as.character(paste(df[[1]], collapse = " "))) 
data.frame(text = unlist(strsplit(x, "\\*BREAK\\*")))

出典

2017-10-15 21:56:54

条件に基づいて複数の行の文字列を1つの行にまとめます。

答えて

関連する問題