特定の句読点が含まれていない文字列を置換するR

句読点が含まれていない文字列「/」を置き換えることを目指します。特定の句読点が含まれていない文字列を置換するR

sentence = 'I/NP to/INF this/NP like/CON that/NP Peter wow er ! is'

'/ UN' でそれらをタグ付けする必要があるので、これらの要素は、 '/' で立ち往生していない[ピーターは、すごい、！、えー、あります]。

これは、私はしかし、残念ながら、私は何を得たことは、以下のこの結果であり、この

seg = unlist(strsplit(sentence, '[[:space:]]+')) 
    segment = seg[!grepl('\\/',seg)] 
    replace = gsub('(\\S+)','\\1/UN',segment) 

    library(stringr) 
    mgsub <- function(pattern, replacement, x, ...) { 
     if (length(pattern)!=length(replacement)) { 
     stop("pattern and replacement do not have the same length.") 
     } 
     result <- x 
     for (i in 1:length(pattern)) { 
     result <- gsub(pattern[i], replacement[i], result, ...) 
     } 
     result 
    } 

    mgsub(segment, replace, sentence)

のために試してみたものです。

[1] "I/NP to/INF this/UN/NP like/CON that/NP Peter/UN/UN wow/UN er/UN !/UN is/UN"

これは私が達成することを目指すものです： - sentenceが、コードはそれらのすべてを得ることができますので、より多くの可能な例を考えてみ

[1] "I/NP to/INF this/NP like/CON that/NP Peter/UN wow/UN er/UN !/UN is/UN"

はサンプルで立ち往生されないようにしてください。

出典

2017-05-07 Rcoding

好奇心をそらしてどのようにPOSタグを生成していますか？私は、OpenNLPがあなたの残り物をタグ付けしていると仮定します... –

/UNを、/を含まないすべての単語に追加する場合は、gsubを使用できます。例

gsub("(?<=^|)([^\\/ ]+)(?= |$)","\\1\\2/UN\\3", sentence, perl=T) 
# [1] "I/NP to/INF this/NP like/CON that/NP Peter/UN wow/UN er/UN !/UN is/UN"

についてこの正規表現は、スペースや文字列の境界線に挟まれたスラッシュやスペース([^\\/ ]+)を含まない文字の文字列を探します。

出典

2017-05-07 18:23:49 MrFlick

ありがとう！それは素晴らしいです！ – Rcoding

特定の句読点が含まれていない文字列を置換するR

答えて

関連する問題