Rを使用する文字に基づいて文字列を区切る方法

文字列があり、途中にピリオドを含む単語で検索する必要があります。いくつかの文字列が連結されているので、それらを単語に分割して、点で単語をフィルタリングできるようにする必要があります。以下はRを使用する文字に基づいて文字列を区切る方法

私が持っているもののサンプルと私がこれまでに取得

punctToRemove <- c("[^[:alnum:][:space:]._]") 

s <- c("get_degree('TITLE',PERS.ID)", 
     "CLIENT_NEED.TYPE_CODe=21", 
     "2.1.1Report Field Level Definition", 
     "The user defined field. The user will validate")

これは私が現在、私が何をしたいの

gsub(punctToRemove, " ", s) 

[1] "get_degree TITLE PERS.ID "     
[2] "CLIENT_NEED.TYPE_CODe 21"      
[3] "2.1.1Report Field Level Definition"    
[4] "The user defined field. The user will validate"

サンプルが

[1] "get_degree (' TITLE ' , PERS.ID) "   # spaces before and after the "(", "'", ",",and ")" 
[2] "CLIENT_NEED.TYPE_CODe = 21"     # spaces before and after the "=" sign. Dot and underscore remain untouched.   
[3] "2.1.1Report Field Level Definition"   # no changes 
[4] "The user defined field. The user will validate" # no changes

以下で得るものです

出典

2016-09-01 user3357059

我々は正規表現前後参照

OPの予想出力に示された文字の数に等しい

s1 <- gsub("(?<=['=(),])|(?=['(),=])", " ", s, perl = TRUE) 
s1 
#[1] "get_degree (' TITLE ' , PERS.ID) "   
#[2] "CLIENT_NEED.TYPE_CODe = 21"      
#[3] "2.1.1Report Field Level Definition"    
#[4] "The user defined field. The user will validate" 

nchar(s1) 
#[1] 35 26 34 46

を使用することができます。

出典

2016-09-01 15:24:04 akrun

以下のような**バーやバーがある場合に対応できるようにコードを更新しました：**クライアント**は、現在どのように 'gsub（"（？<= [\\ | '= ]）|（？= [\\ | '（）、=]） "、" "、s、perl = TRUE）' – user3357059

この例の場合：

library(stringr) 
    s <- str_replace_all(s, "\\)", " \\) ") 
    s <- str_replace_all(s, "\\(", " \\(") 
    s <- str_replace_all(s, "=", " = ") 
    s <- str_replace_all(s, "'", " ' ") 
    s <- str_replace_all(s, ",", " , ")

出典

2016-09-01 15:23:33

Rを使用する文字に基づいて文字列を区切る方法

答えて

関連する問題