列を分割するためのより効率的な方法があり

このread.tableを実行するときに正しくインポートされませんいくつかの値があります。列を分割するためのより効率的な方法があり

hs.industry <- read.table("https://download.bls.gov/pub/time.series/hs/hs.industry", header = TRUE, fill = TRUE, sep = "\t", quote = "", stringsAsFactors = FALSE)

は、具体的にindustry_codeとは次のように結合されているindustry_nameいくつかの値があるがindustry_code列の単一の値（理由は不明）。各industry_codeは4桁の数字であることを考えると、分割され、修正するために、私のアプローチは次のとおりです。

for (i in 1:nrow(hs.industry)) { 
    if (isTRUE(nchar(hs.industry$industry_code[i]) > 4)) { 
    hs.industry$industry_name[i] <- gsub("[[:digit:]]","",hs.industry$industry_code[i]) 
    hs.industry$industry_code[i] <- gsub("[^0-9]", "",hs.industry$industry_code[i]) 
    } 
}

私はこれがひどくinnificentであると感じ、私は良いだろうどのようなアプローチはよく分かりません。

ありがとうございます！

出典

2017-03-06 Michael

問題は、ライン29と30（行28と29、我々はヘッダーを数えていない場合）は、フォーマットエラーを持っているということです。適切なタブ文字の代わりに4つのスペースを使用します。余分なデータクリーニングが必要です。その後、

使用readLinesは生のテキストを読み込むために、書式設定のエラーを修正し、洗浄テーブルで読む：

# read in each line of the file as a list of character elements 
hs.industry <- readLines('https://download.bls.gov/pub/time.series/hs/hs.industry') 

# replace any instances of 4 spaces with a tab character 
hs.industry <- gsub('\\W{4,}', '\t', hs.industry) 

# collapse together the list, with each line separated by a return character (\n) 
hs.industry <- paste(hs.industry, collapse = '\n') 

# read in the new table 
hs.industry <- read.table(text = hs.industry, sep = '\t', header = T, quote = '')

出典

2017-03-06 18:45:48 jdobres

ありがとう！あなたは崩壊の必要性を説明できますか？ – Michael

'' text '引数で 'read.table'を使用する場合、テキストは文字列のリストではなく、単一の文字列でなければなりません。このようにして、文字列のリスト（各項目は元のテキストの1行を表す）を改行文字で折りたたみます。 – jdobres

代わりに問題があるとエントリのみをGSUBエントリのみを識別し、各インスタンスをループする必要はありません。

replace_indx <- which(nchar(hs.industry$industry_code) > 4) 
hs.industry$industry_name[replace_indx] <- gsub("\\d+\\s+", "", hs.industry$industry_code[replace_indx]) 
hs.industry$industry_code[replace_indx] <- gsub("\\D+", "", hs.industry$industry_code[replace_indx])

私も私も交換してくださいここで、文字列置換を向上させるために"\\d+\\s+"を使用スペース：

gsub("[[:digit:]]","",hs.industry$industry_code[replace_indx]) 
# [1] " Dimension stone"   " Crushed and broken stone" 

gsub("\\d+\\s+", "", hs.industry$industry_code[replace_indx]) 
# [1] "Dimension stone"   "Crushed and broken stone"

出典

2017-03-06 18:44:10 Djork

列を分割するためのより効率的な方法があり

答えて

関連する問題