2017-10-18 11 views
1

ムードとPartOfTownという2つの変数を持つデータフレーム(df2)があります。気分はマルチ選択です(つまり、オプションの組み合わせは自由です) PartOfTownは地理的な位置を表します。カンマ区切りのエントリを記録するR

問題は、NorthCodeを使用する町の北部の中心とSouthCode(df1)を使用する南部の中心とで、中心のコード気分が異なっていることです。

データセット(df2)のすべてのエントリをSouthCodeに再コード化して、df3のようなデータセットにすることをお勧めします。私は一般的な解決策を望んでいます。なぜなら、現在、データセットに含まれていない新しい組み合わせの新しいエントリが存在する可能性があるからです。それに対するいかなる考えも高く評価されるだろう。

気分のための

センターのコードと定義:

df1 <- data.frame(NorthCode=c(4,5,6,7,99),NorthDef=c("happy","sad","tired","energetic","other"),SouthCode=c(7,8,9,5,99),SouthDef=c("happy","sad","tired","energetic","other")) 

スタート地点:

df2 <- data.frame(Mood=c("4","5","6","7","4,5","5,6,99","99","7","8","9","5","7,8","8,5,99","99"),Region=c("north","north","north","north","north","north","north","south","south","south","south","south","south","south")) 

望ましい結果:

df3 <- data.frame(Mood=c("7","8","9","5","7,8","8,9,99","99","7","8","9","5","7,8","8,5,99","99"),PartofTown=c("north","north","north","north","north","north","north","south","south","south","south","south","south","south")) 

現在の試行:エントリを分割しての開始しようとしたが、それを働かせることができませんでした。

unlist(strsplit(df2$Mood, ",")) 

答えて

1

ムードが文字ベクトル、ない要因であることを確認する)(as.data.frameする= Fあなたはstrsplitと正しい道にあったが、あなたはstringsAsFactorsを追加する必要があります。 その後、分離された要素をリストとして保持し、lapply()で古いコードと新しいコードを一致させることができます。

df1 <- 
    data.frame(NorthCode=c(4,5,6,7,99), 
      NorthDef=c("happy","sad","tired","energetic","other"), 
      SouthCode=c(7,8,9,5,99), 
      SouthDef=c("happy","sad","tired","energetic","other"), 
      stringsAsFactors = F) 

df2 <- 
    data.frame(Mood=c("4","5","6","7","4,5","5,6,99","99","7","8","9","5","7,8","8,5,99","99"), 
      Region=c("north","north","north","north","north","north","north","south","south","south","south" ,"south","south","south"), 
      stringsAsFactors = F) 

df3 <- 
    data.frame(Mood=c("7","8","9","5","7,8","8,9,99","99","7","8","9","5","7,8","8,5,99","99"), 
      PartofTown=c("north","north","north","north","north","north","north","south","south","south","south" ,"south","south","south"), 
      stringsAsFactors = F) 

# Split the Moods into separate values 
splitCodes <- strsplit(df2$Mood, ",") 
# Add the Region as the name of each element in the new list 
names(splitCodes) <- df2$Region 

# Recode the values by matching the north values to the south values 
recoded <- 
    lapply(
    seq_along(splitCodes), 
    function(x){ 
     ifelse(rep(names(splitCodes[x]) == "north", length(splitCodes[[x]])), 
      df1$SouthCode[match(splitCodes[[x]], df1$NorthCode)], 
      splitCodes[[x]]) 
    } 
) 

# Add the recoded values back to df2 
df2$recoded <- 
    sapply(recoded, 
     paste, 
     collapse = ",") 

# Check if the recoded values match your desired values  
identical(df2$recoded, df3$Mood) 
関連する問題