複数の変数を持つ頻度テーブルR

公式の統計で頻繁に使用されているテーブルを複製しようとしていますが、これまで成功していません。このようなデータフレームを考える：私はPER YEARを示す表を作成したいと思います複数の変数を持つ頻度テーブルR

d1 <- data.frame(StudentID = c("x1", "x10", "x2", 
          "x3", "x4", "x5", "x6", "x7", "x8", "x9"), 
      StudentGender = c('F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'M', 'M'), 
      ExamenYear = c('2007','2007','2007','2008','2008','2008','2008','2009','2009','2009'), 
      Exam   = c('algebra', 'stats', 'bio', 'algebra', 'algebra', 'stats', 'stats', 'algebra', 'bio', 'bio'), 
      participated = c('no','yes','yes','yes','no','yes','yes','yes','yes','yes'), 
      passed  = c('no','yes','yes','yes','no','yes','yes','yes','no','yes'), 
      stringsAsFactors = FALSE)

、全学生（すべて）とメスのある人、参加した人たちと渡された人々の数。以下の "ofwhich"は、すべての生徒を指します。

私が考えているテーブルは、そのようになります。

cbind(All = table(d1$ExamenYear), 
    participated  = table(d1$ExamenYear, d1$participated)[,2], 
    ofwhichFemale  = table(d1$ExamenYear, d1$StudentGender)[,1], 
    ofwhichpassed  = table(d1$ExamenYear, d1$passed)[,2])

を、私はR.

でこの種のもの

に注意することが、より良い方法があります確信している：私はLaTeXのソリューションを見てきましたが、私はExcelでテーブルをエクスポートする必要があるので、これは私にとってはうまくいくでしょう。 plyr使用して、事前

出典

2012-08-07 user1043144

で

ありがとう：

require(plyr) 
ddply(d1, .(ExamenYear), summarize, 
     All=length(ExamenYear), 
     participated=sum(participated=="yes"), 
     ofwhichFemale=sum(StudentGender=="F"), 
     ofWhichPassed=sum(passed=="yes"))

います：

ExamenYear All participated ofwhichFemale ofWhichPassed 
1  2007 3   2    2    2 
2  2008 4   3    2    3 
3  2009 3   3    0    2

出典

2012-08-07 19:13:18 Andy

ありがとうございました。どうもありがとう。私は間違いなくplyrを学ぶつもりです。 – user1043144

良い答えですが、1分後@csgillespieより。 –

@ジャバー、私はあなたが1分早く*を意味すると思う。あなたのコメントには "but"はありません。 – A5C1D2H2I1M1N2O1R2T1

plyrパッケージには、この種のものに最適です。

ddply(d1, "ExamenYear", summarise, 
     All = length(passed),##We can use any column for this statistics 
     participated = sum(participated=="yes"), 
     ofwhichFemale = sum(StudentGender=="F"), 
     ofwhichpassed = sum(passed=="yes"))

は基本的には、ddply入力としてデータフレームを期待し、データフレームを返します。まず、我々はddply機能を使用

library(plyr)

パッケージをロードしますその後、入力データフレームをExamenYearで分割します。各サブテーブルで、いくつかの要約統計量を計算します。 ddplyでは、列を参照するときに$表記を使用する必要はありません。

出典

2012-08-07 19:14:21 csgillespie

ありがとうございます。あなたはどちらも私の日を作った – user1043144

が行われている可能性が

ddplyにそれが読みやすくするために作られ、立派なライバルだろう、あなたのコードの変更（df$呼び出しの回数を減らし、自己ドキュメンテーション改善するために、文字のインデックスを使用するようにwithを使用）のカップル解決策：あなたは、より大きなデータセットを作業している場合にのみ明らかであろうが、

with(d1, cbind(All = table(ExamenYear), 
    participated  = table(ExamenYear, participated)[,"yes"], 
    ofwhichFemale  = table(ExamenYear, StudentGender)[,"F"], 
    ofwhichpassed  = table(ExamenYear, passed)[,"yes"]) 
    ) 

    All participated ofwhichFemale ofwhichpassed 
2007 3   2    2    2 
2008 4   3    2    3 
2009 3   3    0    2

私は、これはddplyソリューションよりもはるかに高速であることを期待します。

出典

2012-08-07 19:28:11

またplyrの次の反復子の見てみたいことがあります。dplyr

それはggplotのような構文を使用し、C++でのキー部分を書き込むことによって、高速なパフォーマンスを提供します。

d1 %.% 
group_by(ExamenYear) %.%  
summarise(ALL=length(ExamenYear), 
      participated=sum(participated=="yes"), 
      ofwhichFemale=sum(StudentGender=="F"), 
      ofWhichPassed=sum(passed=="yes"))

出典

2014-01-26 07:24:42

複数の変数を持つ頻度テーブルR

答えて

関連する問題