groupbyオブジェクトの複数の値を集計する

groupByオブジェクトの複数の値（セルあたりのリストに含まれる）を数えたいと思います。私はRecord the respondent’s sexにグループ分けした後、列7. What do you use the phone for?のすべての値をカウントしたいgroupbyオブジェクトの複数の値を集計する

| | Record the respondent’s sex | 7. What do you use the phone for? | |---|-----------------------------|---------------------------------------------| | 0 | Male | sending texts;calls;receiving sending texts | | 1 | Female | sending texts;calls;WhatsApp;Facebook | | 2 | Male | sending texts;calls;receiving texts | | 3 | Female | sending texts;calls |

：

は、私は、次のデータフレームを持っています。

セルあたり1つの値しかない場合は、この操作は問題ありません。

| 7. What do you use the phone for? | Record the respondent's sex | count | |-----------------------------------|-----------------------------|-------| | sending texts | Male | 2 | | calls | Male | 2 | | receiving texts | Male | 2 | | sending texts | Female | 2 | | calls | Female | 2 | | WhatsApp | Female | 1 | | Facebook | Female | 1 |

場合にのみ、私は複数の値を扱うこれを取得することができます：

grouped = df.groupby(['Record the respondent’s sex'], sort=True) 

question_counts = grouped['2. Are you a teacher, caregiver, or young adult ?'].value_counts(normalize=False, sort=True) 

question_data = [ 
     {'2. Are you a teacher, caregiver, or young adult ?': question, 'Record the respondent’s sex': group, 'count': count*100} for 
     (group, question), count in dict(question_counts).items()] 

df_question = pd.DataFrame(question_data)

は私にまさにこのようになりますテーブルを提供します！

value_counts()複数の値を持つリストでは機能しません。TypeError: unhashable type: 'list'というエラーがスローされます。質問Counting occurrence of values in a Panda series?は、これをさまざまな方法で処理する方法を示していますが、GroupByオブジェクトで動作させることはできません。列に複数の値を爆発/複製

出典

2017-09-10 John Boss

は確かhttps://stackoverflow.com/questions/12680754/split-explode-pandas-dataframe-string-を参照してください（これについて移動する最も簡単かつ最速の方法のように思えますentry-to-separate-rows）、以下の受け入れられた答えはそれを行うことなく行えることを示しています。 –

# Initialize sample data. 
df = pd.DataFrame({'Record the respondent’s sex': ['Male', 'Female'] * 2, 
        '7. What do you use the phone for?': [ 
         "sending texts;calls;receiving sending texts", 
         "sending texts;calls;WhatsApp;Facebook", 
         "sending texts;calls;receiving texts", 
         "sending texts;calls" 
        ]}) 

# Split the values on ';' and separate into columns. Melt the result. 
df2 = pd.melt(
    pd.concat([df['Record the respondent’s sex'], 
       df.loc[:, "7. What do you use the phone for?"].apply(
        lambda series: series.split(';')).apply(pd.Series)], axis=1), 
    id_vars='Record the respondent’s sex')[['Record the respondent’s sex', 'value']] 

# Group on gender and rename columns. 
result = df2.groupby('Record the respondent’s sex')['value'].value_counts().reset_index() 
result.columns = ['Record the respondent’s sex', '7. What do you use the phone for?', 'count'] 

# Reorder columns. 
>>> result[['7. What do you use the phone for?', 'Record the respondent’s sex', 'count']] 
    7. What do you use the phone for? Record the respondent’s sex count 
0        calls      Female  2 
1      sending texts      Female  2 
2       Facebook      Female  1 
3       WhatsApp      Female  1 
4        calls      Male  2 
5      sending texts      Male  2 
6   receiving sending texts      Male  1 
7     receiving texts      Male  1

出典

2017-09-10 17:30:01 Alexander

私はpd.melt（）について考えていませんでしたが、それはちょうど素晴らしいことをするようです。ありがとう！ –

今後の見通しでは、複数の値ごとに余分な行を作成するルート（これはMaxUがこれを重複として認識した方法です）は簡単で、おそらくは高速です。 –

groupbyオブジェクトの複数の値を集計する

答えて

関連する問題