パンダの文字列のローリング合計

私はパンダのバージョンが '0.19.2'のPython3を使用しています。パンダの文字列のローリング合計

私は次のようにDFパンダを持っている：

chat_id line 
1   'Hi.' 
1   'Hi, how are you?.' 
1   'I'm well, thanks.' 
2   'Is it going to rain?.' 
2   'No, I don't think so.'

私は「chat_id」でグループ化したい、次を得るために「行」上のローリング合計ような何かを：

chat_id line      conversation 
1   'Hi.'     'Hi.' 
1   'Hi, how are you?.'  'Hi. Hi, how are you?.' 
1   'I'm well, thanks.'  'Hi. Hi, how are you?. I'm well, thanks.' 
2   'Is it going to rain?.' 'Is it going to rain?.' 
2   'No, I don't think so.' 'Is it going to rain?. No, I don't think so.'

私はdf.groupby（ 'chat_id'）['line']と考えています。cumsum（）は数値列に対してのみ機能します。

私はdf.groupby（by = ['chat_id']、as_index = False）['line']を試してみましたが、完全な会話のすべての行のリストを取得するにはapply（list）そのリストを展開して「ローリングサム」スタイルの会話の列を作成する方法を理解できません。私にとって

出典

2017-04-23 user3591836

興味深いです。 'cumsum'はSeriesで呼び出すと動作しますが、groupbyオブジェクトで呼び出されるとエラーが発生します。 – ayhan

が必要セパレータはspaceを追加する場合、Series.cumsumでapplyに動作します：

df['new'] = df.groupby('chat_id')['line'].apply(lambda x: (x + ' ').cumsum().str.strip()) 
print (df) 
    chat_id     line           new 
0  1     Hi.           Hi. 
1  1  Hi, how are you?.      Hi. Hi, how are you?. 
2  1  I'm well, thanks.  Hi. Hi, how are you?. I'm well, thanks. 
3  2 Is it going to rain?.      Is it going to rain?. 
4  2 No, I don't think so. Is it going to rain?. No, I don't think so.

df['line'] = df['line'].str.strip("'") 
df['new'] = df.groupby('chat_id')['line'].apply(lambda x: "'" + (x + ' ').cumsum().str.strip() + "'") 
print (df) 
    chat_id     line \ 
0  1     Hi. 
1  1  Hi, how are you?. 
2  1  I'm well, thanks. 
3  2 Is it going to rain?. 
4  2 No, I don't think so. 

              new 
0           'Hi.' 
1      'Hi. Hi, how are you?.' 
2  'Hi. Hi, how are you?. I'm well, thanks.' 
3      'Is it going to rain?.' 
4 'Is it going to rain?. No, I don't think so.'

出典

2017-04-23 08:54:38 jezrael

結果は ValueError：重複軸から再インデックスできません – user3591836

あなたのパンダのバージョンは？ 'print（pd.show_versions（））'。私はあなたのエラーをシミュレートすることはできません。私は重複した値をテストし、インデックスに重複していて、バージョン0.19.2で完全に動作します。 – jezrael

申し訳ありませんが、あなたは正しいです。私はdfでreset_index（）を実行しなければなりませんでした。 – user3591836

パンダの文字列のローリング合計

答えて

関連する問題