で
from dfply import *
D >>\
groupby(X.vector, X.gp) >>\
summarize(b=X.sq.sum())
結果、私はあなたがpandas
に最初のタプルに列vector
を変換する必要があると思う:
print(D['sq'].groupby([D['vector'].apply(tuple), D['gp']]).sum().reset_index())
vector gp sq
0 (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) 0 0
1 (2, 3, 4, 5, 6, 7, 8, 9, 10, 11) 1 1
2 (4, 5, 6, 7, 8, 9, 10, 11, 12, 13) 0 20
3 (6, 7, 8, 9, 10, 11, 12, 13, 14, 15) 1 34
4 (8, 9, 10, 11, 12, 13, 14, 15, 16, 17) 0 100
5 (10, 11, 12, 13, 14, 15, 16, 17, 18, 19) 1 130
別の解決策は、最初の列を変換です:
D['vector'] = D['vector'].apply(tuple)
print(D.groupby(['vector','gp'])['sq'].sum().reset_index())
vector gp sq
0 (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) 0 0
1 (2, 3, 4, 5, 6, 7, 8, 9, 10, 11) 1 1
2 (4, 5, 6, 7, 8, 9, 10, 11, 12, 13) 0 20
3 (6, 7, 8, 9, 10, 11, 12, 13, 14, 15) 1 34
4 (8, 9, 10, 11, 12, 13, 14, 15, 16, 17) 0 100
5 (10, 11, 12, 13, 14, 15, 16, 17, 18, 19) 1 130
ANF array
バックにnecesary最後変換する場合:
D['vector'] = D['vector'].apply(tuple)
df = D.groupby(['vector','gp'])['sq'].sum().reset_index()
df['vector'] = df['vector'].apply(np.array)
print (df)
vector gp sq
0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] 0 0
1 [2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 1 1
2 [4, 5, 6, 7, 8, 9, 10, 11, 12, 13] 0 20
3 [6, 7, 8, 9, 10, 11, 12, 13, 14, 15] 1 34
4 [8, 9, 10, 11, 12, 13, 14, 15, 16, 17] 0 100
5 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19] 1 130
print (type(df['vector'].iat[0]))
<class 'numpy.ndarray'>
私はあなたのコードを使用して試してみて、私の作品:
from dfply import *
D['vector'] = D['vector'].apply(tuple)
a = D >> groupby(X.vector, X.gp) >> summarize(b=X.sq.sum())
a['vector'] = a['vector'].apply(np.array)
print (a)
gp vector b
0 0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] 0
1 1 [2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 1
2 0 [4, 5, 6, 7, 8, 9, 10, 11, 12, 13] 20
3 1 [6, 7, 8, 9, 10, 11, 12, 13, 14, 15] 34
4 0 [8, 9, 10, 11, 12, 13, 14, 15, 16, 17] 100
5 1 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19] 130
にそれを得るために。 – piRSquared