パンダの行にテーブルを分解する

テーブルや行列をPythonから列とその値の情報を含む行に分解するための最良の解決策（パフォーマンス）は何ですか？パンダの行にテーブルを分解する

私たちは、次のようパンダにテーブルをロードしているとしましょう：

Date A B 
t1 1 2 
t2 3 4

私はそれが4つのラインのシリーズになるように、次のようにテーブルを爆発したい：

t1-A-1 
t1-B-2 
t2-A-3 
t2-C-4

パフォーマンス元のテーブルに数十の列と数百の行があると、ここで重要です。以下について

何：

Date A B C 
t1 1 5 9 
t1 2 6 10 
t2 3 7 11 
t2 4 8 12

出力系列は次のようになります。

Date code 
t1 "str1"1"str2"B"str2"5 
t1 "str1"2"str2"B"str2"6 
t2 "str1"3"str2"B"str2"7 
t2 "str1"4"str2"B"str2"8 
.. .. 
t2 "str1"4"str2"C"str2"12

私はあなたの助けに感謝します！

出典

2017-06-22 Guga

B5とC3はどうなりますか？ – Allen

df.set_index('Date').stack().reset_index().apply(lambda x: '-'.join(x.astype(str)), axis=1)

出力：

0 t1-A-1 
1 t1-B-2 
2 t2-A-3 
3 t2-B-4 
dtype: object

出典

2017-06-22 20:49:14

それは素晴らしいです！ありがとう。 – Guga

パフォーマンスがキーである場合は... numpy

from numpy.core.defchararray import add as cadd 
from functools import reduce 

def proc(d1): 
    v = d1.values 
    n, m = v.shape 
    dates = np.repeat(d1.index.values.astype(str), m) 
    cols = np.tile(d1.columns.values.astype(str), n) 
    vals = v.ravel().astype(str) 
    return pd.Series(reduce(cadd, [dates, '-', cols, '-', vals])) 

proc(df.set_index('Date')) 

0 t1-A-1 
1 t1-B-2 
2 t2-A-3 
3 t2-B-4 
dtype: object

タイミング

%timeit proc(df.set_index('Date')) 
%timeit df.set_index('Date').stack().reset_index().apply(lambda x: '-'.join(x.astype(str)), axis=1)

を使用

小さなデータ

1000 loops, best of 3: 494 µs per loop 
100 loops, best of 3: 2.17 ms per loop

大規模データ

from string import ascii_letters 

np.random.seed([3,1415]) 
df = pd.DataFrame(
    np.random.randint(10, size=(1000, 52)), 
    pd.Index(['t{:05d}'.format(i) for i in range(1000)], name='Date'), 
    list(ascii_letters) 
).reset_index() 

10 loops, best of 3: 156 ms per loop 
1 loop, best of 3: 3.75 s per loop

出典

2017-06-22 21:06:40 piRSquared

パンダの行にテーブルを分解する

答えて

関連する問題