2017-05-16 9 views
4

私は複数のインデックス付きパンダデータフレームを持っています。ここでは、 '||' 3つの新しい列 'Connection'、 'Val1' 'Val2'を含むインデックスのレイヤーを追加します。パンダ複数のインデックスデータフレームを分割する

誰かがこれを行うヒントを教えてくれないと助かります。

現在の例のデータフレーム:

       Experiment1      Experiment2  
    Target  Analyze1_ab    Analyze2_zz   Analyze1_yy   
    XXX_1 Edge2||3.1E-07||-0.5 Edge2||2.1E-06||-0.9  Edge2||6.4E-02||-0.3 
    XXX_4 Edge1||6.4E-12||1.1  Edge1||2.4E-11||9.4  Edge1||1.4E-11||1.4  
    ABC_1 Edge1||3.9E-07||0.7  Edge1||2.9E-07||5.6  Edge1||6.8E-02||0.4  
    ABC_2 Edge2||1.1E-09||-0.5 Edge2||1.2E-09||1.2  Edge2||1.0E-03||-0.5 
    ABC_3 Edge2||4.6E-25||-0.8 Edge2||2.6E-10||1.9  Edge2||5.0E-17||-0.9 
    XXX_2 Edge2||1.7E-07||-0.5 Edge2||5.7E-08||-0.3  Edge2||4.1E-02||-0.3 
    ABC_4 Edge1||8.1E-02||0.5  Edge1||9.1E-02||1.5  Edge1||5.4E-02||0.6  
    ABC_5 Edge1||6.7E-02||0.3  Edge1||4.2E-02||1.9  Edge1||5.6E-03||0.4  
    XXX_3 Edge2||3.1E-03||-0.4 Edge1||2.4E-11||1.1  Edge2||2.4E-02||-0.3 

所望のデータフレーム:

        Experiment1       Experiment2  
    Target   Analyze1_ab     Analyze2_zz    Analyze1_yy  
      Connection Val1 Val2 Connection Val1 Val2 Connection Val1 Val2 
    XXX_1 Edge2 3.10E-07 -0.5 Edge2 2.10E-06 -0.9 Edge2 6.40E-02 -0.3 
    XXX_4 Edge1 6.40E-12 1.1  Edge1 2.40E-11 9.4  Edge1 1.40E-11 1.4 
    ABC_1 Edge1 3.90E-07 0.7  Edge1 2.90E-07 5.6  Edge1 6.80E-02 0.4 
    ABC_2 Edge2 1.10E-09 -0.5 Edge2 1.20E-09 1.2  Edge2 1.00E-03 -0.5 
    ABC_3 Edge2 4.60E-25 -0.8 Edge2 2.60E-10 1.9  Edge2 5.00E-17 -0.9 
    XXX_2 Edge2 1.70E-07 -0.5 Edge2 5.70E-08 -0.3 Edge2 4.10E-02 -0.3 
    ABC_4 Edge1 8.10E-02 0.5  Edge1 9.10E-02 1.5  Edge1 5.40E-02 0.6 
    ABC_5 Edge1 6.70E-02 0.3  Edge1 4.20E-02 1.9  Edge1 5.60E-03 0.4 
    XXX_3 Edge2 3.10E-03 -0.4 Edge1 2.40E-11 1.1  Edge2 2.40E-02 -0.3 
+0

「ターゲット」とは何ですか?それはインデックスか列ですか? – MaxU

+0

こんにちはマックス、私はターゲットとして列にしたい。 – Rtut

答えて

2
import pandas as pd 

# Initialize DataFrame 
# ----------------------------------------------------------------------------- 

df = pd.DataFrame({ 
    'Analyze1_ab': ['Edge2||3.1E-07||-0.5', 'Edge1||6.4E-12||1.1'], 
    'Analyze2_zz': ['Edge2||2.1E-06||-0.9', 'Edge1||2.4E-11||9.4'], 
    'Analyze1_yy': ['Edge2||6.4E-02||-0.3', 'Edge1||1.4E-11||1.4'], 
    'Target': ['XXX_1', 'XXX_4'],}) 

df.columns = pd.MultiIndex.from_tuples(
    [('Experiment1', 'Analyze1_ab'), 
    ('Experiment2', 'Analyze1_yy'), 
    ('Experiment1', 'Analyze2_zz'), 
    ('Target', '')]) 

# Split 'Analyses' columns by double pipes || 
# ----------------------------------------------------------------------------- 

# Initialize final DataFrame 
final_df = pd.DataFrame() 

for col_name in df.columns: 
    if (col_name[1].startswith('Analyze') and 
      df[col_name].str.contains('||').all()): 
     # Split 'Analysis' by || into new columns 
     splitted_analysis = df[col_name].str.split('\|\|', expand=True) 
     # The new column names are 0, 1, 2. Let's rename them. 
     splitted_analysis.columns = ['Connection', 'Val1', 'Val2'] 
     # Recreate MultiIndex 
     splitted_analysis.columns = pd.MultiIndex.from_tuples(
      [(col_name[0], col_name[1], c) for c in splitted_analysis.columns]) 
     # Concatenate the new columns to the final_df 
     final_df = pd.concat(objs=[final_df, splitted_analysis], axis=1) 

# Add 'Target' column in the final_df. 
# First, extract it. 
target_col = pd.DataFrame(df[('Target', '')]) 
# Then, increase MultiIndex level of 'Target' from 2 to 3, 
# to allow smooth concatenation with the final_df. 
target_col.columns = pd.MultiIndex.from_tuples([('Target', '', '')]) 
final_df = pd.concat([final_df, target_col], axis=1) 

検証:print(final_df)

Experiment1    Experiment2    Experiment1    Target 
    Analyze1_ab    Analyze1_yy    Analyze2_zz      
    Connection  Val1 Val2 Connection  Val1 Val2 Connection  Val1 Val2  
0  Edge2 3.1E-07 -0.5  Edge2 6.4E-02 -0.3  Edge2 2.1E-06 -0.9 XXX_1 
1  Edge1 6.4E-12 1.1  Edge1 1.4E-11 1.4  Edge1 2.4E-11 9.4 XXX_4 

検証:pprint.pprint([c for c in final_df.columns])

[('Experiment1', 'Analyze1_ab', 'Connection'), 
('Experiment1', 'Analyze1_ab', 'Val1'), 
('Experiment1', 'Analyze1_ab', 'Val2'), 
('Experiment2', 'Analyze1_yy', 'Connection'), 
('Experiment2', 'Analyze1_yy', 'Val1'), 
('Experiment2', 'Analyze1_yy', 'Val2'), 
('Experiment1', 'Analyze2_zz', 'Connection'), 
('Experiment1', 'Analyze2_zz', 'Val1'), 
('Experiment1', 'Analyze2_zz', 'Val2'), 
('Target', '', '')] 
4

セットアップ

df 
Out[2319]: 
     ID   Experiment1         Experiment2 
    Target   Analyze1_ab   Analyze2_ab   Analyze1_yy 
0 XXX_1 Edge2||3.1E-07||-0.5 Edge2||2.1E-06||-0.9 Edge2||6.4E-02||-0.3 
1 XXX_4 Edge1||6.4E-12||1.1 Edge1||2.4E-11||9.4 Edge1||1.4E-11||1.4 
2 ABC_1 Edge1||3.9E-07||0.7 Edge1||2.9E-07||5.6 Edge1||6.8E-02||0.4 
3 ABC_2 Edge2||1.1E-09||-0.5 Edge2||1.2E-09||1.2 Edge2||1.0E-03||-0.5 
4 ABC_3 Edge2||4.6E-25||-0.8 Edge2||2.6E-10||1.9 Edge2||5.0E-17||-0.9 
5 XXX_2 Edge2||1.7E-07||-0.5 Edge2||5.7E-08||-0.3 Edge2||4.1E-02||-0.3 
6 ABC_4 Edge1||8.1E-02||0.5 Edge1||9.1E-02||1.5 Edge1||5.4E-02||0.6 
7 ABC_5 Edge1||6.7E-02||0.3 Edge1||4.2E-02||1.9 Edge1||5.6E-03||0.4 
8 XXX_3 Edge2||3.1E-03||-0.4 Edge1||2.4E-11||1.1 Edge2||2.4E-02||-0.3 

ソリューション

#split columns by '||' and rebuild a Dataframe with the separated columns 
df2 = pd.DataFrame(np.asarray(df.iloc[:,1:].apply(lambda x: x.str.split('\|\|')).values.tolist()).reshape(9,-1)) 
#set Multilevel columns 
df2.columns=pd.MultiIndex.from_tuples([('Experiment1','Analyze1_ab','Connection'), 
('Experiment1','Analyze1_ab','Val1'), 
('Experiment1','Analyze1_ab','Val2'), 
('Experiment1','Analyze2_zz','Connection'), 
('Experiment1','Analyze2_zz','Val1'), 
('Experiment1','Analyze2_zz','Val2'), 
('Experiment2','Analyze1_yy','Connection'), 
('Experiment2','Analyze1_yy','Val1'), 
('Experiment2','Analyze1_yy','Val2')]) 
#add Target column 
df2.insert(0,'Target',df.iloc[:,0]) 

Out[2324]: 
    Target Experiment1           Experiment2 
     Analyze1_ab    Analyze2_zz    Analyze1_yy 
      Connection  Val1 Val2 Connection  Val1 Val2 Connection  Val1 Val2 
0 XXX_1  Edge2 3.1E-07 -0.5  Edge2 2.1E-06 -0.9  Edge2 6.4E-02 -0.3 
1 XXX_4  Edge1 6.4E-12 1.1  Edge1 2.4E-11 9.4  Edge1 1.4E-11 1.4 
2 ABC_1  Edge1 3.9E-07 0.7  Edge1 2.9E-07 5.6  Edge1 6.8E-02 0.4 
3 ABC_2  Edge2 1.1E-09 -0.5  Edge2 1.2E-09 1.2  Edge2 1.0E-03 -0.5 
4 ABC_3  Edge2 4.6E-25 -0.8  Edge2 2.6E-10 1.9  Edge2 5.0E-17 -0.9 
5 XXX_2  Edge2 1.7E-07 -0.5  Edge2 5.7E-08 -0.3  Edge2 4.1E-02 -0.3 
6 ABC_4  Edge1 8.1E-02 0.5  Edge1 9.1E-02 1.5  Edge1 5.4E-02 0.6 
7 ABC_5  Edge1 6.7E-02 0.3  Edge1 4.2E-02 1.9  Edge1 5.6E-03 0.4 
8 XXX_3  Edge2 3.1E-03 -0.4  Edge1 2.4E-11 1.1  Edge2 2.4E-02 -0.3 
1

あなたはこのようにそれを行うことができます。

結果:

In [320]: x 
Out[320]: 
     Experiment1           Experiment2 
    Target Analyze1_ab    Analyze2_zz    Analyze1_yy 
      Connection  Val1 Val2 Connection  Val1 Val2 Connection  Val1 Val2 
0 XXX_1  Edge2 3.1E-07 -0.5  Edge2 2.1E-06 -0.9  Edge2 6.4E-02 -0.3 
1 XXX_4  Edge1 6.4E-12 1.1  Edge1 2.4E-11 9.4  Edge1 1.4E-11 1.4 
2 ABC_1  Edge1 3.9E-07 0.7  Edge1 2.9E-07 5.6  Edge1 6.8E-02 0.4 
3 ABC_2  Edge2 1.1E-09 -0.5  Edge2 1.2E-09 1.2  Edge2 1.0E-03 -0.5 
4 ABC_3  Edge2 4.6E-25 -0.8  Edge2 2.6E-10 1.9  Edge2 5.0E-17 -0.9 
5 XXX_2  Edge2 1.7E-07 -0.5  Edge2 5.7E-08 -0.3  Edge2 4.1E-02 -0.3 
6 ABC_4  Edge1 8.1E-02 0.5  Edge1 9.1E-02 1.5  Edge1 5.4E-02 0.6 
7 ABC_5  Edge1 6.7E-02 0.3  Edge1 4.2E-02 1.9  Edge1 5.6E-03 0.4 
8 XXX_3  Edge2 3.1E-03 -0.4  Edge1 2.4E-11 1.1  Edge2 2.4E-02 -0.3 
関連する問題