2台のPandas DataFrameをマージして、インデックスが一致する場合は別のレコードを取得する

2つのPandas DataFramesをマージしたいが、インデックスが一致する場合は特定のdfの行だけをマージしたい。 df1.ix['apple']がdf2.ix['apple']より優先され、orangeとgrapeは一意であるため2台のPandas DataFrameをマージして、インデックスが一致する場合は別のレコードを取得する

だから私は

df1 
      A B 
type model 
apple v1 10 xyz 
orange v2 11 pqs 

df2 
      A B 
type model 
apple v3 11 xyz 
grape v4 12 def

を持っている場合、私は

df3 
      A B 
type model 
apple v1 10 xyz 
orange v2 11 pqs 
grape v4 12 def

になるだろう。

私はいくつかのインデックス比較作業を試みていますが、df2.drop(df1.index[[0]])は、df2の内容全体を削除しています。

データフレームの両方がによって、類似した構造を持つマルチインデックス付け作成されます：あなたがしたい場合

MultiIndex(
    levels=[[u'apple', u'orange', u'grape', ...], [u'v1', u'v2', u'v3', ... ]], 
    labels=[[0, 1, 2, 3, 4, 6, 7, 8, 9, 10, ...]], 
    names=[u'type', u'model'] 
)

出典

2016-06-29 getglad

DataFrame.combine_first()がためにある何が...将来のパンダのバージョンで動作するかもしれません：

import pandas as pd 

df1 = pd.DataFrame({'A': [10, 11], 'B': ['xyz', 'pqs']}, index=['apple', 'orange']) 
df2 = pd.DataFrame({'A': [11, 12], 'B': ['xyz', 'def']}, index=['apple', 'grape']) 

df3 = df1.combine_first(df2)

利回り

df3 
      A B 
apple 10.0 xyz 
grape 12.0 def 
orange 11.0 pqs

編集：私は投稿後のご質問は、実質的に変更されたが—を超えるとインデックスにmodelレベルが追加され、効果的にマルチインデックスに変換されます。

import pandas as pd 

# Create the df1 in the question 
df1 = pd.DataFrame({'model': ['v1', 'v2'], 'A': [10, 11], 'B': ['xyz', 'pqs']}, 
        index=['apple', 'orange']) 
df1.index.name = 'type' 
df1.set_index('model', append=True, inplace=True) 

# Create the df2 in the question 
df2 = pd.DataFrame({'model': ['v3', 'v4'], 'A': [11, 12], 'B': ['xyz', 'def']}, 
        index=['apple', 'grape']) 
df2.index.name = 'type' 
df2.set_index('model', append=True, inplace=True) 

# Solution: remove the `model` from the index and apply the above 
#  technique. Restore it to the index at the end if you want. 
df1.reset_index(level=1, inplace=True) 
df2.reset_index(level=1, inplace=True) 
df3 = df1.combine_first(df2).set_index('model', append=True)

結果：

df3 
       A B 
type model   
apple v1  10.0 xyz 
grape v4  12.0 def 
orange v2  11.0 pqs

出典

2016-06-29 18:35:09

私は私の質問を更新しましたが、私は複数のインデックスを持っています。 combine_firstを使用しているときは、インデックスをまとめてまとめているので、両方のリンゴで終わるでしょう。 '.groupby（level = 0）'で '.combine_first'を使うことは可能でしょうか？ – getglad

@getglad：これは新しい質問として投稿し、さらに詳しい情報と、最小限の、完全で、検証可能な例（http://stackoverflow.com/help/mcve） –

あなたはこれを試すことができます。このような指標になり

pd.read_csv(..., index_col=[3, 1])

df1のセルにNaNを保存するか、マルチインデックスがある場合は、NotImplementedError: merging with both multi-indexes is not implementedcombine_first()：

In [53]: df1 
Out[53]: 
       A B 
ind1 ind2 
foo apple 10 NaN 
bar orange 11 pqs 
baz grape 12 def 

In [54]: df2 
Out[54]: 
      A B 
ind1 ind2 
foo apple 11 xyz 
baz grape 12 def 

In [55]: pd.concat([df1, df2.ix[df2.index.difference(df1.index)]]) 
Out[55]: 
       A B 
ind1 ind2 
foo apple 10 NaN 
bar orange 11 pqs 
baz grape 12 def

OLD答え：例えば

（df1にapple行に注意を払う）：

In [33]: df1 
Out[33]: 
     A B 
apple 10 NaN 
orange 11 pqs 
grape 12 def 

In [34]: df2 
Out[34]: 
     A B 
apple 11 xyz 
grape 12 def 

In [35]: df1.combine_first(df2) 
Out[35]: 
     A B 
apple 10 xyz 
grape 12 def 
orange 11 pqs 

In [36]: pd.concat([df1, df2.ix[df2.index.difference(df1.index)]]) 
Out[36]: 
     A B 
apple 10 NaN 
orange 11 pqs 
grape 12 def

そうでない場合（通常のインデックス用）@Albertoガルシア-Rabosoからソリューションは間違いですより速く、より速く。また、

出典

2016-06-29 18:27:02 MaxU

ちょうど好奇心 - あなたの最初の答え '.groupby（レベル= 0）1次回（）'と間違って何でしたか？ – getglad

@getglad、結果のDFで 'df2'の最初の行と同じインデックスの' df1'を持つと正しく動作しません... – MaxU

2台のPandas DataFrameをマージして、インデックスが一致する場合は別のレコードを取得する

答えて

関連する問題