Pandas Pythonで見つかった列のみを索引付けして選択する

Pandas Pythonの使用中に基本的な問題に直面しています。たとえば、私のデータフレーム "a"は、以下の列q、w、e、rを持つ。今私はaのサブセットを取ってみたい。「」データフレームのZを見つけていないのにもかかわらず、私は「b」をしたいというzがそこにないのでPandas Pythonで見つかった列のみを索引付けして選択する

b=a[[w,e,r,z]]

が、それは、サブセットを作成しません、私はこの問題の世話をすることができますどのように助けてください残りのw、e、rで作成されます。

出典

2016-05-27 Manu Sharma

それを使用して表示されます方法はそれを行うための最も効率的な方法ではありません。

% timeit a[a.columns[a.columns.isin(['w', 'e', 'r', 'z'])]] 
out : 1000 loops, best of 3: 528 µs per loop

あなただけのフィルターを使用する場合：自動的に希望のデータフレームを作成するようにあなたの列のインデックスを再作成isinを使用して、一方

%timeit a[[col for col in ['w','e','r','z'] if col in a.columns]] 
out: 1000 loops, best of 3: 431 µs per loop

を：

a = pd.DataFrame({'q':[1],'w':[2],'e':[3],'r':[4]})  
out: e q r w 
    0 3 1 4 2 

a[a.columns[a.columns.isin(['w', 'e', 'r', 'z'])]] 
out : e r w 
    0 3 4 2 

a[[col for col in ['w','e','r','z'] if col in a.columns]] 
out: w e r 
    0 2 3 4

出典

2016-05-27 07:55:42 ysearka

あなたは、インデックスの前に手動でフィルタリングを行うことができます。

filtered_col = [col for col in [w,e,r,z] if col in a.columns] 
b = a[filtered_col]

出典

2016-05-27 07:43:51 zaxliu

あなたは間違いをした可能性があります。 2番目の 'for'を 'if'に変更することもできます。 – ysearka

@ysearka、ありがとうございます。 – zaxliu

IIUCあなたはa列に対してisinアプローチでそれを行うことができます：

mask = a.columns[a.columns.isin([w, e, r, z])] 
b = a[mask]

例：

np.random.seed(632) 
df = pd.DataFrame(np.random.randn(5, 4), columns = list('abcd')) 

In [56]: df 
Out[56]: 
      a   b   c   d 
0 -0.202506 1.245011 0.628800 -1.787930 
1 -1.076415 0.603727 -1.242478 0.430865 
2 -1.689979 0.885975 -1.408643 0.545198 
3 -1.351751 -0.095847 1.506013 1.454067 
4 -1.081069 -0.162412 -0.141595 -1.180774 

mask = df.columns[df.columns.isin(['a', 'b', 'c', 'e'])] 

In [57]: mask 
Out[57]: Index(['a', 'b', 'c'], dtype='object') 

In [58]: df[mask] 
Out[58]: 
      a   b   c 
0 -0.202506 1.245011 0.628800 
1 -1.076415 0.603727 -1.242478 
2 -1.689979 0.885975 -1.408643 
3 -1.351751 -0.095847 1.506013 
4 -1.081069 -0.162412 -0.141595

出典

2016-05-27 07:46:16

Pandas Pythonで見つかった列のみを索引付けして選択する

答えて

関連する問題