2016-09-26 4 views
3

私はPythonとpandasを初めて使用しています。列がない行をフィルタリングするためにpanda dfを照会する

データフレームを照会して、列の1つがNaNではない行をフィルタリングする必要があります。

a=dictionarydf.label.isnull() 

をしかしtruefalseが移入さ:

私が試してみました。 はこの

dictionarydf.query(dictionarydf.label.isnull()) 

を試みたが、私は

サンプルデータを期待通りにエラーが発生しました:

 reference_word   all_matching_words label review 
0   account    fees - account NaN  N 
1   account   mobile - account NaN  N 
2   account   monthly - account NaN  N 
3 administration delivery - administration NaN  N 
4 administration  fund - administration NaN  N 
5   advisor    fees - advisor NaN  N 
6   advisor   optimum - advisor NaN  N 
7   advisor    sub - advisor NaN  N 
8    aichi   delivery - aichi NaN  N 
9    aichi    pref - aichi NaN  N 
10   airport    biz - airport travel  N 
11   airport    cfo - airport travel  N 
12   airport   cfomtg - airport travel  N 
13   airport   meeting - airport travel  N 
14   airport   summit - airport travel  N 
15   airport    taxi - airport travel  N 
16   airport   train - airport travel  N 
17   airport   transfer - airport travel  N 
18   airport    trip - airport travel  N 
19    ais    admin - ais NaN  N 
20    ais    alpine - ais NaN  N 
21    ais     fund - ais NaN  N 
22  allegiance  custody - allegiance NaN  N 
23  allegiance   fees - allegiance NaN  N 
24   alpha    late - alpha NaN  N 
25   alpha    meal - alpha NaN  N 
26   alpha    taxi - alpha NaN  N 
27   alpine    admin - alpine NaN  N 
28   alpine    ais - alpine NaN  N 
29   alpine    fund - alpine NaN  N 

私はラベルがNaN

期待出力されていないデータフィルタする:

 reference_word   all_matching_words label review 
0   airport    biz - airport travel  N 
1   airport    cfo - airport travel  N 
2   airport   cfomtg - airport travel  N 
3   airport   meeting - airport travel  N 
4   airport   summit - airport travel  N 
5   airport    taxi - airport travel  N 
6   airport   train - airport travel  N 
7   airport   transfer - airport travel  N 
8   airport    trip - airport travel  N 

答えて

3

あなたはdropnaを使用することができます。

df = df.dropna(subset=['label']) 

print (df) 
    reference_word all_matching_words label review 
10  airport  biz - airport travel  N 
11  airport  cfo - airport travel  N 
12  airport cfomtg - airport travel  N 
13  airport meeting - airport travel  N 
14  airport summit - airport travel  N 
15  airport  taxi - airport travel  N 
16  airport  train - airport travel  N 
17  airport transfer - airport travel  N 
18  airport  trip - airport travel  N 

別の解決策を - notnullboolean indexing

df = df[df.label.notnull()] 

print (df) 
    reference_word all_matching_words label review 
10  airport  biz - airport travel  N 
11  airport  cfo - airport travel  N 
12  airport cfomtg - airport travel  N 
13  airport meeting - airport travel  N 
14  airport summit - airport travel  N 
15  airport  taxi - airport travel  N 
16  airport  train - airport travel  N 
17  airport transfer - airport travel  N 
18  airport  trip - airport travel  N 
+0

問題を解決し、迅速な答え:) @jezraelに感謝します。私は行を削除する必要はありませんし、重複するデータフレームも作成する必要はないので、ブール型インデックス作成を選択しました。両方のソリューションが完璧に機能しました – Dileep

関連する問題