2016-12-18 10 views
0

「?」を含む行を削除しようとしています。セルでは、私は何もしなかった場合と同じデータを取得します。これはdata setへのリンクです。パンダのドロップ行が機能しない

import pandas as pd 
from IPython.display import display 

adult = pd.read_csv('adult.data.csv') 
adult = adult[adult.Workclass != '?'] 
display(adult) 
+1

おそらくいくつかの空白は、 '' str.strip'大人=大人[adult.Workclass.str.strip(でそれらを削除しようとします)!= '?'] ' – jezrael

+0

@jezraelそれは働いた!どうもありがとうございます! – johnwj

答えて

2

私はあなたが削除空白のためstr.strip必要があると思う:あなたのデータを

adult = adult[adult.Workclass.str.strip() != '?'] 

テスト(のみなしカラム名を設定していないので、テスト列6

import pandas as pd 
from IPython.display import display 

adult = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data', header=None) 
adult = adult[adult[6].str.strip() != '?'] 
以下のコードです
display(adult.head(30)) 

    0     1  2    3 4      5 \ 
0 39   State-gov 77516  Bachelors 13   Never-married 
1 50 Self-emp-not-inc 83311  Bachelors 13  Married-civ-spouse 
2 38   Private 215646  HS-grad 9    Divorced 
3 53   Private 234721   11th 7  Married-civ-spouse 
4 28   Private 338409  Bachelors 13  Married-civ-spouse 
5 37   Private 284582  Masters 14  Married-civ-spouse 
6 49   Private 160187   9th 5 Married-spouse-absent 
7 52 Self-emp-not-inc 209642  HS-grad 9  Married-civ-spouse 
8 31   Private 45781  Masters 14   Never-married 
9 42   Private 159449  Bachelors 13  Married-civ-spouse 
10 37   Private 280464 Some-college 10  Married-civ-spouse 
11 30   State-gov 141297  Bachelors 13  Married-civ-spouse 
12 23   Private 122272  Bachelors 13   Never-married 
13 32   Private 205019  Assoc-acdm 12   Never-married 
14 40   Private 121772  Assoc-voc 11  Married-civ-spouse 
15 34   Private 245487  7th-8th 4  Married-civ-spouse 
16 25 Self-emp-not-inc 176756  HS-grad 9   Never-married 
17 32   Private 186824  HS-grad 9   Never-married 
18 38   Private 28887   11th 7  Married-civ-spouse 
19 43 Self-emp-not-inc 292175  Masters 14    Divorced 
20 40   Private 193524  Doctorate 16  Married-civ-spouse 
21 54   Private 302146  HS-grad 9    Separated 
22 35  Federal-gov 76845   9th 5  Married-civ-spouse 
23 43   Private 117037   11th 7  Married-civ-spouse 
24 59   Private 109015  HS-grad 9    Divorced 
25 56   Local-gov 216851  Bachelors 13  Married-civ-spouse 
26 19   Private 168294  HS-grad 9   Never-married 
28 39   Private 367260  HS-grad 9    Divorced 
29 49   Private 193366  HS-grad 9  Married-civ-spouse 
30 23   Local-gov 190709  Assoc-acdm 12   Never-married 

        6    7     8  9  10 \ 
0   Adm-clerical Not-in-family    White  Male 2174 
1  Exec-managerial   Husband    White  Male  0 
2 Handlers-cleaners Not-in-family    White  Male  0 
3 Handlers-cleaners   Husband    Black  Male  0 
4  Prof-specialty   Wife    Black Female  0 
5  Exec-managerial   Wife    White Female  0 
6  Other-service Not-in-family    Black Female  0 
7  Exec-managerial   Husband    White  Male  0 
8  Prof-specialty Not-in-family    White Female 14084 
9  Exec-managerial   Husband    White  Male 5178 
10  Exec-managerial   Husband    Black  Male  0 
11  Prof-specialty   Husband Asian-Pac-Islander  Male  0 
12  Adm-clerical  Own-child    White Female  0 
13    Sales Not-in-family    Black  Male  0 
14  Craft-repair   Husband Asian-Pac-Islander  Male  0 
15 Transport-moving   Husband Amer-Indian-Eskimo  Male  0 
16  Farming-fishing  Own-child    White  Male  0 
17 Machine-op-inspct  Unmarried    White  Male  0 
18    Sales   Husband    White  Male  0 
19  Exec-managerial  Unmarried    White Female  0 
20  Prof-specialty   Husband    White  Male  0 
21  Other-service  Unmarried    Black Female  0 
22  Farming-fishing   Husband    Black  Male  0 
23 Transport-moving   Husband    White  Male  0 
24  Tech-support  Unmarried    White Female  0 
25  Tech-support   Husband    White  Male  0 
26  Craft-repair  Own-child    White  Male  0 
28  Exec-managerial Not-in-family    White  Male  0 
29  Craft-repair   Husband    White  Male  0 
30  Protective-serv Not-in-family    White  Male  0 

     11 12    13  14 
0  0 40 United-States <=50K 
1  0 13 United-States <=50K 
2  0 40 United-States <=50K 
3  0 40 United-States <=50K 
4  0 40   Cuba <=50K 
5  0 40 United-States <=50K 
6  0 16   Jamaica <=50K 
7  0 45 United-States >50K 
8  0 50 United-States >50K 
9  0 40 United-States >50K 
10  0 80 United-States >50K 
11  0 40   India >50K 
12  0 30 United-States <=50K 
13  0 50 United-States <=50K 
14  0 40    ? >50K 
15  0 45   Mexico <=50K 
16  0 35 United-States <=50K 
17  0 40 United-States <=50K 
18  0 50 United-States <=50K 
19  0 45 United-States >50K 
20  0 60 United-States >50K 
21  0 20 United-States <=50K 
22  0 40 United-States <=50K 
23 2042 40 United-States <=50K 
24  0 40 United-States <=50K 
25  0 40 United-States >50K 
26  0 40 United-States <=50K 
28  0 80 United-States <=50K 
29  0 40 United-States <=50K 
30  0 52 United-States <=50K 

コメントによる編集:

少なくとも1つの列に値?あるすべての行が必要な場合:

#select object columns (obviously string columns) 
df = adult.select_dtypes(['object']) 
#remove whitespaces and compare, check at least one True 
mask = (df.apply(lambda x: x.str.strip()) == '?').any(axis=1) 
#print(mask) 
#boolean indexing with inverting mask by ~ 
print (adult[~mask]) 
+0

ありがとうございます。どのようにして各列にクエリを実行せずにそれらを再びマージすることなく、すべての列に対してこれを行うことができますか? – johnwj

+0

私にもう一度お願いします。 – jezrael

+0

編集した回答を確認してください。 – jezrael

関連する問題