2017-08-18 4 views
1

抜粋です:パンダ()ここで

test = pd.DataFrame({'days': [0,31,45]}) 
test['range'] = pd.cut(test.days, [0,30,60]) 

は出力:

days range 
0 0  NaN 
1 31  (30, 60] 
2 45  (30, 60] 

私は0私はどうあるべきか、(0、30]になっていないことを驚いています(0、30]として0を分類するのですか?

答えて

1
test['range'] = pd.cut(test.days, [0,30,60], include_lowest=True) 
print (test) 
    days   range 
0  0 (-0.001, 30.0] 
1 31 (30.0, 60.0] 
2 45 (30.0, 60.0] 

参照差:

test = pd.DataFrame({'days': [0,20,30,31,45,60]}) 

test['range1'] = pd.cut(test.days, [0,30,60], include_lowest=True) 
#30 value is in [30, 60) group 
test['range2'] = pd.cut(test.days, [0,30,60], right=False) 
#30 value is in (0, 30] group 
test['range3'] = pd.cut(test.days, [0,30,60]) 
print (test) 
    days   range1 range2 range3 
0  0 (-0.001, 30.0] [0, 30)  NaN 
1 20 (-0.001, 30.0] [0, 30) (0, 30] 
2 30 (-0.001, 30.0] [30, 60) (0, 30] 
3 31 (30.0, 60.0] [30, 60) (30, 60] 
4 45 (30.0, 60.0] [30, 60) (30, 60] 
5 60 (30.0, 60.0]  NaN (30, 60] 

numpy.searchsortedを使用しますが、days HASTの値をソートする:

arr = np.array([0,30,60]) 
test['range1'] = arr.searchsorted(test.days) 
test['range2'] = arr.searchsorted(test.days, side='right') - 1 
print (test) 
    days range1 range2 
0  0  0  0 
1 20  1  0 
2 30  1  1 
3 31  2  1 
4 45  2  1 
5 60  2  2 
1

pd.cut documentation
は、パラメータにright=False

を含めます
test = pd.DataFrame({'days': [0,31,45]}) 
test['range'] = pd.cut(test.days, [0,30,60], right=False) 

test 

    days  range 
0  0 [0, 30) 
1 31 [30, 60) 
2 45 [30, 60) 
関連する問題