2016-10-17 9 views
0

を与える:インデックスの範囲外の:GROUPBYは以下のはIndexError与えはIndexError

df1.groupby(['Stock','Date']).head(1) 
     Date EndTime  StartTime Stock 
0 2016-10-11  NaN 08:00:00.241 ABC 
5 2016-10-11  NaN 10:08:36.657 XYZ 

任意のアイデアを:しかし

import pandas as pd 
from numpy import nan 

df1 = pd.DataFrame({'Date': {0: '2016-10-11', 1: '2016-10-11', 2: '2016-10-11', 3: '2016-10-11', 4: '2016-10-11',5: '2016-10-11'}, 'Stock': {0: 'ABC', 1: 'ABC', 2: 'ABC', 3: 'ABC', 4: 'ABC', 5: 'XYZ'}, 'StartTime': {0: '08:00:00.241', 1: '08:00:00.243', 2: '12:34:23.563', 3: '08:14.05.908', 4: '18:54:50.100', 5: '10:08:36.657'}, 'EndTime': {0: nan,1: nan, 2: nan, 3: nan, 4: nan, 5: nan}}) 

df1.groupby(['Stock','EndTime']).head(1) 

Traceback (most recent call last): 
File "<stdin>", line 1, in <module> 
File "https://stackoverflow.com/users/.../egg_cache/p/pandas-0.16.2-py2.7-linux-x86_64.egg/pandas/core/groupby.py", line 994, in head 
    in_head = self._cumcount_array() < n 
File "https://stackoverflow.com/users/.../egg_cache/p/pandas-0.16.2-py2.7-linux-x86_64.egg/pandas/core/groupby.py", line 1034, in _cumcount_array 
    arr = np.arange(self.grouper._max_groupsize, dtype='int64') 
File "pandas/src/properties.pyx", line 34, in pandas.lib.cache_readonly.__get__ (pandas/lib.c:41917) 
File "https://stackoverflow.com/users/.../egg_cache/p/pandas-0.16.2-py2.7-linux-x86_64.egg/pandas/core/groupby.py", line 1343, in _max_groupsize 
    if self.indices: 
File "pandas/src/properties.pyx", line 34, in pandas.lib.cache_readonly.__get__ (pandas/lib.c:41917) 
File "https://stackoverflow.com/users/.../egg_cache/p/pandas-0.16.2-py2.7-linux-x86_64.egg/pandas/core/groupby.py", line 1309, in indices 
    return _get_indices_dict(label_list, keys) 
File "https://stackoverflow.com/users/.../egg_cache/p/pandas-0.16.2-py2.7-linux-x86_64.egg/pandas/core/groupby.py", line 3767, in _get_indices_dict 
    return lib.indices_fast(sorter, group_index, keys, sorted_labels) 
File "pandas/lib.pyx", line 1385, in pandas.lib.indices_fast (pandas/lib.c:23875) 
File "pandas/src/util.pxd", line 41, in util.get_value_at (pandas/lib.c:62901) 
IndexError: index out of bounds 

次のように私はそれが正常に動作し、すべてのNaN列を除外した場合これがパンダのバグなのか、ここで何か不足しているのですか?私は次のように読んでいます:https://github.com/pandas-dev/pandas/issues/11016
バグの場合、すべてのNan列を取り除くと仮定した場合の回避策はありません。

いくつかのより多くの興味深い観察結果:

df1 = pd.DataFrame({'Date': {0: '2016-10-11', 1: '2016-10-11', 2: '2016-10-11', 3: '2016-10-11', 4: '2016-10-11',5: '2016-10-11'}, 'Stock': {0: 'ABC', 1: 'ABC', 2: 'ABC', 3: 'ABC', 4: 'ABC', 5: 'XYZ'}, 'StartTime': {0: '08:00:00.241', 1: '08:00:00.243', 2: '12:34:23.563', 3: '08:14.05.908', 4: '18:54:50.100', 5: '10:08:36.657'}, 'EndTime': {0: nan,1: nan, 2: 1, 3: nan, 4: nan, 5: nan}}) 

print df1 
     Date EndTime  StartTime Stock 
0 2016-10-11  NaN 08:00:00.241 ABC 
1 2016-10-11  NaN 08:00:00.243 ABC 
2 2016-10-11  1 12:34:23.563 ABC 
3 2016-10-11  NaN 08:14.05.908 ABC 
4 2016-10-11  NaN 18:54:50.100 ABC 
5 2016-10-11  NaN 10:08:36.657 XYZ 

df1.groupby(['Stock','EndTime']).head(1) 
     Date EndTime  StartTime Stock 
0 2016-10-11  NaN 08:00:00.241 ABC 
2 2016-10-11  1 12:34:23.563 ABC 

上記の出力は、私には間違って見えます。それはすべきではありません:

  Date EndTime  StartTime Stock 
0 2016-10-11  NaN 08:00:00.241 ABC 
2 2016-10-11  1 12:34:23.563 ABC 
5 2016-10-11  NaN 10:08:36.657 XYZ 

は今すぐ次のような場合のために:

df1 = pd.DataFrame({'Date': {0: '2016-10-11', 1: '2016-10-11', 2: '2016-10-11', 3: '2016-10-11', 4: '2016-10-11',5: '2016-10-11'}, 'Stock': {0: 'ABC', 1: 'ABC', 2: 'ABC', 3: 'ABC', 4: 'ABC', 5: 'XYZ'}, 'StartTime': {0: '08:00:00.241', 1: '08:00:00.243', 2: '12:34:23.563', 3: '08:14.05.908', 4: '18:54:50.100', 5: '10:08:36.657'}, 'EndTime': {0: nan,1: nan, 2: nan, 3: nan, 4: nan, 5: 1}}) 

print df1 
     Date EndTime  StartTime Stock 
0 2016-10-11  NaN 08:00:00.241 ABC 
1 2016-10-11  NaN 08:00:00.243 ABC 
2 2016-10-11  NaN 12:34:23.563 ABC 
3 2016-10-11  NaN 08:14.05.908 ABC 
4 2016-10-11  NaN 18:54:50.100 ABC 
5 2016-10-11  1 10:08:36.657 XYZ 

df1.groupby(['Stock','EndTime']).head(1) 
     Date EndTime  StartTime Stock 
0 2016-10-11  NaN 08:00:00.241 ABC 
5 2016-10-11  1 10:08:36.657 XYZ 

この1つは罰金です。パンダ0.19.0を使用した場合

+0

'df1.groupby([ '株'、 '終了時間'])ヘッド(1)'(のために。最初の 'df1')は私のためにうまくいきます(パンダ0.19.0) – MaxU

+0

鉱山は0.16.2です。残念ながら、企業環境で更新されたバージョンを取得するには時間がかかるかもしれません。回避策を探しています。 – Rahul

+0

@MaxUは0.19.0を使用している3つのケースすべてで出力を共有できます。おかげで – Rahul

答えて

0

@Rahulは、ここにあなたのコードの出力は次のようになります。

In [5]: df1 
Out[5]: 
     Date EndTime  StartTime Stock 
0 2016-10-11  NaN 08:00:00.241 ABC 
1 2016-10-11  NaN 08:00:00.243 ABC 
2 2016-10-11  NaN 12:34:23.563 ABC 
3 2016-10-11  NaN 08:14.05.908 ABC 
4 2016-10-11  NaN 18:54:50.100 ABC 
5 2016-10-11  NaN 10:08:36.657 XYZ 

In [6]: df1.groupby(['Stock','EndTime']).head(1) 
Out[6]: 
     Date EndTime  StartTime Stock 
0 2016-10-11  NaN 08:00:00.241 ABC 

In [7]: df1.groupby(['Stock','Date']).head(1) 
Out[7]: 
     Date EndTime  StartTime Stock 
0 2016-10-11  NaN 08:00:00.241 ABC 
5 2016-10-11  NaN 10:08:36.657 XYZ 

In [8]: df1 = pd.DataFrame({'Date': {0: '2016-10-11', 1: '2016-10-11', 2: '2016-10-11', 3: '2016-10-11', 4: '2016-10-11',5: '2016-10-11'}, 'Stock': { 
    ...: 0: 'ABC', 1: 'ABC', 2: 'ABC', 3: 'ABC', 4: 'ABC', 5: 'XYZ'}, 'StartTime': {0: '08:00:00.241', 1: '08:00:00.243', 2: '12:34:23.563', 3: '08:14 
    ...: .05.908', 4: '18:54:50.100', 5: '10:08:36.657'}, 'EndTime': {0: nan,1: nan, 2: 1, 3: nan, 4: nan, 5: nan}}) 
    ...: 

In [9]: df1.groupby(['Stock','EndTime']).head(1) 
Out[9]: 
     Date EndTime  StartTime Stock 
0 2016-10-11  NaN 08:00:00.241 ABC 
2 2016-10-11  1.0 12:34:23.563 ABC 

In [10]: df1 = pd.DataFrame({'Date': {0: '2016-10-11', 1: '2016-10-11', 2: '2016-10-11', 3: '2016-10-11', 4: '2016-10-11',5: '2016-10-11'}, 'Stock': 
    ...: {0: 'ABC', 1: 'ABC', 2: 'ABC', 3: 'ABC', 4: 'ABC', 5: 'XYZ'}, 'StartTime': {0: '08:00:00.241', 1: '08:00:00.243', 2: '12:34:23.563', 3: '08: 
    ...: 14.05.908', 4: '18:54:50.100', 5: '10:08:36.657'}, 'EndTime': {0: nan,1: nan, 2: nan, 3: nan, 4: nan, 5: 1}}) 
    ...: 

In [11]: df1.groupby(['Stock','EndTime']).head(1) 
Out[11]: 
     Date EndTime  StartTime Stock 
0 2016-10-11  NaN 08:00:00.241 ABC 
5 2016-10-11  1.0 10:08:36.657 XYZ 
+0

ありがとう@マックス。 Out [6]にStock XYZの行がもう1つあるはずですか? – Rahul

+0

@Rahul、なぜそんなことを考えますか? – MaxU

+0

Out [9]と同様です。私は何かに欠けていますか? – Rahul

関連する問題