2016-08-18 1 views
2

複数の列にpandas.DatetimeIndexを適用できないのはなぜですか?私は、次のコードを使用して、いくつかのパンダ列の時間部分をドロップしようとしています

group_df['submitted_on'] = pd.DatetimeIndex(group_df['submitted_on']).to_period('d') 
group_df['resolved_on'] = pd.DatetimeIndex(group_df['resolved_on']).to_period('d') 

これは、最初の列のために正常に動作しますが、私はそれが複数の列をTI適用傾ける理由を見つけ出すように見えることはできません。

私は2番目の行を実行しようとすると、次のエラー取得しています:とValueErrorは私に何を言っていないので

File "C:/Users/anshanno/PycharmProjects/RETIvizScript/RetiViz.py", line 271, in join_groups 
    group_df['resolved_on'] = pd.DatetimeIndex(group_df['resolved_on']).to_period('d') 
    File "C:\Python27\lib\site-packages\pandas\util\decorators.py", line 91, in wrapper 
    return func(*args, **kwargs) 
    File "C:\Python27\lib\site-packages\pandas\tseries\index.py", line 349, in __new__ 
    values, freq=freq, dayfirst=dayfirst, yearfirst=yearfirst) 
    File "pandas\tslib.pyx", line 2347, in pandas.tslib.parse_str_array_to_datetime (pandas\tslib.c:42450) 
ValueError 

を、私は運なしerrors='coerce'を試してみました - 私はまだ同じundescriptiveエラーを取得します。

group_df['resolved_on'] = pd.DatetimeIndex(group_df['resolved_on'], errors='coerce').to_period('d') 

編集(データをサンプリング):

"identifier","status","submitted_on","resolved_on","closed_on","duplicate_on","junked_on","unproducible_on","verified_on" 
"xx1","D","2004-07-28 07:00:00.0","null","null","2004-08-26 07:00:00.0","null","null","null" 
"xx2","N","2010-03-02 03:00:16.0","null","null","null","null","null","null" 
"xx3","U","2005-10-26 14:20:20.0","null","null","null","null","2005-11-01 13:02:22.0","null" 
"xx4","V","2006-06-30 07:00:00.0","2006-09-15 07:00:00.0","null","null","null","null","2006-11-20 08:00:00.0" 
"xx5","R","2012-09-21 06:30:58.0","2013-06-06 09:35:25.0","null","null","null","null","null" 
"xx6","D","2009-11-25 02:16:03.0","null","null","2010-02-26 12:28:22.0","null","null","null" 
"xx7","D","2003-08-29 07:00:00.0","null","null","2003-08-29 07:00:00.0","null","null","null" 
"xx8","R","2003-06-06 12:00:00.0","2003-06-24 12:00:00.0","null","null","null","null","null" 
"xx9","R","2004-11-05 08:00:00.0","2004-11-15 08:00:00.0","null","null","null","null","null" 
"xx10","R","2008-02-21 05:13:39.0","2008-09-25 17:20:57.0","null","null","null","null","null" 
"xx11","R","2007-03-08 17:47:44.0","2007-03-21 23:47:57.0","null","null","null","null","null" 
"xx12","R","2011-08-22 19:50:25.0","2012-06-21 05:52:12.0","null","null","null","null","null" 
"xx13","J","2003-07-07 12:00:00.0","null","null","null","2003-07-10 12:00:00.0","null","null" 
"xx14","A","2008-09-24 11:36:34.0","null","null","null","null","null","null" 

みんなありがとう、任意のヘルプは高く評価されます。

+0

私たちは、このエラーを再現するデータのサブセットを必要とします。 – piRSquared

+0

追加、ありがとう:) – anshanno

答えて

2

利用pd.to_datetime代わりのpd.DatetimeIndex

group_df['submitted_on'] = pd.to_datetime(group_df['submitted_on'], 'coerce').dt.to_period('d') 
group_df['resolved_on'] = pd.to_datetime(group_df['resolved_on'], 'coerce').dt.to_period('d') 

group_df 

enter image description here

+0

ありがとうございます。好奇心のために、なぜこのような場合に ''エラー= ''なしで ''強制的に ''働くのですか? – anshanno

+0

@anshannoそれの有無にかかわらず動作します。違いは、 'pd.DatatimeIndex'の代わりに' pd.to_datetime'を呼び出した点です。 – piRSquared

+0

そうです、私はその部分を理解しました。私は 'errors ='が必要でないことを知らなかった。再度、感謝します :) – anshanno

関連する問題