あなたは、days
でmonths
ためextract
を使用radd
でrigthからeache年に追加し、to_datetime
に変換することができます:
L = [['Fiscal data as of Dec 31 2016', '2016', '2015', '2014'],
['Fiscal data as of Mar 31 2016', '2016', '2015', '2014']]
a = np.array(L)
pat = '(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+(\d{1,2})'
d = pd.Series(a[:, 0]).str.extract(pat, expand=True).apply('-'.join, 1).add('-')
print (d)
0 Dec-31-
1 Mar-31-
dtype: object
L1 = pd.DataFrame(a[:, 1:]).radd(d, 0).apply(pd.to_datetime).values.astype('datetime64[D]')
print (L1)
[['2016-12-31' '2015-12-31' '2014-12-31']
['2016-03-31' '2015-03-31' '2014-03-31']]
パフォーマンスがマッピングヶ月間の重要な使用dictionary
の場合:
d = {'Jan':'01', 'Feb':'02', 'Mar':'03', 'Apr':'04', 'May':'05', 'Jun':'06',
'Jul':'07', 'Aug':'08', 'Sep':'09', 'Oct':'10', 'Nov':'11', 'Dec':'12'}
L2 = []
for l in L:
a = l[0].split()[-3:-1]
a = '-'.join([d[a[0]], a[1]])
L2.append([x + '-' + a for x in l[1:]])
print (L2)
[['2016-12-31', '2015-12-31', '2014-12-31'],
['2016-03-31', '2015-03-31', '2014-03-31']]
最後に必要な場合はnumpy array
:
print (np.array(L1))
[['2016-12-31' '2015-12-31' '2014-12-31']
['2016-03-31' '2015-03-31' '2014-03-31']]
の
タイミング:
L = [['Fiscal data as of Dec 31 2016', '2016', '2015', '2014'],
['Fiscal data as of Mar 31 2016', '2016', '2015', '2014']] * 10000
In [262]: %%timeit
...: d = {'Jan':'01', 'Feb':'02', 'Mar':'03', 'Apr':'04', 'May':'05', 'Jun':'06',
...: 'Jul':'07', 'Aug':'08', 'Sep':'09', 'Oct':'10', 'Nov':'11', 'Dec':'12'}
...:
...: L2 = []
...: for l in L:
...: a = l[0].split()[-3:-1]
...: a = '-'.join([d.get(a[0]), a[1]])
...: L2.append([x + '-' + a for x in l[1:]])
...:
10 loops, best of 3: 44.3 ms per loop
In [263]: %%timeit
...: out_list=[]
...: for l in L:
...: l_date = datetime.strptime((" ").join(l[0].split()[-3:]), '%b %d %Y')
...: out_list.append([("-").join([str(l_year),str(l_date.month),str(l_date.day)])
...: for l_year in l[-3:]])
...:
1 loop, best of 3: 303 ms per loop
In [264]: %%timeit
...: a = np.array(L)
...: pat = '(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+(\d{1,2})'
...: d = pd.Series(a[:, 0]).str.extract(pat, expand=True).apply('-'.join, 1).add('-')
...: L1 = pd.DataFrame(a[:, 1:]).radd(d, 0).apply(pd.to_datetime).values.astype('datetime64[D]')
...:
1 loop, best of 3: 7.46 s per loop
を作成し、私はあなたのnumpyのソリューションを好むが、それはまた、より直感的です。ありがとう –