2017-05-16 4 views
0

私は以下のような「sm」という名前のpandasデータフレームを多数の行に持っています。会社と年ごとの売上成長率を計算して新しい列名 'sg' 。pandasデータフレーム年と企業別に機能を持つ列を追加する

fyear ch  conm  sale ipodate 
0 1996 51.705 AAR CORP 589.328 NaN 
1 1997 17.222 AAR CORP 782.123 NaN 
2 1998 8.250 AAR CORP 918.036 NaN 
3 1999 1.241 AAR CORP 1024.333 NaN 
4 2000 13.809 AAR CORP 874.255 NaN 
5 2001 34.522 AAR CORP 638.721 NaN 
6 2002 29.154 AAR CORP 606.337 NaN 
7 2003 41.010 AAR CORP 651.958 NaN 
8 2004 40.508 AAR CORP 747.848 NaN 
9 2005 121.738 AAR CORP 897.284 NaN 
10 2006 83.317 AAR CORP 1061.169 NaN 
11 2007 109.391 AAR CORP 1384.919 NaN 
12 2008 112.505 AAR CORP 1423.976 NaN 
13 2009 79.370 AAR CORP 1352.151 NaN 
14 2010 57.433 AAR CORP 1775.782 NaN 
15 2011 67.720 AAR CORP 2074.498 NaN 
16 2012 75.300 AAR CORP 2167.100 NaN 
17 2013 89.200 AAR CORP 2035.000 NaN 
18 2014 54.700 AAR CORP 1594.300 NaN 
19 2015 31.200 AAR CORP 1662.600 NaN 
20 1997 64.000 AMERICAN AIRLINES GROUP INC 18570.000 NaN 
21 1998 95.000 AMERICAN AIRLINES GROUP INC 19205.000 NaN 
22 1999 85.000 AMERICAN AIRLINES GROUP INC 17730.000 NaN 
23 2000 89.000 AMERICAN AIRLINES GROUP INC 19703.000 NaN 
24 2001 120.000 AMERICAN AIRLINES GROUP INC 18963.000 NaN 
115466 2014 290.500 ALLEGION PLC 2118.300 NaN 
115467 2015 199.700 ALLEGION PLC 2068.100 NaN 
115468 2016 312.400 ALLEGION PLC 2238.000 NaN 
115470 2013 2.063 AGILITY HEALTH INC 63.052 NaN 
115471 2014 1.301 AGILITY HEALTH INC 62.105 NaN 
115472 2015 1.307 AGILITY HEALTH INC 62.328 NaN 
115473 2013 109.819 NORDIC AMERICAN OFFSHORE NaN NaN 
115474 2014 46.398 NORDIC AMERICAN OFFSHORE 52.789 NaN 
115475 2015 5.339 NORDIC AMERICAN OFFSHORE 36.372 NaN 
115476 2016 2.953 NORDIC AMERICAN OFFSHORE 16.249 NaN 
115477 2011 2.040 DORIAN LPG LTD 34.571 20140508.0 
115478 2012 1.042 DORIAN LPG LTD 38.662 20140508.0 
115479 2013 279.132 DORIAN LPG LTD 29.634 20140508.0 
115480 2014 204.821 DORIAN LPG LTD 104.129 20140508.0 
115481 2015 46.412 DORIAN LPG LTD 289.208 20140508.0 
115482 2013 948.684 NOMAD FOODS LTD 2074.842 NaN 
115483 2014 855.541 NOMAD FOODS LTD 1816.239 NaN 
115484 2015 671.846 NOMAD FOODS LTD 971.013 NaN 
115485 2016 347.688 NOMAD FOODS LTD 2034.109 NaN 
115487 2014 2638.000 ATHENE HOLDING LTD 4100.000 20161209.0 
115488 2015 2720.000 ATHENE HOLDING LTD 2616.000 20161209.0 
115489 2016 2459.000 ATHENE HOLDING LTD 4107.000 20161209.0 
115490 2013 3.956 MIDATECH PHARMA PLC 0.244 NaN 
115491 2014 47.240 MIDATECH PHARMA PLC 0.245 NaN 
115492 2015 23.852 MIDATECH PHARMA PLC 2.028 NaN 
115493 2016 21.723 MIDATECH PHARMA PLC 8.541 NaN 

私は多少、このようなコードを実装

d = sm.loc[sm['conm'] == 'AAR CORP'] 
dt = d.loc[d.fyear == 1996,'sale'].values[0] 
dtp1 = d.loc[d.fyear == 1997,'sale'].values[0] 
sg = (dtp1-dt)/ dt * 100 
d.ix[d.fyear == 1997,'sg'] = sg 

、それが私の最後の列を与えます。

fyear ch  conm  sale ipodate sg 
0 1996 51.705 AAR CORP 589.328 NaN  Nan 
1 1997 17.222 AAR CORP 782.123 NaN  32.71438 
2 1998 8.250 AAR CORP 918.036 NaN  Nan 

は、私が「SG」列が「販売」列の隣になりたいと私はすべての年(1996年から2015年)のためのすべての企業のために、この売上成長率を計算し、与えられたt年と同じ行に挿入します。今私は会社名を小さなデータフレームにスライスしてから売上成長率を計算しますが、私のユニークな会社名は9000を超えているので、私の現在の方法は非効率的です。すべての会社名でスライスせずにこれを行うことはできますか?前もって感謝します。

答えて

1
#sort df first 
df.sort_values(by=['conm','fyear'],inplace=True) 
#you can use groupby first by company, get the rolling increase for each company and then insert it to the dataframe. 

df.insert(df.columns.tolist().index('sale')+1, 
      'sm',df.groupby(by=['conm'])['sale']\ 
      .apply(lambda x: x.rolling(2).apply(lambda y: (y[1]-y[0])/y[0]*100))) 

df 
Out[296]: 
    fyear  ch       conm  sale   sm ipodate 
0 1996 51.705      AAR CORP 589.328  NaN  NaN 
1 1997 17.222      AAR CORP 782.123 32.714380  NaN 
2 1998 8.250      AAR CORP 918.036 17.377446  NaN 
3 1999 1.241      AAR CORP 1024.333 11.578740  NaN 
4 2000 13.809      AAR CORP 874.255 -14.651290  NaN 
5 2001 34.522      AAR CORP 638.721 -26.941110  NaN 
6 2002 29.154      AAR CORP 606.337 -5.070132  NaN 
7 2003 41.010      AAR CORP 651.958 7.524034  NaN 
8 2004 40.508      AAR CORP 747.848 14.708003  NaN 
9 2005 121.738      AAR CORP 897.284 19.982135  NaN 
10 2006 83.317      AAR CORP 1061.169 18.264563  NaN 
11 2007 109.391      AAR CORP 1384.919 30.508807  NaN 
12 2008 112.505      AAR CORP 1423.976 2.820165  NaN 
13 2009 79.370      AAR CORP 1352.151 -5.043975  NaN 
14 2010 57.433      AAR CORP 1775.782 31.330155  NaN 
15 2011 67.720      AAR CORP 2074.498 16.821659  NaN 
16 2012 75.300      AAR CORP 2167.100 4.463827  NaN 
17 2013 89.200      AAR CORP 2035.000 -6.095704  NaN 
18 2014 54.700      AAR CORP 1594.300 -21.656020  NaN 
19 2015 31.200      AAR CORP 1662.600 4.284012  NaN 
20 1997 64.000 AMERICAN AIRLINES GROUP INC 18570.000  NaN  NaN 
21 1998 95.000 AMERICAN AIRLINES GROUP INC 19205.000 3.419494  NaN 
22 1999 85.000 AMERICAN AIRLINES GROUP INC 17730.000 -7.680292  NaN 
23 2000 89.000 AMERICAN AIRLINES GROUP INC 19703.000 11.128032  NaN 
24 2001 120.000 AMERICAN AIRLINES GROUP INC 18963.000 -3.755773  NaN 
1

もう1つの可能な解決策(それはapply()を使用していないので、それが潜在的に高速です):

df['sm'] = (df.sort_values(['conm', 'fyear'])\ 
       .groupby('conm')['sale']\ 
       .diff()\ 
       .shift(-1)/df['sale']).shift() * 100 

このソリューションは、consecuitive年の間に1年の差が常に存在することを前提としています。

関連する問題