2017-09-13 19 views
1

この質問はpandas Dataframeでいくつかのデータ分析を実行したいと思います。pandasデータフレームの前の日付レコードの列を追跡する方法は?

    derived_symbol sport_name person_name  city \ 
0  football.RAM.mumbai.ram_count football   RAM mumbai 
1  football.RAM.mumbai.mum_count football   RAM mumbai 
2  football.RAM.delhi.mum_count football   RAM  delhi 
3  football.RAM.delhi.ram_count football   RAM  delhi 
4  football.RAM.mumbai.ram_count football   RAM mumbai 
5  football.RAM.mumbai.mum_count football   RAM mumbai 
6  football.RAM.delhi.mum_count football   RAM  delhi 
7  football.RAM.delhi.ram_count football   RAM  delhi 
8  basketball.MAH.pune.mah_count basketball   MAH  pune 
9  basketball.MAH.nagpur.mah_count basketball   MAH nagpur 
10  basketball.MAH.TOTAL.mah_count basketball   MAH No Entry 
11 basketball.MAH.TOTAL.nagpur_count basketball   MAH nagpur 
12 basketball.MAH.TOTAL.pune_count basketball   MAH  pune 
13  football.RAM.TOTAL.delhi_count football   RAM  delhi 
14  football.RAM.TOTAL.delhi_count football   RAM  delhi 
15  football.RAM.TOTAL.mum_count football   RAM No Entry 
16  football.RAM.TOTAL.mum_count football   RAM No Entry 
17 football.RAM.TOTAL.mumbai_count football   RAM mumbai 
18 football.RAM.TOTAL.mumbai_count football   RAM mumbai 
19  football.RAM.TOTAL.ram_count football   RAM No Entry 
20  football.RAM.TOTAL.ram_count football   RAM No Entry 

    person_symbol  month sir person_count 
0   ram 2017-01-23 a   10 
1   mum 2017-01-23 a   14 
2   mum 2017-01-23 a   25 
3   ram 2017-01-23 a   20 
4   ram 2017-02-22 b   34 
5   mum 2017-02-22 b   23 
6   mum 2017-02-22 b   43 
7   ram 2017-02-22 b   34 
8   mah 2017-03-03 c   10 
9   mah 2017-03-03 c   20 
10   mah 2017-03-03 c   30 
11  No Entry 2017-03-03 c   20 
12  No Entry 2017-03-03 c   10 
13  No Entry 2017-01-23 a   45 
14  No Entry 2017-02-22 b   77 
15   mum 2017-01-23 a   39 
16   mum 2017-02-22 b   66 
17  No Entry 2017-01-23 a   24 
18  No Entry 2017-02-22 b   57 
19   ram 2017-01-23 a   30 
20   ram 2017-02-22 b   68 

このDataframeにprevious_person_countカラムを追加します。このデータフレームの「月」列には、「yyyy-mm-dd」という形式の日付が含まれています。その月を特定するには、月、すなわち「mm」フィールドを調べる必要があります。

この月を見て、次の月の「previous_person_count」値に「person_count」値を設定する必要があります。

Exceted出力:

   derived_symbol sport_name person_name  city \ 
0  football.RAM.mumbai.ram_count football   RAM mumbai 
1  football.RAM.mumbai.mum_count football   RAM mumbai 
2  football.RAM.delhi.mum_count football   RAM  delhi 
3  football.RAM.delhi.ram_count football   RAM  delhi 
4  football.RAM.mumbai.ram_count football   RAM mumbai 
5  football.RAM.mumbai.mum_count football   RAM mumbai 
6  football.RAM.delhi.mum_count football   RAM  delhi 
7  football.RAM.delhi.ram_count football   RAM  delhi 
8  basketball.MAH.pune.mah_count basketball   MAH  pune 
9  basketball.MAH.nagpur.mah_count basketball   MAH nagpur 
10  basketball.MAH.TOTAL.mah_count basketball   MAH No Entry 
11 basketball.MAH.TOTAL.nagpur_count basketball   MAH nagpur 
12 basketball.MAH.TOTAL.pune_count basketball   MAH  pune 
13  football.RAM.TOTAL.delhi_count football   RAM  delhi 
14  football.RAM.TOTAL.delhi_count football   RAM  delhi 
15  football.RAM.TOTAL.mum_count football   RAM No Entry 
16  football.RAM.TOTAL.mum_count football   RAM No Entry 
17 football.RAM.TOTAL.mumbai_count football   RAM mumbai 
18 football.RAM.TOTAL.mumbai_count football   RAM mumbai 
19  football.RAM.TOTAL.ram_count football   RAM No Entry 
20  football.RAM.TOTAL.ram_count football   RAM No Entry 

    person_symbol  month sir person_count  previous_person_count 
0   ram 2017-01-23 a   10  0 
1   mum 2017-01-23 a   14  0 
2   mum 2017-01-23 a   25  0 
3   ram 2017-01-23 a   20  0 
4   ram 2017-02-22 b   34  10 
5   mum 2017-02-22 b   23  14 
6   mum 2017-02-22 b   43  25 
7   ram 2017-02-22 b   34  20 
8   mah 2017-03-03 c   10  0 
9   mah 2017-03-03 c   20  0 
10   mah 2017-03-03 c   30  0 
11  No Entry 2017-03-03 c   20  0 
12  No Entry 2017-03-03 c   10  0 
13  No Entry 2017-01-23 a   45  0 
14  No Entry 2017-02-22 b   77  45 
15   mum 2017-01-23 a   39  0 
16   mum 2017-02-22 b   66  39 
17  No Entry 2017-01-23 a   24  0 
18  No Entry 2017-02-22 b   57  24 
19   ram 2017-01-23 a   30  0 
20   ram 2017-02-22 b   68  30 

編集リファレンスコード:

df = pd.DataFrame({'sport_name': ['football','football','football','football','football','football','football','football','basketball','basketball'], 
      'person_name': ['ramesh','ramesh','ramesh','ramesh','ramesh','ramesh','ramesh','ramesh','mahesh','mahesh'], 
       'city': ['mumbai', 'mumbai','delhi','delhi','mumbai', 'mumbai','delhi','delhi','pune','nagpur'], 
     'person_symbol': ['ram','mum','mum','ram','ram','mum','mum','ram','mah','mah'], 
     'person_count': ['10','14','25','20','34','23','43','34','10','20'], 
     'month': ['2017-01-23','2017-01-23','2017-01-23','2017-01-23','2017-02-22','2017-02-22','2017-02-22','2017-02-22','2017-03-03','2017-03-03'], 
     'sir': ['a','a','a','a','b','b','b','b','c','c']}) 

df = df[['sport_name','person_name','city','person_symbol','person_count','month','sir']] 

df['person_name'] = df['person_name'].apply(symbology) 

df['person_count'] = df['person_count'].astype(int) 

print df 
df1=df.set_index(['sport_name','person_name','person_count','month','sir']).stack().reset_index(name='val') 

df1['derived_symbol'] = df1['sport_name'] + '.' + df1['person_name'] + '.TOTAL.' + df1['val'] + '_count' 

df2 = df1.groupby(['derived_symbol','month','sir','sport_name','person_name','level_5','val'])['person_count'].sum().reset_index(name='person_count') 

df3 = df2.set_index(['derived_symbol','month','sir','sport_name','person_name','person_count','level_5'])['val'].unstack().fillna('No Entry').rename_axis(None, 1).reset_index() 

df['derived_symbol'] = df['sport_name'] + '.' + df['person_name'] + '.' + df['city'] + "."+ df['person_symbol'] + '_count' 
df4 = pd.concat([df, df3]).reset_index(None) 
print df3 
del df4['index'] 
df4 = df4[['derived_symbol','sport_name','person_name','city','person_symbol','month','sir','person_count']] 
print df4 

利便性:

d = {'city': {0: 'mumbai', 
    1: 'mumbai', 
    2: 'delhi', 
    3: 'delhi', 
    4: 'mumbai', 
    5: 'mumbai', 
    6: 'delhi', 
    7: 'delhi', 
    8: 'pune', 
    9: 'nagpur', 
    10: 'No Entry', 
    11: 'nagpur', 
    12: 'pune', 
    13: 'delhi', 
    14: 'delhi', 
    15: 'No Entry', 
    16: 'No Entry', 
    17: 'mumbai', 
    18: 'mumbai', 
    19: 'No Entry', 
    20: 'No Entry'}, 
'derived_symbol': {0: 'football.RAM.mumbai.ram_count', 
    1: 'football.RAM.mumbai.mum_count', 
    2: 'football.RAM.delhi.mum_count', 
    3: 'football.RAM.delhi.ram_count', 
    4: 'football.RAM.mumbai.ram_count', 
    5: 'football.RAM.mumbai.mum_count', 
    6: 'football.RAM.delhi.mum_count', 
    7: 'football.RAM.delhi.ram_count', 
    8: 'basketball.MAH.pune.mah_count', 
    9: 'basketball.MAH.nagpur.mah_count', 
    10: 'basketball.MAH.TOTAL.mah_count', 
    11: 'basketball.MAH.TOTAL.nagpur_count', 
    12: 'basketball.MAH.TOTAL.pune_count', 
    13: 'football.RAM.TOTAL.delhi_count', 
    14: 'football.RAM.TOTAL.delhi_count', 
    15: 'football.RAM.TOTAL.mum_count', 
    16: 'football.RAM.TOTAL.mum_count', 
    17: 'football.RAM.TOTAL.mumbai_count', 
    18: 'football.RAM.TOTAL.mumbai_count', 
    19: 'football.RAM.TOTAL.ram_count', 
    20: 'football.RAM.TOTAL.ram_count'}, 
'month': {0: '2017-01-23', 
    1: '2017-01-23', 
    2: '2017-01-23', 
    3: '2017-01-23', 
    4: '2017-02-22', 
    5: '2017-02-22', 
    6: '2017-02-22', 
    7: '2017-02-22', 
    8: '2017-03-03', 
    9: '2017-03-03', 
    10: '2017-03-03', 
    11: '2017-03-03', 
    12: '2017-03-03', 
    13: '2017-01-23', 
    14: '2017-02-22', 
    15: '2017-01-23', 
    16: '2017-02-22', 
    17: '2017-01-23', 
    18: '2017-02-22', 
    19: '2017-01-23', 
    20: '2017-02-22'}, 
'person_count': {0: 10, 
    1: 14, 
    2: 25, 
    3: 20, 
    4: 34, 
    5: 23, 
    6: 43, 
    7: 34, 
    8: 10, 
    9: 20, 
    10: 30, 
    11: 20, 
    12: 10, 
    13: 45, 
    14: 77, 
    15: 39, 
    16: 66, 
    17: 24, 
    18: 57, 
    19: 30, 
    20: 68}, 
'person_name': {0: 'RAM', 
    1: 'RAM', 
    2: 'RAM', 
    3: 'RAM', 
    4: 'RAM', 
    5: 'RAM', 
    6: 'RAM', 
    7: 'RAM', 
    8: 'MAH', 
    9: 'MAH', 
    10: 'MAH', 
    11: 'MAH', 
    12: 'MAH', 
    13: 'RAM', 
    14: 'RAM', 
    15: 'RAM', 
    16: 'RAM', 
    17: 'RAM', 
    18: 'RAM', 
    19: 'RAM', 
    20: 'RAM'}, 
'person_symbol': {0: 'ram', 
    1: 'mum', 
    2: 'mum', 
    3: 'ram', 
    4: 'ram', 
    5: 'mum', 
    6: 'mum', 
    7: 'ram', 
    8: 'mah', 
    9: 'mah', 
    10: 'mah', 
    11: 'No Entry', 
    12: 'No Entry', 
    13: 'No Entry', 
    14: 'No Entry', 
    15: 'mum', 
    16: 'mum', 
    17: 'No Entry', 
    18: 'No Entry', 
    19: 'ram', 
    20: 'ram'}, 
'sir': {0: 'a', 
    1: 'a', 
    2: 'a', 
    3: 'a', 
    4: 'b', 
    5: 'b', 
    6: 'b', 
    7: 'b', 
    8: 'c', 
    9: 'c', 
    10: 'c', 
    11: 'c', 
    12: 'c', 
    13: 'a', 
    14: 'b', 
    15: 'a', 
    16: 'b', 
    17: 'a', 
    18: 'b', 
    19: 'a', 
    20: 'b'}, 
'sport_name': {0: 'football', 
    1: 'football', 
    2: 'football', 
    3: 'football', 
    4: 'football', 
    5: 'football', 
    6: 'football', 
    7: 'football', 
    8: 'basketball', 
    9: 'basketball', 
    10: 'basketball', 
    11: 'basketball', 
    12: 'basketball', 
    13: 'football', 
    14: 'football', 
    15: 'football', 
    16: 'football', 
    17: 'football', 
    18: 'football', 
    19: 'football', 
    20: 'football'}} 

答えて

1

あなたができることは、月の番号(日付から)とそれ以前のものを計算した後に、merge自身へのデータフレームです。

まず、これらの2つの値を計算してみましょう。便宜上、私は最初に生のmonth文字列値をdatetimeに変換しました。これにより、relativedeltaを使用して前の月を計算することができました。これにより、たとえ年の変更があったとしても、行動が正しいことが保証されます。

In [12]: relevant_columns = ['city', 'person_symbol', 'sport_name'] 

In [13]: pd.merge(df, df, left_on=relevant_columns + ['previous_month_num'], right_on=rele 
    ...: vant_columns + ['month_num'], how='left', suffixes=('', '_previous'))[list(df.col 
    ...: umns) + ['person_count_previous']].fillna(0).drop(['month_num', 'previous_month_n 
    ...: um'], axis=1) 
Out[13]: 
    city  month person_count person_name person_symbol sir sport_name \ 
0 mumbai 2017-01-23   10  ramesh   ram a football 
1 mumbai 2017-01-23   14  ramesh   mum a football 
2 delhi 2017-01-23   25  ramesh   mum a football 
3 delhi 2017-01-23   20  ramesh   ram a football 
4 mumbai 2017-02-22   34  ramesh   ram b football 
5 mumbai 2017-02-22   23  ramesh   mum b football 
6 delhi 2017-02-22   43  ramesh   mum b football 
7 delhi 2017-02-22   34  ramesh   ram b football 
8 pune 2017-03-03   10  mahesh   mah c basketball 
9 nagpur 2017-03-03   20  mahesh   mah c basketball 

    person_count_previous 
0      0 
1      0 
2      0 
3      0 
4     10 
5     14 
6     25 
7     20 
8      0 
9      0 

いくつかのコメント:私たちは、その後のキーをマージするよう計算された月の値を使用して、自分自身へのデータフレームをマージすることができ

In [7]: df['month'] = pd.to_datetime(df['month']) 

In [8]: df['month_num'] = df['month'].apply(lambda x: x.strftime('%Y-%m')) 

In [9]: from dateutil.relativedelta import relativedelta 

In [10]: df['previous_month_num'] = df['month'].apply(lambda x: (x + relativedelta(months=-1)).strftime('%Y-%m')) 

In [11]: df 
Out[11]: 
    city  month person_count person_name person_symbol sir sport_name \ 
0 mumbai 2017-01-23   10  ramesh   ram a football 
1 mumbai 2017-01-23   14  ramesh   mum a football 
2 delhi 2017-01-23   25  ramesh   mum a football 
3 delhi 2017-01-23   20  ramesh   ram a football 
4 mumbai 2017-02-22   34  ramesh   ram b football 
5 mumbai 2017-02-22   23  ramesh   mum b football 
6 delhi 2017-02-22   43  ramesh   mum b football 
7 delhi 2017-02-22   34  ramesh   ram b football 
8 pune 2017-03-03   10  mahesh   mah c basketball 
9 nagpur 2017-03-03   20  mahesh   mah c basketball 

    month_num previous_month_num 
0 2017-01   2016-12 
1 2017-01   2016-12 
2 2017-01   2016-12 
3 2017-01   2016-12 
4 2017-02   2017-01 
5 2017-02   2017-01 
6 2017-02   2017-01 
7 2017-02   2017-01 
8 2017-03   2017-02 
9 2017-03   2017-02 

  • 私は参照列として['city', 'person_symbol', 'sport_name']を使用しますが、あなたが達成したいと思っているものに応じて、もう少し追加してください。
  • 新しい列の名前はperson_count_previousですが、renameにすることができます。これは最適な方法です。
  • デフォルトでは、前回のカウントと一致しない場合、列はNaNになります。 fillnaのおかげで値を0に置き換えました。
  • dropを使用して「一時的な」列を削除しましたが、自由に保管してください。
+0

@ 3kt-ありがとうございました。私も週に同じことをすることができるような1つの質問がありますか?データが毎週利用可能である場合。そのデータから先週のカウントを計算できますか? – kit

+0

@ 3kt-私はそれをしました。ありがとう – kit

関連する問題