2017-11-24 4 views
1

このパンダに続いてネストされた辞書を持つjsonを読み込もうとしていますtutorial、ネストされたリスト/ディクショナリの一部がNaNなのでnormalize関数を呼び出してみましょうcan't find Key Errorです。これは、辞書の上位レベルの特定の要素に対してのみ存在するためです。パンダはNaNエントリでネストされたjsonを読み込みます

ここ

が私のデータである。

q 
Out[235]: 
[{u'Code': u'GE', 
    u'datetime': u'2011-11-14T19:30:03-05:00[US/Eastern]'}, 
{u'Code': u'PP', 
    u'datetime': u'2012-21-14T18:50-05:00[US/Eastern]'}, 
{u'Code': u'IO', 
    u'Summary': [{u'prod': u'book', 
    u'num': 81.04, 
    u'devil': 17}, 
    {u'prod': u'game', 
    u'num': 191.5, 
    u'devil': 10}, 
    {u'prod': u'desk', 
    u'num': 55.5, 
    u'devil': -6}, 
    {u'angel': u'ipo', 
    u'num': 503.0, 
    u'devil': 1}], 
    u'datetime': u'2013-10-14T16:30-05:00[US/Eastern]'}, 
{u'Code': u'BI', 
    u'datetime': u'2014-11-14T12:30-05:00[US/Eastern]'}, 
{u'Code': u'EZ', 
    u'datetime': u'2015-12-14T10:00-05:00[US/Eastern]'}, 
{u'Code': u'JC', 
    u'datetime': u'2016-10-14T08:30:01-05:00[US/Eastern]'}, 
{u'Code': u'WX', 
    u'Summary': [{u'angel': u'yut', 
    u'num': 0, 
    u'prod': u'read', 
    u'devil': 0.0}, 
    {u'angel': u'fgf', 
    u'prod': u'fart', 
    u'devil': 0.0}, 
    {u'prod': u'red', 
    u'num': 673, 
    u'angel': u'deft', 
    u'devil': 0}, 
    { u'devil': 0, 
    u'prod': u'dog'}, 
    {u'angel': u'hut', 
    u'devil': 99}], 
    u'datetime': u'2017-10-13T05:00:02-05:00[US/Eastern]'}] 

私はこのようなデータフレームでそれを半表示することができます述べたように

pd.DataFrame(q) 
    Out[229]: 
      Code           Summary      datetime 
    0   GE            NaN 2011-11-11T19:30:03-05:00[US/Eastern] 
    1   PP            NaN 2012-12-25T18:50-05:00[US/Eastern] 
    2   IO [{u'prod': u'book', u'angel': u'I...    2013-11-04T16:30-05:00[US/Eastern] 
    3   BI            NaN 2014-12-14T08:30:01-05:00[US/Eastern] 
    4   JC            NaN 2016-11-14T04:30-05:00[US/Eastern] 
    5   WX [{u'prod': u'orange', u'devil': -2, u's...   2017-10-13T03:30:08-05:00[US/Eastern] 

KeyError: 'Summary'

pd.io.json.json_normalize(q, 'Summary',['Code', 'datetime'])結果を実行すると、どのようにすることができますこれを回避する?理想的には、存在しない時間に対してはNaNセル値を持つだけです。

+0

@MaxU申し訳ありませんが、単にタイプミスに気づき、私はそれを編集しました。私のサンプルデータに問題がある場合は教えてください。 – guy

+0

@マックス今見ましょうか? – guy

+0

うん、それは今より良く見える;-)あなたは希望のデータセットを提供してもらえますか? – MaxU

答えて

1

IIUC:

In [94]: (json_normalize([x for x in q if x.get('Summary')], 
         'Summary', 
         ['Code', 'datetime']) 
    ...:    .append(pd.DataFrame([x for x in q if not x.get('Summary')]))) 
    ...: 
Out[94]: 
    Code angel        datetime devil  num prod 
0 IO NaN  2013-10-14T16:30-05:00[US/Eastern] 17.0 81.04 book 
1 IO NaN  2013-10-14T16:30-05:00[US/Eastern] 10.0 191.50 game 
2 IO NaN  2013-10-14T16:30-05:00[US/Eastern] -6.0 55.50 desk 
3 IO ipo  2013-10-14T16:30-05:00[US/Eastern] 1.0 503.00 NaN 
4 WX yut 2017-10-13T05:00:02-05:00[US/Eastern] 0.0 0.00 read 
5 WX fgf 2017-10-13T05:00:02-05:00[US/Eastern] 0.0  NaN fart 
6 WX deft 2017-10-13T05:00:02-05:00[US/Eastern] 0.0 673.00 red 
7 WX NaN 2017-10-13T05:00:02-05:00[US/Eastern] 0.0  NaN dog 
8 WX hut 2017-10-13T05:00:02-05:00[US/Eastern] 99.0  NaN NaN 
0 GE NaN 2011-11-14T19:30:03-05:00[US/Eastern] NaN  NaN NaN 
1 PP NaN  2012-21-14T18:50-05:00[US/Eastern] NaN  NaN NaN 
2 BI NaN  2014-11-14T12:30-05:00[US/Eastern] NaN  NaN NaN 
3 EZ NaN  2015-12-14T10:00-05:00[US/Eastern] NaN  NaN NaN 
4 JC NaN 2016-10-14T08:30:01-05:00[US/Eastern] NaN  NaN NaN 

pd.concat()を使用して:

In [95]: pd.concat([json_normalize([x for x in q if x.get('Summary')], 
    ...:       'Summary', 
    ...:       ['Code', 'datetime']), 
    ...:   pd.DataFrame([x for x in q if not x.get('Summary')])], 
    ...:   ignore_index=True) 
    ...: 
Out[95]: 
    Code angel        datetime devil  num prod 
0 IO NaN  2013-10-14T16:30-05:00[US/Eastern] 17.0 81.04 book 
1 IO NaN  2013-10-14T16:30-05:00[US/Eastern] 10.0 191.50 game 
2 IO NaN  2013-10-14T16:30-05:00[US/Eastern] -6.0 55.50 desk 
3 IO ipo  2013-10-14T16:30-05:00[US/Eastern] 1.0 503.00 NaN 
4 WX yut 2017-10-13T05:00:02-05:00[US/Eastern] 0.0 0.00 read 
5 WX fgf 2017-10-13T05:00:02-05:00[US/Eastern] 0.0  NaN fart 
6 WX deft 2017-10-13T05:00:02-05:00[US/Eastern] 0.0 673.00 red 
7 WX NaN 2017-10-13T05:00:02-05:00[US/Eastern] 0.0  NaN dog 
8 WX hut 2017-10-13T05:00:02-05:00[US/Eastern] 99.0  NaN NaN 
9 GE NaN 2011-11-14T19:30:03-05:00[US/Eastern] NaN  NaN NaN 
10 PP NaN  2012-21-14T18:50-05:00[US/Eastern] NaN  NaN NaN 
11 BI NaN  2014-11-14T12:30-05:00[US/Eastern] NaN  NaN NaN 
12 EZ NaN  2015-12-14T10:00-05:00[US/Eastern] NaN  NaN NaN 
13 JC NaN 2016-10-14T08:30:01-05:00[US/Eastern] NaN  NaN NaN 
関連する問題