2016-04-19 14 views
-1

JSON文字列を、パンダのデータフレームに対して最も細かく解析しようとしています。ネストされたJSONをデータフレームに解析する

試み

は、最初に私が試したread_json:

jsonData = pd.read_json(apiRequest) 

enter image description here

をしかし、大量のデータチャンクはまだnetworkRankの下にネストされています。

次にjson_normalizeを試しましたが、今回は緯度と経度のように1レベル高いデータがありません。

result = json_normalize(json_data['networkRank']) 

enter image description here

Iはまた、ネストされた構造 "に" 解析して最初からデータフレームを構築しようとしたが、エラーでこのコードの結果:

result_nested = json_normalize(json_data, 'networkRank', ['longitude', 'latitude', ['networkRank', 'type3G', 'downloadSpeed']]) 

目標

すべてのフィールドを持つフラットな表にJSONデータを解析するには、図の各データ行に緯度、経度、距離のデータを追加します。2

JSON文字列

{'apiVersion': '2', 
'distance': 10, 
'latitude': '-6.162959', 
'longitude': '35.751607', 
'networkRank': [{'networkId': '6402', 
    'networkName': 'Vodacom', 
    'type3G': {'averageRssiAsu': '9.5429091136', 
    'averageRssiDb': '-69.5664329624972', 
    'downloadSpeed': '1508.1304', 
    'networkId': '6402', 
    'networkName': 'Vodacom', 
    'networkType': '3', 
    'pingTime': '320.9600', 
    'reliability': '0.804236452826138', 
    'sampleSizeRSSI': '948', 
    'sampleSizeSpeed': '29', 
    'uploadSpeed': '893.7692'}}, 
    {'networkId': '6400', 
    'networkName': 'tiGO', 
    'type3G': {'averageRssiAsu': '15.3537142857', 
    'averageRssiDb': '-61.4563389583101', 
    'downloadSpeed': '516.0000', 
    'networkId': '6400', 
    'networkName': 'tiGO', 
    'networkType': '3', 
    'pingTime': '259.0000', 
    'reliability': '0.911904765537807', 
    'sampleSizeRSSI': '935', 
    'sampleSizeSpeed': '21', 
    'uploadSpeed': '320.4211'}}, 
    {'networkId': '6403', 
    'networkName': 'Airtel', 
    'type3G': {'averageRssiAsu': '13.2729999375', 
    'averageRssiDb': '-58.1521092977699', 
    'downloadSpeed': '1080.2500', 
    'networkId': '6403', 
    'networkName': 'Airtel', 
    'networkType': '3', 
    'pingTime': '194.5556', 
    'reliability': '0.554680264185345', 
    'sampleSizeRSSI': '587', 
    'sampleSizeSpeed': '21', 
    'uploadSpeed': '572.1579'}}], 
'network_type': None, 
'perMinuteCurrent': 0, 
'perMinuteLimit': 10, 
'perMonthCurrent': 0, 
'perMonthLimit': 2000} 

答えて

0

1)のpython構造

2に解析JSON文字列)Iterete辞書の 'networkRank' リストの上に、あなたはハッシュ

for data_row in deserialized_json['networkRank']: 
    data_row['latitude'] = deserialized_json['latitude'] 
    # etc 
の内側に追加したい各キーを置きます

3)

yourdataframe = pd.DataFrame(deserialized_json['networkRank']) 
2

この関数は、再帰的に自身を呼び出して、辞書とリストを平坦化します。

from collections import OrderedDict 

def flatten(json_object, container=None, name=''): 
    if container is None: 
     container = OrderedDict() 
    if isinstance(json_object, dict): 
     for key in json_object: 
      flatten(json_object[key], container=container, name=name + key + '_') 
    elif isinstance(json_object, list): 
     for n, item in enumerate(json_object, 1): 
      flatten(item, container=container, name=name + str(n) + '_') 
    else: 
     container[str(name[:-1])] = str(json_object) 
    return container 

例:

flatten([1, 2, 3]) 
OrderedDict([('1', '1'), ('2', '2'), ('3', '3')]) 

flatten([1, 2, 3], name='x') 
OrderedDict([('x1', '1'), ('x2', '2'), ('x3', '3')]) 

flatten({'a': [1, 2, 3], 'b': 4, 'c': {'d': [5, 6], 'e': 7}}, name='x') 
OrderedDict([('xa_1', '1'), 
      ('xa_2', '2'), 
      ('xa_3', '3'), 
      ('xc_e', '7'), 
      ('xc_d_1', '5'), 
      ('xc_d_2', '6'), 
      ('xb', '4')]) 

は応答:

# j = json string 
>>> pd.DataFrame(flatten(j), index=[0]).T 
                 0 
perMinuteLimit          10 
distance            10 
perMonthCurrent          0 
longitude          35.751607 
perMonthLimit          2000 
latitude          -6.162959 
perMinuteCurrent          0 
networkRank_1_networkId       6402 
networkRank_1_type3G_sampleSizeSpeed     29 
networkRank_1_type3G_averageRssiAsu  9.5429091136 
networkRank_1_type3G_pingTime     320.9600 
networkRank_1_type3G_networkType      3 
networkRank_1_type3G_averageRssiDb -69.5664329624972 
networkRank_1_type3G_networkName    Vodacom 
networkRank_1_type3G_networkId      6402 
networkRank_1_type3G_downloadSpeed   1508.1304 
networkRank_1_type3G_uploadSpeed    893.7692 
networkRank_1_type3G_reliability  0.804236452826138 
networkRank_1_type3G_sampleSizeRSSI     948 
networkRank_1_networkName      Vodacom 
networkRank_2_networkId       6400 
networkRank_2_type3G_sampleSizeSpeed     21 
networkRank_2_type3G_averageRssiAsu  15.3537142857 
networkRank_2_type3G_pingTime     259.0000 
networkRank_2_type3G_networkType      3 
networkRank_2_type3G_averageRssiDb -61.4563389583101 
networkRank_2_type3G_networkName     tiGO 
networkRank_2_type3G_networkId      6400 
networkRank_2_type3G_downloadSpeed    516.0000 
networkRank_2_type3G_uploadSpeed    320.4211 
networkRank_2_type3G_reliability  0.911904765537807 
networkRank_2_type3G_sampleSizeRSSI     935 
networkRank_2_networkName       tiGO 
networkRank_3_networkId       6403 
networkRank_3_type3G_sampleSizeSpeed     21 
networkRank_3_type3G_averageRssiAsu  13.2729999375 
networkRank_3_type3G_pingTime     194.5556 
networkRank_3_type3G_networkType      3 
networkRank_3_type3G_averageRssiDb -58.1521092977699 
networkRank_3_type3G_networkName     Airtel 
networkRank_3_type3G_networkId      6403 
networkRank_3_type3G_downloadSpeed   1080.2500 
networkRank_3_type3G_uploadSpeed    572.1579 
networkRank_3_type3G_reliability  0.554680264185345 
networkRank_3_type3G_sampleSizeRSSI     587 
networkRank_3_networkName      Airtel 
network_type          None 
apiVersion           2 
0

何をしたいということでしょうか?

In [22]: df = json_normalize(json_data['networkRank']) 

In [23]: df['distance'] = json_data['distance'] 

In [24]: df['latitude'] = json_data['latitude'] 

In [25]: df['longitude'] = json_data['longitude'] 

In [26]: df 
Out[26]: 
    networkId networkName type3G.averageRssiAsu type3G.averageRssiDb \ 
0  6402  Vodacom   9.5429091136 -69.5664329624972 
1  6400  tiGO   15.3537142857 -61.4563389583101 
2  6403  Airtel   13.2729999375 -58.1521092977699 

    type3G.downloadSpeed type3G.networkId type3G.networkName type3G.networkType \ 
0   1508.1304    6402   Vodacom     3 
1    516.0000    6400    tiGO     3 
2   1080.2500    6403    Airtel     3 

    type3G.pingTime type3G.reliability type3G.sampleSizeRSSI \ 
0  320.9600 0.804236452826138     948 
1  259.0000 0.911904765537807     935 
2  194.5556 0.554680264185345     587 

    type3G.sampleSizeSpeed type3G.uploadSpeed distance latitude longitude 
0      29   893.7692  10 -6.162959 35.751607 
1      21   320.4211  10 -6.162959 35.751607 
2      21   572.1579  10 -6.162959 35.751607 
関連する問題