Pandasを使用してサブレベルのJSONデータを読み取る

Pandasを使用してサブレベルのデータを読み取っているうちに、私は立ち往生しています。Pandasを使用してサブレベルのJSONデータを読み取る

背景：私は一連のデータをダウンロードするNYTアーカイブAPIを使用

、私は実際にそれでJSONオブジェクトのリストを持っているJSONファイルにそれを保存します。

手順：

私はread_jsonメソッドを使用してJSONファイルを読み込みます。

pandas_df = pd.read_json("data.json")

私は頭を使ったサンプルの結果を見ると、それは以下のようになります。

pandas_df.head() 
    copyright \ 
0 Copyright (c) 2013 The New York Times Company.... 
1 Copyright (c) 2013 The New York Times Company.... 
2 Copyright (c) 2013 The New York Times Company.... 
3 Copyright (c) 2013 The New York Times Company.... 
4 Copyright (c) 2013 The New York Times Company.... 

              response 
0 {'docs': [{'subsection_name': None, 'slideshow... 
1 {'docs': [{'subsection_name': None, 'slideshow... 
2 {'docs': [{'subsection_name': None, 'slideshow... 
3 {'docs': [{'subsection_name': None, 'slideshow... 
4 {'docs': [{'subsection_name': None, 'slideshow...

私は唯一の応答で情報を必要とします。だから私は以下のようにコードを変更する場合：

print(pandas_df["response"].head()) 
0 {'docs': [{'subsection_name': None, 'slideshow... 
1 {'docs': [{'subsection_name': None, 'slideshow... 
2 {'docs': [{'subsection_name': None, 'slideshow... 
3 {'docs': [{'subsection_name': None, 'slideshow... 
4 {'docs': [{'subsection_name': None, 'slideshow... 
Name: response, dtype: object

質問：

がどのように私は、ドキュメント内の要素を使用してデータを取り出すことができますか？サブセクションのように、スライドショーなど。データフレームのような表形式で見ることができますか？

詳細情報が必要な場合は教えてください。

ありがとうございました。

EDIT 1：

JSONファイルから最初の要素を追加します。このファイルは1GBほど大きすぎます。

{ 
    "copyright": "Copyright (c) 2013 The New York Times Company. All Rights Reserved.", 
    "response": { 
    "meta": { 
     "hits": 7652 
    }, 
    "docs": [ 
     { 
     "web_url": "http://www.nytimes.com/interactive/2016/technology/personaltech/cord-cutting-guide.html", 
     "snippet": "We teamed up with The Wirecutter to come up with cord-cutter bundles for movie buffs, sports addicts, fans of premium TV shows, binge watchers and families with children.", 
     "lead_paragraph": "We teamed up with The Wirecutter to come up with cord-cutter bundles for movie buffs, sports addicts, fans of premium TV shows, binge watchers and families with children.", 
     "abstract": null, 
     "print_page": null, 
     "blog": [], 
     "source": "The New York Times", 
     "multimedia": [ 
      { 
      "width": 190, 
      "url": "images/2016/10/13/business/13TECHFIX/06TECHFIX-thumbWide.jpg", 
      "height": 126, 
      "subtype": "wide", 
      "legacy": { 
       "wide": "images/2016/10/13/business/13TECHFIX/06TECHFIX-thumbWide.jpg", 
       "wideheight": "126", 
       "widewidth": "190" 
      }, 
      "type": "image" 
      }, 
      { 
      "width": 600, 
      "url": "images/2016/10/13/business/13TECHFIX/06TECHFIX-articleLarge.jpg", 
      "height": 346, 
      "subtype": "xlarge", 
      "legacy": { 
       "xlargewidth": "600", 
       "xlarge": "images/2016/10/13/business/13TECHFIX/06TECHFIX-articleLarge.jpg", 
       "xlargeheight": "346" 
      }, 
      "type": "image" 
      }, 
      { 
      "width": 75, 
      "url": "images/2016/10/13/business/13TECHFIX/06TECHFIX-thumbStandard.jpg", 
      "height": 75, 
      "subtype": "thumbnail", 
      "legacy": { 
       "thumbnailheight": "75", 
       "thumbnail": "images/2016/10/13/business/13TECHFIX/06TECHFIX-thumbStandard.jpg", 
       "thumbnailwidth": "75" 
      }, 
      "type": "image" 
      } 
     ], 
     "headline": { 
      "main": "The Definitive Guide to Cord-Cutting in 2016, Based on Your Habits", 
      "kicker": "Tech Fix" 
     }, 
     "keywords": [ 
      { 
      "rank": "1", 
      "is_major": "N", 
      "name": "subject", 
      "value": "Video Recordings, Downloads and Streaming" 
      }, 
      { 
      "rank": "2", 
      "is_major": "N", 
      "name": "subject", 
      "value": "Television Sets and Media Devices" 
      }, 
      { 
      "rank": "1", 
      "is_major": "Y", 
      "name": "subject", 
      "value": "Television" 
      } 
     ], 
     "pub_date": "2016-01-01T05:00:00Z", 
     "document_type": "multimedia", 
     "news_desk": "Technology/Personal Tech", 
     "section_name": "Technology", 
     "subsection_name": "Personal Tech", 
     "byline": { 
      "person": [ 
      { 
       "firstname": "Brian", 
       "middlename": "X.", 
       "lastname": "CHEN", 
       "rank": 1, 
       "role": "reported", 
       "organization": "" 
      } 
      ], 
      "original": "By BRIAN X. CHEN" 
     }, 
     "type_of_material": "Interactive Feature", 
     "_id": "57fdfb9895d0e022439c2b57", 
     "word_count": null, 
     "slideshow_credits": null 
     }]}}

出典

2017-03-31 disp_name

最初の数行のローのJSON全体をポストできますか？ –

追加しました。どうぞご覧ください。 –

"docs"内のほとんどの値を読みたい –

あなたがデータフレームにresponse辞書にネストされているdocsリストの下にあるすべての要素を抽出することができるはずです。

import json 
with open('data.json') as f: 
    data = json.load(f) 
df = pd.DataFrame(data['response']['docs'])

出典

2017-03-31 14:50:34

最後の行はエラーです。TypeError：リストインデックスはstrではなく整数またはスライスでなければなりませんなぜそうですか？リストの中に複数のJSONオブジェクトを含むファイルを読み込んでいるのですか？ –

jsonの入力を少し修正しましたが、閉じ括弧と2つの中括弧を追加しました。その正確なjsonを直接ファイルにコピーし、コードを再実行してください。それは動作するはずです。 –

Pandasを使用してサブレベルのJSONデータを読み取る

答えて

関連する問題