Pandasを使用してサブレベルのデータを読み取っているうちに、私は立ち往生しています。Pandasを使用してサブレベルのJSONデータを読み取る
背景:私は一連のデータをダウンロードするNYTアーカイブAPIを使用
、私は実際にそれでJSONオブジェクトのリストを持っているJSONファイルにそれを保存します。
手順:
私はread_jsonメソッドを使用してJSONファイルを読み込みます。
pandas_df = pd.read_json("data.json")
私は頭を使ったサンプルの結果を見ると、それは以下のようになります。
pandas_df.head()
copyright \
0 Copyright (c) 2013 The New York Times Company....
1 Copyright (c) 2013 The New York Times Company....
2 Copyright (c) 2013 The New York Times Company....
3 Copyright (c) 2013 The New York Times Company....
4 Copyright (c) 2013 The New York Times Company....
response
0 {'docs': [{'subsection_name': None, 'slideshow...
1 {'docs': [{'subsection_name': None, 'slideshow...
2 {'docs': [{'subsection_name': None, 'slideshow...
3 {'docs': [{'subsection_name': None, 'slideshow...
4 {'docs': [{'subsection_name': None, 'slideshow...
私は唯一の応答で情報を必要とします。だから私は以下のようにコードを変更する場合:
print(pandas_df["response"].head())
0 {'docs': [{'subsection_name': None, 'slideshow...
1 {'docs': [{'subsection_name': None, 'slideshow...
2 {'docs': [{'subsection_name': None, 'slideshow...
3 {'docs': [{'subsection_name': None, 'slideshow...
4 {'docs': [{'subsection_name': None, 'slideshow...
Name: response, dtype: object
質問:
がどのように私は、ドキュメント内の要素を使用してデータを取り出すことができますか?サブセクションのように、スライドショーなど。データフレームのような表形式で見ることができますか?
詳細情報が必要な場合は教えてください。
ありがとうございました。
EDIT 1:
JSONファイルから最初の要素を追加します。このファイルは1GBほど大きすぎます。
{
"copyright": "Copyright (c) 2013 The New York Times Company. All Rights Reserved.",
"response": {
"meta": {
"hits": 7652
},
"docs": [
{
"web_url": "http://www.nytimes.com/interactive/2016/technology/personaltech/cord-cutting-guide.html",
"snippet": "We teamed up with The Wirecutter to come up with cord-cutter bundles for movie buffs, sports addicts, fans of premium TV shows, binge watchers and families with children.",
"lead_paragraph": "We teamed up with The Wirecutter to come up with cord-cutter bundles for movie buffs, sports addicts, fans of premium TV shows, binge watchers and families with children.",
"abstract": null,
"print_page": null,
"blog": [],
"source": "The New York Times",
"multimedia": [
{
"width": 190,
"url": "images/2016/10/13/business/13TECHFIX/06TECHFIX-thumbWide.jpg",
"height": 126,
"subtype": "wide",
"legacy": {
"wide": "images/2016/10/13/business/13TECHFIX/06TECHFIX-thumbWide.jpg",
"wideheight": "126",
"widewidth": "190"
},
"type": "image"
},
{
"width": 600,
"url": "images/2016/10/13/business/13TECHFIX/06TECHFIX-articleLarge.jpg",
"height": 346,
"subtype": "xlarge",
"legacy": {
"xlargewidth": "600",
"xlarge": "images/2016/10/13/business/13TECHFIX/06TECHFIX-articleLarge.jpg",
"xlargeheight": "346"
},
"type": "image"
},
{
"width": 75,
"url": "images/2016/10/13/business/13TECHFIX/06TECHFIX-thumbStandard.jpg",
"height": 75,
"subtype": "thumbnail",
"legacy": {
"thumbnailheight": "75",
"thumbnail": "images/2016/10/13/business/13TECHFIX/06TECHFIX-thumbStandard.jpg",
"thumbnailwidth": "75"
},
"type": "image"
}
],
"headline": {
"main": "The Definitive Guide to Cord-Cutting in 2016, Based on Your Habits",
"kicker": "Tech Fix"
},
"keywords": [
{
"rank": "1",
"is_major": "N",
"name": "subject",
"value": "Video Recordings, Downloads and Streaming"
},
{
"rank": "2",
"is_major": "N",
"name": "subject",
"value": "Television Sets and Media Devices"
},
{
"rank": "1",
"is_major": "Y",
"name": "subject",
"value": "Television"
}
],
"pub_date": "2016-01-01T05:00:00Z",
"document_type": "multimedia",
"news_desk": "Technology/Personal Tech",
"section_name": "Technology",
"subsection_name": "Personal Tech",
"byline": {
"person": [
{
"firstname": "Brian",
"middlename": "X.",
"lastname": "CHEN",
"rank": 1,
"role": "reported",
"organization": ""
}
],
"original": "By BRIAN X. CHEN"
},
"type_of_material": "Interactive Feature",
"_id": "57fdfb9895d0e022439c2b57",
"word_count": null,
"slideshow_credits": null
}]}}
最初の数行のローのJSON全体をポストできますか? –
追加しました。どうぞご覧ください。 –
"docs"内のほとんどの値を読みたい –