2017-03-21 6 views
1

こんにちは、私はJSONで働いている、このJSONは、いくつかの会話が含まれ、フォーマットは以下の通りです:次のようにブラケットからブラケットへ が完全converstationが含まれていますパンダを使って次のjsonを解析するには?

[ 
    { 
     "created": "2017-02-02T11:57:41+0000", 
     "from": "Bank", 
     "message": "Hi Alex, if you have not perform the modification to the data, please verify your DNI, celphone and the operator to verify it. Thanks." 
    }, 
    { 
     "created": "2017-02-01T22:19:58+0000" , 
     "from": "Alex ", 
     "message": "Could someone please help me?, I am callig to CC and they don't answer" 
    }, 
    { 
     "created": "2017-02-01T22:19:42+0000", 
     "from": "Alex ", 
     "message": "the sms with the corresponding key and token has not arrived" 
    }, 
    { 
     "created": "2017-02-01T22:19:28+0000", 
     "from": "Alex ", 
     "message": "I have issues to make payments from the app" 
    }, 
    { 
     "created": "2017-02-01T22:19:18+0000", 
     "from": "Alex ", 
     "message": "Good afternoon" 
    } 
], 

私はこのJSONをパースしたいと思いますがカラム内の質問を取得し、次のように2番目のカラムに、銀行のために常に提供anwersでそれと一致し、 最初の相互作用のためにするためには、次のようになります。

すべてのユーザーのコメント:

「グッド午後、私は誰かが私を助けてもらえます?私はCCにcalligています、対応するキーとトークンとのSMSが到着していない、アプリからの支払いに問題があると、彼らは「

すべての答えを答えていない:

"こんにちはアレックス、データの変更を行っていない場合は、DNI、携帯電話、オペレーターが確認してください。 。おかげで」

私の所望の出力は、この2つの列を構築するために、すべてのJSONを解析することですが、あなたがソートすべての時間と対応する日付によって、私は私が試したこの を取得するために注文することができていることに気づく:

with open('/home/adolfo/Desktop/CONVERSATIONS/test2.json') as json_data: 
    d = json.load(json_data) 
    df = pd.DataFrame.from_records(np.concatenate(d)) 

print(df) 

しかし、私が得た:

     created from \ 
0 2017-02-02T11:57:41+0000 Bank 
1 2017-02-01T22:19:58+0000 Alex  
2 2017-02-01T22:19:42+0000 Alex  
3 2017-02-01T22:19:28+0000 Alex  
4 2017-02-01T22:19:18+0000 Alex  
5 2017-02-02T11:57:41+0000 Bank 
6 2017-02-01T22:19:58+0000 Alex  
7 2017-02-01T22:19:42+0000 Alex  
8 2017-02-01T22:19:28+0000 Alex  
9 2017-02-01T22:19:18+0000 Alex  
10 2017-02-01T22:19:12+0000 Bank 
11 2017-02-01T16:22:30+0000 Alex 

               message 
0 Hi Alex, if you have not perform the modificat... 
1 Could someone please help me?, I am callig to ... 
2 the sms with the corresponding key and token h... 
3   I have issues to make payments from the app 
4          Good afternoon 
5 Hi Alex, if you have not perform the modificat... 
6 Could someone please help me?, I am callig to ... 
7 the sms with the corresponding key and token h... 
8   I have issues to make payments from the app 
9          Good afternoon 
10 Hello Alexander, the money is available to be... 
11 hello they have deposited the money into my ac... 

だから私は本当にこのJSONの例では、このタスクを達成するためのサポートに感謝:

[ 
    [ 
     { 
      "created": "2017-02-02T11:57:41+0000", 
      "from": "Bank", 
      "message": "Hi Alex, if you have not perform the modification to the data, please verify your DNI, celphone and the operator to verify it. Thanks." 
     }, 
     { 
      "created": "2017-02-01T22:19:58+0000" , 
      "from": "Alex ", 
      "message": "Could someone please help me?, I am callig to CC and they don't answer" 
     }, 
     { 
      "created": "2017-02-01T22:19:42+0000", 
      "from": "Alex ", 
      "message": "the sms with the corresponding key and token has not arrived" 
     }, 
     { 
      "created": "2017-02-01T22:19:28+0000", 
      "from": "Alex ", 
      "message": "I have issues to make payments from the app" 
     }, 
     { 
      "created": "2017-02-01T22:19:18+0000", 
      "from": "Alex ", 
      "message": "Good afternoon" 
     } 
    ], 
    [ 
     { 
      "created": "2017-02-01T22:19:12+0000", 
      "from": "Bank", 
      "message": " Hello Alexander, the money is available to be withdrawn, you could go to any store the number is 70307002459" 
     }, 
     {    
      "created": "2017-02-01T16:22:30+0000", 
      "from": "Alex", 
      "message": "hello they have deposited the money into my account, I don't have account from this bank, Could I know if I can withdraw the money? DNI 427 thanks a lot" 
     } 

    ] 


] 

はこちらから有用なフィードバックの後、私が試した:

df = pd.read_json('/home/adolfo/Desktop/CONVERSATIONS/test2.json') 

df.created = pd.to_datetime(df.created) 

df.assign(qna=np.where(df['from'] == 'Bank', 'Answer', 'Question')).set_index(['created', 'qna']).message.unstack(fill_value='') 

私が得た:

--------------------------------------------------------------------------- 
AttributeError       Traceback (most recent call last) 
<ipython-input-44-8881c5d91cd0> in <module>() 
    63 df = pd.read_json('/home/adolfo/Desktop/CONVERSATIONS/test2.json') 
    64 
---> 65 df.created = pd.to_datetime(df.created) 
    66 
    67 df.assign(qna=np.where(df['from'] == 'Bank', 'Answer', 'Question')).set_index(['created', 'qna']).message.unstack(fill_value='') 

/usr/local/lib/python3.5/dist-packages/pandas/core/generic.py in __getattr__(self, name) 
    2742    if name in self._info_axis: 
    2743     return self[name] 
-> 2744    return object.__getattribute__(self, name) 
    2745 
    2746  def __setattr__(self, name, value): 

AttributeError: 'DataFrame' object has no attribute 'created' 
+0

私はデフォルトの解析と間違っているものを私に明確ではありません。 –

+0

@StephenRauch最初のアプローチの問題点は、必要な方法でソートされていないことです。すべてのコメントが付いた2つの列と、すべての回答が2番目の列になっている必要があります。 – neo33

+0

どのように区別しますかコメントと回答の間に –

答えて

1
j = """[ 
    [ 
     { 
      "created": "2017-02-02T11:57:41+0000", 
      "from": "Bank", 
      "message": "Hi Alex, if you have not perform the modification to the data, please verify your DNI, celphone and the operator to verify it. Thanks." 
     }, 
     { 
      "created": "2017-02-01T22:19:58+0000" , 
      "from": "Alex ", 
      "message": "Could someone please help me?, I am callig to CC and they don't answer" 
     }, 
     { 
      "created": "2017-02-01T22:19:42+0000", 
      "from": "Alex ", 
      "message": "the sms with the corresponding key and token has not arrived" 
     }, 
     { 
      "created": "2017-02-01T22:19:28+0000", 
      "from": "Alex ", 
      "message": "I have issues to make payments from the app" 
     }, 
     { 
      "created": "2017-02-01T22:19:18+0000", 
      "from": "Alex ", 
      "message": "Good afternoon" 
     } 
    ], 
    [ 
     { 
      "created": "2017-02-01T22:19:12+0000", 
      "from": "Bank", 
      "message": " Hello Alexander, the money is available to be withdrawn, you could go to any store the number is 70307002459" 
     }, 
     {    
      "created": "2017-02-01T16:22:30+0000", 
      "from": "Alex", 
      "message": "hello they have deposited the money into my account, I don't have account from this bank, Could I know if I can withdraw the money? DNI 427 thanks a lot" 
     } 

    ] 


]""" 

js = json.loads(j) 
df = pd.concat({i: pd.DataFrame(j) for i, j in enumerate(js)}) 

df.created = pd.to_datetime(df.created) 

df.assign(qna=np.where(df['from'] == 'Bank', 'Answer', 'Question')).set_index(['created', 'qna']).message.unstack(fill_value='') 

enter image description here

+0

こんにちは、本当に感謝しますが、実際のデータセット私は – neo33

+0

あなたの例は本当にうまく動作しますが、実際のデータのパスを入れたら何が起こっているのか分かりません – neo33

+0

@ neo33出力を使ってあなたの投稿を編集してください'print(df.head())' – piRSquared

関連する問題