Python 3.5を使ってパンダでMongoDBを読む

私のつぶやきデータベースMongoDBは以下のスキーマを持っています。これは別の列のpandas Dataframeで読んでみたいと思います。また、ハッシュタグのテキストとインデックスの内部コンポーネントが必要です。Python 3.5を使ってパンダでMongoDBを読む

{ 
     "_id" : ObjectId("5a11200441f0c41f447ce56c"), 
     "created" : ISODate("2017-11-19T06:09:06Z"), 
     "text" : "#Bitcoin Hong Kong's bitcoin businesses suffer after local bank accounts frozen , 
     "username" : "PennyStocksMomo", 
     "hashtags" : [ 
       { 
         "text" : "Bitcoin", 
         "indices" : [ 
           0, 
           8 
         ] 
       } 
     ], 
     "language" : "en", 
     "id" : "932128582767296512", 
     "followers" : 5715 
}

"EDIT"

私は以下のコードを使用していました。

import pandas as pd 
from pymongo import MongoClient 
client = MongoClient() 
db = client.BitCoinDatabase 
collection = db.tweets 
data = pd.DataFrame(list(collection.find())) 



_id created followers hashtags id language text username 
0 5a11200441f0c41f447ce56c 2017-11-19 06:09:06 5715 [{'text': 'Bitcoin', 'indices': [0, 8]}] 932128582767296512 en #Bitcoin Hong Kong's bitcoin businesses suffer... PennyStocksMomo 
1 5a11200441f0c41f447ce56d 2017-11-19 06:09:06 19526 [{'text': 'Bitcoin', 'indices': [0, 8]}] 932128583077675008 en #Bitcoin Hong Kong's bitcoin businesses suffer... CryptoTraderPro

出典

2017-11-19 Amar Kumar

私は[この]（httpsを信じる：// stackoverflowの。 com/q/16249736/2901002）が役立ちます。 – jezrael

私はデータを読むことができますが、ハッシュタグのテキストと、編集のように表示されているインデックスも必要です。 –

'hashtags'は' list'の 'dict'に1行だけありますか？ – jezrael

pymongo

インポートが列hashtagsと元にjoinでDataFrameを作成します。

お知らせ - hashtags欄のlistの行ごとに1つしかない場合dictソリューションが動作します。

df = df.join(pd.DataFrame(df['hashtags'].str[0].values.tolist()).add_suffix('_hash')) 
print (df) 
         _id   created followers \ 
0 5a11200441f0c41f447ce56c 2017-11-19 06:09:06 5715 
1 5a11200441f0c41f447ce56d 2017-11-19 06:09:06 19526 

            hashtags     id language \ 
0 [{'text': 'Bitcoin', 'indices': [0, 8]}] 932128582767296512  en 
1 [{'text': 'Bitcoin', 'indices': [0, 8]}] 932128583077675008  en 

               text   username \ 
0 #Bitcoin Hong Kong's bitcoin businesses suffer... PennyStocksMomo 
1 #Bitcoin Hong Kong's bitcoin businesses suffer... CryptoTraderPro 

    indices_hash text_hash 
0  [0, 8] Bitcoin 
1  [0, 8] Bitcoin

EDIT：

AttributeError: 'float' object has no attribute 'keys'

一部NaN sの値があることを意味します。

私はそれをシミュレートしてみてください。

print (df) 
         _id   created followers \ 
0 5a11200441f0c41f447ce56c 2017-11-19 06:09:06 5715 
1 5a11200441f0c41f447ce56d 2017-11-19 06:09:06 19526 
2 5a11200441f0c41f447ce56c 2017-11-19 06:09:06 5715 

            hashtags     id language \ 
0 [{'text': 'Bitcoin', 'indices': [0, 8]}] 932128582767296512  en 
1          NaN 932128583077675008  en 
2 [{'text': 'Bitcoin', 'indices': [0, 8]}] 932128582767296512  en 

               text   username 
0 #Bitcoin Hong Kong's bitcoin businesses suffer... PennyStocksMomo 
1 #Bitcoin Hong Kong's bitcoin businesses suffer... CryptoTraderPro 
2 #Bitcoin Hong Kong's bitcoin businesses suffer... PennyStocksMomo

ソリューションは、最初のNaNsを削除し、アライン・データのためのDataFrameコンストラクタにindexパラメータを追加します：

hashtags = df['hashtags'].dropna() 
df = df.join(pd.DataFrame(hashtags.str[0].values.tolist(), 
      index=hashtags.index).add_suffix('_hash')) 
print (df) 
         _id   created followers \ 
0 5a11200441f0c41f447ce56c 2017-11-19 06:09:06 5715 
1 5a11200441f0c41f447ce56d 2017-11-19 06:09:06 19526 
2 5a11200441f0c41f447ce56c 2017-11-19 06:09:06 5715 

            hashtags     id language \ 
0 [{'text': 'Bitcoin', 'indices': [0, 8]}] 932128582767296512  en 
1          NaN 932128583077675008  en 
2 [{'text': 'Bitcoin', 'indices': [0, 8]}] 932128582767296512  en 

               text   username \ 
0 #Bitcoin Hong Kong's bitcoin businesses suffer... PennyStocksMomo 
1 #Bitcoin Hong Kong's bitcoin businesses suffer... CryptoTraderPro 
2 #Bitcoin Hong Kong's bitcoin businesses suffer... PennyStocksMomo 

    indices_hash text_hash 
0  [0, 8] Bitcoin 
1   NaN  NaN 
2  [0, 8] Bitcoin

出典

2017-11-19 08:47:58 jezrael

これは役に立ちました、私はupvoted、しかし、それは私の場合に働いた。 –

問題は何ですか？ – jezrael

私はこれを使用しています。df = df.join（pdfDataFrame（df ['hashtags']。str [0] .values.tolist（））。add_suffix（ '_ hash'））キーエラーハストタグ、df =データ（上記のコード中のデータフレーム） –

Python 3.5を使ってパンダでMongoDBを読む

答えて

関連する問題