Python MongoDBクエリーがデータをチャンクに分割します。

以下の手順を実行するPythonスクリプトを作成しています。 60秒後に出てREST APIを使用してServiceNowテーブルへPython MongoDBクエリーがデータをチャンクに分割します。

クエリのMongoDBデータベース解析し、集計結果アップロードデータ

スクリプトの動作、しかし、データセットが大きすぎる、とRESTのトランザクション時間（接続先はServiceNowサーバーによって閉じられます）。

データをチャンクに分割し、各データチャンクに対して別々のRESTトランザクションを送信して、完全なデータセットがPOSTによって送信され、タイムアウト制限を超えないようにする必要があります。

以下のスクリプトを変更して、この目標を達成するにはどうすればよいですか？

#!/usr/bin/env python 

from config import * 

import os, sys 

mypath = os.path.dirname(os.path.realpath(__file__)) 
sys.path.append(os.path.join(mypath, "api-python-client")) 

from apiclient.mongo import * 

from pymongo import MongoClient 

import json 

import requests 

from bson.json_util import dumps 

client = MongoClient(mongo_uri) 

#Create ServiceNow URL 
svcnow_url = create_svcnow_url('u_imp_cmps') 

#BITSDB Nmap Collection 
db = client[mongo_db] 

#Aggregate - RDBMS equivalent to Alias select x as y 
#Rename fields to match ServiceNow field names 
computers = db['computer'].aggregate([ 
     {"$unwind": "$hostnames"}, 
     {"$project" : { 
       "_id":0, 
       "u_hostname": "$hostnames.name", 
       "u_ipv4": "$addresses.ipv4", 
       "u_status": "$status.state", 
       "u_updated_timestamp": "$last_seen" 
     }} 

]) 

j = dumps({"records":computers}) 
#print(j) 


#Set proper headers 
headers = {"Content-Type":"application/json","Accept":"application/json"} 

#Build HTTP Request 
response = requests.post(url=svcnow_url, auth=(svcnow_user, svcnow_pwd), headers=headers ,data=j) 

#Check for HTTP codes other than 200 
if response.status_code != 200: 
     print('Status:', response.status_code, 'Headers:', response.headers, 'Response Text', response.text, 'Error Response:',response.json()) 
     exit() 

#Decode the JSON response into a dictionary and use the data 
print('Status:',response.status_code,'Headers:',response.headers,'Response:',response.json())

更新：私は計画がありますが、これを正確に実装する方法がわかりません。

バッチが一杯になると1000の固定されたバッチサイズにカーソルが各

を記録設定し

、JSONの出力を作成し、ループの中で要求
を介してデータを送信します。新しいバッチをつかみ、送り続けます目的地への各バッチデータセット全体が

https://docs.mongodb.com/v3.0/reference/method/cursor.batchSize/

に到達するまで、基本的に私は、私はバッチとloopInのを作成することによってこの問題を解決することができると思いますg毎回新しいAPI呼び出しを使用してバッチを実行します。これが良い計画であり、解決策を実装する方法があれば、そこにいる誰かが何か考えを持っているかどうか、私に知らせてください。ありがとう。

出典

2016-06-17 pengz

誰もが、少なくとも上の任意の提案を持っています始めること？私は立ち往生している。ありがとうございました。 – pengz

j = dumps({"records":computers})はリストを返すので、j[x]を呼び出すか、forループを繰り返して1つのデータエントリを簡単に指すことができます。これらのエントリのそれぞれは、ServiceNowが受け入れる必要があります。

# Set proper headers (these are always the same, so this 
# can be assigned outside of the for loop) 
headers = {"Content-Type":"application/json","Accept":"application/json"} 

for data_point in j: 

    #Build HTTP Request (Note we are using data_point instead of j) 
    response = requests.post(url=svcnow_url, auth=(svcnow_user, svcnow_pwd), headers=headers ,data=data_point) 

    #Check for HTTP codes other than 200 
    if response.status_code != 200: 
     print('Status:', response.status_code, 'Headers:', response.headers, 'Response Text', response.text, 'Error Response:',response.json()) 
    else: 
     # This is a response of success for a single record 
     print('Status:',response.status_code,'Headers:',response.headers,'Response:',response.json()) 

exit()

MongoDBに100個の新しいエントリがある場合、ServiceNowに対して100回のPOST呼び出しが行われます。 ServiceNowインスタンスはロードを処理できる必要があり、読み込みに失敗したレコードを簡単に識別できます。

あなたが何らかの理由でコール数を凝縮させるために必要がある場合は、私がone-liner featured in this answerのように、「サブリスト」にリストを分割することをお勧めしたい：

# Set proper headers (these are always the same, so this 
# can be assigned outside of the for loop) 
headers = {"Content-Type":"application/json","Accept":"application/json"} 

# Each POST will send up to 10 records of data 
split_size = 10 

# Note the two places where our split_size variable is used 
for data_point in [j[x:x+split_size] for x in xrange(0, len(j), split_size)]: 

    #Build HTTP Request (Note we are using data_point instead of j) 
    response = requests.post(url=svcnow_url, auth=(svcnow_user, svcnow_pwd), headers=headers ,data=data_point) 

    #Check for HTTP codes other than 200 
    if response.status_code != 200: 
     print('Status:', response.status_code, 'Headers:', response.headers, 'Response Text', response.text, 'Error Response:',response.json()) 
    else: 
     # This is a response of success for a single record 
     print('Status:',response.status_code,'Headers:',response.headers,'Response:',response.json()) 

exit()

出典

2016-06-21 03:45:50

こんにちはスティーブは答えに感謝します。私は、ループごとに文字ごとにデータをループする問題があります（レコードごとではありません）。言い換えれば、 "j：print（data_point）のdata_point"を実行すると、各レコードではなく、すべての文字が1つずつ返されます。どのように私はこの問題を緩和することができますか？ありがとう！ – pengz

ああ、なぜそれがそれをしているのか分かっていると思う。変数 "j"のデータ型はリストではなく文字列です。 "print（type（j））" 。 – pengz

カーソルデータをリストに変換してリストを反復することで修正できました。あなたの助けをもう一度ありがとう。 – pengz

Python MongoDBクエリーがデータをチャンクに分割します。

答えて

関連する問題