マルチプロセッシングとPySftpによる並列ダウンロード

pysftpとマルチプロセッシングのlibsを使用して同じタイプのN個のファイルをダウンロードするコードを作成しようとしています。私は基本的なPythonトレーニングを行い、コードを取得してそれらを1つにまとめましたが、うまく動かすことはできません。誰かが私にそれを手伝っていただければ幸いです。エラーは、vFtp.close（）コマンドの後に発生します。同時ダウンロードを開始すると思われる部分では、マルチプロセッシングとPySftpによる並列ダウンロード

from multiprocessing import Pool 
import pysftp 
import os 

vHost='10.11.12.13' 
vLogin='admin' 
vPwd='pass1234' 
vFtpPath='/export/home/' 

os.chdir('d:/test/') 
os.getcwd() 

cnopts=pysftp.CnOpts() 
cnopts.hostkeys = None 

vFtp=pysftp.Connection(vHost,username=vLogin,password=vPwd,cnopts=cnopts) 
vFtp.cwd(vFtpPath) 
vObjectList=vFtp.listdir() 
vFileList=[] 
vFoldList=[] 

for vObject in vObjectList: 
    vType=str(vFtp.lstat(vObject))[:1] 
    if vType!='d': 
     vFileList.append(vObject) 
    else: 
     vFoldList.append(vObject) 

vFtp.close() 

def fDownload(vFileAux): 
    vFtpAux=pysftp.Connection(vHost,username=vLogin,password=vPwd,cnopts=cnopts) 
    vFtpAux.cwd(vFtpPath) 
    vFtpAux.get(vFileAux,preserve_mtime=True) 
    vFtpAux.close() 

if __name__ == "__main__": 
    vPool=Pool(3) 
    vPool.map(fDownload,vFileList)

出典

2017-08-12 Thiago Matsui

ファイルリストを取得して複数のプロセスを同時に使用しているようです。代わりに、手動でファイルを調べる

は、接続オブジェクトにwalktreeメソッドを使用してみてください：pysftp walktreeここ

は、私は、Python 3.5で行われた実施例です。私はちょうどローカルftpサーバーと小さなファイルを使用しているので、私はダウンロードの遅延をシミュレートしました。同時ダウンロード数を設定するには、max_workers引数を変更します。

"""Demo using sftp to download files simultaneously.""" 
import pysftp 
import os 
from concurrent.futures import ProcessPoolExecutor 
import time 


def do_nothing(s): 
    """ 
    Using this as the callback for directories and unknown items found 
    using walktree. 
    """ 
    pass 


def download(file): 
    """ 
    Simulates a 1-second download. 
    """ 
    with pysftp.Connection(
      host='convox', username='abc', private_key='/home/abc/test') as sftp: 

     time.sleep(1) 
     print('Downloading {}'.format(file)) 
     sftp.get(file) 


def get_list_of_files(remote_dir): 
    """ 
    Walks remote directory tree and returns list of files. 
    """ 
    with pysftp.Connection(
      host='convox', username='abc', private_key='/home/abc/test') as sftp: 

     files = [] 

     # if this finds a file it will send the filename to the file callback 
     # which in this case just appends to the 'files' list 
     sftp.walktree(remote_dir, fcallback=files.append, 
         dcallback=do_nothing, ucallback=do_nothing) 

    return files 

if __name__ == '__main__': 
    remote_dir = '/home/abc/remoteftp/' 
    download_target = '/home/abc/localftp/' 

    # if you don't specify a localpath in sftp.get then it just downloads to 
    # the os cwd, so set it here 
    os.chdir(download_target) 

    files = get_list_of_files(remote_dir) 
    pool = ProcessPoolExecutor(max_workers=4) 
    pool.map(download, files)

編集：ProcessPoolExecutorは、複数のCPUコア上で何かを実行するためであり、あなたのプロセッサによって制限されます。ダウンロードのようなネットワークタスクでは、代わりにスレッドを使用できます。上記のコードでは、これは唯一の変更です：ProcessPoolExecutorの代わりにThreadPoolExecutorをインポートして使用してください。次に、max_workersを使用することができます。

出典

2017-08-13 16:39:11 Teleodynamics

ありがとう、非常に簡単で明確な例です。私はいくつかのテストを行い、その結果をあなたに知らせます。 –

マルチプロセッシングとPySftpによる並列ダウンロード

答えて

関連する問題