2016-08-31 4 views
0

分散スケジューラでdaskを使用しています。私はs3のcsvを介してすべてのワーカーノードに読み込まれたデータセットを複製しようとしています。例:daskを持つデータセットをすべての作業者にレプリケートします

from distributed import Executor 
import dask.dataframe as dd 

e= Executor('127.0.0.1:8786',set_as_default=True) 
df = dd.read_csv('s3://bucket/file.csv', blocksize=None) 
df = e.persist(df) 
e.replicate(df) 

distributed.utils - ERROR - unhashable type: 'list' 
Traceback (most recent call last): 
    File "/root/.miniconda/envs/dask_env/lib/python3.5/site-packages/distributed/utils.py", line 102, in f 
    result[0] = yield gen.maybe_future(func(*args, **kwargs)) 
    File "/root/.miniconda/envs/dask_env/lib/python3.5/site-packages/tornado/gen.py", line 1015, in run 
    value = future.result() 
    File "/root/.miniconda/envs/dask_env/lib/python3.5/site-packages/tornado/concurrent.py", line 237, in result 
    raise_exc_info(self._exc_info) 
    File "<string>", line 3, in raise_exc_info 
    File "/root/.miniconda/envs/dask_env/lib/python3.5/site-packages/tornado/gen.py", line 1021, in run 
    yielded = self.gen.throw(*exc_info) 
    File "/root/.miniconda/envs/dask_env/lib/python3.5/site-packages/distributed/executor.py", line 1347, in _replicate 
    branching_factor=branching_factor) 
    File "/root/.miniconda/envs/dask_env/lib/python3.5/site-packages/tornado/gen.py", line 1015, in run 
    value = future.result() 
    File "/root/.miniconda/envs/dask_env/lib/python3.5/site-packages/tornado/concurrent.py", line 237, in result 
    raise_exc_info(self._exc_info) 
    File "<string>", line 3, in raise_exc_info 
    File "/root/.miniconda/envs/dask_env/lib/python3.5/site-packages/tornado/gen.py", line 1021, in run 
    yielded = self.gen.throw(*exc_info) 
    File "/root/.miniconda/envs/dask_env/lib/python3.5/site-packages/distributed/core.py", line 444, in send_recv_from_rpc 
    result = yield send_recv(stream=stream, op=key, **kwargs) 
    File "/root/.miniconda/envs/dask_env/lib/python3.5/site-packages/tornado/gen.py", line 1015, in run 
    value = future.result() 
    File "/root/.miniconda/envs/dask_env/lib/python3.5/site-packages/tornado/concurrent.py", line 237, in result 
    raise_exc_info(self._exc_info) 
    File "<string>", line 3, in raise_exc_info 
    File "/root/.miniconda/envs/dask_env/lib/python3.5/site-packages/tornado/gen.py", line 1024, in run 
    yielded = self.gen.send(value) 
    File "/root/.miniconda/envs/dask_env/lib/python3.5/site-packages/distributed/core.py", line 345, in send_recv 
    six.reraise(*clean_exception(**response)) 
    File "/root/.miniconda/envs/dask_env/lib/python3.5/site-packages/six.py", line 685, in reraise 
    raise value.with_traceback(tb) 
    File "/root/.miniconda/envs/dask_env/lib/python3.5/site-packages/distributed/core.py", line 211, in handle_stream 
    result = yield gen.maybe_future(handler(stream, **msg)) 
    File "/root/.miniconda/envs/dask_env/lib/python3.5/site-packages/tornado/gen.py", line 1015, in run 
    value = future.result() 
    File "/root/.miniconda/envs/dask_env/lib/python3.5/site-packages/tornado/concurrent.py", line 237, in result 
    raise_exc_info(self._exc_info) 
    File "<string>", line 3, in raise_exc_info 
    File "/root/.miniconda/envs/dask_env/lib/python3.5/site-packages/tornado/gen.py", line 285, in wrapper 
    yielded = next(result) 
    File "/root/.miniconda/envs/dask_env/lib/python3.5/site-packages/distributed/scheduler.py", line 1324, in replicate 
    keys = set(keys) 
TypeError: unhashable type: 'list' 

これはデータフレームを複製する正しい方法ですか?何らかの理由でe.persist(df)返されたオブジェクトが​​で動作しないようです。

+0

で解決されました。その関数は、単一のキーではなく、キーのリストを取り込むように見えます。 – Blender

+0

'[df]'と同じエラー –

答えて

関連する問題