DirectRunnerで私のパイプラインをテストしたところ、すべてうまくいきました。 これをDataflowRunnerで実行します。それは動作しません。私のパイプラインコードを入力する前でさえ失敗し、スタックドライバーのログに完全に圧倒されます。何を意味しているのか分からず、何が間違っているのか本当に分かりません。dataflowがワーカーをセットアップできませんでした
- 実行グラフは、しかし決して成功に見え、
- ワーカープールが開始され、1人の労働者は、セットアッププロセスを実行しようとしている細かいロード見える
私はデバッグのために有用な情報を提供するかもしれないと思い、いくつかのログ:
AttributeError:'module' object has no attribute 'NativeSource' /usr/bin/python failed with exit status 1
Back-off 20s restarting failed container=python pod=dataflow-fiona-backlog-clean-test2-06140817-1629-harness-3nxh_default(50a3915d6501a3ec74d6d385f70c8353)
checking backoff for container "python" in pod "dataflow-fiona-backlog-clean-test2-06140817-1629-harness-3nxh"
INFO SSH key is not a complete entry: .....この問題はどのように解決する必要がありますか?
編集:ここ 私のsetup.pyそれは場合に役立ちます:
from distutils.command.build import build as _build
import subprocess
import setuptools
# This class handles the pip install mechanism.
class build(_build): # pylint: disable=invalid-name
"""A build command class that will be invoked during package install.
The package built using the current setup.py will be staged and later
installed in the worker using `pip install package'. This class will be
instantiated during install for this specific scenario and will trigger
running the custom commands specified.
"""
sub_commands = _build.sub_commands + [('CustomCommands', None)]
# Some custom command to run during setup. The command is not essential for this
# workflow. It is used here as an example. Each command will spawn a child
# process. Typically, these commands will include steps to install non-Python
# packages. For instance, to install a C++-based library libjpeg62 the following
# two commands will have to be added:
#
# ['apt-get', 'update'],
# ['apt-get', '--assume-yes', install', 'libjpeg62'],
#
# First, note that there is no need to use the sudo command because the setup
# script runs with appropriate access.
# Second, if apt-get tool is used then the first command needs to be 'apt-get
# update' so the tool refreshes itself and initializes links to download
# repositories. Without this initial step the other apt-get install commands
# will fail with package not found errors. Note also --assume-yes option which
# shortcuts the interactive confirmation.
#
# The output of custom commands (including failures) will be logged in the
# worker-startup log.
CUSTOM_COMMANDS = [
['echo', 'Custom command worked!']]
class CustomCommands(setuptools.Command):
"""A setuptools Command class able to run arbitrary commands."""
def initialize_options(self):
pass
def finalize_options(self):
pass
def RunCustomCommand(self, command_list):
print 'Running command: %s' % command_list
p = subprocess.Popen(
command_list,
stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
# Can use communicate(input='y\n'.encode()) if the command run requires
# some confirmation.
stdout_data, _ = p.communicate()
print 'Command output: %s' % stdout_data
if p.returncode != 0:
raise RuntimeError(
'Command %s failed: exit code: %s' % (command_list, p.returncode))
def run(self):
for command in CUSTOM_COMMANDS:
self.RunCustomCommand(command)
# Configure the required packages and scripts to install.
# Note that the Python Dataflow containers come with numpy already installed
# so this dependency will not trigger anything to be installed unless a version
# restriction is specified.
REQUIRED_PACKAGES = ['apache-beam==2.0.0',
'datalab==1.0.1',
'google-cloud==0.19.0',
'google-cloud-bigquery==0.22.1',
'google-cloud-core==0.22.1',
'google-cloud-dataflow==0.6.0',
'pandas==0.20.2']
setuptools.setup(
name='geotab-backlog-dataflow',
version='0.0.1',
install_requires=REQUIRED_PACKAGES,
packages=setuptools.find_packages(),
)
(のみREQUIRED_PACKAGES
とsetuptools.setup
セクションを変更し、 [here]からcopyed)労働者-起動ログ:し、それを以下の例外で終了した。
I /usr/bin/python failed with exit status 1
I /usr/bin/python failed with exit status 1
I AttributeError: 'module' object has no attribute 'NativeSource'
I class ConcatSource(iobase.NativeSource):
I File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/concat_reader.py", line 26, in <module>
I from dataflow_worker import concat_reader
I File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/maptask.py", line 31, in <module>
I from dataflow_worker import maptask
I File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 26, in <module>
I from dataflow_worker import executor
I File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 63, in <module>
I from dataflow_worker import batchworker
I File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/start.py", line 26, in <module>
I exec code in run_globals
I File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
I "__main__", fname, loader, pkg_name)
I File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
I AttributeError: 'module' object has no attribute 'NativeSource'
I class ConcatSource(iobase.NativeSource):
ジョブIDを共有できますか?クラウドロギングの中には、ワーカーが起動しなかった理由についてもう少し詳細を含むかもしれない他のログ(例えば、worker、worker_startupなど)のいくつかの追加情報があるかもしれません。 –
@BenChambersジョブID:2017-06-13_14_00_46-14992846213167080594、2017-06-14_08_17_20-10061999408051657645私はワーカー起動ログから例外を見ることができます。私はそれらを私のポストに加えました。 – foxwendy