2017-03-27 12 views
1

私はsparkアプリケーションをリモートのhadoopクラスタに送信しようとしています。私は記事hereに従っています。指定されたpython環境が見つからない

ここで私は仕事を提出方法:

ここ
PYSPARK_PYTHON=./PY_ENV/prod_env3/bin/python /home/hadoop/spark-1.6.0-bin-hadoop2.6/bin/spark-submit \ 
--master yarn \ 
--name run.py \ 
--deploy-mode cluster \ 
--executor-memory 2g \ 
--executor-cores 1 \ 
--num-executors 3 \ 
--jars /home/hadoop/projects/cms_counter/spark-streaming-kafka-assembly_2.10-1.6.0.jar \ 
--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./PY_ENV/prod_env3/bin/python \ 
--archives /opt/anaconda/envs/prod_env3.zip#PY_ENV \ 
/home/hadoop/run.py 

私は環境、スクリプト、jarはHDFS上.sparkStaging/application_1490199711887_0131にアップロードしていることがわかります。

17/03/27 11:55:42 INFO ConfiguredRMFailoverProxyProvider: Failing over to rm188 
17/03/27 11:55:42 INFO Client: Requesting a new application from cluster with 3 NodeManagers 
17/03/27 11:55:42 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (52586 MB per container) 
17/03/27 11:55:42 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead 
17/03/27 11:55:42 INFO Client: Setting up container launch context for our AM 
17/03/27 11:55:42 INFO Client: Setting up the launch environment for our AM container 
17/03/27 11:55:42 INFO Client: Preparing resources for our AM container 
17/03/27 11:55:43 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
17/03/27 11:55:43 INFO Client: Uploading resource file:/home/hadoop/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar -> hdfs://nameservicehighavail/user/root/.sparkStaging/application_1490199711887_0131/spark-assembly-1.6.0-hadoop2.6.0.jar 
17/03/27 11:55:45 INFO Client: Uploading resource file:/home/hadoop/projects/cms_counter/spark-streaming-kafka-assembly_2.10-1.6.0.jar -> hdfs://nameservicehighavail/user/root/.sparkStaging/application_1490199711887_0131/spark-streaming-kafka-assembly_2.10-1.6.0.jar 
17/03/27 11:55:45 INFO Client: Uploading resource file:/opt/anaconda/envs/prod_env3.zip#PY_ENV -> hdfs://nameservicehighavail/user/root/.sparkStaging/application_1490199711887_0131/prod_env3.zip 
17/03/27 11:55:46 INFO Client: Uploading resource file:/home/hadoop/run.py -> hdfs://nameservicehighavail/user/root/.sparkStaging/application_1490199711887_0131/run.py 
17/03/27 11:55:46 INFO Client: Uploading resource file:/home/hadoop/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip -> hdfs://nameservicehighavail/user/root/.sparkStaging/application_1490199711887_0131/pyspark.zip 
17/03/27 11:55:46 INFO Client: Uploading resource file:/home/hadoop/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip -> hdfs://nameservicehighavail/user/root/.sparkStaging/application_1490199711887_0131/py4j-0.9-src.zip 
17/03/27 11:55:46 INFO Client: Uploading resource file:/tmp/spark-7c8130fc-454f-4920-95ce-30211cea3576/__spark_conf__8359653165366110281.zip -> hdfs://nameservicehighavail/user/root/.sparkStaging/application_1490199711887_0131/__spark_conf__8359653165366110281.zip 
17/03/27 11:55:46 INFO SecurityManager: Changing view acls to: root 
17/03/27 11:55:46 INFO SecurityManager: Changing modify acls to: root 
17/03/27 11:55:46 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 
17/03/27 11:55:46 INFO Client: Submitting application 131 to ResourceManager 
17/03/27 11:55:46 INFO YarnClientImpl: Submitted application application_1490199711887_0131 
17/03/27 11:55:47 INFO Client: Application report for application_1490199711887_0131 (state: ACCEPTED) 

そして、私はそれを確認することができますファイルはここにあります:

[[email protected] ~]# hadoop fs -ls .sparkStaging/application_1490199711887_0131 
Found 7 items 
-rw-r--r-- 3 root supergroup  24766 2017-03-27 11:54 .sparkStaging/application_1490199711887_0131/__spark_conf__8359653165366110281.zip 
-rw-r--r-- 3 root supergroup 36034763 2017-03-27 11:54 .sparkStaging/application_1490199711887_0131/prod_env3.zip 
-rw-r--r-- 3 root supergroup  44846 2017-03-27 11:54 .sparkStaging/application_1490199711887_0131/py4j-0.9-src.zip 
-rw-r--r-- 3 root supergroup  355358 2017-03-27 11:54 .sparkStaging/application_1490199711887_0131/pyspark.zip 
-rw-r--r-- 3 root supergroup  2099 2017-03-27 11:54 .sparkStaging/application_1490199711887_0131/run.py 
-rw-r--r-- 3 root supergroup 187548272 2017-03-27 11:54 .sparkStaging/application_1490199711887_0131/spark-assembly-1.6.0-hadoop2.6.0.jar 
-rw-r--r-- 3 root supergroup 13350134 2017-03-27 11:54 .sparkStaging/application_1490199711887_0131/spark-streaming-kafka-assembly_2.10-1.6.0.jar 

まだそれでもPython環境が見つからないと私に言っている。

SLF4J: Class path contains multiple SLF4J bindings. 
SLF4J: Found binding in [jar:file:/data/sde/yarn/nm/usercache/root/filecache/1454/spark-assembly-1.6.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] 
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] 
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 
17/03/27 11:54:10 INFO ApplicationMaster: Registered signal handlers for [TERM, HUP, INT] 
17/03/27 11:54:11 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1490199711887_0131_000001 
17/03/27 11:54:11 INFO SecurityManager: Changing view acls to: yarn,root 
17/03/27 11:54:11 INFO SecurityManager: Changing modify acls to: yarn,root 
17/03/27 11:54:11 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, root); users with modify permissions: Set(yarn, root) 
17/03/27 11:54:12 INFO ApplicationMaster: Starting the user application in a separate Thread 
17/03/27 11:54:12 INFO ApplicationMaster: Waiting for spark context initialization 
17/03/27 11:54:12 INFO ApplicationMaster: Waiting for spark context initialization ... 
17/03/27 11:54:12 ERROR ApplicationMaster: User class threw exception: java.io.IOException: Cannot run program "./PY_ENV/prod_env3/bin/python": error=2, No such file or directory 
java.io.IOException: Cannot run program "./PY_ENV/prod_env3/bin/python": error=2, No such file or directory 
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047) 
    at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:82) 
    at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:606) 
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542) 
Caused by: java.io.IOException: error=2, No such file or directory 
    at java.lang.UNIXProcess.forkAndExec(Native Method) 
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:186) 
    at java.lang.ProcessImpl.start(ProcessImpl.java:130) 
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028) 
    ... 7 more 
17/03/27 11:54:12 INFO ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.io.IOException: Cannot run program "./PY_ENV/prod_env3/bin/python": error=2, No such file or directory) 
17/03/27 11:54:22 ERROR ApplicationMaster: SparkContext did not initialize after waiting for 100000 ms. Please check earlier log output for errors. Failing the application. 
17/03/27 11:54:22 INFO ShutdownHookManager: Shutdown hook called 

明らかに私は何かが欠けているはずですが、どんなリードも感謝します。

答えて

0

私は間違ったpython環境(zip -r prod_env3.zip prod_env)を圧縮しています。ご迷惑をおかけして申し訳ありません。

関連する問題