EMRでスパークステップを実行できません

私に感謝してくれてありがとうございます。EMRでスパークステップを実行できません

私はスパークのステップとしてAmazon EMRで単語数のマップを減らす問題がありました。しかし、私はマスターノードにsshして問題なくspark-shellでワードカウントロジックを実行しました。

エラーはありませんでしたが、それは、__spark_conf_xx.zipはマスターHDFS上に存在しないと文句を言い

16/04/05 07:20:21 INFO yarn.Client: Uploading resource file:/mnt/tmp/spark-1d701ab0-7990-4ca2-bee2-099aed8e8e6b/__spark_conf__9006968814682693730.zip -> hdfs://ip-172-31-26-247.ap-northeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1459839685827_0001/__spark_conf__9006968814682693730.zip

をコピーするときに、次のようにログは次のとおりです。

16/04/05 07:20:16 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-26-247.ap-northeast-1.compute.internal/172.31.26.247:8032 
16/04/05 07:20:16 INFO yarn.Client: Requesting a new application from cluster with 2 NodeManagers 
16/04/05 07:20:16 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (11520 MB per container) 
16/04/05 07:20:16 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead 
16/04/05 07:20:16 INFO yarn.Client: Setting up container launch context for our AM 
16/04/05 07:20:16 INFO yarn.Client: Setting up the launch environment for our AM container 
16/04/05 07:20:16 INFO yarn.Client: Preparing resources for our AM container 
16/04/05 07:20:17 INFO yarn.Client: Uploading resource file:/usr/lib/spark/lib/spark-assembly-1.6.1-hadoop2.7.2-amzn-0.jar -> hdfs://ip-172-31-26-247.ap-northeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1459839685827_0001/spark-assembly-1.6.1-hadoop2.7.2-amzn-0.jar 
16/04/05 07:20:18 INFO metrics.MetricsSaver: MetricsConfigRecord disabledInCluster: false instanceEngineCycleSec: 60 clusterEngineCycleSec: 60 disableClusterEngine: false maxMemoryMb: 3072 maxInstanceCount: 500 lastModified: 1459839695291 
16/04/05 07:20:18 INFO metrics.MetricsSaver: Created MetricsSaver j-3AZL0AH5ALBBL:i-96753119:SparkSubmit:11699 period:60 /mnt/var/em/raw/i-96753119_20160405_SparkSubmit_11699_raw.bin 
16/04/05 07:20:19 INFO metrics.MetricsSaver: 1 aggregated HDFSWriteDelay 2327 raw values into 1 aggregated values, total 1 
16/04/05 07:20:20 INFO fs.EmrFileSystem: Consistency disabled, using com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem as filesystem implementation 
16/04/05 07:20:20 INFO yarn.Client: Uploading resource s3://gda-test/logic/wordCount.jar -> hdfs://ip-172-31-26-247.ap-northeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1459839685827_0001/wordCount.jar 
16/04/05 07:20:20 INFO s3n.S3NativeFileSystem: Opening 's3://gda-test/logic/wordCount.jar' for reading 
16/04/05 07:20:20 INFO metrics.MetricsSaver: Thread 1 created MetricsLockFreeSaver 1 
16/04/05 07:20:21 INFO metrics.MetricsSaver: 1 MetricsLockFreeSaver 1 comitted 33 matured S3ReadDelay values 
16/04/05 07:20:21 INFO yarn.Client: Uploading resource file:/mnt/tmp/spark-1d701ab0-7990-4ca2-bee2-099aed8e8e6b/__spark_conf__9006968814682693730.zip -> hdfs://ip-172-31-26-247.ap-northeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1459839685827_0001/__spark_conf__9006968814682693730.zip 
16/04/05 07:20:21 INFO spark.SecurityManager: Changing view acls to: hadoop 
16/04/05 07:20:21 INFO spark.SecurityManager: Changing modify acls to: hadoop 
16/04/05 07:20:21 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop) 
16/04/05 07:20:21 INFO yarn.Client: Submitting application 1 to ResourceManager 
16/04/05 07:20:21 INFO impl.YarnClientImpl: Submitted application application_1459839685827_0001 
16/04/05 07:20:22 INFO yarn.Client: Application report for application_1459839685827_0001 (state: ACCEPTED) 
16/04/05 07:20:22 INFO yarn.Client: 
    client token: N/A 
    diagnostics: N/A 
    ApplicationMaster host: N/A 
    ApplicationMaster RPC port: -1 
    queue: default 
    start time: 1459840821323 
    final status: UNDEFINED 
    tracking URL: http://ip-172-31-26-247.ap-northeast-1.compute.internal:20888/proxy/application_1459839685827_0001/ 
    user: hadoop 
16/04/05 07:20:23 INFO yarn.Client: Application report for application_1459839685827_0001 (state: ACCEPTED) 
16/04/05 07:20:24 INFO yarn.Client: Application report for application_1459839685827_0001 (state: ACCEPTED) 
16/04/05 07:20:25 INFO yarn.Client: Application report for application_1459839685827_0001 (state: ACCEPTED) 
16/04/05 07:20:26 INFO yarn.Client: Application report for application_1459839685827_0001 (state: ACCEPTED) 
16/04/05 07:20:27 INFO yarn.Client: Application report for application_1459839685827_0001 (state: ACCEPTED) 
16/04/05 07:20:28 INFO yarn.Client: Application report for application_1459839685827_0001 (state: ACCEPTED) 
16/04/05 07:20:29 INFO yarn.Client: Application report for application_1459839685827_0001 (state: ACCEPTED) 
16/04/05 07:20:30 INFO yarn.Client: Application report for application_1459839685827_0001 (state: ACCEPTED) 
16/04/05 07:20:31 INFO yarn.Client: Application report for application_1459839685827_0001 (state: FAILED) 
16/04/05 07:20:31 INFO yarn.Client: 
    client token: N/A 
    diagnostics: Application application_1459839685827_0001 failed 2 times due to AM Container for appattempt_1459839685827_0001_000002 exited with exitCode: -1000 
For more detailed output, check application tracking page:http://ip-172-31-26-247.ap-northeast-1.compute.internal:8088/cluster/app/application_1459839685827_0001Then, click on links to logs of each attempt. 
Diagnostics: File does not exist: hdfs://ip-172-31-26-247.ap-northeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1459839685827_0001/__spark_conf__9006968814682693730.zip 
java.io.FileNotFoundException: File does not exist: hdfs://ip-172-31-26-247.ap-northeast-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1459839685827_0001/__spark_conf__9006968814682693730.zip 
    at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309) 
    at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301) 
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) 
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301) 
    at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253) 
    at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63) 
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361) 
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:415) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) 
    at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358) 
    at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
    at java.lang.Thread.run(Thread.java:745) 

Failing this attempt. Failing the application. 
    ApplicationMaster host: N/A 
    ApplicationMaster RPC port: -1 
    queue: default 
    start time: 1459840821323 
    final status: FAILED 
    tracking URL: http://ip-172-31-26-247.ap-northeast-1.compute.internal:8088/cluster/app/application_1459839685827_0001 
    user: hadoop 
Exception in thread "main" org.apache.spark.SparkException: Application application_1459839685827_0001 finished with failed status 
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034) 
    at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081) 
    at org.apache.spark.deploy.yarn.Client.main(Client.scala) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:606) 
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) 
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) 
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) 
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) 
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
16/04/05 07:20:31 INFO util.ShutdownHookManager: Shutdown hook called 
16/04/05 07:20:31 INFO util.ShutdownHookManager: Deleting directory /mnt/tmp/spark-1d701ab0-7990-4ca2-bee2-099aed8e8e6b 
Command exiting with ret '1'

出典

2016-04-05 Sovann

私は解決策を見つけました。

ロジックとjarがJava8であり、EMRクラスターがデフォルトでJava7を使用するため、Javaバージョンの不一致が原因でした。

私の場合、Spark & Hadoopの場合、クラスタを作成するときにAdvanced Optionを使用して次のようにenvをカスタマイズする必要があります。 http://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-configure-apps.html#configuring-java8

私は、この情報が同じ問題に直面している人にとって有益であることを願っています。

出典

2016-04-06 03:35:52 Sovann

EMRでスパークステップを実行できません

答えて

関連する問題