2017-02-27 8 views
1

スパーククラスタ(7 * 2コア)は、spark 2.0.2でhdfsクラスタの隣に設定されています。スパークエグゼキュータが不思議なポート35529に接続できません

私はJupyterを使っていくつかのhdfsファイルを読むと、14コアと3を使ってアプリケーションが起動するのを見ていますが、ネットワークが奇妙な "localhost"ポートに接続できないため、 35529.私が見

spark  = SparkSession.builder.master(master).appName(appName).config("spark.executor.instances", 3).getOrCreate() 
sc   = spark.sparkContext 
hdfs_master = "hdfs://xx.xx.xx.xx:8020" 
hdfs_path = "/logs/cycliste_debug/2017/2017_02/2017_02_20/23h/*" 
infos  = sc.textFile(hdfs_master+hdfs_path) 

enter image description here

(それは私が唯一の3 * 2が可能な場合、14個のコアが割り当てられて見ることが奇妙だと思いますCPUのすなわちspark.executor.instances * NBノード別):

キュータアプリ-20170227140938から0009のための要約:

ExecutorID Worker Cores Memory State ▾ Logs 
1488 worker-20170227125912-xx.xx.xx.xx-38028 2 1024 RUNNING stdout stderr 
1489 worker-20170227125954-xx.xx.xx.xx-48962 2 1024 RUNNING stdout stderr 
5  worker-20170227125959-xx.xx.xx.xx-48149 2 1024 RUNNING stdout stderr 
1486 worker-20170227130012-xx.xx.xx.xx-47639 2 1024 RUNNING stdout stderr 
1490 worker-20170227130027-xx.xx.xx.xx-44921 2 1024 RUNNING stdout stderr 
1485 worker-20170227130152-xx.xx.xx.xx-50620 2 1024 RUNNING stdout stderr 
1487 worker-20170227130248-xx.xx.xx.xx-42100 2 1024 RUNNING stdout stderr 

と1つのワーカーのためのエラーの例:アプリ-20170227140938ため

標準エラーログページここは、クラスタの要約であります-0009/1488:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 
17/02/27 14:37:57 INFO CoarseGrainedExecutorBackend: Started daemon with process name: [email protected] 
17/02/27 14:37:57 INFO SignalUtils: Registered signal handler for TERM 
17/02/27 14:37:57 INFO SignalUtils: Registered signal handler for HUP 
17/02/27 14:37:57 INFO SignalUtils: Registered signal handler for INT 
17/02/27 14:37:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
17/02/27 14:37:58 INFO SecurityManager: Changing view acls to: spark 
17/02/27 14:37:58 INFO SecurityManager: Changing modify acls to: spark 
17/02/27 14:37:58 INFO SecurityManager: Changing view acls groups to: 
17/02/27 14:37:58 INFO SecurityManager: Changing modify acls groups to: 
17/02/27 14:37:58 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); groups with view permissions: Set(); users with modify permissions: Set(spark); groups with modify permissions: Set() 
17/02/27 14:38:01 WARN ThreadLocalRandom: Failed to generate a seed from SecureRandom within 3 seconds. Not enough entrophy? 
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713) 
    at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:70) 
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:174) 
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:270) 
    at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) 
Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult 
    at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) 
    at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) 
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) 
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) 
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) 
    at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) 
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) 
    at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:88) 
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:188) 
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:71) 
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:70) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:422) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) 
    ... 4 more 
Caused by: java.io.IOException: Failed to connect to localhost/127.0.0.1:35529 
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228) 
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179) 
    at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197) 
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:191) 
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745) 
Caused by: java.net.ConnectException: Connection refused: localhost/127.0.0.1:35529 
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) 
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) 
    at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224) 
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289) 
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528) 
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) 
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) 
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) 
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) 
    ... 1 more 

2つのプロセス間の単純な通信の問題です。

だから私は、/ etc/hostsの表示:

127.0.0.1 localhost 
193.xx.xx.xxx vpsxxxx.ovh.net vpsxxxx 

任意のアイデアを?

答えて

0

SPARK_LOCAL_IPが各スレーブの正しいIPに設定されているかどうかを確認してください。

関連する問題