ラズベリーパイとメインデスクトップを使用して小さな3ノードスパーククラスタをセットアップしようとしていますが、Piをマスターノード(デスクトップ) 。 3つのノードすべてでCassandra(オープンソースではなくDSE)を実行しているので、ネットワークが正しく設定されています。私がウェブUIに行くと、それは私のメインコンピュータのみを表示します。私は各ワーカーノードのWeb UIアドレスを入力し、それぞれのWeb UIページを取得することができます。彼らは私のマスターノードについて知っているようではありません。私はslaves
ファイルにそれぞれのスレーブノードを持っています。私はこれを動作させるためにちょうど1つの小さなものが欠けているように感じる。どんな提案も大歓迎です。以下に、いくつかのログと、このことをかなり短く簡潔に保つために役立つと思われるその他の情報を示します。 SparkワーカーノードがWebUIで開始されているが表示されていない
export SPARK_WORKER_CORES=6
export SPARK_MASTER_HOST=192.168.0.106
export SPARK_LOCAL_IP=192.168.0.201
ログを次のようにあるすべてのノードにspark-env.sh
:
Spark Command: /usr/lib/jvm/jdk-8-oracle-arm32-vfp-hflt/jre/bin/java -cp /home/spark/spark/conf/:/home/spark/spark/jars/* -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://Palehorse:7077
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/07/05 03:22:40 INFO Worker: Started daemon with process name: [email protected]
17/07/05 03:22:40 INFO SignalUtils: Registered signal handler for TERM
17/07/05 03:22:40 INFO SignalUtils: Registered signal handler for HUP
17/07/05 03:22:40 INFO SignalUtils: Registered signal handler for INT
17/07/05 03:22:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/07/05 03:22:42 INFO SecurityManager: Changing view acls to: spark
17/07/05 03:22:42 INFO SecurityManager: Changing modify acls to: spark
17/07/05 03:22:42 INFO SecurityManager: Changing view acls groups to:
17/07/05 03:22:42 INFO SecurityManager: Changing modify acls groups to:
17/07/05 03:22:42 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); groups with view permissions: Set(); users with modify permissions: Set(spark); groups with modify permissions: Set()
17/07/05 03:22:43 INFO Utils: Successfully started service 'sparkWorker' on port 35342.
17/07/05 03:22:44 INFO Worker: Starting Spark worker 192.168.0.201:35342 with 6 cores, 1024.0 MB RAM
17/07/05 03:22:44 INFO Worker: Running Spark version 2.1.1
17/07/05 03:22:44 INFO Worker: Spark home: /home/spark/spark
17/07/05 03:22:45 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
17/07/05 03:22:45 INFO WorkerWebUI: Bound WorkerWebUI to 192.168.0.201, and started at http://192.168.0.201:8081
17/07/05 03:22:45 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:22:51 INFO Worker: Retrying connection to master (attempt # 1)
17/07/05 03:22:51 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:22:57 INFO Worker: Retrying connection to master (attempt # 2)
17/07/05 03:22:57 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:23:03 INFO Worker: Retrying connection to master (attempt # 3)
17/07/05 03:23:03 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:23:09 INFO Worker: Retrying connection to master (attempt # 4)
17/07/05 03:23:09 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:23:15 INFO Worker: Retrying connection to master (attempt # 5)
17/07/05 03:23:15 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:23:21 INFO Worker: Retrying connection to master (attempt # 6)
17/07/05 03:23:21 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:23:57 INFO Worker: Retrying connection to master (attempt # 7)
17/07/05 03:23:57 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:24:33 INFO Worker: Retrying connection to master (attempt # 8)
17/07/05 03:24:33 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:24:45 ERROR RpcOutboxMessage: Ask timeout before connecting successfully
17/07/05 03:24:45 WARN NettyRpcEnv: Ignored failure: java.io.IOException: Connecting to Palehorse/198.105.254.63:7077 timed out (120000 ms)
17/07/05 03:24:45 WARN Worker: Failed to connect to master Palehorse:7077
org.apache.spark.SparkException: Exception thrown in awaitResult
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:100)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:108)
at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1$$anon$1.run(Worker.scala:218)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Connecting to Palehorse/198.105.254.63:7077 timed out (120000 ms)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:229)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:182)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
... 4 more
17/07/05 03:25:09 INFO Worker: Retrying connection to master (attempt # 9)
17/07/05 03:25:09 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:25:45 INFO Worker: Retrying connection to master (attempt # 10)
17/07/05 03:25:45 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:26:21 INFO Worker: Retrying connection to master (attempt # 11)
17/07/05 03:26:21 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:26:57 INFO Worker: Retrying connection to master (attempt # 12)
17/07/05 03:26:57 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:27:09 ERROR RpcOutboxMessage: Ask timeout before connecting successfully
17/07/05 03:27:09 WARN NettyRpcEnv: Ignored failure: java.io.IOException: Connecting to Palehorse/198.105.254.63:7077 timed out (120000 ms)
17/07/05 03:27:09 WARN Worker: Failed to connect to master Palehorse:7077
org.apache.spark.SparkException: Exception thrown in awaitResult
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:100)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:108)
at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1$$anon$1.run(Worker.scala:218)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Connecting to Palehorse/198.105.254.63:7077 timed out (120000 ms)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:229)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:182)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
... 4 more
17/07/05 03:27:33 INFO Worker: Retrying connection to master (attempt # 13)
17/07/05 03:27:33 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:28:09 INFO Worker: Retrying connection to master (attempt # 14)
17/07/05 03:28:09 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:28:45 INFO Worker: Retrying connection to master (attempt # 15)
17/07/05 03:28:45 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:29:21 INFO Worker: Retrying connection to master (attempt # 16)
17/07/05 03:29:21 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:29:33 ERROR RpcOutboxMessage: Ask timeout before connecting successfully
17/07/05 03:29:33 WARN NettyRpcEnv: Ignored failure: java.io.IOException: Connecting to Palehorse/198.105.254.63:7077 timed out (120000 ms)
17/07/05 03:29:33 WARN Worker: Failed to connect to master Palehorse:7077
org.apache.spark.SparkException: Exception thrown in awaitResult
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:100)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:108)
at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1$$anon$1.run(Worker.scala:218)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Connecting to Palehorse/198.105.254.63:7077 timed out (120000 ms)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:229)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:182)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
... 4 more
17/07/05 03:29:57 ERROR Worker: All masters are unresponsive! Giving up.
Canワーカーマシンから '192.168.0.106'をpingしていますか?マスターマシンはワーカーマシンにpingできますか?あなたのログから:_「Palehorseへの接続/ 198.105.254.63:7077タイムアウト」_IPとは何ですか? –
さて、すべてのマシンはお互いに話すことができます...私はお互いにそれらのそれぞれにsshすることができます。私はちょっと考えました。私がstat-all.shを実行しているとき、スレーブノードごとにパスワードを入力するように求められます。おそらく、これは別の方法で起こっていると思うかもしれませんが、それはパスワードが動作していないために私にプロンプトを出さないからです。これは正常なのでしょうか、またはいくつかのユーザー設定を変更する必要がありますか? –
スタンドアロンワーカーを手動で起動して、例外があれば表示することはできますか?私はそれを修正するまで 'start-all.sh'を避けたいと思います。 –