2017-04-18 10 views
0

私は3つのノードを含む飼育係のクラスターを持っています。 Zookeeperの設定は以下の通りです。再起動中に成功メッセージが表示されますが、失敗としてステータスが表示されます。飼い葉桶のノードがリーダーのノードと通信できません

zoo.cfg

dataDir=/ngs/app/<app>/zookeeper-3.4.6/zookeeperdata/1 
clientPort=2181 
initLimit=5 
syncLimit=2 
server.1=pr2-ligerp-lapp27.<domain.com>:2888:3888 
server.2=pr2-ligerp-lapp28.<domain.com>:2889:3889 
server.3=pr2-ligerp-lapp29.<domain.com>:2890:3890 

以下のログを見つけてください:

shが

JMX enabled by default 
Using config: /ngs/app/ligerp/solr/zookeeper-3.4.6/bin/../conf/zoo.cfg 
Starting zookeeper ... STARTED 
-bash-4.1$ 
-bash-4.1$ cat zookeeper.out 
2017-04-18 18:58:13,840 [myid:] - INFO [main:[email protected]] - Reading configuration from: /ngs/app/ligerp/solr/zookeeper-3.4.6/bin/../conf/zoo.cfg 
2017-04-18 18:58:13,843 [myid:] - INFO [main:[email protected]] - Defaulting to majority quorums 
2017-04-18 18:58:13,845 [myid:1] - INFO [main:[email protected]] - autopurge.snapRetainCount set to 3 
2017-04-18 18:58:13,845 [myid:1] - INFO [main:[email protected]] - autopurge.purgeInterval set to 0 
2017-04-18 18:58:13,846 [myid:1] - INFO [main:[email protected]] - Purge task is not scheduled. 
2017-04-18 18:58:13,854 [myid:1] - INFO [main:[email protected]] - Starting quorum peer 
2017-04-18 18:58:13,861 [myid:1] - INFO [main:[email protected]] - binding to port 0.0.0.0/0.0.0.0:2181 
2017-04-18 18:58:13,875 [myid:1] - INFO [main:[email protected]] - tickTime set to 3000 
2017-04-18 18:58:13,875 [myid:1] - INFO [main:[email protected]] - minSessionTimeout set to -1 
2017-04-18 18:58:13,875 [myid:1] - INFO [main:[email protected]] - maxSessionTimeout set to -1 
2017-04-18 18:58:13,875 [myid:1] - INFO [main:[email protected]] - initLimit set to 5 
2017-04-18 18:58:13,884 [myid:1] - INFO [main:[email protected]] - Reading snapshot /ngs/app/ligerp/solr/zookeeper-3.4.6/zookeeperdata/1/version-2/snapshot.1300000032 
2017-04-18 18:58:13,954 [myid:1] - INFO [Thread-1:[email protected]] - My election bind port: pr2-ligerp-lapp27.<domain>/10.136.145.38:3888 
2017-04-18 18:58:13,960 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:[email protected]14] - LOOKING 
2017-04-18 18:58:13,961 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:[email protected]] - New election. My id = 1, proposed zxid=0x130000024b 
2017-04-18 18:58:13,962 [myid:1] - INFO [WorkerReceiver[myid=1]:[email protected]] - Notification: 1 (message format version), 1 (n.leader), 0x130000024b (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x13 (n.peerEpoch) LOOKING (my state) 
2017-04-18 18:58:13,964 [myid:1] - INFO [WorkerSender[myid=1]:[email protected]] - Have smaller server identifier, so dropping the connection: (2, 1) 
2017-04-18 18:58:13,964 [myid:1] - INFO [WorkerSender[myid=1]:[email protected]] - Have smaller server identifier, so dropping the connection: (3, 1) 
2017-04-18 18:58:14,165 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:[email protected]] - Have smaller server identifier, so dropping the connection: (2, 1) 
2017-04-18 18:58:14,166 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:[email protected]] - Have smaller server identifier, so dropping the connection: (3, 1) 
2017-04-18 18:58:14,166 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:[email protected]] - Notification time out: 400 
2017-04-18 18:58:15,566 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:[email protected]] - Have smaller server identifier, so dropping the connection: (2, 1) 
2017-04-18 18:58:15,567 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:[email protected]] - Have smaller server identifier, so dropping the connection: (3, 1) 
2017-04-18 18:58:15,567 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:[email protected]] - Notification time out: 800 
2017-04-18 18:58:16,368 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:[email protected]] - Have smaller server identifier, so dropping the connection: (2, 1) 
2017-04-18 18:58:16,368 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:[email protected]] - Have smaller server identifier, so dropping the connection: (3, 1) 
2017-04-18 18:58:16,368 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:[email protected]] - Notification time out: 1600 
2017-04-18 18:58:17,969 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:[email protected]] - Have smaller server identifier, so dropping the connection: (2, 1) 
2017-04-18 18:58:17,969 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:[email protected]] - Have smaller server identifier, so dropping the connection: (3, 1) 
2017-04-18 18:58:17,970 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:[email protected]] - Notification time out: 3200 

を開始zkServer.shしかし、状態を確認した後、我々はそれが動作していないことがわかったが、我々はできますプロセスIDが表示されます。誰かがこの問題を解決するのに役立つかもしれません。

のSH zkServer.sh状況

JMX enabled by default 
Using config: /ngs/app/ligerp/solr/zookeeper-3.4.6/bin/../conf/zoo.cfg 
Error contacting service. It is probably not running. 

これは、飼育係のリーダーノードで例外です:

2017-04-18 18:25:32,634 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:[email protected]] - Accepted socket connection from /127.0.0.1:47916 
2017-04-18 18:25:32,635 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:[email protected]] - Processing srvr command from /127.0.0.1:47916 
2017-04-18 18:25:32,635 [myid:3] - INFO [Thread-22:[email protected]] - Closed socket connection for client /127.0.0.1:47916 (no session established for client) 
2017-04-18 18:30:01,662 [myid:3] - WARN [RecvWorker:1:[email protected]] - Connection broken for id 1, my id = 3, error = 
java.io.EOFException 
    at java.io.DataInputStream.readInt(DataInputStream.java:392) 
    at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765) 
2017-04-18 18:30:01,663 [myid:3] - WARN [RecvWorker:1:[email protected]] - Interrupting SendWorker 
2017-04-18 18:30:01,662 [myid:3] - ERROR [LearnerHandler-/10.136.145.38:47656:[email protected]] - Unexpected exception causing shutdown while sock still open 
java.io.EOFException 
    at java.io.DataInputStream.readInt(DataInputStream.java:392) 
    at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) 
    at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) 
    at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103) 
    at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:546) 
2017-04-18 18:30:01,663 [myid:3] - WARN [SendWorker:1:[email protected]] - Interrupted while waiting for message on queue 
java.lang.InterruptedException 
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) 
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088) 
    at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418) 
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:849) 
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:64) 
    at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:685) 
2017-04-18 18:30:01,663 [myid:3] - WARN [LearnerHandler-/10.136.145.38:47656:[email protected]] - ******* GOODBYE /10.136.145.38:47656 ******** 
2017-04-18 18:30:01,663 [myid:3] - WARN [SendWorker:1:[email protected]] - Send worker leaving thread 
2017-04-18 18:39:40,076 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:[email protected]] - Accepted socket connection from /10.136.145.38:58748 
2017-04-18 18:39:40,077 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:[email protected]] - caught end of stream exception 
EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket 
    at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) 
    at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) 
    at java.lang.Thread.run(Thread.java:745) 
2017-04-18 18:39:40,078 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:[email protected]] - Closed socket connection for client /10.136.145.38:58748 (no session established for client) 
2017-04-18 18:42:46,516 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:[email protected]] - Accepted socket connection from /127.0.0.1:47988 
2017-04-18 18:42:46,516 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:[email protected]] - Processing srvr command from /127.0.0.1:47988 
2017-04-18 18:42:46,517 [myid:3] - INFO [Thread-23:[email protected]] - Closed socket connection for client /127.0.0.1:47988 (no session established for client) 

答えて

0

は、すべてのZooKeeperノードのzoo.cfgの変化の下に行うことで問題を修正しました:

  1. は、IPアドレスに飼育係のホスト名を変更する
  2. は、下降するIDのシーケンスで動物園のインスタンスを開始しました。

    dataDir=/ngs/app/ligerp/solr/zookeeper-3.4.6/zookeeperdata/1 
    clientPort=2181 
    initLimit=100 
    syncLimit=2 
    server.1=10.136.145.38:2888:3888 
    server.2=10.136.145.39:2889:3889 
    server.3=10.136.145.40:2890:3890 
    
  3. は、サンプルzoo.cfgは次のようになり、5〜100

にINITSIZEを増加しました

関連する問題