2017-10-26 12 views
0

私はSpringブートアプリケーションで2つのノードのクラスタを作った。外部リソースはありません。メモリにはDBのみ、分散キャッシュにはApache Igniteがあります。私は、1つのノードがすでに開始され、2つ目のノードが起動すると、2つのノードのtcpダンプとJFRを作成しました。Igniteは起動に時間がかかりすぎて、アプリケーションランタイム中にしばしば失敗する

そして、私は起動しようとしたノード上でこれを参照してください。 JFR for second node which is in starting state

10秒だけのバイトを取得するため。同時に、このノード上の別のIgnite SPIスレッド「tcp-disco-msg-worker」が同じリモートIPとポートから読み取ろうとしましたが、約10秒もかかりました。

JFR SPI long read

、すでに同じことをしようとし始めたと同時に第1ノード - 10秒程度 First node

のための1つのバイトを読んだから、彼らはからのメッセージをお読みしようとしていたようですお互いを同時にブロックし、お互いにブロックします。このSPIスレッドが読み取りを試みると、応答を書き込めません。

しかし、全く同じアプリケーションとその設定の2つのサーバーがある別のクラスターでは、このような動作は見られません。長い読み取りはありません。

私の最初のクラスタで何が問題なのでしょうか?Igniteは両方のノードで同時にトポロジーに関するメタ情報を読み込もうとしますか?

そして、主な問題は、SPIスレッドがクラスタがSEGMENTED状態にあると判断するため、最初のクラスタのノードの1つでIgniteが頻繁に失敗することです。 SPIスレッドは、通常の通信中にノードの1つからメタ情報を取得できないため、この状態になります。

いくつかのネットワークの問題がこのクラスタにあるようですが、すべての低レベルテストはすべてが問題ないことを示しています。

私は彼らがセグメント化された状態の長いGC、ネットワークの問題などについて言うことを知っています。しかし、私はこのようなものは見ません。 SPIスレッドのソケット通信中に読み込みが非常に遅くなります。そしてそれは1つのクラスターにしかありません。同じ構成の別のクラスタには、このような問題はありません。

誰でも私にアドバイスを与えて、何を確認したりテストしたらいいですか?

UPD: 私の設定:

@Bean 
public IgniteCache<String, WaveFile> getCache() { 
    log.info("Ignite addresses: " + Arrays.asList(igniteAddresses)); 
    IgniteConfiguration conf = new IgniteConfiguration(); 
    Slf4jLogger gridLog = new Slf4jLogger(log); 
    conf.setGridLogger(gridLog); 
    conf.setClientMode(false); 
    conf.setNetworkTimeout(networkTimeout); 
    conf.setClassLoader(WaveFile.class.getClassLoader()); 

    TcpDiscoverySpi tcpDiscoverySpi = new TcpDiscoverySpi(); 
    TcpDiscoveryVmIpFinder ipFinder = new TcpDiscoveryVmIpFinder(); 
    ipFinder.setAddresses(Arrays.asList(igniteAddresses)); 
    tcpDiscoverySpi.setIpFinder(ipFinder); 
    tcpDiscoverySpi.setAckTimeout(failureDetectionTimeout); 
    conf.setDiscoverySpi(tcpDiscoverySpi); 
    conf.setSqlConnectorConfiguration(null); 

    MemoryConfiguration memoryConfiguration = new MemoryConfiguration(); 
    MemoryPolicyConfiguration memoryPolicyConfiguration = new MemoryPolicyConfiguration(); 
    memoryPolicyConfiguration.setInitialSize(5L * 1024 * 1024 * 1024); 
    memoryConfiguration.setMemoryPolicies(memoryPolicyConfiguration); 
    conf.setMemoryConfiguration(memoryConfiguration); 

    Ignite ignite = Ignition.start(conf); 
    // Cache Config 
    CacheConfiguration<String, WaveFile> cacheConfig = new CacheConfiguration<>(); 
    cacheConfig.setName(cacheName); 
    cacheConfig.setCacheMode(CacheMode.PARTITIONED); 
    cacheConfig.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC); 
    cacheConfig.setCopyOnRead(false); 
    cacheConfig.setEagerTtl(eagerTtl); 
    cacheConfig.setExpiryPolicyFactory(FactoryBuilder.factoryOf(
      new TouchedExpiryPolicy(new Duration(TimeUnit.SECONDS, expirationPeriod)))); 
    return ignite.getOrCreateCache(cacheConfig); 
} 

ログ。そこには4つのノードがありますが、実際には2つのノードでのみテストを行いました。

2017-10-27 14:01:53.084 [main] ERROR - Failed to resolve default logging config file: config/java.util.logging.properties 
2017-10-27 14:02:35.175 [main] INFO com.voice.Application - Ignite addresses: [10.200.43.28, 10.200.43.32, 10.200.43.33, 10.200.43.34] 
2017-10-27 14:02:35.369 [main] INFO o.a.ignite.internal.IgniteKernal - 

>>> __________ ________________ 
>>> /_/ ___/ |//_/_ __/ __/ 
>>> _/ // (7 7 ////// _/ 
>>> /___/\___/_/|_/___/ /_/ /___/ 
>>> 
>>> ver. 2.1.0#20170720-sha1:a6ca5c8a 
>>> 2017 Copyright(C) Apache Software Foundation 
>>> 
>>> Ignite documentation: http://ignite.apache.org 

[14:02:35] __________ ________________ 
[14:02:35] /_/ ___/ |//_/_ __/ __/ 
[14:02:35] _/ // (7 7 ////// _/ 
[14:02:35] /___/\___/_/|_/___/ /_/ /___/ 
[14:02:35] 
[14:02:35] ver. 2.1.0#20170720-sha1:a6ca5c8a 
[14:02:35] 2017 Copyright(C) Apache Software Foundation 
[14:02:35] 
[14:02:35] Ignite documentation: http://ignite.apache.org 
[14:02:35] 
[14:02:35] Quiet mode. 
[14:02:35] ^-- To see **FULL** console log here add -DIGNITE_QUIET=false or "-v" to ignite.{sh|bat} 
[14:02:35] 
2017-10-27 14:02:35.370 [main] INFO o.a.ignite.internal.IgniteKernal - Config URL: n/a 
2017-10-27 14:02:35.370 [main] INFO o.a.ignite.internal.IgniteKernal - Daemon mode: off 
[14:02:35] OS: Linux 3.10.0-327.el7.x86_64 amd64 
2017-10-27 14:02:35.370 [main] INFO o.a.ignite.internal.IgniteKernal - OS: Linux 3.10.0-327.el7.x86_64 amd64 
2017-10-27 14:02:35.371 [main] INFO o.a.ignite.internal.IgniteKernal - OS user: mopopov 
2017-10-27 14:02:35.371 [main] INFO o.a.ignite.internal.IgniteKernal - PID: 31516 
[14:02:35] VM information: Java(TM) SE Runtime Environment 1.8.0_121-b13 Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 25.121-b13 
2017-10-27 14:02:35.372 [main] INFO o.a.ignite.internal.IgniteKernal - Language runtime: Java Platform API Specification ver. 1.8 
2017-10-27 14:02:35.372 [main] INFO o.a.ignite.internal.IgniteKernal - VM information: Java(TM) SE Runtime Environment 1.8.0_121-b13 Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 25.121-b13 
2017-10-27 14:02:35.373 [main] INFO o.a.ignite.internal.IgniteKernal - VM total memory: 10.0GB 
2017-10-27 14:02:35.374 [main] INFO o.a.ignite.internal.IgniteKernal - Remote Management [restart: off, REST: on, JMX (remote: off)] 
2017-10-27 14:02:35.374 [main] INFO o.a.ignite.internal.IgniteKernal - IGNITE_HOME=null 
2017-10-27 14:02:35.374 [main] INFO o.a.ignite.internal.IgniteKernal - VM arguments: [-Xmx12000m, -XX:+UnlockCommercialFeatures, -XX:+HeapDumpOnOutOfMemoryError, -Dspring.profiles.active=prod] 
2017-10-27 14:02:35.374 [main] INFO o.a.ignite.internal.IgniteKernal - System cache's MemoryPolicy size is configured to 40 MB. Use MemoryConfiguration.systemCacheMemorySize property to change the setting. 
2017-10-27 14:02:35.388 [main] INFO o.a.ignite.internal.IgniteKernal - Configured caches [in 'sysMemPlc' memoryPolicy: ['ignite-sys-cache']] 
2017-10-27 14:02:35.395 [pub-#14%null%] WARN o.a.ignite.internal.GridDiagnostic - Initial heap size is 250MB (should be no less than 512MB, use -Xms512m -Xmx512m). 
[14:02:35] Initial heap size is 250MB (should be no less than 512MB, use -Xms512m -Xmx512m). 
[14:02:35] Configured plugins: 
2017-10-27 14:02:35.477 [main] INFO o.a.i.i.p.p.IgnitePluginProcessor - Configured plugins: 
[14:02:35] ^-- None 
2017-10-27 14:02:35.478 [main] INFO o.a.i.i.p.p.IgnitePluginProcessor - ^-- None 
[14:02:35] 
2017-10-27 14:02:35.478 [main] INFO o.a.i.i.p.p.IgnitePluginProcessor - 
2017-10-27 14:02:35.555 [main] INFO o.a.i.s.c.tcp.TcpCommunicationSpi - Successfully bound communication NIO server to TCP port [port=47100, locHost=0.0.0.0/0.0.0.0, selectorsCnt=4, selectorSpins=0, pairedConn=false] 
2017-10-27 14:02:35.560 [main] WARN o.a.i.s.c.tcp.TcpCommunicationSpi - Message queue limit is set to 0 which may lead to potential OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to message queues growth on sender and receiver sides. 
[14:02:35] Message queue limit is set to 0 which may lead to potential OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to message queues growth on sender and receiver sides. 
2017-10-27 14:02:35.586 [main] WARN o.a.i.s.c.noop.NoopCheckpointSpi - Checkpoints are disabled (to enable configure any GridCheckpointSpi implementation) 
2017-10-27 14:02:35.627 [main] WARN o.a.i.i.m.c.GridCollisionManager - Collision resolution is disabled (all jobs will be activated upon arrival). 
[14:02:35] Security status [authentication=off, tls/ssl=off] 
2017-10-27 14:02:35.628 [main] INFO o.a.ignite.internal.IgniteKernal - Security status [authentication=off, tls/ssl=off] 
2017-10-27 14:02:36.072 [main] INFO o.a.i.i.p.r.p.t.GridTcpRestProtocol - Command protocol successfully started [name=TCP binary, host=0.0.0.0/0.0.0.0, port=11211] 
2017-10-27 14:02:36.133 [main] INFO o.a.ignite.internal.IgniteKernal - Non-loopback local IPs: 10.200.43.34, fe80:0:0:0:250:56ff:fe93:3f60%eno16777984 
2017-10-27 14:02:36.133 [main] INFO o.a.ignite.internal.IgniteKernal - Enabled local MACs: 005056933F60 
2017-10-27 14:02:36.188 [main] INFO o.a.i.s.d.tcp.TcpDiscoverySpi - Successfully bound to TCP port [port=47500, localHost=0.0.0.0/0.0.0.0, locNodeId=64941c34-2912-4934-8c92-c0c089391207] 
2017-10-27 14:02:51.267 [main] WARN o.a.i.s.d.tcp.TcpDiscoverySpi - Node has not been connected to topology and will repeat join process. Check remote nodes logs for possible error messages. Note that large topology may require significant time to start. Increase 'TcpDiscoverySpi.networkTimeout' configuration property if getting this message on the starting nodes [networkTimeout=5000] 
2017-10-27 14:03:46.397 [exchange-worker-#29%null%] INFO o.a.ignite.internal.exchange.time - Started exchange init [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], crd=false, evt=10, node=TcpDiscoveryNode [id=64941c34-2912-4934-8c92-c0c089391207, addrs=[0:0:0:0:0:0:0:1%lo, 10.200.43.34, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, 4212-bio-lanta-01/10.200.43.34:47500], discPort=47500, order=13, intOrder=8, lastExchangeTime=1509102156183, loc=true, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false], evtNode=TcpDiscoveryNode [id=64941c34-2912-4934-8c92-c0c089391207, addrs=[0:0:0:0:0:0:0:1%lo, 10.200.43.34, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, 4212-bio-lanta-01/10.200.43.34:47500], discPort=47500, order=13, intOrder=8, lastExchangeTime=1509102156183, loc=true, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false], customEvt=null] 
2017-10-27 14:03:47.022 [exchange-worker-#29%null%] INFO o.a.i.i.p.cache.GridCacheProcessor - Started cache [name=ignite-sys-cache, memoryPolicyName=sysMemPlc, mode=REPLICATED, atomicity=TRANSACTIONAL] 
2017-10-27 14:03:47.062 [exchange-worker-#29%null%] INFO o.a.i.i.p.cache.GridCacheProcessor - Started cache [name=SHARED_AUDIO_CACHE, memoryPolicyName=default, mode=PARTITIONED, atomicity=ATOMIC] 
2017-10-27 14:03:57.115 [grid-nio-worker-tcp-comm-0-#17%null%] INFO o.a.i.s.c.tcp.TcpCommunicationSpi - Established outgoing communication connection [locAddr=/10.200.43.34:53456, rmtAddr=/10.200.43.28:47100] 
2017-10-27 14:03:57.172 [exchange-worker-#29%null%] INFO o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture - Finished waiting for partition release future [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], waitTime=0ms] 
2017-10-27 14:03:57.252 [exchange-worker-#29%null%] INFO o.a.ignite.internal.exchange.time - Finished exchange init [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], crd=false] 
2017-10-27 14:03:57.293 [sys-#32%null%] INFO o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture - Snapshot initialization completed [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], time=0ms] 
2017-10-27 14:03:57.344 [exchange-worker-#29%null%] INFO o.a.i.i.p.c.GridCachePartitionExchangeManager - Rebalancing scheduled [order=[ignite-sys-cache, SHARED_AUDIO_CACHE]] 
2017-10-27 14:03:57.344 [exchange-worker-#29%null%] INFO o.a.i.i.p.c.GridCachePartitionExchangeManager - Rebalancing started [top=AffinityTopologyVersion [topVer=13, minorTopVer=0], evt=NODE_JOINED, node=64941c34-2912-4934-8c92-c0c089391207] 
2017-10-27 14:03:57.345 [exchange-worker-#29%null%] INFO o.a.i.i.p.c.d.d.p.GridDhtPartitionDemander - Starting rebalancing [mode=SYNC, fromNode=110e4e93-2954-4b76-8649-a23f4a83990a, partitionsCount=100, topology=AffinityTopologyVersion [topVer=13, minorTopVer=0], updateSeq=1] 
2017-10-27 14:03:57.391 [utility-#34%null%] INFO o.a.i.i.p.c.d.d.p.GridDhtPartitionDemander - Completed (final) rebalancing [fromNode=110e4e93-2954-4b76-8649-a23f4a83990a, topology=AffinityTopologyVersion [topVer=13, minorTopVer=0], time=51 ms] 
2017-10-27 14:03:57.395 [utility-#34%null%] INFO o.a.i.i.p.c.d.d.p.GridDhtPartitionDemander - Starting rebalancing [mode=ASYNC, fromNode=110e4e93-2954-4b76-8649-a23f4a83990a, partitionsCount=518, topology=AffinityTopologyVersion [topVer=13, minorTopVer=0], updateSeq=1] 
2017-10-27 14:03:57.449 [sys-#36%null%] INFO o.a.i.i.p.c.d.d.p.GridDhtPartitionDemander - Completed (final) rebalancing [fromNode=110e4e93-2954-4b76-8649-a23f4a83990a, topology=AffinityTopologyVersion [topVer=13, minorTopVer=0], time=51 ms] 
[14:03:57] Performance suggestions for grid (fix if possible) 
2017-10-27 14:03:57.463 [main] INFO o.a.ignite.internal.IgniteKernal - Performance suggestions for grid (fix if possible) 
[14:03:57] To disable, set -DIGNITE_PERFORMANCE_SUGGESTIONS_DISABLED=true 
2017-10-27 14:03:57.463 [main] INFO o.a.ignite.internal.IgniteKernal - To disable, set -DIGNITE_PERFORMANCE_SUGGESTIONS_DISABLED=true 
[14:03:57] ^-- Enable G1 Garbage Collector (add '-XX:+UseG1GC' to JVM options) 
2017-10-27 14:03:57.463 [main] INFO o.a.ignite.internal.IgniteKernal - ^-- Enable G1 Garbage Collector (add '-XX:+UseG1GC' to JVM options) 
[14:03:57] ^-- Set max direct memory size if getting 'OOME: Direct buffer memory' (add '-XX:MaxDirectMemorySize=<size>[g|G|m|M|k|K]' to JVM options) 
2017-10-27 14:03:57.464 [main] INFO o.a.ignite.internal.IgniteKernal - ^-- Set max direct memory size if getting 'OOME: Direct buffer memory' (add '-XX:MaxDirectMemorySize=<size>[g|G|m|M|k|K]' to JVM options) 
[14:03:57] ^-- Disable processing of calls to System.gc() (add '-XX:+DisableExplicitGC' to JVM options) 
2017-10-27 14:03:57.464 [main] INFO o.a.ignite.internal.IgniteKernal - ^-- Disable processing of calls to System.gc() (add '-XX:+DisableExplicitGC' to JVM options) 
[14:03:57] ^-- Speed up flushing of dirty pages by OS (alter vm.dirty_expire_centisecs parameter by setting to 500) 
2017-10-27 14:03:57.464 [main] INFO o.a.ignite.internal.IgniteKernal - ^-- Speed up flushing of dirty pages by OS (alter vm.dirty_expire_centisecs parameter by setting to 500) 
[14:03:57] ^-- Reduce pages swapping ratio (set vm.swappiness=10) 
2017-10-27 14:03:57.464 [main] INFO o.a.ignite.internal.IgniteKernal - ^-- Reduce pages swapping ratio (set vm.swappiness=10) 
[14:03:57] ^-- Disable fully synchronous writes (set 'writeSynchronizationMode' to PRIMARY_SYNC or FULL_ASYNC) 
2017-10-27 14:03:57.464 [main] INFO o.a.ignite.internal.IgniteKernal - ^-- Disable fully synchronous writes (set 'writeSynchronizationMode' to PRIMARY_SYNC or FULL_ASYNC) 
[14:03:57] Refer to this page for more performance suggestions: https://apacheignite.readme.io/docs/jvm-and-system-tuning 
2017-10-27 14:03:57.464 [main] INFO o.a.ignite.internal.IgniteKernal - Refer to this page for more performance suggestions: https://apacheignite.readme.io/docs/jvm-and-system-tuning 
[14:03:57] 
2017-10-27 14:03:57.465 [main] INFO o.a.ignite.internal.IgniteKernal - 
[14:03:57] To start Console Management & Monitoring run ignitevisorcmd.{sh|bat} 
2017-10-27 14:03:57.465 [main] INFO o.a.ignite.internal.IgniteKernal - To start Console Management & Monitoring run ignitevisorcmd.{sh|bat} 
[14:03:57] 
[14:03:57] Ignite node started OK (id=64941c34) 
2017-10-27 14:03:57.465 [main] INFO o.a.ignite.internal.IgniteKernal - 
2017-10-27 14:03:57.466 [main] INFO o.a.ignite.internal.IgniteKernal - 

>>> +----------------------------------------------------------------------+ 
>>> Ignite ver. 2.1.0#20170720-sha1:a6ca5c8a97e9a4c9d73d40ce76d1504c14ba1940 
>>> +----------------------------------------------------------------------+ 
>>> OS name: Linux 3.10.0-327.el7.x86_64 amd64 
>>> CPU(s): 6 
>>> Heap: 10.0GB 
>>> VM name: [email protected] 
>>> Local node [ID=64941C34-2912-4934-8C92-C0C089391207, order=13, clientMode=false] 
>>> Local node addresses: [4212-bio-lanta-01/0:0:0:0:0:0:0:1%lo, /10.200.43.34, /127.0.0.1] 
>>> Local ports: TCP:11211 TCP:47100 TCP:47500 

[14:03:57] Topology snapshot [ver=13, servers=2, clients=0, CPUs=12, heap=22.0GB] 
2017-10-27 14:03:57.468 [main] INFO o.a.i.i.m.d.GridDiscoveryManager - Topology snapshot [ver=13, servers=2, clients=0, CPUs=12, heap=22.0GB] 
2017-10-27 14:03:58.944 [exchange-worker-#29%null%] INFO o.a.ignite.internal.exchange.time - Started exchange init [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=1], crd=false, evt=18, node=TcpDiscoveryNode [id=64941c34-2912-4934-8c92-c0c089391207, addrs=[0:0:0:0:0:0:0:1%lo, 10.200.43.34, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, 4212-bio-lanta-01/10.200.43.34:47500], discPort=47500, order=13, intOrder=8, lastExchangeTime=1509102156183, loc=true, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false], evtNode=TcpDiscoveryNode [id=64941c34-2912-4934-8c92-c0c089391207, addrs=[0:0:0:0:0:0:0:1%lo, 10.200.43.34, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, 4212-bio-lanta-01/10.200.43.34:47500], discPort=47500, order=13, intOrder=8, lastExchangeTime=1509102156183, loc=true, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false], customEvt=CacheAffinityChangeMessage [id=729ac7d5f51-0fc83046-7af0-474f-8487-f526629190a8, topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], exchId=null, partsMsg=null, exchangeNeeded=true]] 
2017-10-27 14:03:58.952 [exchange-worker-#29%null%] INFO o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture - Finished waiting for partition release future [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=1], waitTime=0ms] 
2017-10-27 14:03:58.975 [exchange-worker-#29%null%] INFO o.a.ignite.internal.exchange.time - Finished exchange init [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=1], crd=false] 
2017-10-27 14:03:58.991 [sys-#41%null%] INFO o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture - Snapshot initialization completed [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=1], time=0ms] 
2017-10-27 14:03:58.994 [exchange-worker-#29%null%] INFO o.a.i.i.p.c.GridCachePartitionExchangeManager - Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=13, minorTopVer=1], evt=DISCOVERY_CUSTOM_EVT, node=110e4e93-2954-4b76-8649-a23f4a83990a] 
2017-10-27 14:04:01.144 [main] INFO com.voice.Application - Run config context -> ok 
2017-10-27 14:04:01.145 [main] INFO com.voice.Application - StartUp date: Fri Oct 27 14:01:48 MSK 2017 

UPD2:ノードに障害が発生しなったログ:デバッグモードで

2017-10-28 07:48:56.320 [grid-nio-worker-tcp-comm-2-#19%null%] INFO o.a.i.s.c.tcp.TcpCommunicationSpi - Accepted incoming communication connection [locAddr=/127.0.0.1:47100, rmtAddr=/127.0.0.1:52329] 
2017-10-28 07:48:56.323 [grid-nio-worker-tcp-comm-3-#20%null%] INFO o.a.i.s.c.tcp.TcpCommunicationSpi - Accepted incoming communication connection [locAddr=/0:0:0:0:0:0:0:1:47100, rmtAddr=/0:0:0:0:0:0:0:1:55692] 
2017-10-28 07:48:56.323 [grid-nio-worker-tcp-comm-0-#17%null%] INFO o.a.i.s.c.tcp.TcpCommunicationSpi - Accepted incoming communication connection [locAddr=/127.0.0.1:47100, rmtAddr=/127.0.0.1:52335] 
2017-10-28 07:48:59.535 [RTP-948] INFO com.voice.rtp.RtpReceiverImpl - Start callId 887B4B84-C47F-461E-35B0-97D0D7F86A51-7092 RTP processing with local port 36496 
2017-10-28 07:49:00.116 [grid-nio-worker-tcp-comm-1-#18%null%] INFO o.a.i.s.c.tcp.TcpCommunicationSpi - Accepted incoming communication connection [locAddr=/0:0:0:0:0:0:0:1:47100, rmtAddr=/0:0:0:0:0:0:0:1:55698] 
2017-10-28 07:49:00.117 [grid-nio-worker-tcp-comm-2-#19%null%] INFO o.a.i.s.c.tcp.TcpCommunicationSpi - Accepted incoming communication connection [locAddr=/127.0.0.1:47100, rmtAddr=/127.0.0.1:52341] 
2017-10-28 07:49:00.372 [tcp-disco-msg-worker-#2%null%] WARN o.a.i.s.d.tcp.TcpDiscoverySpi - Node is out of topology (probably, due to short-time network problems). 
2017-10-28 07:49:00.373 [disco-event-worker-#28%null%] WARN o.a.i.i.m.d.GridDiscoveryManager - Local node SEGMENTED: TcpDiscoveryNode [id=c1886afe-b28f-45b5-a91b-cfdd9e635374, addrs=[0:0:0:0:0:0:0:1%lo, 10.200.43.33, 127.0.0.1], sockAddrs=[4212-bio-lanta-03/10.200.43.33:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=15, intOrder=10, lastExchangeTime=1509102474635, loc=true, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false] 
2017-10-28 07:49:00.399 [disco-event-worker-#28%null%] WARN o.a.i.i.m.d.GridDiscoveryManager - Stopping local node according to configured segmentation policy. 
2017-10-28 07:49:00.400 [disco-event-worker-#28%null%] WARN o.a.i.i.m.d.GridDiscoveryManager - Node FAILED: TcpDiscoveryNode [id=110e4e93-2954-4b76-8649-a23f4a83990a, addrs=[0:0:0:0:0:0:0:1%lo, 10.200.43.28, 127.0.0.1], sockAddrs=[/10.200.43.28:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=11, intOrder=7, lastExchangeTime=1509102530512, loc=false, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false] 
2017-10-28 07:49:00.413 [disco-event-worker-#28%null%] INFO o.a.i.i.m.d.GridDiscoveryManager - Topology snapshot [ver=16, servers=3, clients=0, CPUs=18, heap=31.0GB] 
2017-10-28 07:49:00.450 [Thread-52685] INFO o.a.i.i.p.r.p.t.GridTcpRestProtocol - Command protocol successfully stopped: TCP binary 
2017-10-28 07:49:00.452 [disco-event-worker-#28%null%] INFO o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture - Snapshot initialization completed [topVer=AffinityTopologyVersion [topVer=16, minorTopVer=0], time=0ms] 
2017-10-28 07:49:00.470 [disco-event-worker-#28%null%] WARN o.a.i.i.m.d.GridDiscoveryManager - Node FAILED: TcpDiscoveryNode [id=64941c34-2912-4934-8c92-c0c089391207, addrs=[0:0:0:0:0:0:0:1%lo, 10.200.43.34, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /10.200.43.34:47500], discPort=47500, order=13, intOrder=8, lastExchangeTime=1509102540526, loc=false, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false] 
2017-10-28 07:49:00.479 [disco-event-worker-#28%null%] INFO o.a.i.i.m.d.GridDiscoveryManager - Topology snapshot [ver=17, servers=2, clients=0, CPUs=12, heap=21.0GB] 
2017-10-28 07:49:00.480 [disco-event-worker-#28%null%] WARN o.a.i.i.m.d.GridDiscoveryManager - Node FAILED: TcpDiscoveryNode [id=f6678eb1-f2f6-4da1-8d4d-a259c435b5a6, addrs=[0:0:0:0:0:0:0:1%lo, 10.200.43.32, 127.0.0.1], sockAddrs=[/10.200.43.32:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=14, intOrder=9, lastExchangeTime=1509102550535, loc=false, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false] 
2017-10-28 07:49:00.481 [disco-event-worker-#28%null%] INFO o.a.i.i.m.d.GridDiscoveryManager - Topology snapshot [ver=18, servers=1, clients=0, CPUs=6, heap=10.0GB] 
2017-10-28 07:49:01.125 [RTP-948] ERROR com.voice.rtp.RtpReceiverImpl - During RTP socket communications 
javax.cache.CacheException: class org.apache.ignite.internal.NodeStoppingException: Operation has been cancelled (node is stopping). 
     at org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1312) 
     at org.apache.ignite.internal.processors.cache.IgniteCacheProxy.cacheException(IgniteCacheProxy.java:2630) 
     at org.apache.ignite.internal.processors.cache.IgniteCacheProxy.get(IgniteCacheProxy.java:1188) 

ログここにある:https://drive.google.com/drive/folders/0B00-0mjd5sy1MjNnSkdyR1M4VFk?usp=sharing

+0

どのような種類のipFinderをお使いですか? 設定ファイルとログをプロファイルできますか? – Konstantin

+0

投稿に追加されました。 TcpDiscoveryVmIpFinder – Donz

+0

ログは通常のスタートアップのようです。ノードが分割されている状況でログを追加できますか? – Konstantin

答えて

1

問題は、DNSのクラスタホスト名が存在しない場合にありました。これを訂正した後、読み取り時間が10秒でなく、クラスタも少なくとも半日以上正しく動作しています。

関連する問題