某天线上的ImpalaJob日志时不时的报错:
ERROR:No backends configured
Couldnot execute command:xxx
并且不是一直报错,查看statstore的日志发现:
1
2
3
4
5
6
7
8
9
|
I022316:18:34.338698 10924 state-store.cc:194] Creating new topic: '' impala-membership ' on behalf of subscriber: ' xxxhostname:22000
I022316:18:34.338739 10924 state-store.cc:200] Registering: xxxhostname:22000 I022316:18:34.339864 10904 state-store.cc:355] Unable to update subscriber atxxxhostname:23000, received errorCouldn't open transport for xxxhostname:23000(connect() failed: Connectionrefused)
I022316:18:34.840463 10904 state-store.cc:355] Unable to update subscriber atxxxhostname:23000, received errorCouldn't open transport for xxxhostname:23000(connect() failed: Connectionrefused)
I022316:18:35.341156 10904 state-store.cc:355] Unable to update subscriber atxxxhostname:23000, received errorCouldn't open transport for xxxhostname:23000(connect() failed: Connectionrefused)
I022316:18:36.843724 10904 state-store.cc:365] Subscriber: xxxhostname:22000 haseither failed or disconnected. I022316:18:47.536650 10911 state-store.cc:355] Unable to update subscriber atxxxhostname:23000, received errorCouldn't open transport for xxxhostname:23000(connect() failed: Connectionrefused)
I022316:18:54.343158 10924 state-store.cc:200] Registering: xxxhostname:22000 W022316:18:54.343209 10924 state-store.cc:215] Duplicate registration of subscriber:xxxhostname:22000, possible duplicate subscriber IDs or recovering subscriber |
再看xxxxhostname果然一直也注册失败:
1
2
3
4
5
|
I022421:32:38.173282 33088 state-store-subscriber.cc:169] Trying to register ...
I022421:32:38.173743 33088 state-store-subscriber.cc:172] Reconnected tostate-store. Exiting recovery mode I022421:32:48.173959 33088 state-store-subscriber.cc:166] xxxhostname:22000:Connection with state-store lost, entering recovery mode I022421:32:48.174018 33088 state-store-subscriber.cc:169] Trying to register ...
W022421:32:48.174432 33088 state-store-subscriber.cc:181] Failed to re- register withstate-store: Duplicate registration of subscriber: xxxhostname:22000
|
估计是heartbeattimeout了,不过没找到文档有相关参数的详细解释,直接翻了下代码把相关的参数列下:最后修改statestore_subscriber_timeout_seconds=60s重启生效.
1
2
3
4
5
6
7
|
statestore_subscriber_timeout_seconds, 10, "The amount of time (in seconds) that may elapse before the connection with the statestore is considered lost." ;
statestore_subscriber_cnxn_attempts, 10, "The number of times to retry an RPC connection to the statestore. A setting of 0 means retry indefinitely" ;
statestore_subscriber_cnxn_retry_interval_ms, 3000, "The interval, in ms, to wait between attempts to make an RPC connection to the statestore." ;
statestore_max_missed_heartbeats, 5, "Maximum number of consecutive heartbeats an impalad can miss before being declared failed by the statestore." ;
statestore_suspect_heartbeats, 2, "(Advanced) Number of consecutive heartbeats an impalad can miss before being suspected of failure by the statestore" ;
statestore_num_heartbeat_threads, 10, "(Advanced) Number of threads used to send heartbeats in parallel to all registered subscribers." ;
statestore_heartbeat_frequency_ms, 500, "(Advanced) Frequency (in ms) with which the statestore sends heartbeats to subscribers." ;
|
本文转自MIKE老毕 51CTO博客,原文链接:http://blog.51cto.com/boylook/1367252,如需转载请自行联系原作者