ODL中Netconf支持设备异常下线后定时重连。其相关功能介绍如下:
在节点添加成功后,会创建该设备的Communicator,负责控制器与该设备节点的连接沟通处理逻辑。
AbstractNetconfTopology.java
protected NetconfConnectorDTO createDeviceCommunicator(final NodeId nodeId,
final NetconfNode node) {
//setup default values since default value is not supported yet in mdsal
// TODO remove this when mdsal starts supporting default values 节点配置参数获取
final Long defaultRequestTimeoutMillis = node.getDefaultRequestTimeoutMillis() == null ? DEFAULT_REQUEST_TIMEOUT_MILLIS : node.getDefaultRequestTimeoutMillis();
final Long keepaliveDelay = node.getKeepaliveDelay() == null ? DEFAULT_KEEPALIVE_DELAY : node.getKeepaliveDelay();//保活心跳间隔 120s
final Boolean reconnectOnChangedSchema = node.isReconnectOnChangedSchema() == null ? DEFAULT_RECONNECT_ON_CHANGED_SCHEMA : node.isReconnectOnChangedSchema();
IpAddress ipAddress = node.getHost().getIpAddress();
InetSocketAddress address = new InetSocketAddress(ipAddress.getIpv4Address() != null ?
ipAddress.getIpv4Address().getValue() : ipAddress.getIpv6Address().getValue(),
node.getPort().getValue());
RemoteDeviceId remoteDeviceId = new RemoteDeviceId(nodeId.getValue(), address);
RemoteDeviceHandler<NetconfSessionPreferences> salFacade =
createSalFacade(remoteDeviceId, node, domBroker, bindingAwareBroker);
//这里根据传入节点的keepaliveDelay配置,在设置为0时,会使用NetconfDevicesSalFacade,即无保活心跳机制
if (keepaliveDelay > 0) {
LOG.warn("Adding keepalive facade, for device {}", nodeId);
salFacade = new KeepaliveSalFacade(remoteDeviceId, salFacade, keepaliveExecutor.getExecutor(), keepaliveDelay, defaultRequestTimeoutMillis);
}
final NetconfDevice.SchemaResourcesDTO schemaResourcesDTO = setupSchemaCacheDTO(nodeId, node);
final NetconfDevice device = new NetconfDevice(schemaResourcesDTO, remoteDeviceId, salFacade,
processingExecutor.getExecutor(), reconnectOnChangedSchema);
final Optional<NetconfSessionPreferences> userCapabilities = getUserCapabilities(node);
NetconfDeviceCommunicator communicator = userCapabilities.isPresent() ?
new NetconfDeviceCommunicator(
remoteDeviceId, device, new UserPreferences(userCapabilities.get(), node.getYangModuleCapabilities().isOverride())):
new NetconfDeviceCommunicator(remoteDeviceId, device);
final NetconfConnectorDTO netconfConnectorDTO = new NetconfConnectorDTO(communicator, salFacade);
salFacade.setListener(communicator);
setCommunicator(nodeId, netconfConnectorDTO.getCommunicator());
return netconfConnectorDTO;
}
leaf connection-timeout-millis {
description "Specifies timeout in milliseconds after which connection must be established.";
type uint32;
default 20000;
}
leaf default-request-timeout-millis {
description "Timeout for blocking operations within transactions.";
type uint32;
default 60000;
}
leaf max-connection-attempts {
description "Maximum number of connection retries. Non positive value or null is interpreted as infinity.";
type uint32;
default 0; // retry forever
}
leaf between-attempts-timeout-millis {
description "Initial timeout in milliseconds to wait between connection attempts. Will be multiplied by sleep-factor with every additional attempt";
type uint16;
default 2000;
}
leaf sleep-factor {
type decimal64 {
fraction-digits 1;
}
default 1.5;
}
在session创建成功后,AbstractSessionNegotiator中channelActive,执行startNegotiation,发送Hello报文,NetconfClientSessionNegotiator handleMessage中处理设备返回Hello报文
getSessionForHelloMessage中将session状态修改为ESTABLISHED
connection-timeout-millis:是指发起negotiation时,session从OPEN_WAIT变为ESTABLISHED状态的超时时间,当时间到,并且promise没有完成且没有取消,则协商失败,关闭channel
default-request-timeout-millis:在KeepaliveSalFacade类中KeepaliveDOMRpcService的invokeRpc,在RPC调用超时后,取消
maxConnectionAttempts, betweenAttemptsTimeoutMillis, sleepFactor:用于重连逻辑中重连时机的计算
保活心跳机制:
顾名思义是建立在节点已经连接上的基础上(如当session状态ideal),KeepaliveSalFacade.java
sessionCreated(IoSession session) 当有新的连接建立的时候,该方法被调用。
sessionOpened(IoSession session) 当有新的连接打开的时候,该方法被调用。该方法在 sessionCreated之后被调用。
sessionClosed(IoSession session) 当连接被关闭的时候,此方法被调用。
sessionIdle(IoSession session, IdleStatus status) 当连接变成闲置状态的时候,此方法被调用。
exceptionCaught(IoSession session, Throwable cause)当 I/O 处理器的实现,此方法被调用。
说明:
sessionCreated 和 sessionOpened 的区别。sessionCreated方法是由 I/O 处理线程来调用的,而 sessionOpened是由其它线程来调用的。
因此从性能方面考虑,不要在 sessionCreated 方法中执行过多的操作。
对于sessionIdle,默认情况下,闲置时间设置是禁用的,也就是说sessionIdle 并不会被调用。可以通过 IoSessionConfig.setIdleTime(IdleStatus, int) 来进行设置。
KeepaliveSalFacade.java
@Override
public void onDeviceConnected(final SchemaContext remoteSchemaContext, final NetconfSessionPreferences netconfSessionPreferences, final DOMRpcService deviceRpc) {
this.currentDeviceRpc = deviceRpc;
final DOMRpcService deviceRpc1 = new KeepaliveDOMRpcService(deviceRpc, resetKeepaliveTask, defaultRequestTimeoutMillis, executor);
salFacade.onDeviceConnected(remoteSchemaContext, netconfSessionPreferences, deviceRpc1);
LOG.debug("{}: Netconf session initiated, starting keepalives", id);
scheduleKeepalive();
}
连接成功后,调用scheduleKeepalive启动保活心跳机制
private void scheduleKeepalive() {
Preconditions.checkState(currentDeviceRpc != null);
LOG.trace("{}: Scheduling next keepalive in {} {}", id, keepaliveDelaySeconds, TimeUnit.SECONDS);
currentKeepalive = executor.schedule(new Keepalive(currentKeepalive), keepaliveDelaySeconds, TimeUnit.SECONDS);
}
KeepaliveSalFacade.java中Keepalive实现了Runnable和FutureCallBack,其调用了rpc(get-config),其回调函数中,除成功返回响应外,都触发重连。
@Override
public void onSuccess(final DOMRpcResult result) {
if (result != null && result.getResult() != null) {
LOG.debug("{}: Keepalive RPC successful with response: {}", id, result.getResult());
scheduleKeepalive();
} else {
LOG.warn("{} Keepalive RPC returned null with response: {}. Reconnecting netconf session", id, result);
reconnect();
}
}
@Override
public void onFailure(@Nonnull final Throwable t) {
LOG.warn("{}: Keepalive RPC failed. Reconnecting netconf session.", id, t);
reconnect();
}
考虑到除了getConfig请求,业务的其它RPC也能返回节点的数据,亦能证明节点Session存在,所以KeepaliveDOMRpcService的invokeRpc调用回调成功函数中会重置keepalive定时器。借助业务的RPC降低keepalive的心跳压力。
<node xmlns="urn:TBD:params:xml:ns:yang:network-topology">
<node-id>testa</node-id>
<host xmlns="urn:opendaylight:netconf-node-topology">10.42.94.233</host>
<port xmlns="urn:opendaylight:netconf-node-topology">17830</port>
<username xmlns="urn:opendaylight:netconf-node-topology">admin</username>
<password xmlns="urn:opendaylight:netconf-node-topology">admin</password>
<tcp-only xmlns="urn:opendaylight:netconf-node-topology">false</tcp-only>
<keepalive-delay xmlns="urn:opendaylight:netconf-node-topology">0</keepalive-delay>
<sleep-factor xmlns="urn:opendaylight:netconf-node-topology">1</sleep-factor>
<reconnect-on-changed-schema xmlns="urn:opendaylight:netconf-node-topology">true</reconnect-on-changed-schema>
</node>
可以通过节点参数配置,可以参考YANG文件netconf-node-topology.yang
断链重连:
断链,则之前已经创建链接,Netconf要创建链接,首先进行了设备节点的添加(写config库)
ProtocolSessionPromise.java
synchronized void connect() {
final Object lock = this;
try {
final int timeout = this.strategy.getConnectTimeout();
LOG.debug("Promise {} attempting connect for {}ms", lock, timeout);
if(this.address.isUnresolved()) {
this.address = new InetSocketAddress(this.address.getHostName(), this.address.getPort());
}
this.b.option(ChannelOption.CONNECT_TIMEOUT_MILLIS, timeout);
final ChannelFuture connectFuture = this.b.connect(this.address);
// Add listener that attempts reconnect by invoking this method again.
connectFuture.addListener(new BootstrapConnectListener(lock));
this.pending = connectFuture;
} catch (final Exception e) {
LOG.info("Failed to connect to {}", address, e);
setFailure(e);
}
}
timeout默认为2秒,即示2秒连接不上(其后退避策略计算),则超时(信号灯超时时间)
BootstrapConnectListener.java这个监听器的关键是在于连接不成功的逻辑(重连)
LOG.debug("Attempt to connect to {} failed", ProtocolSessionPromise.this.address, cf.cause());
final Future<Void> rf = ProtocolSessionPromise.this.strategy.scheduleReconnect(cf.cause());
rf.addListener(new ReconnectingStrategyListener());
ProtocolSessionPromise.this.pending = rf;
超时连接不成功,则开始重连逻辑,使用的策略为TimedReconnectStrategy.java
leaf between-attempts-timeout-millis {
description "Initial timeout in milliseconds to wait between connection attempts. Will be multiplied by sleep-factor with every additional attempt";
config true;
type uint16;
default 2000;
}
这里的重连等待时间采用的是退避算法(借助sleep-factor)
ReconnectingStrategyListener则比较简单,在重连时间计算feature到达后,连接即可。
connect的流程又回到了起始地方,形成一个循环。
当连接断开后,又是如何进行重连的。
在设备掉线后,一系列的channelInactive会触发,进入ClosedChannelHandler.channelInactive从而会触发ReconnectPromise的connect
@Override
public void channelInactive(final ChannelHandlerContext ctx) throws Exception {
// This is the ultimate channel inactive handler, not forwarding
if (promise.isCancelled()) {
return;
}
if (promise.isInitialConnectFinished() == false) {
LOG.debug("Connection to {} was dropped during negotiation, reattempting", promise.address);
}
LOG.debug("Reconnecting after connection to {} was dropped", promise.address);
promise.connect();
}
最后的打印,表明重连
针对于后序的Ssh连接:
在进行重连后,进入AbstractChannelHandlerContext.java
private void invokeConnect(SocketAddress remoteAddress, SocketAddress localAddress, ChannelPromise promise) {
if (isAdded()) {
try {
((ChannelOutboundHandler) handler()).connect(this, remoteAddress, localAddress, promise);
} catch (Throwable t) {
notifyOutboundHandlerException(t, promise);
}
} else {
connect(remoteAddress, localAddress, promise);
}
}
其中handle()方法会依次调用返回:
DefaultChannelPipeline.java connect
NetconfHelloMessageToXMLEncoder
EOMFramingMechanismEncoder
AsynSshHandler.java