ZeroMQ(java)中连接建立与重连机制

最新推荐文章于 2025-07-23 11:33:09 发布

fjs_cloud

最新推荐文章于 2025-07-23 11:33:09 发布

阅读量9.6k

点赞数 1

CC 4.0 BY-SA版权

本文链接：https://blog.youkuaiyun.com/fjslovejhl/article/details/17563223

ZeroMQ 专栏收录该内容

16 篇文章

订阅专栏

本文深入探讨了ZeroMQ在连接断开后如何实现自动重连接的过程，包括建立TCP连接、处理连接断开及重连机制。重点分析了如何在分布式环境中保持消息通信的稳定性和可靠性。

前面的一篇文章分析了ZeroMQ中最为简单Socket类型，Dealer。。不过觉得这种具体的Socket类型的分析可以留到以后，或者等以后什么时候会用到了再分析再不迟。。。。

但是作为一个消息通信的框架，最重要的还是通信的可靠性，而这其中最最重要的就是连接断开之后的重连接机制。。。

在看具体的重连接机制之前，先来看看ZeroMQ中如何主动的建立于远程的连接吧，先来看看SocketBase中定义的connect方法：

 //与远程地址建立连接
    public boolean connect (String addr_) {
        if (ctx_terminated) {
            throw new ZError.CtxTerminatedException();
        }

        //  Process pending commands, if any.
        boolean brc = process_commands (0, false);
        if (!brc)
            return false;

        //  Parse addr_ string.
        URI uri;
        try {
            uri = new URI(addr_);   //构建URI对象
        } catch (URISyntaxException e) {
            throw new IllegalArgumentException(e);
        }
        
        String protocol = uri.getScheme();   //获取协议类型
        String address = uri.getAuthority();
        String path = uri.getPath();
        if (address == null)
            address = path;

        check_protocol (protocol);  //检查是否是合格的协议类型

        if (protocol.equals("inproc")) {    //如果是进程内部的通信

            //  TODO: inproc connect is specific with respect to creating pipes
            //  as there's no 'reconnect' functionality implemented. Once that
            //  is in place we should follow generic pipe creation algorithm.

            //  Find the peer endpoint.
            Ctx.Endpoint peer = find_endpoint (addr_);
            if (peer.socket == null)
                return false;
            // The total HWM for an inproc connection should be the sum of
            // the binder's HWM and the connector's HWM.
            int  sndhwm = 0;
            if (options.sndhwm != 0 && peer.options.rcvhwm != 0)
                sndhwm = options.sndhwm + peer.options.rcvhwm;
            int  rcvhwm = 0;
            if (options.rcvhwm != 0 && peer.options.sndhwm != 0)
                rcvhwm = options.rcvhwm + peer.options.sndhwm;

            //  Create a bi-directional pipe to connect the peers.
            ZObject[] parents = {this, peer.socket};
            Pipe[] pipes = {null, null};
            int[] hwms = {sndhwm, rcvhwm};
            boolean[] delays = {options.delay_on_disconnect, options.delay_on_close};
            Pipe.pipepair (parents, pipes, hwms, delays);

            //  Attach local end of the pipe to this socket object.
            attach_pipe (pipes [0]);

            //  If required, send the identity of the peer to the local socket.
            if (peer.options.recv_identity) {
                Msg id = new Msg (options.identity_size);
                id.put (options.identity, 0 , options.identity_size);
                id.set_flags (Msg.identity);
                boolean written = pipes [0].write (id);
                assert (written);
                pipes [0].flush ();
            }
            
            //  If required, send the identity of the local socket to the peer.
            if (options.recv_identity) {
                Msg id = new Msg (peer.options.identity_size);
                id.put (peer.options.identity, 0 , peer.options.identity_size);
                id.set_flags (Msg.identity);
                boolean written = pipes [1].write (id);
                assert (written);
                pipes [1].flush ();
            }

            //  Attach remote end of the pipe to the peer socket. Note that peer's
            //  seqnum was incremented in find_endpoint function. We don't need it
            //  increased here.
            send_bind (peer.socket, pipes [1], false);

            // Save last endpoint URI
            options.last_endpoint = addr_;

            // remember inproc connections for disconnect
            inprocs.put(addr_, pipes[0]);

            return true;
        }

        //选择一个比较IO线程，用于部署待会将会创建爱的session
        IOThread io_thread = choose_io_thread (options.affinity);
        if (io_thread == null) {
            throw new IllegalStateException("Empty IO Thread");
        }
        //创建address对象
        Address paddr = new Address (protocol, address);

        if (protocol.equals("tcp")) {  //如果是tcp的话
            paddr.resolved( new  TcpAddress () );
            paddr.resolved().resolve (
                address, options.ipv4only != 0 ? true : false);
        } else if(protocol.equals("ipc")) {  //进程间通信
            paddr.resolved( new IpcAddress () );
            paddr.resolved().resolve (address, true);
        }
        //  Create session.
        //创建session，第一参数是当前session将会依附的IO线程，第二个参数表示需要主动建立连接
        SessionBase session = SessionBase.create (io_thread, true, this,
            options, paddr);
        assert (session != null);

        //  PGM does not support subscription forwarding; ask for all data to be
        //  sent to this pipe.
        boolean icanhasall = false;
        if (protocol.equals("pgm") || protocol.equals("epgm"))
            icanhasall = true;

        //创建pipe的关联，连接session与当前的socket
        if (options.delay_attach_on_connect != 1 || icanhasall) {
            //  Create a bi-directional pipe.
            ZObject[] parents = {this, session};
            Pipe[] pipes = {null, null};
            int[] hwms = {options.sndhwm, options.rcvhwm};
            boolean[] delays = {options.delay_on_disconnect, options.delay_on_close};
            Pipe.pipepair (parents, pipes, hwms, delays);

            //  Attach local end of the pipe to the socket object.
            //将第一个pipe与当前socket关联
            attach_pipe (pipes [0], icanhasall);

            //  Attach remote end of the pipe to the session object later on.
            //将另外一个pipe与session关联起来，这样session与socket就能够通过pipe通信了
            session.attach_pipe (pipes [1]);
        }
        
        // Save last endpoint URI
        options.last_endpoint = paddr.toString ();

        add_endpoint (addr_, session);  //将这个session与这个地址关联起来
        return true;
    }

这里就主要关注TCP连接的建立部分吧，毕竟在分布式的环境下还是再用TCP，通过前面的文章，我们知道一个Socket下面可能对应了多个连接，而每一个连接其实对应的是一个StreamEngine对象，而每一个StreamEngine对象又都关联了一个Session对象，用于与上层的Socket之间的交互，那么这里其实可以看到代码最主要要做的事情就是创建Session对象，以及Pipe对象啥的。。。。接着再调用add_endpoint方法，用于部署这个session，那么接下来来看看这个方法吧：

    //这里管理地址与session，其实也就记录当前所有的建立连接的地址，以及相对的session
    private void add_endpoint (String addr_, Own endpoint_) {
        //  Activate the session. Make it a child of this socket.
        launch_child (endpoint_);   //部署这个endpoint，这里主要的是要将这个endpoint加入到IO线程
        endpoints.put (addr_, endpoint_);
    }

这里其实用于不是session对象，那么对于这个session对象，将会执行process_plug方法，那么来看看这个方法的定义：

    //执行plug命令,如果需要连接的话，那么要开始进行连接
    protected void process_plug () {
        io_object.set_handler(this);  //设置io对象的handler，用于响应io事件
        if (connect) {  //如果这里需要主动与远程建立连接的话，那么启动连接
            start_connecting (false);   //启动连接，false表示不等待
        }
    }

这里首先会设置当前io对象的事件回调，connect属性，在创建session的时候设置的，如果是主动创建的连接那么将会是true，如果是listener接收到的连接，那么将会是false，这里来看看这个方法的定义：

    //如果是connect的话，那么需要调用这个方法来建立连接
    private void start_connecting (boolean wait_) {
        assert (connect);

        //  Choose I/O thread to run connecter in. Given that we are already
        //  running in an I/O thread, there must be at least one available.
        IOThread io_thread = choose_io_thread (options.affinity);  //挑选一个io线程，用于部署待会的TCPConnector
        assert (io_thread != null);

        //  Create the connecter object.

        if (addr.protocol().equals("tcp")) {
            TcpConnecter connecter = new TcpConnecter (
                io_thread, this, options, addr, wait_);
            //alloc_assert (connecter);
            launch_child (connecter);  //部署这个TCPconnector
            return;
        }
        
        if (addr.protocol().equals("ipc")) {
            IpcConnecter connecter = new IpcConnecter (
                io_thread, this, options, addr, wait_);
            //alloc_assert (connecter);
            launch_child (connecter);
            return;
        }
        
        assert (false);
    }

这里传进来了一个参数，这个参数在构建TCPConnector的时候将会被用到，用于表示这个连接的建立是否是延迟的。。这里刚开始建立连接的时候，是false，表示不要延迟，待会看重连接的时候会发现，在重连接中将会使用延迟的连接。。。

这里也可以看到对于具体连接的建立，其实是委托给了TCPConnector对象来做的，它其实是一个工具类。。。

具体它是怎么建立连接的就不详细的列出来了，大概的说一下过程吧：

（1）创建一个socketchannel对象，并将其设置为非阻塞的，然后调用connect方法来建立于远程地址的连接

（2）将socketchannel注册到IO线程的poller上去，并要设置connect事件

（3）对于connect事件的回调要做的事情，其实是在poller对象上解除这个socketchannel的注册，然后创建一个新的streamengine对象来包装这个socketchannel，然后再将这个streamEngine对象与刚刚的session对象关联起来。。

这里我们可以来看看这个connect的事件回调方法做了什么事情吧：

    //连接建立的事件回调，其实也有可能是连接超时
    public void connect_event (){
        boolean err = false;
        SocketChannel fd = null;
        try {
            fd = connect ();   //获取已经建立好连接的channel
        } catch (ConnectException e) {
            err = true;
        } catch (SocketException e) {
            err = true;
        } catch (SocketTimeoutException e) {
            err = true;
        } catch (IOException e) {
            throw new ZError.IOException(e);
        }

        io_object.rm_fd (handle);  //可以将当前的IOObject从poller上面移除了，同时代表这个TCPConnector也就失效了，
        handle_valid = false;
        
        if (err) {
            //  Handle the error condition by attempt to reconnect.
            close ();   
            add_reconnect_timer();  //尝试重新建立连接
            return;
        }
        
        handle = null;
        
        try {
            
            Utils.tune_tcp_socket (fd);
            Utils.tune_tcp_keepalives (fd, options.tcp_keepalive, options.tcp_keepalive_cnt, options.tcp_keepalive_idle, options.tcp_keepalive_intvl);
        } catch (SocketException e) {
            throw new RuntimeException(e);
        }

        //  Create the engine object for this connection.
        
        //创建streamEngine对象，重新封装建立好连接的channel 
        StreamEngine engine = null;
        try {
            engine = new StreamEngine (fd, options, endpoint);
        } catch (ZError.InstantiationException e) {
            socket.event_connect_delayed (endpoint, -1);
            return;
        }

        //  Attach the engine to the corresponding session object.
        send_attach (session, engine);  //将这个engine与session绑定起来，然后同时还会将当前streamEngine绑定到IO线程上，也就是在poller上面注册

        //  Shut the connecter down.
        terminate ();  //关闭当前的connector

        socket.event_connected (endpoint, fd);  //向上层的socket通知连接建立的消息
    }

具体干了什么代码很直白的就能看出来吧，这里还可以看到对于建立连接超时也会进行尝试重连接的。。。

好了，到这里如何建立连接就算是比较的清楚了。。那么接下来看看在连接断开之后将会如何进行重连接吧，先来看看连接断开之后会执行啥操作，

这里首先总得知道如何判断底层的channel的连接是不是已经断开了吧，如何来判断呢，嗯，这个有点基础的就应该知道，如果连接已经断开了，那么在channel上read将会返回-1，好了那么我们就知道代码应该从哪里开始看了，嗯，来看streamEngine的in_event方法，看它在read返回-1之后会做啥：

    //当底层的chanel有数据可以读取的时候的回调方法
    public void in_event ()  {
        if (handshaking)
            if (!handshake ())
                return;
        
        assert (decoder != null);
        boolean disconnection = false;

        //  If there's no data to process in the buffer...
        if (insize == 0) {  //如果inbuf里面没有数据需要处理

            //  Retrieve the buffer and read as much data as possible.
            //  Note that buffer can be arbitrarily large. However, we assume
            //  the underlying TCP layer has fixed buffer size and thus the
            //  number of bytes read will be always limited.
            inbuf = decoder.get_buffer ();  //从解码器里面获取buf，用于写入读取的数据，因为在已经设置了底层socket的TCP接收缓冲区的大小
            insize = read (inbuf);  //用于将发送过来的数据写到buf中去，并记录大小
            inbuf.flip();  //这里准备从buf里面读取数据了

            //  Check whether the peer has closed the connection.
            if (insize == -1) {  //如果是-1的话，表示底层的socket连接已经出现了问题
                insize = 0;
                disconnection = true;  //设置标志位
            }
        }

        //  Push the data to the decoder.
        int processed = decoder.process_buffer (inbuf, insize);  //解析这些读取到的数据

        if (processed == -1) {
            disconnection = true;
        } else {

            //  Stop polling for input if we got stuck.
            if (processed < insize)  //如果处理的数据居然还没有读到的数据多，那么取消读取事件的注册
                io_object.reset_pollin (handle);

            //  Adjust the buffer.
            insize -= processed;  //还剩下没有处理的数据的大小
        }

        //  Flush all messages the decoder may have produced.
        session.flush ();  //将decoder解析出来的数据交给session

        //  An input error has occurred. If the last decoded message
        //  has already been accepted, we terminate the engine immediately.
        //  Otherwise, we stop waiting for socket events and postpone
        //  the termination until after the message is accepted.
        if (disconnection) {   //表示已经断开了连接，那么需要处理一下
            if (decoder.stalled ()) {
                io_object.rm_fd (handle);
                io_enabled = false;
            } else {
                error ();
        
            }
        }
    }

嗯，这里可以看到，如果返回-1之后，会设置disconnection标志位，然后还会调用error方法来报错，那么接下来来看看这个error方法做了啥吧：

    //报错，那么让高层的ZMQ的socket关闭当前连接
    private void error ()  {
        assert (session != null);
        socket.event_disconnected (endpoint, handle);  //这里可以理解为通知上层的socket，
        session.detach ();   //这个主要是用于session清理与socket的pipe ，然后还会尝试进行重连接
        unplug ();  //取消在poller上面的注册
        destroy ();  //关闭底层的channel，关闭当前
    }

其实，如果底层的链接断开了，那么当前这个channel也就无效了，那么当前的streamEngine对象也就无效了，那么要做的事情就是销毁当前的对象，然后还要解除在poller上面的注册，然后还要通知上层的socket，当前的这个链接地址的连接已经断开了。。。当然还要告诉session对象，让其进行一些处理，session的处理就包括重连接了，那么来看看他做了啥：

    //相当于是要移除底层的Engine的关联
    public void detach()  {
        //  Engine is dead. Let's forget about it.
        engine = null;  //这里相当于就会释放当前的engine对象

        //  Remove any half-done messages from the pipes.
        clean_pipes ();  //清除那些没有接受完的msg
 
        //  Send the event to the derived class.
        detached ();   //取消pipe，然后重连接

        //  Just in case there's only a delimiter in the pipe.
        if (pipe != null)
            pipe.check_read ();
    }

这里还看不到进行重连接的代码，接下来继续看detached方法：

    private void detached() {
        //  Transient session self-destructs after peer disconnects.
        if (!connect) {  //如果不是主动建立连接的话，那么就直接终止就好了否则的话还进行重连接的尝试
            terminate ();
            return;
        }

        //  For delayed connect situations, terminate the pipe
        //  and reestablish later on
        if (pipe != null && options.delay_attach_on_connect == 1
            && addr.protocol () != "pgm" && addr.protocol () != "epgm") {
            pipe.hiccup ();
            pipe.terminate (false);
            terminating_pipes.add (pipe);
            pipe = null;
        }
        
        reset ();  // 复位标志位

        //这里主动进行重连接的尝试
        if (options.reconnect_ivl != -1) {
            start_connecting (true);   //进行重连接尝试，这里也就是需要进行一些延迟
        }

        //  For subscriber sockets we hiccup the inbound pipe, which will cause
        //  the socket object to resend all the subscriptions.
        if (pipe != null && (options.type == ZMQ.ZMQ_SUB || options.type == ZMQ.ZMQ_XSUB))
            pipe.hiccup ();

    }

这里可以看到调用了start_conneting方法，不过这里传进去的参数是true，具体的执行流程与上面建立连接差不多，只不过这里是延迟进行连接的。。。

也就是会在IO线程上面设置定时，当超时之后才会进行连接。。。这样也就使得重连接在一定的频率内进行。。。

具体的定时就不细讲了，蛮简单的。。。

通过上面的代码可以知道ZeroMQ在连接断开之后，如果这个连接时自己主动建立的，而不是listener获取的，那么会自动的去尝试进行重连接。。嗯，做的还不错。。