Erlang节点重启导致的incarnation问题(转)

本文深入探讨了Erlang节点重启时出现的incarnation问题,包括错误日志、pid格式、creation字段的含义以及如何解决因节点重启导致的进程通信错误。通过分析Erlang External Term Format和dist_entry结构,揭示了creation在节点注册和通信中的作用,最后提供了解决分布式环境中节点失效问题的建议。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

转自霸爷的博客:

 转载自系统技术非业余研究

本文链接地址: Erlang节点重启导致的incarnation问题

遇到个问题,

=ERROR REPORT==== 10-Mar-2016::09:44:07 ===
Discarding message {'$gen_cast',close_be_covered} from <0.771.0> to <0.774.0> in an old incarnation (2) of this node (3)

 网上查阅了霸爷的博客,对该问题脑洞大开.

 

今天晚上mingchaoyan同学在线上问以下这个问题:

152489 =ERROR REPORT==== 2013-06-28 19:57:53 ===
152490 Discarding message {send,<<19 bytes>>} from <0.86.1> to <0.6743.0> in an old incarnation (1 ) of this node (2)
152491
152492
152493 =ERROR REPORT==== 2013-06-28 19:57:55 ===
152494 Discarding message {send,<<22 bytes>>} from <0.1623.1> to <0.6743.0> in an old incarnation (1) of this node (2

我们中午服务器更新后,日志上满屏的这些错误,请问您有遇到过类似的错误吗?或者提过些定位问题,解决问题的思路,谢谢

这个问题有点意思,从日志提示来再结合源码来看,马上我们就可以找到打出这个提示的地方:

/*bif.c*/
Sint
do_send(Process *p, Eterm to, Eterm msg, int suspend) {
    Eterm portid;
...
else if (is_external_pid(to)) {
        dep = external_pid_dist_entry(to);
        if(dep == erts_this_dist_entry) {
            erts_dsprintf_buf_t *dsbufp = erts_create_logger_dsbuf();
            erts_dsprintf(dsbufp,
                          "Discarding message %T from %T to %T in an old "
                          "incarnation (%d) of this node (%d)\n",
                          msg,
                          p->id,
                          to,
                          external_pid_creation(to),
                          erts_this_node->creation);
            erts_send_error_to_logger(p->group_leader, dsbufp);
            return 0;
        }
..
}

触发这句警告提示必须满足以下条件:
1. 目标Pid必须是external_pid。
2. 该pid归宿的外部节点所对应的dist_entry和当前节点的dist_entry相同。

通过google引擎,我找到了和这个描述很相近的问题:参见 这里 ,该作者很好的描述和重现了这个现象,但是他没有解释出具体的原因。

好,那我们顺着他的路子来重新下这个问题.
但演示之前,我们先巩固下基础,首先需要明白pid的格式:
可以参见这篇文章:

pid的核心内容摘抄如下:

Printed process ids < A.B.C > are composed of [6]:
A, the node number (0 is the local node, an arbitrary number for a remote node)
B, the first 15 bits of the process number (an index into the process table) [7]
C, bits 16-18 of the process number (the same process number as B) [7]

再参见Erlang External Term Format 文档的章节9.10
描述了PID_EXT的组成:

1 N 4 4 1
103 Node ID Serial Creation
Table 9.16:
Encode a process identifier object (obtained from spawn/3 or friends). The ID and Creation fields works just like in REFERENCE_EXT, while the Serial field is used to improve safety. In ID, only 15 bits are significant; the rest should be 0.


我们可以看到一个字段 Creation, 这个东西我们之前怎么没见过呢?
参考erlang的文档 我们可以知道:

creation
Returns the creation of the local node as an integer. The creation is changed when a node is restarted. The creation of a node is stored in process identifiers, port identifiers, and references. This makes it (to some extent) possible to distinguish between identifiers from different incarnations of a node. Currently valid creations are integers in the range 1..3, but this may (probably will) change in the future. If the node is not alive, 0 is returned.

追踪这个creation的来源,我们知道这个变量来自epmd. 具体点的描述就是每次节点都会像epmd注册名字,epmd会给节点返回这个creation. net_kernel会把这个creation通过set_node这个bif登记到该节点的erts_this_dist_entry->creation中去:

/* erl_node_tables.c */
void
erts_set_this_node(Eterm sysname, Uint creation)
{
...
    erts_this_dist_entry->sysname = sysname;
    erts_this_dist_entry->creation = creation;
...
}
 
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值