1、关于rpc
先上现象,如下:
erl -name a@192.168.35.147
Erlang/OTP 17 [erts-6.0] [source] [64-bit] [smp:2:2] [async-threads:10] [kernel-poll:false]
Eshell V6.0 (abort with ^G)
(a@192.168.35.147)1> net_kernel:connect('b@192.168.35.147').
true
(a@192.168.35.147)2> nodes().
['b@192.168.35.147']
(a@192.168.35.147)3> rpc:call('b@192.168.35.147', ets, new, [nimeizi, [named_table, public]]).
nimeizi
erl -name b@192.168.35.147
Erlang/OTP 17 [erts-6.0] [source] [64-bit] [smp:2:2] [async-threads:10] [kernel-poll:false]
Eshell V6.0 (abort with ^G)
(b@192.168.35.147)1> nodes().
['a@192.168.35.147']
(b@192.168.35.147)2> ets:info(nimeizi).
undefined
如上图所示,两个连接上的节点a和b,a节点使用远程调用rpc:call/4创建ets表,并且得到执行成功的返回结果,但是在b节点没有该ets表,文档中对rpc:call/4描述为: Evaluates apply(Module, Function, Args) on the node Node and returns the corresponding value Res , or {badrpc, Reason} if the call fails. 查看源码rpc.erl,如下:
call(N,M,F,A) when node() =:= N -> %% Optimize local call
local_call(M, F, A);
call(N,M,F,A) ->
do_call(N, {call,M,F,A,group_leader()}, infinity).
do_call(Node, Request, infinity) ->
rpc_check(catch gen_server:call({?NAME,Node}, Request, infinity));
do_call(Node, Request, Timeout) ->
Tag = make_ref(),
{Receiver,Mref} =
erlang:spawn_monitor(
fun() ->
%% Middleman process. Should be unsensitive to regular
%% exit signals.
process_flag(trap_exit, true),
Result = gen_server:call({?NAME,Node}, Request, Timeout),
exit({self(),Tag,Result})
end),
receive
{'DOWN',Mref,_,_,{Receiver,Tag,Result}} ->
rpc_check(Result);
{'DOWN',Mref,_,_,Reason} ->
%% The middleman code failed. Or someone did
%% exit(_, kill) on the middleman process => Reason==killed
rpc_check_t({'EXIT',Reason})
end.
rpc_check_t({'EXIT', {timeout,_}}) -> {badrpc, timeout};
rpc_check_t(X) -> rpc_check(X).
rpc_check({'EXIT', {{nodedown,_},_}}) -> {badrpc, nodedown};
rpc_check({'EXIT', X}) -> exit(X);
rpc_check(X) -> X.
handle_call({call, Mod, Fun, Args, Gleader}, To, S) ->
handle_call_call(Mod, Fun, Args, Gleader, To, S);
handle_call_call(Mod, Fun, Args, Gleader, To, S) ->
RpcServer = self(),
%% Spawn not to block the rpc server.
{Caller,_} =
erlang:spawn_monitor(
fun () ->
set_group_leader(Gleader),
Reply =
%% in case some sucker rex'es
%% something that throws
case catch apply(Mod, Fun, Args) of
{'EXIT', _} = Exit ->
{badrpc, Exit};
Result ->
Result
end,
RpcServer ! {self(), {reply, Reply}}
end),
{noreply, gb_trees:insert(Caller, To, S)}.
可以看到,当rpc:call/4被调用时,spwan了一个新进程来执行apply(M, F, A),并且把执行结果返回,随后该进程结束,所以类似于ets:new/2这种操作随着该进程结束后ets表也被删除了,有没有解决的办法?看文档中另一个方法rpc:block_call/4,描述如下: Like call/4 , but the RPC server at Node does not create a separate process to handle the call. Thus, this function can be used if the intention of the call is to block the RPC server from any other incoming requests until the request has been handled. The function can also be used for efficiency reasons when very small fast functions are evaluated, for example BIFs that are guaranteed not to suspend.
结论:1、erlang的文档有些描述十分简单,需要和其他描述联合起来查看;2、源码是很好的工具。
2、epmd
C:\Users\pyfn1100>erl -name b@192.168.35.142 -setcookie 123456
Eshell V6.1 (abort with ^G)
(b@192.168.35.142)1> net_kernel:connect('a@192.168.35.147').
false
erl -name a@192.168.35.147 -setcookie 123456
Erlang/OTP 17 [erts-6.0] [source] [64-bit] [smp:2:2] [async-threads:10] [kernel-poll:false]
Eshell V6.0 (abort with ^G)
(a@192.168.35.147)1> net_kernel:connect('b@192.168.35.142').
true
(a@192.168.35.147)2> nodes().
['b@192.168.35.142']
如图所示,b节点在windows上,a节点在linux上,b节点连接a失败,a连接b成功,防火墙全部关闭,追查原因不明,初步怀疑epmd. 关于epmd详细,链接:
http://blog.yufeng.info/archives/2169
http://mryufeng.iteye.com/blog/288235
http://mryufeng.iteye.com/blog/120666