node client of elastic search笔记

最新推荐文章于 2025-05-05 22:44:54 发布

原创最新推荐文章于 2025-05-05 22:44:54 发布 · 645 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#elasticsearch

精华专栏收录该内容

24 篇文章

订阅专栏

本文探讨了在Java环境下使用TransportClient和NodeClient的优劣，并提供了选择建议。TransportClient适合与集群解耦的应用场景，尤其适用于快速创建和销毁连接的情况；而NodeClient则因其对集群状态的了解和减少网络跳数的优势，在少量持久连接需求下更为高效。然而，NodeClient将应用程序绑定至集群，可能引发防火墙问题。文章最后强调了推荐使用本地运行的Elasticsearch服务器并通过TransportClient连接，以避免NodeClient带来的问题。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

The node client, on the other hand, is actually a node within the cluster (but does not hold data, and cannot become master)(就是一个node，但是没有数据，也不是master). Because it is a node, it knows the entire cluster state (where all the nodes reside, which shards live in which nodes, and so forth). This means it can execute APIs with one less network hop(可以直接执行各种api，而且网络跳数更少).

If you need only a few long-lived, persistent connection objects to the cluster(只有几个到elastic search集群的长期的连接), a node client can be a bit more efficient since it knows the cluster layout. But it ties your application into the cluster, so it may pose problems from a firewall perspective.

NodeBuilder can also be used to connect to a cluster.

＊＊＊建立NodeClient的方式＊＊＊

Node node = nodeBuilder().clusterName("yourcluster").client(true).node();

Client client = node.client();

＊＊＊备注：建立TransportClient的方式＊＊＊

Client client = TransportClient.builder().build() .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("host1"), 9300)) .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("host2"), 9300)); // on shutdown client.close();

It will join the cluster as another node and will be aware of the whole topology. Using nodes, you can use multicast to discover other running nodes(使用多播发现其它node，而不需要像transport client那样指定ip).

My opinion is that I prefer to use TransportClient than NodeClient because other cluster nodes won't receive useless information when the TransportClient stops. When a NodeClient stops, each node has to know that even if they don't have to manage it as it does not hold any data. Also, I have seen in debug mode that NodeClient starts more Threads than TransportCLient. So I think TransportClient has a lesser memory footprint. By the way, if you are using Spring, you can use the spring-elasticsearch factories for that. If not, you can always have a look at source code to see how I manage NodeClient vs TransportClient. Hope this helps.

EDIT 2016-03-09: NodeClient should not be used. If there is a need for that, people should create a client node (launch an elasticsearch node with node.data: false and node.master: false) and use a TransportClient to connect to it locally.(不要使用NodeClient类，而是直接在node上面跑transport client).

Currently we suggesting users create a Node (using NodeBuilder in 2.x) to have a client that is capable of keeping up-to-date information. This is generally a bad idea as it means elasticsearch has no control over eg max heap size or gc settings(堆大小的管理和gc设置), and is also problematic for users because they must deal with dependency collisions (and in 2.x+ dependencies of elasticsearch itself).

A better alternative, and what we should document, is to run a local elasticsearch server using bin/elasticsearch, and then use the transport client to connect to that local node(跑一个ealstic search node，并且在本地建立transport client连接). This local connection is virtually free, and allows the client code to be completely isolated from the elasticsearch process(用户线程和elasticsearch线程分开). Plugins are then also easy to deal with: just install them in elasticsearch as usual.

原文：

Transport Client Versus Node Client

If you are using Java, you may wonder when to use the transport client versus the node client. As discussed at the beginning of the book, the transport client acts as a communication layer between the cluster and your application. It knows the API and can automatically round-robin between nodes, sniff the cluster for you, and more. But it is external to the cluster, similar to the REST clients.

The node client, on the other hand, is actually a node within the cluster (but does not hold data, and cannot become master). Because it is a node, it knows the entire cluster state (where all the nodes reside, which shards live in which nodes, and so forth). This means it can execute APIs with one less network hop.

There are uses-cases for both clients:

The transport client is ideal if you want to decouple your application from the cluster. For example, if your application quickly creates and destroys connections to the cluster, a transport client is much "lighter" than a node client, since it is not part of a cluster.

Similarly, if you need to create thousands of connections, you don’t want to have thousands of node clients join the cluster. The TC will be a better choice.
On the flipside, if you need only a few long-lived, persistent connection objects to the cluster, a node client can be a bit more efficient since it knows the cluster layout. But it ties your application into the cluster, so it may pose problems from a firewall perspective.