esJAVA客户端

Index  API

The index API allows one to index a typed JSON document into a specific index and make it searchable.索引API允许将类型化JSON文档索引到特定索引中,并使其可搜索。

Internally, each type is converted to byte[] (so a String is converted to a byte[]). Therefore, if the object is in this form already, then use it. The jsonBuilder is highly optimized JSON generator that directly constructs a byte[].

在内部,每个类型都被转换为byte[](因此字符串被转换为byte[])。因此,如果对象已经以这种形式存在,那么就使用它。jsonBuilder是高度优化的JSON生成器,它直接构造一个字节[]。(所有JSON对象都要转为byte形式)


ES的java客户端API:(ES必用:集群名称,节点(ip:端口),JSON,序列化,索引名称,索引类型)

https://blog.youkuaiyun.com/chupengfei_521/article/details/72812613

1.添加maven坐标依赖

2.获取/关闭客户端:

// on startup

TransportClient client = new PreBuiltTransportClient(Settings.EMPTY)
        .addTransportAddress(new TransportAddress(InetAddress.getByName("host1"), 9300))
        .addTransportAddress(new TransportAddress(InetAddress.getByName("host2"), 9300));

// on shutdown

client.close();

Note that you have to set the cluster name if you use one different than "elasticsearch":

Settings settings = Settings.builder()
        .put("cluster.name", "myClusterName").build();
TransportClient client = new PreBuiltTransportClient(settings);
//Add transport addresses and do something with the client...

Transport客户端带有集群嗅探功能,可以动态添加新主机并删除旧主机。当启用嗅探时,传输客户端将连接到其内部节点列表中的节点,该列表通过调用构建addTransportAddress之后,客户端将调用这些节点上的内部群集状态API来发现可用的数据节点。客户端的内部节点列表将仅替换为那些数据节点。此列表默认每五秒刷新一次。请注意,嗅探器连接的IP地址是 在那些节点的Elasticsearch配置中声明为发布地址的IP地址

请记住,如果该节点不是数据节点,则该列表可能不包括其连接到的原始节点。例如,如果您最初在嗅探后连接到主节点,则不会有更多请求转到该主节点,而是转而转到任何数据节点。传输客户端排除非数据节点的原因是为了避免将搜索流量发送到仅主节点。

为了启用嗅探,请设置client.transport.snifftrue


3.创建索引:

/*第一个参数索引名称,第二个参数类型名称,与关系型数据库对应大致为:索引相当于数据库名,类型相当于表名,一个索引下可以有多个类型*/

res = client.prepareIndex("chupengfei","xiaofei").setSource(s).execute().actionGet();

4.索引(动词)文档

Index documentedit

The following example indexes a JSON document into an index called twitter, under a type called tweet, with id valued 1:

import static org.elasticsearch.common.xcontent.XContentFactory.*;

IndexResponse response = client.prepareIndex("twitter", "tweet", "1")
        .setSource(jsonBuilder()
                    .startObject()
                        .field("user", "kimchy")
                        .field("postDate", new Date())
                        .field("message", "trying out Elasticsearch")
                    .endObject()
                  )
        .get();

Note that you can also index your documents as JSON String and that you don’t have to give an ID:

String json = "{" +
        "\"user\":\"kimchy\"," +
        "\"postDate\":\"2013-01-30\"," +
        "\"message\":\"trying out Elasticsearch\"" +
    "}";

IndexResponse response = client.prepareIndex("twitter", "tweet")
        .setSource(json)
        .get();

IndexResponse object will give you a report:

// Index name
String _index = response.getIndex();
// Type name
String _type = response.getType();
// Document ID (generated or not)
String _id = response.getId();
// Version (if it's the first time you index this document, you will get: 1)
long _version = response.getVersion();
// isCreated() is true if the document is a new one, false if it has been updated
boolean created = response.isCreated();

5.得到文档

GetResponse response = client.prepareGet("twitter", "tweet", "1").get();

6.更新文档

Update APIedit

You can either create an UpdateRequest and send it to the client:

UpdateRequest updateRequest = new UpdateRequest();
updateRequest.index("index");
updateRequest.type("type");
updateRequest.id("1");
updateRequest.doc(jsonBuilder()
        .startObject()
            .field("gender", "male")
        .endObject());
client.update(updateRequest).get();

Or you can use prepareUpdate() method:

client.prepareUpdate("ttl", "doc", "1")
        .setScript(new Script("ctx._source.gender = \"male\""  , ScriptService.ScriptType.INLINE, null, null))
        .get();

client.prepareUpdate("ttl", "doc", "1")
        .setDoc(jsonBuilder()               
            .startObject()
                .field("gender", "male")
            .endObject())
        .get();

Your script. It could also be a locally stored script name. In that case, you’ll need to use ScriptService.ScriptType.FILE

Document which will be merged to the existing one.

Note that you can’t provide both script and doc.

Update by scriptedit

The update API allows to update a document based on a script provided:

UpdateRequest updateRequest = new UpdateRequest("ttl", "doc", "1")
        .script(new Script("ctx._source.gender = \"male\""));
client.update(updateRequest).get();

Update by merging documentsedit

The update API also support passing a partial document, which will be merged into the existing document (simple recursive merge, inner merging of objects, replacing core "keys/values" and arrays). For example:

UpdateRequest updateRequest = new UpdateRequest("index", "type", "1")
        .doc(jsonBuilder()
            .startObject()
                .field("gender", "male")
            .endObject());
client.update(updateRequest).get();

Upsertedit

There is also support for upsert. If the document does not exist, the content of the upsert element will be used to index the fresh doc:

IndexRequest indexRequest = new IndexRequest("index", "type", "1")
        .source(jsonBuilder()
            .startObject()
                .field("name", "Joe Smith")
                .field("gender", "male")
            .endObject());
UpdateRequest updateRequest = new UpdateRequest("index", "type", "1")
        .doc(jsonBuilder()
            .startObject()
                .field("gender", "male")
            .endObject())
        .upsert(indexRequest);              
client.update(updateRequest).get();

If the document does not exist, the one in indexRequest will be added

If the document index/type/1 already exists, we will have after this operation a document like:

{
    "name"  : "Joe Dalton",
    "gender": "male"        
}

This field is added by the update request

If it does not exist, we will have a new document:

{
    "name" : "Joe Smith",
    "gender": "male"
}

7.得到多个文档

Multi Get APIedit

The multi get API allows to get a list of documents based on their indextype and id:

MultiGetResponse multiGetItemResponses = client.prepareMultiGet()
    .add("twitter", "tweet", "1")           
    .add("twitter", "tweet", "2", "3", "4") 
    .add("another", "type", "foo")          
    .get();

for (MultiGetItemResponse itemResponse : multiGetItemResponses) { 
    GetResponse response = itemResponse.getResponse();
    if (response.isExists()) {                      
        String json = response.getSourceAsString(); 
    }
}

get by a single id

or by a list of ids for the same index / type

you can also get from another index

iterate over the result set

you can check if the document exists

access to the _source field

8.索引或删除多个文档

Bulk APIedit

The bulk API allows one to index and delete several documents in a single request. Here is a sample usage:

import static org.elasticsearch.common.xcontent.XContentFactory.*;

BulkRequestBuilder bulkRequest = client.prepareBulk();

// either use client#prepare, or use Requests# to directly build index/delete requests
bulkRequest.add(client.prepareIndex("twitter", "tweet", "1")
        .setSource(jsonBuilder()
                    .startObject()
                        .field("user", "kimchy")
                        .field("postDate", new Date())
                        .field("message", "trying out Elasticsearch")
                    .endObject()
                  )
        );

bulkRequest.add(client.prepareIndex("twitter", "tweet", "2")
        .setSource(jsonBuilder()
                    .startObject()
                        .field("user", "kimchy")
                        .field("postDate", new Date())
                        .field("message", "another post")
                    .endObject()
                  )
        );

BulkResponse bulkResponse = bulkRequest.get();
if (bulkResponse.hasFailures()) {
    // process failures by iterating through each bulk response item
}

9.处理批量操作文档

Using Bulk Processoredit

The BulkProcessor class offers a simple interface to flush bulk operations automatically based on the number or size of requests, or after a given period.

To use it, first create a BulkProcessor instance:

import org.elasticsearch.action.bulk.BackoffPolicy;
import org.elasticsearch.action.bulk.BulkProcessor;
import org.elasticsearch.common.unit.ByteSizeUnit;
import org.elasticsearch.common.unit.ByteSizeValue;
import org.elasticsearch.common.unit.TimeValue;

BulkProcessor bulkProcessor = BulkProcessor.builder(
        client,  
        new BulkProcessor.Listener() {
            @Override
            public void beforeBulk(long executionId,
                                   BulkRequest request) { ... } 

            @Override
            public void afterBulk(long executionId,
                                  BulkRequest request,
                                  BulkResponse response) { ... } 

            @Override
            public void afterBulk(long executionId,
                                  BulkRequest request,
                                  Throwable failure) { ... } 
        })
        .setBulkActions(10000) 
        .setBulkSize(new ByteSizeValue(1, ByteSizeUnit.GB)) 
        .setFlushInterval(TimeValue.timeValueSeconds(5)) 
        .setConcurrentRequests(1) 
        .setBackoffPolicy(
            BackoffPolicy.exponentialBackoff(TimeValue.timeValueMillis(100), 3)) 
        .build();

Add your elasticsearch client

This method is called just before bulk is executed. You can for example see the numberOfActions with request.numberOfActions()

This method is called after bulk execution. You can for example check if there was some failing requests with response.hasFailures()

This method is called when the bulk failed and raised a Throwable

We want to execute the bulk every 10 000 requests

We want to flush the bulk every 1gb

We want to flush the bulk every 5 seconds whatever the number of requests

Set the number of concurrent requests. A value of 0 means that only a single request will be allowed to be executed. A value of 1 means 1 concurrent request is allowed to be executed while accumulating new bulk requests.

Set a custom backoff policy which will initially wait for 100ms, increase exponentially and retries up to three times. A retry is attempted whenever one or more bulk item requests have failed with an EsRejectedExecutionException which indicates that there were too little compute resources available for processing the request. To disable backoff, pass BackoffPolicy.noBackoff().

Then you can simply add your requests to the BulkProcessor:

bulkProcessor.add(new IndexRequest("twitter", "tweet", "1").source(/* your doc here */));
bulkProcessor.add(new DeleteRequest("twitter", "tweet", "2"));

By default, BulkProcessor:

  • sets bulkActions to 1000
  • sets bulkSize to 5mb
  • does not set flushInterval
  • sets concurrentRequests to 1
  • sets backoffPolicy to an exponential backoff with 8 retries and a start delay of 50ms. The total wait time is roughly 5.1 seconds.

When all documents are loaded to the BulkProcessor it can be closed by using awaitClose or closemethods:

bulkProcessor.awaitClose(10, TimeUnit.MINUTES);

or

bulkProcessor.close();

Both methods flush any remaining documents and disable all other scheduled flushes if they were scheduled by setting flushInterval. If concurrent requests were enabled the awaitClose method waits for up to the specified timeout for all bulk requests to complete then returns true, if the specified waiting time elapses before all bulk requests complete, false is returned. The closemethod doesn’t wait for any remaining bulk requests to complete and exits immediately.


搜索文档:

The search API allows one to execute a search query and get back search hits that match the query. It can be executed across one or more indices and across one or more types. The query can provided using the query Java API. The body of the search request is built using the SearchSourceBuilder. Here is an example:

import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.search.SearchType;
import org.elasticsearch.index.query.QueryBuilders.*;
SearchResponse response = client.prepareSearch("index1", "index2")
        .setTypes("type1", "type2")
        .setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
        .setQuery(QueryBuilders.termQuery("multi", "test"))                 // Query
        .setPostFilter(QueryBuilders.rangeQuery("age").from(12).to(18))     // Filter
        .setFrom(0).setSize(60).setExplain(true)
        .execute()
        .actionGet();

Note that all parameters are optional. Here is the smallest search call you can write:

// MatchAll on the whole cluster with all default options

SearchResponse response = client.prepareSearch().execute().actionGet();

Using scrolls in Javaedit

Read the scroll documentation first!

import static org.elasticsearch.index.query.QueryBuilders.*;

QueryBuilder qb = termQuery("multi", "test");

SearchResponse scrollResp = client.prepareSearch(test)
        .addSort(SortParseElement.DOC_FIELD_NAME, SortOrder.ASC)
        .setScroll(new TimeValue(60000))
        .setQuery(qb)
        .setSize(100).execute().actionGet(); //100 hits per shard will be returned for each scroll
//Scroll until no hits are returned
while (true) {

    for (SearchHit hit : scrollResp.getHits().getHits()) {
        //Handle the hit...
    }
    scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet();
    //Break condition: No hits are returned
    if (scrollResp.getHits().getHits().length == 0) {
        break;
    }
}
Note

The size-parameter is per shard, so if you run a query against multiple indices (leading to many shards being involved in the query) the result might be more documents per execution of the scroll than you would expect!


Full text queriesedit

The high-level full text queries are usually used for running full text queries on full text fields like the body of an email. They understand how the field being queried is analyzed and will apply each field’sanalyzer (or search_analyzer) to the query string before executing.

The queries in this group are:

match query
The standard query for performing full text queries, including fuzzy matching and phrase or proximity queries.
multi_match query
The multi-field version of the  match query.
common_terms query
A more specialized query which gives more preference to uncommon words.
query_string query
Supports the compact Lucene query string syntax, allowing you to specify AND|OR|NOT conditions and multi-field search within a single query string. For expert users only.
simple_query_string
A simpler, more robust version of the  query_string syntax suitable for exposing directly to users.

Match Queryedit

See Match Query

QueryBuilder qb = matchQuery(
    "name",                  
    "kimchy elasticsearch"   
);

field

text

Multi Match Queryedit

See Multi Match Query

QueryBuilder qb = multiMatchQuery(
    "kimchy elasticsearch", 
    "user", "message"       
);

text

fields

Common Terms Queryedit

See Common Terms Query

QueryBuilder qb = commonTermsQuery("name",    
                                   "kimchy"); 

field

value

Query String Queryedit

See Query String Query

QueryBuilder qb = queryStringQuery("+kimchy -elasticsearch");    

text

Simple Query String Queryedit

See Simple Query String Query

QueryBuilder qb = simpleQueryStringQuery("+kimchy -elasticsearch");    

text


MultiSearch APIedit

See MultiSearch API Query documentation

SearchRequestBuilder srb1 = client
    .prepareSearch().setQuery(QueryBuilders.queryStringQuery("elasticsearch")).setSize(1);
SearchRequestBuilder srb2 = client
    .prepareSearch().setQuery(QueryBuilders.matchQuery("name", "kimchy")).setSize(1);

MultiSearchResponse sr = client.prepareMultiSearch()
        .add(srb1)
        .add(srb2)
        .execute().actionGet();

// You will get all individual responses from MultiSearchResponse#getResponses()
long nbHits = 0;
for (MultiSearchResponse.Item item : sr.getResponses()) {
    SearchResponse response = item.getResponse();
    nbHits += response.getHits().getTotalHits();
}




Compound queriesedit

Compound queries wrap other compound or leaf queries, either to combine their results and scores, to change their behaviour, or to switch from query to filter context.

The queries in this group are:

constant_score query
A query which wraps another query, but executes it in filter context. All matching documents are given the same “constant”  _score.
bool query
The default query for combining multiple leaf or compound query clauses, as  mustshouldmust_not, or  filter clauses. The  must and  should clauses have their scores combined — the more matching clauses, the better — while the  must_not and  filter clauses are executed in filter context.
dis_max query
A query which accepts multiple queries, and returns any documents which match any of the query clauses. While the  bool query combines the scores from all matching queries, the  dis_max query uses the score of the single best- matching query clause.
function_score query
Modify the scores returned by the main query with functions to take into account factors like popularity, recency, distance, or custom algorithms implemented with scripting.
boosting query
Return documents which match a  positive query, but reduce the score of documents which also match a  negative query.
indices query
Execute one query for the specified indices, and another for other indices.
andornot
Synonyms for the  bool query.
filtered query
Combine a query clause in query context with another in filter context.  [2.0.0]Deprecated in 2.0.0. Use the bool query instead
limit query
Limits the number of documents examined per shard.

Constant Score Queryedit

See Constant Score Query

QueryBuilder qb = constantScoreQuery(
        termQuery("name","kimchy")      
    )
    .boost(2.0f);                       

your query

query score

Bool Queryedit

See Bool Query

QueryBuilder qb = boolQuery()
    .must(termQuery("content", "test1"))    
    .must(termQuery("content", "test4"))    
    .mustNot(termQuery("content", "test2")) 
    .should(termQuery("content", "test3")); 

 

must query

must not query

should query


Bool Queryedit

A query that matches documents matching boolean combinations of other queries. The bool query maps to Lucene BooleanQuery. It is built using one or more boolean clauses, each clause with a typed occurrence. The occurrence types are:



 

must

The clause (query) must appear in matching documents and will contribute to the score.

filter

The clause (query) must appear in matching documents. However unlike must the score of the query will be ignored.

should

The clause (query) should appear in the matching document. In a boolean query with no must or filter clauses, one or more should clauses must match a document. The minimum number of should clauses to match can be set using the minimum_should_matchparameter.

must_not

The clause (query) must not appear in the matching documents.


Bool Queryedit

A query that matches documents matching boolean combinations of other queries. The bool query maps to Lucene BooleanQuery. It is built using one or more boolean clauses, each clause with a typed occurrence. The occurrence types are:

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值