在solrCloud中,我们发起的一次查询请求绝大部分是朝多个shard发起的请求,但是可能存在这么一个情况,我已经知道这次要查询的sahrd是哪一个了,那么如何只向一个shard发起请求呢?这个博客就是看看solrCloud对分布式请求的判断,代码在HttpShardHandler中,看看checkDistributed方法:
/**
* 判断这次请求是不是分布式的请求,根据是不是有zk,
* 如果是的话,则找到由Router决定的要路由到的多个shard,
* 并添加多个shard的多个replica的url,用|分隔,放在rb的shard和slices中
*/
@Override
public void checkDistributed(ResponseBuilder rb) {
SolrQueryRequest req = rb.req;
SolrParams params = req.getParams();
rb.isDistrib = params.getBool("distrib", req.getCore().getCoreDescriptor().getCoreContainer().isZooKeeperAware());// 先检查distrib这个参数,如果指定了则使用,否则默认值是是否启动了zk.
String shards = params.get(ShardParams.SHARDS);// 参数中指定的shards参数。
// for back compat, a shards param with URLs like localhost:8983/solr will mean that this
// search is distributed.
boolean hasShardURL = shards != null && shards.indexOf('/') > 0;
rb.isDistrib = hasShardURL | rb.isDistrib;//由distrib、是否使用zk、是否制定了shards三个参数决定一个请求是否是分布式的,即是否要向多个shard转发请求。
if (rb.isDistrib) {// 如果是分布式的。
// since the cost of grabbing cloud state is still up in the air, we grab it only if we need it.
ClusterState clusterState = null;
Map<String,Slice> slices = null;
CoreDescriptor coreDescriptor = req.getCore().getCoreDescriptor();
CloudDescriptor cloudDescriptor = coreDescriptor.getCloudDescriptor();
ZkController zkController = coreDescriptor.getCoreContainer().getZkController();
if (shards != null) {// 如果在请求的参数中指定了shards,则使用给定的shards
List<String> lst = StrUtils.splitSmart(shards, ",", true);// 可以指定多个要查询的shard,用英文的逗号分隔。
rb.shards = lst.toArray(new String[lst.size()]);
rb.slices = new String[rb.shards.length];
if (zkController != null) {
// figure out which shards are slices
for (int i = 0; i < rb.shards.length; i++) {
if (rb.shards[i].indexOf('/') < 0) {
// this is a logical shard
rb.slices[i] = rb.shards[i];
rb.shards[i] = null;
}
}
}
} else if (zkController != null) {// 如果没有指定shards并且使用了zk
// we weren't provided with an explicit list of slices to query via "shards", so use the cluster state
clusterState = zkController.getClusterState();
String shardKeys = params.get(ShardParams._ROUTE_);// shardKeys就是参数中的_route_,这个指定要路由到的shard,对于任何的Router都可以使用这个值(像Implicit这个Router可以使用域的名字来指定要查找的shard)。
// This will be the complete list of slices we need to query for this request.
slices = new HashMap<>();
// we need to find out what collections this request is for.
// A comma-separated list of specified collections.
// Eg: "collection1,collection2,collection3"
String collections = params.get("collection");// 得到collection,可能有多个collection,有,分隔。
if (collections != null) {
// If there were one or more collections specified in the query, split
// each parameter and store as a separate member of a List.
List<String> collectionList = StrUtils.splitSmart(collections, ",", true);
// In turn, retrieve the slices that cover each collection from the
// cloud state and add them to the Map 'slices'.
for (String collectionName : collectionList) {// 假设只有一个collection.
// The original code produced <collection-name>_<shard-name> when the collections
// parameter was specified (see ClientUtils.appendMap)
// Is this necessary if ony one collection is specified?
// i.e. should we change multiCollection to collectionList.size() > 1?
addSlices(slices, clusterState, params, collectionName, shardKeys, true);// 根据这个collection的路由策略和参数找到所有要请求的shard。这个方法的实现要涉及到docRouter,关于这个博客参见http://suichangkele.iteye.com/blog/2363305这个博客。
}
} else {
// just this collection
String collectionName = cloudDescriptor.getCollectionName();
addSlices(slices, clusterState, params, collectionName, shardKeys, false);
}
// Store the logical slices in the ResponseBuilder and create a new
// String array to hold the physical shards (which will be mapped
// later).
rb.slices = slices.keySet().toArray(new String[slices.size()]);
rb.shards = new String[rb.slices.length];
}
读完了这个代码,便明白了solrCloud对分布式请求的路由的规则,如果我们指定了shards就会使用查找的shard,如果没有指定,则使用collection中的DocRouter根据参数中的_router_来决定要路由到的shard。对于DocRouter的操作在http://suichangkele.iteye.com/blog/2363305这个博客中写了。
本文解析了SolrCloud中分布式查询的工作原理,重点介绍了如何通过配置参数控制查询是否为分布式,以及如何指定特定的shard进行查询。
826

被折叠的 条评论
为什么被折叠?



