问题
I am implementing a feature which requires looking up Cassandra by a list of primary keys.
Below is an example data where id is primary key
mytable
id column1
1 423
2 542
3 678
4 45534
5 435634
6 2435
7 678
8 4564
9 546
Most of my queries a lookup by id, but for some special cases I would like to get data for a list of ids.
The way I am currently doing is a follows:
public Object fetchFromCassandraForId(int id);
int ids[] = {1, 3, 5, 7, 9};
List results;
for(int id: ids) {
results.add(fetchFromCassandraForId(id));
}
This results in issuing multiple network call to cassandra, Is it possible to batch this somehow, therefore i would like to know if cassandra supports fast lookup by list of ids
select coulmn1 from mytable where id in (1, 3, 5, 7, 9);
?
Any help or pointers would be appreciated?
回答1:
If the id is the full primary key, then Cassandra supports this, although it's not recommended from performance point of view:
request is sent to coordinator node
coordinator node finds a replica for each of the id, and send individual request to them
wait for results from every node, collect them to result set & send back
As result:
all your sub-queries need to wait for slowest of the replicas
you have an additional network hope from coordinator to replica
you put more pressure to the coordinator node as it need to keep results in memory
If you do a lot of parallel, asynchronous requests for each of the id values from application, then you:
avoid an additional hop - if you're using prepared statements with token-aware load balancing, then query is sent directly to replicas
you may start to process results as you get them, not waiting for everything
So sending parallel asynchronous requests could be faster than sending one request with IN...
来源:https://stackoverflow.com/questions/62643342/cassandra-lookup-by-list-of-primary-keys-in-java
针对使用Cassandra查询多个主键的问题,目前的方式会导致多次网络请求。虽然Cassandra支持通过IN子句批量查询,但这种方法在性能上不推荐,因为协调节点需要向每个副本发送请求并等待所有结果。并发异步请求可能更快,直接将请求发送到副本,并能边接收结果边处理,减少网络延迟和协调节点的压力。
3384

被折叠的 条评论
为什么被折叠?



