[zz]Tokyo Cabinet Observations

[url]http://parand.com/say/index.php/2009/04/09/tokyo-cabinet-observations/[/url]


I’m using Tokyo Cabinet with Python tc for a decent sized amount of data (~19G in a single hash table) on OS X. A few observations and oddities:

* Writes slow down significantly as the database size grows. I’m writing 97 roughly equal sized batches to the tch table. The first batch takes ~40 seconds, and processing time seems to increase fairly linearly, with the last taking ~14 minutes. Not sure why this would be the case, but it’s discouraging. I’ll probably write a simple partitioning scheme to split the data into multiple databases and keep the size of each small, but it seems like this should be handled out of the box for me.
* [Update] I implemented a simple partitioning scheme, and sure enough it makes a big difference. Apparently keeping the file size small (where small is < 500G) is important. Surprising - why doens’t TC implement partitioning if it’s susceptible to performance issues with larger file sizes? Is this a python tc issue or a Tokyo Cabinet issue?
* [Also] Seems I can only open 53-54 tc.HDB()’s before I get an ‘mmap error’, limiting how much I can partition.
* Reading records that have already been read from the tch seems to go much faster on the second access (like an order of magnitude faster). I suspect this is the disk cache at work, but if anyone has extra info on this please enlighten me.

Another somewhat surprising aspect: using the tc library you’re essentially embedding Tokyo Cabinet in your app; I had assumed it was going to be network based access, but it’s not. You can do network access either using the memcached protocol or using pytyrant.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值