elasticsearch 2.2+ index.codec: best_compression启用压缩

Elasticsearch 2.0引入了新的压缩选项best_compression,使用DEFLATE压缩算法代替默认的LZ4,能够在牺牲一定的读写速度的情况下显著减小索引文件大小,最高可减少25.6%的空间占用。

官方说法,来自https://www.elastic.co/guide/en/elasticsearch/reference/2.2/index-modules.html#_static_index_settings:

index.codecThe default value compresses stored data with LZ4 compression, but this can be set tobest_compression which uses DEFLATE for a higher compression ratio, at the expense of slower stored fields performance.

注意:2.1以下都是实验特性!2.2+才稳定!

 

Now you can also enable better compression on the cold nodes by setting index.codec: best_compression in theirconfig/elasticsearch.yml file in order to be able to archive more data with the same amount of disk space. 

摘自:https://www.elastic.co/blog/store-compression-in-lucene-and-elasticsearch

 

下面的数据摘自:https://www.elastic.co/blog/elasticsearch-storage-the-true-story-2.0

The test methodology hasn’t changed so you can check out the old blog post or the README in the Github repo for the details. 

TestString fields_allindex size /w LZ4index size /w DEFLATEexpansion ratio /w LZ4expansion ratio /w DEFLATEImpact of DEFLATE
Structured data file. Original file size: 67644119       
1analyzed and not_analyzed enabled63047579531315920.9320.785-0.157
2analyzed and not_analyzed disabled48271433383271060.7130.566-0.206
3not_analyzeddisabled38920800290147960.5750.428-0.254
3bnot_analyzed, except for 'message' field which is retained and analyzeddisabled65382872495328580.9660.732-0.242
4not_analyzed, except for 'agent' field which is analyzeddisabled43083702320636020.6360.474-0.255
Semi-structured data file.
Original file size: 75037027
       
1analyzed and not_analyzed enabled100478376821327821.3391.094-0.182
2analyzed and not_analyzed disabled75238480569116381.0020.758-0.243
3not_analyzeddisabled71866672535535610.9570.713-0.254
3bnot_analyzed, except for 'message' field which is retained and analyzeddisabled104638750838243981.3941.117-0.198
4not_analyzed, except for 'agent' field which is analyzeddisabled72925624546038820.9710.727-0.251

With the standard LZ4-based compression, the indexed data size to raw data size ratio ranged from 0.575 to 1.394. After enabling DEFLATE-based compression using the best_compression index.codec option, the indexed data size to raw data size ratio range came down to 0.429 to 1.117. Enabling the best_compression option resulted in a 15.7% to 25.6% reduction in indexed data size depending on the test parameters. 

As you can see, the ratio of index size to raw data size can vary greatly based on your mapping configuration, what fields you decide to create/retain, and the characteristics of the data set itself. We encourage you to run similar tests yourself to determine what the data compression/expansion factor is for your data set and application requirements.

Conclusion

There were many amazing features added to Elasticsearch 2.0 worth considering. As we’ve discussed, two of these new features in particular can reduce the hardware footprint required for an Elasticsearch cluster by 15-25% or more: 1) the addition of a best_compression option and 2) enabling doc_values by default. This allows us to get to compression ratios between 0.429 and 1.117.

 














本文转自张昺华-sky博客园博客,原文链接:http://www.cnblogs.com/bonelee/p/6269582.html,如需转载请自行联系原作者


2025-06-05T04:35:48.666Z INFO [beat] instance/beat.go:1059 Process info {"system_info": {"process": {"capabilities": {"inheritable":null,"permitted":null,"effective":null,"bounding":["chown","dac_override","fowner","fsetid","kill","setgid","setuid","setpcap","net_bind_service","net_raw","sys_chroot","mknod","audit_write","setfcap"],"ambient":null}, "cwd": "/usr/share/filebeat", "exe": "/usr/share/filebeat/filebeat", "name": "filebeat", "pid": 7, "ppid": 1, "seccomp": {"mode":"filter","no_new_privs":true}, "start_time": "2025-06-05T04:35:47.720Z"}}} 2025-06-05T04:35:48.666Z INFO instance/beat.go:309 Setup Beat: filebeat; Version: 7.14.1 2025-06-05T04:35:48.667Z INFO [esclientleg] eslegclient/connection.go:100 elasticsearch url: http://101.201.64.180:9200 2025-06-05T04:35:48.667Z INFO [publisher] pipeline/module.go:113 Beat name: d29843b8745b 2025-06-05T04:35:48.668Z INFO [monitoring] log/log.go:118 Starting metrics logging every 30s 2025-06-05T04:35:48.668Z INFO instance/beat.go:473 filebeat start running. 2025-06-05T04:35:48.669Z INFO memlog/store.go:119 Loading data file of '/usr/share/filebeat/data/registry/filebeat' succeeded. Active transaction id=0 2025-06-05T04:35:48.669Z INFO memlog/store.go:124 Finished loading transaction log file for '/usr/share/filebeat/data/registry/filebeat'. Active transaction id=0 2025-06-05T04:35:48.669Z INFO [registrar] registrar/registrar.go:109 States Loaded from registrar: 0 2025-06-05T04:35:48.669Z INFO [crawler] beater/crawler.go:71 Loading Inputs: 1 2025-06-05T04:35:48.670Z INFO [input] log/input.go:164 Configured paths: [/home/bsta/project/oms/logs/*.log /home/bsta/project/oms/logs/*.log.*] {"input_id": "52bf6e82-1152-40b0-b258-6ca233b7d325"} 2025-06-05T04:35:48.670Z INFO [crawler] beater/crawler.go:141 Starting input (ID: 11996131784029043638) 2025-06-05T04:35:48.670Z INFO [crawler] beater/crawler.go:108 Loading and starting Inputs completed. Enabled inputs: 1 root@iZ2ze70gcqx8x6asj2pidjZ:/home/bsta/install/filebeat/config#
06-06
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值