JSON Compression algorithms

本文探讨了两种JSON压缩算法:CJSON和HPack。通过分析不同大小的JSON文件压缩前后的变化,比较了它们在效率、大小上的差异,并讨论了在实际应用中选择压缩策略的考量因素。

JSON Compression algorithms
作者 Alex 原文链接
About
JSON (Java Script Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It can be used as a data interchange format, just like XML. When comparing JSON to XML, it has several advantages over the last one. JSON is really simple, it has a self-documenting format, it is much shorter because there is no data configuration overhead. That is why JSON is considered a fat-free alternative to XML.
However, the purpose of this post is not to discuss the pros and cons of JSON over XML. Though it is one of the most used data interchanged format, there is still room for improvement. For instance, JSON uses excessively quotes and key names are very often repeated. This problem can be solved by JSON compression algorithms. There are more than one available. Here you’ll find an analysis of two JSON compressors algorithms and a conclusion whether JSON compression is useful and when it should be used.

Compressing JSON with CJSON algorithm
CSJON compress the JSON with automatic type extraction. It tackles the most pressing problem: the need to constantly repeat key names over and over. Using this compression algorithm, the following JSON:
[
{ // This is a point
“x”: 100,
“y”: 100
}, { // This is a rectangle
“x”: 100,
“y”: 100,
“width”: 200,
“height”: 150
},
{}, // an empty object
]
Can be compressed as:
{
“templates”: [
[0, “x”, “y”], [1, “width”, “height”]
],
“values”: [
{ “values”: [ 1, 100, 100 ] },
{ “values”: [2, 100, 100, 200, 150 ] },
{}
]
}
The more detailed description of the compression algorithm, along with the source code can be found here:
Compressing JSON with HPack algorithm
JSON.hpack is a lossless, cross language, performances focused, data set compressor. It is able to reduce up to 70% number of characters used to represent a generic homogeneous collection. This algorithms provides several level of compression (from 0 to 4). The level 0 compression performs the most basic compression by removing keys (property names) from the structure creating a header on index 0 with each property name. Next levels make it possible to reduce even more the size of the JSON by assuming that there are duplicated entries.
For the following JSON:

[{
name : “Andrea”,
age : 31,
gender : “Male”,
skilled : true
}, {
name : “Eva”,
age : 27,
gender : “Female”,
skilled : true
}, {
name : “Daniele”,
age : 26,
gender : “Male”,
skilled : false
}]
the hpack algorithm produces a compressed version which looks like this:
[[“name”,”age”,”gender”,”skilled”],[“Andrea”,31,”Male”,true],[“Eva”,27,”Female”,true],[“Daniele”,26,”Male”,false]]
More details about hpack algorithm can be found at project home page.
Analysis
The purpose of this analysis is to compare each of the described JSON compressor algorithms. For this purpose we will use 5 files with JSON content having different dimensions, varying from 50K to 1MB. Each JSON file will be served to a browser using a servlet container (tomcat) with the following transformations:
Unmodified JSON - no change on the server side
Minimized JSON - remove whitespaces and new lines (most basic js optimization)
Compressed JSON using CJSON algorithm
Compressed JSON using HPack algorithm
Gzipped JSON - no change on the server side
Gzipped and minimized JSON
Gzipped and compressed using CJSON algorithm
Gzipped and compressed using HPack algorithm
Results
This table contains the results of the benchmark. Each row of the table contains one of the earlier mentioned transformation. The table has 5 columns, one for each JSON file we process.
\
json1 json2 json3 json4 json5
Original JSON size (bytes) 52966 104370 233012 493589 1014099
Minimized 33322 80657 180319 382396 776135
Compress CJSON 24899 48605 108983 231760 471230
Compress HPack 5727 10781 23162 49099 99575
Gzipped 2929 5374 11224 23167 43550
Gzipped and Minimized 2775 5035 10411 21319 42083
Gzipped and compressed with CJSON 2568 4605 9397 19055 37597
Gzipped and compressed with HPack 1982 3493 6981 13998 27358

Relative size of transformations(%)

The relative size of transformation graphic is useful to see if the size of the json to compress affects the efficiency of compression or minimization. You can notice the following:
the minimization is much more efficient for smaller files. (~60%)
for large and very large json files, the minimization has constant efficiency (~75%)
compressors algorithms has the same efficency for any size of json file
CJson compressing algorithm is less efficient (~45%) than hpack algorithm (~8%)
CJson compressing algorithm is slower than hpack algorihtm
Gzipped content has almost the same size as the compressed content
Combining compression with gzip or minimization with gzip, doesn’t improve significantly efficiency (only about 1-2%)
Conclusion
Both JSON compression algorithms are supported by wro4j since version 1.3.8 by the following processors: CJsonProcessor & JsonHPackProcessor. Both of them provide the following methods: pack & unpack. The underlying implementation uses Rhino engine to run the javascript code on the serverside.
JSON Compression algorithms considerably reduce json file size. There a several compression algorithms. We have covered two of them: CJson and HPack. HPack seems to be much more efficient than CJson and also significantly faster. When two entities exchange JSON and the source compress it before it reach the target, the client (target) have to apply the inverse operation of compression (unpacking), otherwise the JSON cannot be used. This introduce a small overhead which must be taken into account when deciding if JSON compression should be used or not.

When gziping of content is allowed, it has a better efficiency than any other compression algorithm. In conclusion, it doesn’t worth to compress a JSON on the server if the client accept the gzipped content. The compression on the server-side does make sense when the client doesn’t know how to work with gzipped content and it is important to keep the traffic volue as low as possible (due to cost and time).

Another use-case for JSON compression algorithm is sending a large JSON content from client to server (which is sent ungzipped). In this case, it is important to unpack the JSON content on the server before consuming it.

C:\Users\lenovo>gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=F:/mingw64/bin/../libexec/gcc/x86_64-w64-mingw32/15.1.0/lto-wrapper.exe Target: x86_64-w64-mingw32 Configured with: ../../../src/gcc-15.1.0/configure --host=x86_64-w64-mingw32 --build=x86_64-w64-mingw32 --target=x86_64-w64-mingw32 --prefix=/mingw64 --with-sysroot=/c/buildroot/x86_64-1510-posix-seh-ucrt-rt_v12-rev0/mingw64 --enable-host-shared --disable-multilib --enable-languages=c,c++,fortran,lto --enable-libstdcxx-time=yes --enable-threads=posix --enable-libgomp --enable-libatomic --enable-lto --enable-graphite --enable-checking=release --enable-mingw-wildcard --enable-fully-dynamic-string --enable-version-specific-runtime-libs --enable-libstdcxx-filesystem-ts=yes --disable-libssp --disable-libstdcxx-pch --disable-libstdcxx-debug --enable-bootstrap --disable-rpath --disable-win32-registry --disable-nls --disable-werror --disable-symvers --with-gnu-as --with-gnu-ld --with-arch=nocona --with-tune=core2 --with-libiconv --with-system-zlib --with-gmp=/c/buildroot/prerequisites/x86_64-w64-mingw32-static --with-mpfr=/c/buildroot/prerequisites/x86_64-w64-mingw32-static --with-mpc=/c/buildroot/prerequisites/x86_64-w64-mingw32-static --with-isl=/c/buildroot/prerequisites/x86_64-w64-mingw32-static --with-pkgversion='x86_64-posix-seh-rev0, Built by MinGW-Builds project' --with-bugurl=https://github.com/niXman/mingw-builds LD_FOR_TARGET=/c/buildroot/x86_64-1510-posix-seh-ucrt-rt_v12-rev0/mingw64/bin/ld.exe --with-boot-ldflags='-pipe -fno-ident -L/c/buildroot/x86_64-1510-posix-seh-ucrt-rt_v12-rev0/mingw64/opt/lib -L/c/buildroot/prerequisites/x86_64-zlib-static/lib -L/c/buildroot/prerequisites/x86_64-w64-mingw32-static/lib -Wl,--disable-dynamicbase -static-libstdc++ -static-libgcc' Thread model: posix Supported LTO compression algorithms: zlib gcc version 15.1.0 (x86_64-posix-seh-rev0, Built by MinGW-Builds project) 请跟据这个修改
最新发布
07-20
<think>我们已知用户已经安装了MinGW GCC 15.1.0,并且希望修改VS Code的tasks.json文件以使用该编译器。 根据引用[3]中的示例,我们需要修改tasks.json中的"command"为实际g++.exe的路径,并在args中添加必要的编译选项。 用户当前的tasks.json已经包含了一些基本参数,但需要根据新的编译器和需求进行调整。 用户当前的tasks.json内容为: { "tasks": [ { "type": "cppbuild", "label": "C/C++: g++.exe 生成活动文件", "command": "C:\\Program Files\\mingw64\\bin\\g++.exe", "args": [ "-fdiagnostics-color=always", "-g", "-std=c++11", "-pthread", "${file}", "-o", "${fileDirname}\\${fileBasenameNoExtension}.exe" ], "options": { "cwd": "${fileDirname}" }, "problemMatcher": [ "$gcc" ], "group": { "kind": "build", "isDefault": true }, "detail": "调试器生成的任务。" } ], "version": "2.0.0" } 用户需求: 1. 使用MinGW GCC 15.1.0编译器(注意:路径可能已经改变,需要确认) 2. 确保支持C++11标准(当前已经包含-std=c++11) 3. 支持多线程(当前已经包含-pthread) 但是,用户可能希望使用更高版本的C++标准(如C++17或C++20),但当前配置为C++11。如果用户没有特别要求,我们可以保持C++11,或者根据GCC 15.1.0支持的标准推荐使用更高标准。 另外,注意用户当前编译器的路径是"C:\\Program Files\\mingw64\\bin\\g++.exe",如果用户安装的是GCC 15.1.0,这个路径可能没有改变,或者用户可能安装到了其他位置。需要用户确认。 修改建议: 1. 确认编译器路径:如果GCC 15.1.0的路径与当前配置一致,则无需修改;如果不一致,需要修改为正确的路径(注意转义,如使用双反斜杠)。 2. 如果需要更高C++标准,可以修改-std参数,例如-std=c++17或-std=c++20。 由于用户没有明确要求C++标准版本,我们可以保持原样,或者建议用户根据项目需求修改。 因此,如果用户确认编译器路径没有变化,且只需要C++11,那么当前配置已经满足。但为了确保正确性,我们建议用户检查编译器路径。 修改后的tasks.json示例(假设路径不变,且使用C++17标准): { "tasks": [ { "type": "cppbuild", "label": "C/C++: g++.exe 生成活动文件", "command": "C:\\Program Files\\mingw64\\bin\\g++.exe", "args": [ "-fdiagnostics-color=always", "-g", "-std=c++17", // 修改为C++17标准,GCC 15.1.0支持 "-pthread", "${file}", "-o", "${fileDirname}\\${fileBasenameNoExtension}.exe" ], "options": { "cwd": "${fileDirname}" }, "problemMatcher": [ "$gcc" ], "group": { "kind": "build", "isDefault": true }, "detail": "调试器生成的任务。" } ], "version": "2.0.0" } 注意:如果用户将MinGW安装到了其他位置,需要修改"command"字段。例如,如果安装路径是E:\mingw64\bin\g++.exe,则应该写为: "command": "E:\\mingw64\\bin\\g++.exe", 步骤: 1. 打开VS Code,进入工作区。 2. 打开`.vscode`文件夹下的`tasks.json`文件。 3. 修改`command`为正确的g++.exe路径(如果路径有变化)。 4. 修改`args`中的`-std`参数为所需的标准(可选)。 5. 保存文件。 验证: 1. 创建一个简单的C++程序(例如包含线程的程序)。 2. 按Ctrl+Shift+B进行编译,观察是否成功。 3. 运行生成的可执行文件,检查是否正常运行。 如果遇到问题,可以检查以下方面: - 路径是否正确(注意空格和转义)。 - 编译器版本是否匹配(在终端中输入`g++ --version`确认)。 - 如果使用多线程,确保`-pthread`参数存在。 此外,引用[2]提到了`c_cpp_properties.json`的配置,如果智能感知(IntelliSense)有问题,可能需要在该文件中配置正确的编译器路径和包含路径。但用户的问题集中在tasks.json(编译任务),所以这里只讨论tasks.json。 因此,我们给出修改后的tasks.json配置,用户根据实际情况调整路径和标准版本。</think>根据您提供的 MinGW GCC 15.1.0 环境,以下是修改 VS Code `tasks.json` 文件的完整指南: ### 修改后的 tasks.json 配置 ```json { "version": "2.0.0", "tasks": [ { "type": "cppbuild", "label": "GCC 15.1.0 Build", "command": "C:/Program Files/mingw64/bin/g++.exe", // 修改为您的实际路径 "args": [ "-fdiagnostics-color=always", "-g", "-std=c++17", // 推荐使用 C++17 标准 "-pthread", "${file}", "-o", "${fileDirname}\\${fileBasenameNoExtension}.exe" ], "options": { "cwd": "${fileDirname}" }, "problemMatcher": ["$gcc"], "group": { "kind": "build", "isDefault": true }, "detail": "使用 MinGW GCC 15.1.0 编译" } ] } ``` ### 关键修改说明 1. **编译器路径更新**: ```json "command": "C:/Program Files/mingw64/bin/g++.exe" ``` 请确认路径与您的实际安装位置一致(路径中可使用正斜杠`/`避免转义问题) 2. **C++标准升级**: ```json "-std=c++17" // GCC 15.1.0 完全支持 C++17 ``` 也可根据需求改为 `-std=c++20` 或 `-std=c++11` 3. **多线程支持**: ```json "-pthread" // 必需选项,支持 std::thread 等特性 ``` 4. **诊断优化**: ```json "-fdiagnostics-color=always" // 彩色错误提示 "-g" // 生成调试信息 ``` ### 验证步骤 1. **检查编译器路径**: 在终端运行: ```bash "C:/Program Files/mingw64/bin/g++.exe" --version ``` 应显示:`g++.exe (MinGW-W64 x86_64-ucrt-posix-seh) 15.1.0` 2. **测试编译命令**: ```bash g++ -std=c++17 -pthread test.cpp -o test.exe ``` 3. **创建测试文件** (`thread_test.cpp`): ```cpp #include <iostream> #include <thread> #include <vector> void worker(int id) { std::cout << "Thread " << id << " working\n"; } int main() { std::vector<std::thread> threads; for(int i=0; i<5; ++i) { threads.emplace_back(worker, i); } for(auto& t : threads) t.join(); std::cout << "All threads completed\n"; return 0; } ``` 4. **在 VS Code 中**: - 按 `Ctrl+Shift+B` 编译 - 在终端运行:`.\thread_test.exe` ### 预期输出 ``` Thread 0 working Thread 1 working Thread 2 working Thread 3 working Thread 4 working All threads completed ``` ### 配套配置建议 同时修改 `c_cpp_properties.json`: ```json { "configurations": [ { "name": "Win32", "compilerPath": "C:/Program Files/mingw64/bin/g++.exe", "cppStandard": "c++17", "intelliSenseMode": "windows-gcc-x64", "includePath": [ "${workspaceFolder}/**" ] } ], "version": 4 } ``` ### 常见问题解决 1. **路径错误**: ```bash g++.exe: error: CreateProcess: No such file or directory ``` 解决方案:确认 `command` 路径使用正斜杠 `/` 或双反斜杠 `\\` 2. **未定义引用 pthread**: ```bash undefined reference to `pthread_create' ``` 解决方案:确保 `args` 中包含 `-pthread` 3. **C++17 特性不支持**: ```bash error: 'filesystem' is not a namespace-name ``` 解决方案:升级 MinGW 或使用 `-std=c++14` 4. **中文路径问题**: 避免安装路径包含中文,如 `C:/编程工具/` 可能导致异常
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值