使用Bash Shell处理JSON文件

最新推荐文章于 2025-10-07 22:34:20 发布

原创最新推荐文章于 2025-10-07 22:34:20 发布 · 1.5w 阅读

3 ·

CC 4.0 BY-SA版权

文章标签：

#shell #json

技术 | Tech 专栏收录该内容

16 篇文章

订阅专栏

本文通过一个测试用例，介绍了如何使用Bash Shell配合jq工具处理JSON数据。在数据管道系统和配置管理系统中，常常需要执行依赖于彼此的任务。文章提供了编写程序执行带有依赖关系的shell脚本的方法，并给出了JSON文件示例。每个任务有输入和输出文件，只有所有输入文件存在时任务才会执行，如果输出文件已存在，任务将被跳过。

前言

JSON(JavaScript Object Notation) 是一种轻量级的数据交换格式，易于人阅读和编写，同时也易于机器解析和生成。本文提供一个真实的测试用例需求，设计逻辑类似Makefile，我以Bash处理JSON为例，Coding水平有限，请各位多多包涵哈，欢迎大家一起学习和挑战各种不同的语言来实现。

巧用jq处理JSON数据

更新历史

2015年06月19日 - 初稿

阅读原文 - http://wsgzao.github.io/post/bash-json/

扩展阅读

JSON - http://json.org/
jq - http://stedolan.github.io/jq/

Test Case

In data pipeline system and configuration management systems, it’s very common that you need execute a bunch of jobs which has dependencies with each other.

Write a program pipeline_runner to execute a list of shell scripts. The definition of those scripts and their dependencies are described in a JSON file. The program only takes in one argument which is the file path of JSON file that defines the jobs.

For example,
// jobs.json

{
    "log0_compressed" : {
        "commands": "curl http://websrv0/logs/access.log.gz > access0.log.gz",
        "input": [],
        "output": "access0.log.gz"
    },
    "log0" : {
        "commands": "gunzip access0.log.gz",
        "input": ["access0.log.gz"],
        "output": "access0.log"
    },
    "log1_compressed": {
        "commands": "curl http://websrv1/logs/access.log.gz > access1.log.gz",
        "input": [],
        "output": "access1.log.gz"
    },
    "log1" : {
        "commands": "gunzip access1.log.gz",
        "input": ["access1.log.gz"],
        "output": "access1.log"
    },
    "log_combined": {
        "commands": "cat access0.log access1.log > access.log",
        "input": ["access0.log", "access1.log"],
        "output": "access.log"
    }
}

To run the program

pipeline_runner jobs.json

As you can see, each job has its input files and output files.
- A job will only be executed if all its input files exist.
- A job can have multiple input files (or none) but only produce one output file.
- Users could run the program multiple times, but if a job’s output file already exists, the program would skip the job.

If you’re still not very clear, think of Makefile in Linux systems. The logic is quite similar.

You could complete the test with the programming language you preferred.

Bash Shell

#!/bin/bash

# dos2unix *.sh
# Program:
# This program to test json.
# History:
# 2015/06/18 by OX

#---------------------------- custom variables ---------------------start
runuser=root

# commands
log_combined_commands=`cat jobs.json | ./jq -r '.log_combined.commands'`
log1_commands=`cat jobs.json | ./jq -r '.log1.commands'`
log1_compressed_commands=`cat jobs.json | ./jq -r '.log1_compressed.commands'`
log0_commands=`cat jobs.json | ./jq -r '.log0.commands'`
log0_compressed_commands=`cat jobs.json | ./jq -r '.log0_compressed.commands'`

# input file name
log0_input=`cat jobs.json | ./jq -r '.log0.input[0]'`
log1_input=`cat jobs.json | ./jq -r '.log1.input[0]'`
log_combined_input1=`cat jobs.json | ./jq -r '.log_combined.input[0]'`
log_combined_input2=`cat jobs.json | ./jq -r '.log_combined.input[1]'`

# output file name
log_combined_output=`cat jobs.json | ./jq -r '.log_combined.output'`
log1_output=`cat jobs.json | ./jq -r '.log1.output'`
log1_compressed_output=`cat jobs.json | ./jq -r '.log1_compressed.output'`
log0_output=`cat jobs.json | ./jq -r '.log0.output'`
log0_compressed_output=`cat jobs.json | ./jq -r '.log0_compressed.output'`



#---------------------------- custom variables ---------------------end

#---------------------------- user check ---------------------start
if [ "`whoami`" != "$runuser" ]; then
    echo "Please re-run ${this_file} as $runuser."
    exit 1
fi
#---------------------------- user check ---------------------end

#---------------------------- function ---------------------start

pause()
{
    read -n1 -p "Press any key to continue..."
}

log_combined_check_first()
{
if [ -f "$log_combined_output" ]; then
   echo "${log_combined_output} has been generated, the programe will exit"
   exit 0
fi
}

log0_compressed_check()
{
if [ ! -f "$log0_compressed_output" ]; then
   eval ${log0_compressed_commands}
fi
}

log0_check()
{
if [ ! -f "$log0_output" ]; then
   eval ${log0_commands}
fi
}

log1_compressed_check()
{
if [ ! -f "$log1_compressed_output" ]; then
   eval ${log1_compressed_commands}
fi
}

log1_check()
{
if [ ! -f "$log1_output" ]; then
   eval ${log1_commands}
fi
}

log_combined_check()
{
if [ ! -f "$log_combined_output" ]; then
   eval ${log_combined_commands}
   echo "${log_combined_output} has been generated, the programe will exit"
fi
}


#---------------------------- function ---------------------end

#---------------------------- main ---------------------start

echo "
Please read first:
[0]Check jobs.json and jq by yourself first
[1]A job will only be executed if all its input files exist.
[2]A job can have multiple input files (or none) but only produce one output file.
[3]Users could run the program multiple times, but if a job's output file already exists, the program would skip the job.

"
pause


#check if file exist and do the job

log_combined_check_first

log0_compressed_check
log0_check

log1_compressed_check
log1_check

log_combined_check


#---------------------------- main ---------------------end

小结

我的代码未实现任意数量jobs的input，希望大牛指点

file://D:\pipeline (2 folders, 4 files, 490.55 KB, 531.04 KB in total.)
│  jobs.json 794 bytes
│  jq 486.13 KB
│  pipeline_runner 2.95 KB
│  README.md 714 bytes
├─logs (0 folders, 2 files, 248 bytes, 248 bytes in total.)
│      access0.log.gz 123 bytes
│      access1.log.gz 125 bytes
└─result (0 folders, 5 files, 40.25 KB, 40.25 KB in total.)
        access.log 20.00 KB
        access0.log 10.00 KB
        access0.log.gz 123 bytes
        access1.log 10.00 KB
        access1.log.gz 125 bytes

http://pan.baidu.com/s/1hq1rCP2