How to Parse a CSV File in Bash

1. Overview

In this tutorial, we’ll learn how to parse values from Comma-Separated Values (CSV) files with various Bash built-in utilities.

First, we’ll discuss the prerequisites to read records from a file. Then we’ll explore different techniques to parse CSV files into Bash variables and array lists.

Finally, we’ll examine a few third-party tools for advanced CSV parsing.

2. Prerequisites

Let’s briefly review the standards defined for CSV files:

  1. Each record is on a separate line, delimited by a line break.
  2. The last record in the file may or may not end with a line break.
  3. There may be an optional header line appearing as the first line of the file with the same format as regular record lines.
  4. Within the header and records, there may be one or more fields separated by a comma.
  5. Fields containing line breaks, double quotes, and commas should be enclosed in double-quotes.
  6. If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote.

CSV files containing records with commas or line breaks within quoted strings aren’t in our scope; however, we’ll discuss them briefly in the last section of the article.

Now let’s set up our standard sample CSV file:

$ cat input.csv
SNo,Quantity,Price,Value
1,2,20,40
2,5,10,50Copy

2.1. Reading Records From a File

We’ll run an example to read records from our input file:

#!/bin/bash
while read line
do
   echo "Record is : $line"
done < input.csvCopy

Here we used the read command to read the line-break (\n) separated records of our CSV file. We’ll check the output from our script:

Record is : SNo,Quantity,Price,Value
Record is : 1,2,20,40
Record is : 2,5,10,50Copy

As we can see, there’s a complication; the header of the file is also getting processed. So let’s dive into the solutions.

2.2. Ignoring the Header Line

We’ll run another example to exclude the header line from the output:

#!/bin/bash
while read line
do
   echo "Record is : $line"
done < <(tail -n +2 input.csv)Copy

Here we used the tail command to read from the second line of the file. Subsequently, we passed the output as a file to the while loop using process substitution. The <(..) section enables us to specify the tail command and lets Bash read from its output like a file:

Record is : 1,2,20,40
Record is : 2,5,10,50Copy

Now we’ll try another way to achieve the same result:

#!/bin/bash
exec < input.csv
read header
while read line
do
   echo "Record is : $line"
done Copy

In this approach, we used the exec command to change the standard input to read from the file. Then we used the read command to process the header line. Subsequently, we processed the remaining file in the while loop.

3. Parsing Values From a CSV File

So far, we’ve been reading line-break-separated records from CSV files. Henceforth, we’ll look at methods to read the values from each data record.

3.1. From All Columns

Let’s see how to store the field values as we loop through the CSV file:

#! /bin/bash
while IFS="," read -r rec_column1 rec_column2 rec_column3 rec_column4
do
  echo "Displaying Record-$rec_column1"
  echo "Quantity: $rec_column2"
  echo "Price: $rec_column3"
  echo "Value: $rec_column4"
  echo ""
done < <(tail -n +2 input.csv)Copy

Note that we’re setting the Input Field Separator (IFS) to “,”  in the while loop. As a result, we can parse the comma-delimited field values into Bash variables using the read command.

We’ll also check the output generated on executing the above script:

Displaying Record-1
Quantity: 2
Price: 20
Value: 40

Displaying Record-2
Quantity: 5
Price: 10
Value: 50Copy

3.2. From the First Few Columns

There can be instances where we’re interested in reading only the first few columns of the file for processing.

We’ll demonstrate this with an example:

#! /bin/bash
while IFS="," read -r rec_column1 rec_column2 rec_remaining
do
  echo "Displaying Record-$rec_column1"
  echo "Quantity: $rec_column2"
  echo "Remaining fields of Record-$rec_column1 : $rec_remaining"
  echo ""
done < <(tail -n +2 input.csv)Copy

In this example, we can store the value in the first and second fields of the input CSV in the rec_column1 and rec_column2 variables, respectively. Notably, we stored the remaining fields in the rec_remaining variable.

Let’s look at the output of our script:

Displaying Record-1
Quantity: 2
Remaining fields of Record-1 : 20,40

Displaying Record-2
Quantity: 5
Remaining fields of Record-2 : 10,50Copy

3.3. From Specific Column Numbers

Again, we’ll use process substitution to pass only specific columns to the while loop for reading. To fetch those columns, we’ll utilize the cut command:

#! /bin/bash
while IFS="," read -r rec1 rec2
do
  echo "Displaying Record-$rec1"
  echo "Price: $rec2"
done < <(cut -d "," -f1,3 input.csv | tail -n +2)Copy

As a result, we can parse only the first and third columns of our input CSV.

We’ll validate it with the output:

Displaying Record-1
Price: 20
Displaying Record-2
Price: 10Copy

3.4. From Specific Column Names

There can be situations where we might need to parse the values from CSV based on column names in the header line.

We’ll illustrate this with a simple user-input-driven script:

#! /bin/bash
col_a='SNo'
read -p "Enter the column name to be printed for each record: " col_b
loc_col_a=$(head -1 input.csv | tr ',' '\n' | nl |grep -w "$col_a" | tr -d " " | awk -F " " '{print $1}')
loc_col_b=$(head -1 input.csv | tr ',' '\n' | nl |grep -w "$col_b" | tr -d " " | awk -F " " '{print $1}')
while IFS="," read -r rec1 rec2
do
  echo "Displaying Record-$rec1"
  echo "$col_b: $rec2"
  echo ""
done < <(cut -d "," -f${loc_col_a},${loc_col_b} input.csv | tail -n +2)Copy

This script takes col_b as input from the user, and prints the corresponding column value for every record in the file.

We calculated the location of a column using a combination of the trawkgrep, and nl commands.

First, we converted the commas in the header line into line-breaks using the tr command. Then we appended the line number at the beginning of each line using the nl command. Next, we searched the column name in the output using the grep command, and truncated the preceding spaces using the tr command.

Finally, we used the awk command to get the first field, which corresponds to the column number.

We’ll save the above script as parse_csv.sh for execution:

$ ./parse_csv.sh
Enter the column name to be printed for each record: Price
Displaying Record-1
Price: 20

Displaying Record-2
Price: 10Copy

As expected, when “Price” is given as the input, only the values of the column number corresponding to the string “Price” in the header are printed. This approach can be particularly useful when the sequence of columns in a CSV file isn’t guaranteed.

4. Mapping Columns of CSV File into Bash Arrays

In the previous section, we parsed the field values into Bash variables for each record. Now we’ll check methods to parse entire columns of CSV into Bash arrays:

#! /bin/bash
arr_record1=( $(tail -n +2 input.csv | cut -d ',' -f1) )
arr_record2=( $(tail -n +2 input.csv | cut -d ',' -f2) )
arr_record3=( $(tail -n +2 input.csv | cut -d ',' -f3) )
arr_record4=( $(tail -n +2 input.csv | cut -d ',' -f4) )
echo "array of SNos  : ${arr_record1[@]}"
echo "array of Qty   : ${arr_record2[@]}"
echo "array of Price : ${arr_record3[@]}"
echo "array of Value : ${arr_record4[@]}"Copy

We’re using command substitution to exclude the header line using the tail command, and then using the cut command to filter the respective columns. Notably, the first set of parentheses is required to hold the output of the command substitution in variable arr_record1 as an array.

Let’s check the script output:

array of SNos  : 1 2
array of Qty   : 2 5
array of Price : 20 10
array of Value : 40 50Copy

5. Parsing CSV File Into a Bash Array

There may be cases where we prefer to map the entire CSV file into an array. We can then use the array to process the records.

Let’s check the implementation:

#! /bin/bash 
arr_csv=() 
while IFS= read -r line 
do
    arr_csv+=("$line")
done < input.csv

echo "Displaying the contents of array mapped from csv file:"
index=0
for record in "${arr_csv[@]}"
do
    echo "Record at index-${index} : $record"
	((index++))
done
Copy

In this example, we read the line from our input CSV, and then appended it to the array arr_csv (+= is used to append the records to Bash array). Then we printed the records of the array using a for loop.

Let’s check the output:

Displaying the contents of array mapped from csv file:
Record at index-0 : SNo,Quantity,Price,Value
Record at index-1 : 1,2,20,40
Record at index-2 : 2,5,10,50Copy

For Bash versions 4 and above, we can also populate the array using the readarray command:

readarray -t array_csv < input.csvCopy

This reads lines from input.csv into an array variable, array_csv. The -t option will remove the trailing newlines from each line.

6. Parsing CSV Files Having Line Breaks and Commas Within Records

So far, we’ve used the file input.csv for running all our illustrations.

Now we’ll create another CSV file containing line breaks and commas within quoted strings:

$ cat address.csv
SNo,Name,Address
1,Bruce Wayne,"1007 Mountain Drive,
Gotham"
2,Sherlock Holmes,"221b Baker Street, London"Copy

There can be several more permutations and combinations of line-breaks, commas, and quotes within CSV files. For this reason, it’s a complex task to process such CSV files with only Bash built-in utilities. Generally, third-party tools, like csvkit, are employed for advanced CSV parsing.

However, another suitable alternative is Python’s CSV module, as Python is generally pre-installed on most Linux distributions.

7. Conclusion

In this article, we studied multiple techniques to parse values from CSV files.

First, we discussed the CSV standards and checked the steps to read records from a file. Next, we implemented several case-studies to parse the field values of a CSV file. We also explored ways to handle the optional header line of CSV files.

Then we presented techniques to store either columns or all the records of a CSV file into Bash arrays. Finally, we offered a brief introduction to some third-party tools for advanced CSV parsing.

题目ID ID类型 原始答案(raw_input) 原始答案(review.gold) 原始答案一致性 模型答案(review.pred) 模型答案(boxed) 模型答案一致性 审核结果(review.result) 完成原因 原因分析 模型输出摘要 cmpl-e759f3d8-78e1-11f0-b2b4-1a85109d8f07 自定义ID 8 N/A N/A N/A 8 N/A N/A stop 正常完成 <think> Okay, so I need to figure out how many whole-number divisors there are for the integer given... cmpl-e759ed84-78e1-11f0-b2b4-1a85109d8f07 自定义ID 7 N/A N/A N/A 7 N/A N/A stop 正常完成 <think> Okay, so I need to figure out the value of $ n $ in this magic square. Let me start by recal... cmpl-e759f856-78e1-11f0-b2b4-1a85109d8f07 自定义ID -\sqrt{3} N/A N/A N/A -\sqrt{3} N/A N/A stop 正常完成 <think> Okay, so I need to simplify tan 100° + 4 sin 100°. Hmm, let me think about how to approach t... cmpl-e75996ae-78e1-11f0-b2b4-1a85109d8f07 自定义ID 17 N/A N/A N/A 17 N/A N/A stop 正常完成 <think> Okay, so I need to find the smallest distance between the origin and a point on the graph of... cmpl-e759f162-78e1-11f0-b2b4-1a85109d8f07 自定义ID 4 N/A N/A N/A 4 N/A N/A stop 正常完成 <think> Okay, so I need to figure out how many numbers are common factors of both 14 and 42. Let me ... cmpl-e759f630-78e1-11f0-b2b4-1a85109d8f07 自定义ID 1 N/A N/A N/A 1 N/A N/A stop 正常完成 <think> Okay, so I need to find the remainder when the sum of numbers from 1 to 10 is divided by 9. ... cmpl-40f2a44e-78e2-11f0-b2b4-1a85109d8f07 自定义ID \frac{1}{16} N/A N/A N/A \dfrac{1}{16} N/A N/A stop 正常完成 <think> Okay, so I need to simplify the product of these four cosine terms: cos(2π/15) * cos(4π/15) ... cmpl-772f0d90-78e2-11f0-b2b4-1a85109d8f07 自定义ID 14 N/A N/A N/A 14 N/A N/A stop 正常完成 <think> Okay, so I need to find the sum of the digits in the terminating decimal representation of t... 生成的csv文件内容,修改并重新优化脚本
08-16
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值