1. Text file
f = open(file, mode)
mode | explanation |
---|---|
r | default mode, read only |
w | write only, truncating the file first |
a | open file for appending |
x | create a new file and open it for writing |
r+ | open file for both reading and writing |
t | text mode, default |
b | binary mode |
Eample:
f = open("./Data/CauliflowerPizza.txt")
f.read()
# Operations
f.close()
It is good practice to use thewith
keyword when dealing with file objects. The advantage is that the file is properly closed after its suite finishes
, even if an exception is raised at some point. Using with is also much shorter than writing equivalent try-finally blocks:
Eample:
with open('./Data/CauliflowerPizza.txt') as f:
read_data = f.read()
# check the file is closed
f.closed . # True
Here, f is a file object that has already been created, so we begin to look through some methods of file object.
f.read(size)
: size is an optional numeric argument. When size is omitted or negative, the entire contents of the file will be read and returned; it’s your problem if the file is twice as large as your machine’s memory. Otherwise, at most size characters (in text mode) or size bytes (in binary mode) are read and returned.
Eample:
f.read()
# the whole content if it doesn't exceed the memory
f.readline()
: print just a line for each time; a newline is separated by ‘\n’.
Eample:
f = open('./Data/CauliflowerPizza.txt')
f.readline() #'INGREDIENTS\n'
f.readline() # 'For the pizza base: butter, ghee or coconut oil, for greasing; 140g cauliflower (about 1/4 of a head without the stalk); 1 egg white, beaten; 50g ground almonds; 40g buckwheat flour; 1/2 tsp sea salt; 1/2 tsp black pepper; 1/4 tsp bicarbonate of soda\n'
#...
For reading lines from a file, you can loop over the file object. This is memory efficient, fast, and leads to simple code:
for line in f:
print(line, end='')
f.write(string)
: writes the contents of string to the file.
# the file already exists, or else using "x"
f2 = open("./Data/writeToFile.txt", "r+")
f2.write("This is a test\n")
# 15, returning the number of characters written.
f.seek(offset, whence)
: change the file object’s position. The position is computed from adding offset to a reference point; the reference point is selected by the whence argument. whence can be omitted and defaults to 0, using the beginning of the file as the reference point.
# f2: "This is a test\n"
f2.seek(2)
f2.read(1) # i
2. CSV file
In python, we have csv
module which defining some relevant functions to read/write csv file. For example, csv.reader()
and csv.writer()
. Official document: link
In fact, when we intend to open csv
file, we are likely to pre-process our data set. We desire that our data set is shown in a table-like format for a nice and neat looking. Therefore, we use Pandas
library and its methods to read&write the csv file instead.
import pandas as pd
df = pd.read_csv("./Data/titanic.csv")
df.head()
The following table shows some commonly used parameters.
paramter | explanation |
---|---|
sep | str, default “,”; Delimiter to use |
header | int, list of int, default ‘infer’; Default behavior is to infer the column names: if no names are passed the behavior is identical to header=0 and column names are inferred from the first line of the file, if column names are passed explicitly then the behavior is identical to header=None. Explicitly pass header=0 to be able to replace existing names. |
names | array-like, optional; List of column names to use. If the file contains a header row, then you should explicitly pass header=0 to override the column names. Duplicates in this list are not allowed. |
# replace the columns name without setting header = 0
import pandas as pd
df = pd.read_csv("./Data/titanic.csv", names= ['1','2','3','4','5','6','7','8','9'])
df.head()
# replace the columns name without setting header = 0
import pandas as pd
df = pd.read_csv("./Data/titanic.csv", header= 0, names= ['1','2','3','4','5','6','7','8','9'])
df.head()
df.to_csv()
The following table shows some commonly used parameters.
paramter | explanation |
---|---|
path_or_buf | File path or object |
header | bool or list of str, default True; Write out the column names. |
sep | str, default ‘,’;String of length 1. Field delimiter for the output file. |
index | bool, default True; Write row names (index). |
df.to_csv('writeToCSV.csv',index=False)
3. Other types of files
- Excel
pd.read_excel(io, sheet_name=0, header=0, names=None, index_col=None, parse_cols=None, usecols=None, squeeze=False, dtype=None, engine=None, skiprows=None, nrows=None, **kwds,)
import pandas as pd
df = pd.read_excel("products.xlsx")
- sas
pd.read_sas( filepath_or_buffer, format=None, index=None, encoding=None, chunksize=None, iterator=False)
import pandas as pd
df = pd.read_sas("customers.sas7bdat", format='sas7bdat', encoding='latin1')
More usage in pandas document: Link