1.Read/Write file-优快云博客

本文链接：https://blog.youkuaiyun.com/kelly_fumiao/article/details/104160295

Content

1. Text file

f = open(file, mode)

mode	explanation
r	default mode, read only
w	write only, truncating the file first
a	open file for appending
x	create a new file and open it for writing
r+	open file for both reading and writing
t	text mode, default
b	binary mode

Eample:

f = open("./Data/CauliflowerPizza.txt")
f.read()
# Operations
f.close()

It is good practice to use thewithkeyword when dealing with file objects. The advantage is that the file is properly closed after its suite finishes, even if an exception is raised at some point. Using with is also much shorter than writing equivalent try-finally blocks:

Eample:

with open('./Data/CauliflowerPizza.txt') as f:
    read_data = f.read()

# check the file is closed
f.closed . # True

Here, f is a file object that has already been created, so we begin to look through some methods of file object.

f.read(size): size is an optional numeric argument. When size is omitted or negative, the entire contents of the file will be read and returned; it’s your problem if the file is twice as large as your machine’s memory. Otherwise, at most size characters (in text mode) or size bytes (in binary mode) are read and returned.

Eample:

f.read() 
# the whole content if it doesn't exceed the memory

f.readline(): print just a line for each time; a newline is separated by ‘\n’.
Eample:

f = open('./Data/CauliflowerPizza.txt')
f.readline() #'INGREDIENTS\n'
f.readline() # 'For the pizza base: butter, ghee or coconut oil, for greasing; 140g cauliflower (about 1/4 of a head without the stalk); 1 egg white, beaten; 50g  ground almonds; 40g buckwheat flour; 1/2 tsp sea salt; 1/2 tsp black pepper; 1/4 tsp bicarbonate of soda\n'
#...

For reading lines from a file, you can loop over the file object. This is memory efficient, fast, and leads to simple code:

for line in f:
    print(line, end='')

f.write(string): writes the contents of string to the file.

# the file already exists, or else using "x"
f2 = open("./Data/writeToFile.txt", "r+") 

f2.write("This is a test\n") 
# 15, returning the number of characters written.

f.seek(offset, whence): change the file object’s position. The position is computed from adding offset to a reference point; the reference point is selected by the whence argument. whence can be omitted and defaults to 0, using the beginning of the file as the reference point.

# f2: "This is a test\n"
f2.seek(2)
f2.read(1) # i

2. CSV file

In python, we have csvmodule which defining some relevant functions to read/write csv file. For example, csv.reader()and csv.writer(). Official document: link

In fact, when we intend to open csvfile, we are likely to pre-process our data set. We desire that our data set is shown in a table-like format for a nice and neat looking. Therefore, we use Pandaslibrary and its methods to read&write the csv file instead.

import pandas as pd
df = pd.read_csv("./Data/titanic.csv")
df.head()

The following table shows some commonly used parameters.

paramter	explanation
sep	str, default “,”; Delimiter to use
header	int, list of int, default ‘infer’; Default behavior is to infer the column names: if no names are passed the behavior is identical to `header=0`and column names are inferred from the first line of the file, if column names are passed explicitly then the behavior is identical to header=None. Explicitly pass header=0 to be able to replace existing names.
names	array-like, optional; List of column names to use. If the file contains a header row, then you should explicitly pass `header=0`to override the column names. Duplicates in this list are not allowed.

# replace the columns name without setting header = 0
import pandas as pd
df = pd.read_csv("./Data/titanic.csv", names= ['1','2','3','4','5','6','7','8','9'])
df.head()

在这里插入图片描述

# replace the columns name without setting header = 0
import pandas as pd
df = pd.read_csv("./Data/titanic.csv", header= 0, names= ['1','2','3','4','5','6','7','8','9'])
df.head()

在这里插入图片描述

df.to_csv()
The following table shows some commonly used parameters.

paramter	explanation
path_or_buf	File path or object
header	bool or list of str, default True; Write out the column names.
sep	str, default ‘,’;String of length 1. Field delimiter for the output file.
index	bool, default True; Write row names (index).

df.to_csv('writeToCSV.csv',index=False)

3. Other types of files

Excel
pd.read_excel(io, sheet_name=0, header=0, names=None, index_col=None, parse_cols=None, usecols=None, squeeze=False, dtype=None, engine=None, skiprows=None, nrows=None, **kwds,)

import pandas as pd
df = pd.read_excel("products.xlsx")

sas
pd.read_sas( filepath_or_buffer, format=None, index=None, encoding=None, chunksize=None, iterator=False)

import pandas as pd
df = pd.read_sas("customers.sas7bdat", format='sas7bdat', encoding='latin1')

More usage in pandas document: Link