1.Read/Write file

1. Text file

f = open(file, mode)

modeexplanation
rdefault mode, read only
wwrite only, truncating the file first
aopen file for appending
xcreate a new file and open it for writing
r+open file for both reading and writing
ttext mode, default
bbinary mode

Eample:

f = open("./Data/CauliflowerPizza.txt")
f.read()
# Operations
f.close()

It is good practice to use thewithkeyword when dealing with file objects. The advantage is that the file is properly closed after its suite finishes, even if an exception is raised at some point. Using with is also much shorter than writing equivalent try-finally blocks:

Eample:

with open('./Data/CauliflowerPizza.txt') as f:
    read_data = f.read()

# check the file is closed
f.closed . # True

Here, f is a file object that has already been created, so we begin to look through some methods of file object.

f.read(size): size is an optional numeric argument. When size is omitted or negative, the entire contents of the file will be read and returned; it’s your problem if the file is twice as large as your machine’s memory. Otherwise, at most size characters (in text mode) or size bytes (in binary mode) are read and returned.

Eample:

f.read() 
# the whole content if it doesn't exceed the memory

f.readline(): print just a line for each time; a newline is separated by ‘\n’.
Eample:

f = open('./Data/CauliflowerPizza.txt')
f.readline() #'INGREDIENTS\n'
f.readline() # 'For the pizza base: butter, ghee or coconut oil, for greasing; 140g cauliflower (about 1/4 of a head without the stalk); 1 egg white, beaten; 50g  ground almonds; 40g buckwheat flour; 1/2 tsp sea salt; 1/2 tsp black pepper; 1/4 tsp bicarbonate of soda\n'
#...

For reading lines from a file, you can loop over the file object. This is memory efficient, fast, and leads to simple code:

for line in f:
    print(line, end='')

f.write(string): writes the contents of string to the file.

# the file already exists, or else using "x"
f2 = open("./Data/writeToFile.txt", "r+") 

f2.write("This is a test\n") 
# 15, returning the number of characters written.

f.seek(offset, whence): change the file object’s position. The position is computed from adding offset to a reference point; the reference point is selected by the whence argument. whence can be omitted and defaults to 0, using the beginning of the file as the reference point.

# f2: "This is a test\n"
f2.seek(2)
f2.read(1) # i

2. CSV file

In python, we have csvmodule which defining some relevant functions to read/write csv file. For example, csv.reader()and csv.writer(). Official document: link

In fact, when we intend to open csvfile, we are likely to pre-process our data set. We desire that our data set is shown in a table-like format for a nice and neat looking. Therefore, we use Pandaslibrary and its methods to read&write the csv file instead.

import pandas as pd
df = pd.read_csv("./Data/titanic.csv")
df.head()


The following table shows some commonly used parameters.

paramterexplanation
sepstr, default “,”; Delimiter to use
headerint, list of int, default ‘infer’; Default behavior is to infer the column names: if no names are passed the behavior is identical to header=0and column names are inferred from the first line of the file, if column names are passed explicitly then the behavior is identical to header=None. Explicitly pass header=0 to be able to replace existing names.
namesarray-like, optional; List of column names to use. If the file contains a header row, then you should explicitly pass header=0to override the column names. Duplicates in this list are not allowed.
# replace the columns name without setting header = 0
import pandas as pd
df = pd.read_csv("./Data/titanic.csv", names= ['1','2','3','4','5','6','7','8','9'])
df.head()

在这里插入图片描述

# replace the columns name without setting header = 0
import pandas as pd
df = pd.read_csv("./Data/titanic.csv", header= 0, names= ['1','2','3','4','5','6','7','8','9'])
df.head()

在这里插入图片描述

df.to_csv()
The following table shows some commonly used parameters.

paramterexplanation
path_or_bufFile path or object
headerbool or list of str, default True; Write out the column names.
sepstr, default ‘,’;String of length 1. Field delimiter for the output file.
indexbool, default True; Write row names (index).
df.to_csv('writeToCSV.csv',index=False)

3. Other types of files

  • Excel
    pd.read_excel(io, sheet_name=0, header=0, names=None, index_col=None, parse_cols=None, usecols=None, squeeze=False, dtype=None, engine=None, skiprows=None, nrows=None, **kwds,)
import pandas as pd
df = pd.read_excel("products.xlsx")
  • sas
    pd.read_sas( filepath_or_buffer, format=None, index=None, encoding=None, chunksize=None, iterator=False)
import pandas as pd
df = pd.read_sas("customers.sas7bdat", format='sas7bdat', encoding='latin1')

More usage in pandas document: Link

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值