Python Operators
print(5 // 2) # integer division output=2
Python doesn’t have command like “a++” or “a–”.
Function
def func2(*args, **kwargs):
print(args)
print(kwargs)
def func1(v, *args, **kwargs):
func2(*args, **kwargs)
if 'power' in kwargs:
return v ** kwargs['power']
else:
return v
print(func1(10, 'extra 1', 'extra 2', power=3))
print('--------------')
print(func1(10, 5))
('extra 1', 'extra 2')
{'power': 3}
1000
--------------
(5,)
{}
10
*args表示任何多个无名参数,它是一个tuple
**kwargs表示关键字参数,它是一个dict
print('======================================')
def func(*args, **kwargs):
print('args=', args)
print('kwargs=', kwargs) print('======================================')
func(1, 2, 3)
func(a=1, b=2, c=3)
func(1, 2, 3, a=1, b=2, c=3)
func(1, 'b', 'c', a=1, b='b', c='c')
======================================
args= (1, 2, 3)
kwargs= {}
======================================
args= ()
kwargs= {'a': 1, 'b': 2, 'c': 3}
======================================
args= (1, 2, 3)
kwargs= {'a': 1, 'b': 2, 'c': 3}
======================================
args= (1, 'b', 'c')
kwargs= {'a': 1, 'b': 'b', 'c': 'c'}
======================================
Process finished with exit code 0
String
cs_class_code = 'CS-229'
print('I like ' + str(cs_class_code) + ' a lot!')
print(f'I like {cs_class_code} a lot!')
print('I love CS229. (upper)'.upper())
print('I love CS229. (rjust 50)'.rjust(50))# 返回一个原字符串右对齐,并使用空格填充至长度 width 的新字符串。如果指定的长度小于字符串的长度则返回原字符串。
print('we love CS229. (capitalize)'.capitalize())# 将字符串的第一个字母变成大写,其他字母变小写
print(' I love CS229. (strip) '.strip())
I like CS-229 a lot!
I like CS-229 a lot!
I LOVE CS229. (UPPER)
I love CS229. (rjust 50)
We love cs229. (capitalize)
I love CS229. (strip)
print('Old school formatting: {2}, {1}, {0:10.2F}'.format(1.358, 'b', 'c'))
# Fill in order of 2, 1, 0. For the decimal number, fix at length of 10, round to 2 decimal places
Old school formatting: c, b, 1.36
List
list_2 = [1, 2, 3]
list_2.append(4)
list_2.insert(0, 'ZERO')# 在list_2的第0个位置插入元素'ZERO'
list_1_temp = ['a', 'b']
list_1_temp.extend(list_2)
print(list_1_temp)
['a', 'b', 'ZERO', 1, 2, 3, 4]
pprint is your friend
import pprint
data=['generate_csv\\train_00.csv','generate_csv\\train_01.csv',
'generate_csv\\train_02.csv', 'generate_csv\\train_03.csv',
'generate_csv\\train_04.csv', 'generate_csv\\train_05.csv',
'generate_csv\\train_06.csv', 'generate_csv\\train_07.csv',
'generate_csv\\train_08.csv', 'generate_csv\\train_09.csv',
'generate_csv\\train_10.csv', 'generate_csv\\train_11.csv']
print(data)
print("--------分界线--------------")
pprint.pprint(data)
['generate_csv\\train_00.csv', 'generate_csv\\train_01.csv', 'generate_csv\\train_02.csv', 'generate_csv\\train_03.csv', 'generate_csv\\train_04.csv', 'generate_csv\\train_05.csv', 'generate_csv\\train_06.csv', 'generate_csv\\train_07.csv', 'generate_csv\\train_08.csv', 'generate_csv\\train_09.csv', 'generate_csv\\train_10.csv', 'generate_csv\\train_11.csv']
--------分界线--------------
['generate_csv\\train_00.csv',
'generate_csv\\train_01.csv',
'generate_csv\\train_02.csv',
'generate_csv\\train_03.csv',
'generate_csv\\train_04.csv',
'generate_csv\\train_05.csv',
'generate_csv\\train_06.csv',
'generate_csv\\train_07.csv',
'generate_csv\\train_08.csv',
'generate_csv\\train_09.csv',
'generate_csv\\train_10.csv',
'generate_csv\\train_11.csv']
List comprehension can save a lot of lines
import pprint as pp
long_long_list = [(i, j) for i in range(3) for j in range(5)]
long_list_list = [[i for i in range(3)] for _ in range(5)]
pp.pprint(long_long_list)
pp.pprint(long_list_list)
[(0, 0),
(0, 1),
(0, 2),
(0, 3),
(0, 4),
(1, 0),
(1, 1),
(1, 2),
(1, 3),
(1, 4),
(2, 0),
(2, 1),
(2, 2),
(2, 3),
(2, 4)]
[[0, 1, 2], [0, 1, 2], [0, 1, 2], [0, 1, 2], [0, 1, 2]]
List is iterable!
string_list = ['a', 'b', 'c']
for s in string_list:
print(s)
for i, s in enumerate(string_list):
print(f'{i}, {s}')
a
b
c
0, a
1, b
2, c
enumerate() 函数用于将一个可遍历的数据对象(如列表、元组或字符串)组合为一个索引序列,同时列出数据和数据下标
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
print(list(enumerate(seasons))) # 下标默认从 0 开始
print(list(enumerate(seasons, start=1))) # 下标从 1 开始
# for 循环使用 enumerate
seq = ['one', 'two', 'three']
for i, element in enumerate(seq):
print(i, element)
[(0, 'Spring'), (1, 'Summer'), (2, 'Fall'), (3, 'Winter')]
[(1, 'Spring'), (2, 'Summer'), (3, 'Fall'), (4, 'Winter')]
0 one
1 two
2 three
Slicing(切片). With numpy array (covered layter), you can do this to multi-dimensional ones as well.
切片操作基本表达式:object[start_index : end_index : step]
- step:正负数均可,其绝对值大小决定了切取数据时的“步长”,而正负号决定了“切取方向”,正表示“从左往右”取值,负表示“从右往左”取值。当step省略时,默认为1,即从左往右以增量1取值。
- start_index:表示起始索引(包含该索引本身);该参数省略时,表示从对象“端点”开始取值,至于是从“起点”还是从“终点”开始,则由step参数的正负决定,step为正从“起点”开始,为负从“终点”开始。
- end_index:表示终止索引(不包含该索引本身);该参数省略时,表示一直取到数据”端点“,至于是到”起点“还是到”终点“,同样由step参数的正负决定,step为正时直到”终点“,为负时直到”起点“。
long_list = [i for i in range(9)]
print(long_list)
print(long_list[:5]) # 不包括第5个元素
print(long_list[:-1]) # 不包括最后一个元素
print(long_list[4:-1]) # 从第4个元素到倒数第2个元素
long_list[3:5] = [-1, -2] # 更改列表元素
print(long_list)
long_list.pop()
print(long_list)
[0, 1, 2, 3, 4, 5, 6, 7, 8]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4, 5, 6, 7]
[4, 5, 6, 7]
[0, 1, 2, -1, -2, 5, 6, 7, 8]
[0, 1, 2, -1, -2, 5, 6, 7]
Sorting a list (but remember that sorting can be costly). Documentation for sorting is here
random_list = [3, 12, 5, 6, 8, 2]
print(sorted(random_list))
random_list_2 = [(3, 'z'), (12, 'r'), (5, 'a'), (6, 'e'), (8, 'c'), (2, 'g')]
print(sorted(random_list_2, key=lambda x: x[0]))
print(sorted(random_list_2, key=lambda x: x[1]))
[2, 3, 5, 6, 8, 12]
[(2, 'g'), (3, 'z'), (5, 'a'), (6, 'e'), (8, 'c'), (12, 'r')]
[(5, 'a'), (8, 'c'), (6, 'e'), (2, 'g'), (12, 'r'), (3, 'z')]
a = [[1, 2, 3]]*3
b = [[1, 2, 3] for i in range(3)]
a[0][1] = 4 # 因为a[0][1]与a[1][1]、a[2][1]有相同的地址, 修改任意一维都会影响到其他维
b[0][1] = 4
print(a)
print(b)
[[1, 4, 3], [1, 4, 3], [1, 4, 3]]
[[1, 4, 3], [1, 2, 3], [1, 2, 3]]
import copy
import pprint as pp
orig_list = [[1, 2], [3, 4]]
dup_list = copy.deepcopy(orig_list)
dup_list[0][1] = 'okay'
pp.pprint(orig_list)
pp.pprint(dup_list)
Tuple
List that you cannot edit.元组的元素不能修改
my_tuple = (10, 20, 30)
my_tuple[0] = 40
Traceback (most recent call last):
File "E:/PythonPrj/Fcn/main.py", line 2, in <module>
my_tuple[0] = 40
TypeError: 'tuple' object does not support item assignment
Split assignment makes your code shorter (also works for list).
a, b, c = my_tuple
print(f"a={a}, b={b}, c={c}")
for obj in enumerate(my_tuple):
print(obj)
a=10, b=20, c=30
(0, 10)
(1, 20)
(2, 30)
Dictionary/Set
字典(Dictionary)的每个键值 key=>value 对用冒号 : 分割,每个键值对之间用逗号 , 分割,整个字典包括在花括号 {} 中 ,格式如下所示:d = {key1 : value1, key2 : value2 }
键一般是唯一的,如果重复最后的一个键值对会替换前面的,值不需要唯一。
my_dict = {(5 - i): i ** 2 for i in range(10)}
print(my_dict)
print(my_dict.keys())
{5: 0, 4: 1, 3: 4, 2: 9, 1: 16, 0: 25, -1: 36, -2: 49, -3: 64, -4: 81}
dict_keys([5, 4, 3, 2, 1, 0, -1, -2, -3, -4])
集合(set)是一个无序的不重复元素序列。
可以使用大括号 { } 或者 set() 函数创建集合,注意:创建一个空集合必须用 set() 而不是 { },因为 { } 是用来创建一个空字典。
basket = {'apple', 'orange', 'apple', 'pear', 'orange', 'banana'}
print(basket) # 这里演示的是去重功能
print('orange' in basket) # 快速判断元素是否在集合内
basket.pop() # set 集合的 pop 方法会对集合进行无序的排列,然后将这个无序排列集合的左面第一个元素进行删除。
{'orange', 'banana', 'pear', 'apple'}
True
Here is how to iterate through a dictionary. And remember that dictionary is NOT sorted by key value.
for k, it in my_dict.items(): # similar to for loop over enumerate(list)
print(k, it)
# Sorting keys by string order
for k, it in sorted(my_dict.items(), key=lambda x: str(x[0])):
print(k, it)
Numpy
Array initialization
print(np.ones(3))
print(np.ones((3, 3)))
print(np.eye(3))
[1. 1. 1.]
[[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]]
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
Sampling over uniform distribution on [0,1) .
print(np.random.random(3))
print(np.random.random((3, 3)))
[0.69616054 0.53436214 0.92546999]
[[-1.56946559 -0.11633675 -0.06798521]
[ 1.20411546 -0.33335554 -0.81577106]
[-0.65376941 0.37818094 0.35670805]]
Array shape
Shape/reshape and multi-dimensional arrays
import numpy as np
array_1d = np.array([1, 2, 3, 4])
print(array_1d.reshape(-1,2))
print(array_1d.reshape(-1,2).shape) # 将array_1d 转换成 2列,类似的array_1d.reshape(2,-1)将array_1d 转换成 2 行
[[1 2]
[3 4]]
(2, 2)
import numpy as np
large_array = np.array([i for i in range(400)])
large_array = large_array.reshape((20, 20))
print(large_array[:, 5]) # 取每行的第5列元素
large_3d_array = np.array([i for i in range(1000)])
large_3d_array = large_3d_array.reshape((10, 10, 10))
print(large_3d_array)
print(large_3d_array[:, 1, 1]) #[ 11 111 211 311 411 511 611 711 811 911]
print(large_3d_array[2, :, 1]) #[201 211 221 231 241 251 261 271 281 291]
print(large_3d_array[2, 3, :]) #[230 231 232 233 234 235 236 237 238 239]
print(large_3d_array[1, :, :])
[ 5 25 45 65 85 105 125 145 165 185 205 225 245 265 285 305 325 345 365 385]
[[[ 0 1 2 3 4 5 6 7 8 9]
[ 10 11 12 13 14 15 16 17 18 19]
[ 20 21 22 23 24 25 26 27 28 29]
[ 30 31 32 33 34 35 36 37 38 39]
[ 40 41 42 43 44 45 46 47 48 49]
[ 50 51 52 53 54 55 56 57 58 59]
[ 60 61 62 63 64 65 66 67 68 69]
[ 70 71 72 73 74 75 76 77 78 79]
[ 80 81 82 83 84 85 86 87 88 89]
[ 90 91 92 93 94 95 96 97 98 99]]
[[100 101 102 103 104 105 106 107 108 109]
[110 111 112 113 114 115 116 117 118 119]
[120 121 122 123 124 125 126 127 128 129]
[130 131 132 133 134 135 136 137 138 139]
[140 141 142 143 144 145 146 147 148 149]
[150 151 152 153 154 155 156 157 158 159]
[160 161 162 163 164 165 166 167 168 169]
[170 171 172 173 174 175 176 177 178 179]
[180 181 182 183 184 185 186 187 188 189]
[190 191 192 193 194 195 196 197 198 199]]
[[200 201 202 203 204 205 206 207 208 209]
[210 211 212 213 214 215 216 217 218 219]
[220 221 222 223 224 225 226 227 228 229]
[230 231 232 233 234 235 236 237 238 239]
[240 241 242 243 244 245 246 247 248 249]
[250 251 252 253 254 255 256 257 258 259]
[260 261 262 263 264 265 266 267 268 269]
[270 271 272 273 274 275 276 277 278 279]
[280 281 282 283 284 285 286 287 288 289]
[290 291 292 293 294 295 296 297 298 299]]
...
[[900 901 902 903 904 905 906 907 908 909]
[910 911 912 913 914 915 916 917 918 919]
[920 921 922 923 924 925 926 927 928 929]
[930 931 932 933 934 935 936 937 938 939]
[940 941 942 943 944 945 946 947 948 949]
[950 951 952 953 954 955 956 957 958 959]
[960 961 962 963 964 965 966 967 968 969]
[970 971 972 973 974 975 976 977 978 979]
[980 981 982 983 984 985 986 987 988 989]
[990 991 992 993 994 995 996 997 998 999]]]
[ 11 111 211 311 411 511 611 711 811 911]
[201 211 221 231 241 251 261 271 281 291]
[230 231 232 233 234 235 236 237 238 239]
[[100 101 102 103 104 105 106 107 108 109]
[110 111 112 113 114 115 116 117 118 119]
[120 121 122 123 124 125 126 127 128 129]
[130 131 132 133 134 135 136 137 138 139]
[140 141 142 143 144 145 146 147 148 149]
[150 151 152 153 154 155 156 157 158 159]
[160 161 162 163 164 165 166 167 168 169]
[170 171 172 173 174 175 176 177 178 179]
[180 181 182 183 184 185 186 187 188 189]
[190 191 192 193 194 195 196 197 198 199]]
np.arange()函数返回一个有终点和起点的固定步长的排列
参数个数情况: np.arange()函数分为一个参数,两个参数,三个参数三种情况。
- 一个参数时,参数值为终点,起点取默认值0,步长取默认值1。
- 两个参数时,第一个参数为起点,第二个参数为终点,步长取默认值1。
- 三个参数时,第一个参数为起点,第二个参数为终点,第三个参数为步长。其中步长支持小数
import numpy as np
small_array = np.arange(4)
print(np.reshape(small_array, (2, 2), order='C')) # 按照行的顺序
print(np.reshape(small_array, (2, 2), order='F')) # 按照列的顺序
[[0 1]
[2 3]]
[[0 2]
[1 3]]
Numpy math
array_1 = np.array([1, 2, 3, 4])
print(array_1 + 5)
print(array_1 * 5)
print(np.power(array_1, 2)) # 得到array_1[i]^2
print(np.log(array_1)) # 求以e为底的自然对数
[6 7 8 9]
[ 5 10 15 20]
[ 1 4 9 16]
[0. 0.69314718 1.09861229 1.38629436]
For sum, mean, avg, std, var, etc, you can perform the operation on set axis.
array_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# print(np.sum(array_2d)) # 45
# print(np.sum(array_2d, axis=0)) # [12 15 18]
# print(np.sum(array_2d, axis=1)) # [ 6 15 24]
array_3d = np.array([i for i in range(8)]).reshape((2, 2, 2))
pp.pprint(array_3d)
print(np.sum(array_3d, axis=0))
print(np.sum(array_3d, axis=1))
print(np.sum(array_3d, axis=(1, 2))) # axis=(1, 2)
array([[[0, 1],
[2, 3]],
[[4, 5],
[6, 7]]])
[[ 4 6]
[ 8 10]]
[[ 2 4]
[10 12]]
[ 6 22]
Dot product can be written in 4 ways
array_1 = np.array([1, 2, 3, 4])
array_2 = np.array([3, 4, 5, 6])
print(array_1 @ array_2)
print(array_1.dot(array_2))
print(np.dot(array_1, array_2))
print(np.matmul(array_1, array_2))
当array为多维矩阵时,则不能直接使用dot()
array_1 = np.array([[1, 2, 3, 4]])
array_2 = np.array([[3, 4, 5, 6]])
# print(array_1.shape) # (1, 4)
# print(array_1 * array_2) # [[ 3 8 15 24]]
# print(array_1.dot(array_2.T)) # [[50]],因为此时为矩阵,点只是矩阵乘法
print(array_1.T.dot(array_2))
print(np.matmul(array_1, array_2.T))
print(np.matmul(array_1.T, array_2))
print(np.multiply(array_1, array_2)) # np.multiply是元素级乘法
[[ 3 4 5 6]
[ 6 8 10 12]
[ 9 12 15 18]
[12 16 20 24]]
[[50]]
[[ 3 4 5 6]
[ 6 8 10 12]
[ 9 12 15 18]
[12 16 20 24]]
[[ 3 8 15 24]]
op3 = np.array([1, 2, 3])
print(op3)
print(op3.shape)
print(op3.T) # 一维向量无法使用T转置
print(op3.reshape(1,-1)) # 转为二维向量
print(op3.reshape(1,-1).T) # 转置
[1 2 3]
(3,)
[1 2 3]
[[1 2 3]]
[[1]
[2]
[3]]
Tile
就是将原矩阵横向、纵向地复制。tile 是瓷砖的意思,顾名思义,这个函数就是把数组像瓷砖一样铺展开来。
import numpy as np
mat = np.array([[1,2], [3, 4]])
print(np.tile(mat, (1, 3))) # 将array沿着X轴扩大三倍,等同于 tile(mat, 3)
print(np.tile(mat, (2,3))) # 将array将array沿着X轴扩大两倍,沿着Y轴扩大两倍
[[1 2 1 2 1 2]
[3 4 3 4 3 4]]
[[1 2 1 2 1 2]
[3 4 3 4 3 4]
[1 2 1 2 1 2]
[3 4 3 4 3 4]]
观察使用转置时,平铺结果是如何不同的。Op2最初的形状是1 x 3,所以平铺它(1 x 5)意味着平铺第二维度5次,产生(1 x 15)
用(1 x 5)平铺转置,即3 x 1,意味着平铺第二维度5次,产生(3 x 5)
Expand/Squeeze
import numpy as np
op2 = np.array([[1, 2, 3]])
op_expanded = np.expand_dims(op2, axis=2) # 表示在2位置添加数据
print(op_expanded.shape)
op_expanded2 = np.expand_dims(op2, axis=0) # 表示在0位置添加数据
print(op_expanded2.shape)
op_tiled_1 = np.tile(op_expanded, (15, 1, 5)) # 将原array.shape与现有平铺尺度对应相乘
print(op_tiled_1.shape)
op_tiled_2 = np.tile(op_expanded2, (15, 1, 5))
print(op_tiled_2.shape)