大数据实操课——Python（2.Python基础及函数）-优快云博客

本文链接：https://blog.youkuaiyun.com/QuartusII7/article/details/132263214

2.Python基础及函数

学习Python函数

以下参考本地文件《2.Python基础及函数》
shift+tab键：可以查看函数参数信息
1.内置函数

2.自定义函数

#定义一个函数
def func_name(参数列表):				#func_name：函数名
	函数体
	[return/yield 函数返回值]

3.函数参数

1.无参函数
def show_log():				
	print('i am a log')
show_log()					# i am a log


2.位置函数：传入的参数和定义的参数一一对应
def func_name(arg1,arg2,arg3):
	print(arg1,arg2,arg3)
func_name(val1,val2,val3)


3.关键字参数：直接通过等号传递参数（推荐使用）
def func_name(arg1,arg2,arg3):
	print(arg1,arg2,arg3)
func_name(arg1=val1,arg3=val2,arg2=val3)


4.默认值函数：定义函数时，设置参数的默认值，调用函数时，可以指定通过位置或关键字指定参数的值，如果不指定，参数为默认值。
def func_name(arg1=val1,arg2=val2,arg3=val3):
	print(arg1,arg2,arg3)
#举例：
def func_name(arg1=1,arg2=2,arg3=3):
	print(arg1,arg2,arg3)
func_name(1,2)				# 1 2 3
其实print（）就是一个默认值函数，比如他的分隔符默认是空格，可以设置分隔符号：
def func_name(arg1=1,arg2=2,arg3=3):
	print(arg1,arg2,arg3,sep=',')
func_name(1,2)				# 1,2,3
#默认参数注意：
def func1(x=10,y):		#non-default argument follows default argument
	value = x + y**2
	if value > 5:
		return value
def func1(x,y=10):		#non-default argument follows default argument
	value = x + y**2
	if value > 5:
		return value
func1(1)				# 101


5.不定长参数：
1）参数个数不确定，
2）不定长参数种类：
		1.包裹（packing）位置参数，接受不定长的位置参数，参数被组织成一个元组传入
			1)参数表示为*args
			2)调用参数时，参数自动组成一个元组传入
			3）传入位置参数与组织成的元组索引位置一致
		def func2(*t):	#t is a tuple
			print(t)
		func2()		#no argument   ()
		func2(1,2,3)		#(1, 2, 3)
		func2(1,2,3,4,5)		#(1, 2, 3, 4, 5)
		2.包裹（packing）关键字参数，接受不定长的关键字参数，参数被组织成一个字典传入
			1）参数表示为**kwargs
			2)调用参数时，参数自动组成一个字典传入
			3）传入位置参数与组织成的字典键值对一致
			def func3(**d):		#dictionary
				print(d)
			func3()		#no argument	{}
			func3(a=1,b=2,c=3)
				

6.不同类型的函数参数混用顺序
def func5(x,y=10,*args,**kwargs):  #  位置参数-默认值参数-包裹位置参数-包裹关键字参数
	print(x,y,args,kwargs)
func5(0)
func5(a=1,b=2,c=3,x=4)
func5(1,2,3,4,a=5,b=6)

4.函数是对象

函数可以被引用，即可以赋值给一个变量
函数可以当作参数传递
函数可以做返回值
函数可以嵌套
def factorial(n):
	if n<=2:
		return n 
	return factorial(n-1)*n				#递归吃内存哦
f = factorial(4)		#24
l = [factorial,f]		#
d = {'x':factorial}		#


嵌套函数：
	在函数内部定义新的函数
	内部函数不能被外部直接调用
	函数可以被当作变量赋值，本质是一个对象
def func6():
	def nestedFunc():
		print('Hi')
	return nestedFunc
x = func6()		# x is the nestedFunc
x()				#Hi
func6()()		#Hi

嵌套练习：https://www.cnblogs.com/PortosHan/p/14822258.html



装饰器：
	修改其他函数功能的函数
	函数更简洁
def my_decorator(some_func):
	def wrapper(*args):
		print("i am watching you!")
		some_func(*args)
		print("you are calles.")
	return wrapper
@my_decorator
def add(x,y):
	print(x,'+',y,'=',x+y)
add(5,6)
#
i am watching you!
5 + 6 = 11
you are calles.

如何理解Python装饰器
 python装饰器详解

变量作用域：
1）变量能生效的范围
2）按照作用域划分变量的范围：
	1）全局变量
	定义在模块中的变量，在模块中可见，globals()函数可以返回所有定义在该模块的全局变量，修改全局变量时要想global关键字声明变量
msg = 'created in module'
def outer_1():
	def inner():
		print("print in inner",msg)
	inner()
outer_1()						#print in inner created in module
______________________________________
msg = 'created in module'
def outer():
	def inner():
		global msg
		msg = 'changed in inner'
	inner()
outer()
print(msg)					#changed in inner
见下图：

在这里插入图片描述

	2）局部变量
	定义在函数中变量；，locals()函数，返回所有定义在函数中大的局部变量；自由变量：在函数中使用，但未定义在该函数中的非全局变量
def outer_1():
	msg = 'created in outer'
	def inner():			#闭包
		print(msg)		#msg is a Free variable
	inner()
outer_1()
————————————————————————————————————————————————
nonlocal关键字：修改自由变量时，要先使用nonlocal关键字声明变量
def outer():
	msg = 'created in outer'
	def inner():
		nonlocal msg
		msg = 'changed in inner' 		#msg is a Free varrable
	inner()
	print(msg)
outer()

LEGB规则：
使用legb的顺序来查找一个符号对应的对象
Local > Enclosed > Global > Built-in
local:一个函数或者类方法内部
Enclosed:嵌套函数内
Global:模块层级
Built-in:Python内置符号

type=4
def f1():
    type=3
    def f2():
        type=2
        def f3():
            type=1
            print('type=', type)
        f3()
    f2()
f1()			#type=1

函数的返回值:：
函数无需声明返回值类型
在函数没有返回值时，函数默认返回None
return关键字用于返回返回值

yield关键字:：
当函数使用yield关键字时，函数变为生成器
生成器是Python中的一种可迭代对象
能够使用for循环遍历
生成器每次只被读取一次
生成器有内置方法__next()__，调用后返回下一个元素
yield不会终止程序，返回值之后程序继续运行

求斜边小于n的勾股数组合：
def  list_pythagorean_triples(n) :
    for c in range(n):
        for a in range(1, c):
            for b in range(1, c):
                if a*a+b*b==c*c:
                    yield (a,b,c)
[i for i in list_pythagorean_triples(11)]      #[(3, 4, 5), (4, 3, 5), (6, 8, 10), (8, 6, 10)]  
list(list_pythagorean_triples(11))    #[(3, 4, 5), (4, 3, 5), (6, 8, 10), (8, 6, 10)] 


生成器的使用方法：for循环迭代生成器；next()函数从生成器中取值；构造生成器的结果列表
#使用循环迭代生成器
for i in list_pythagorean_triples(35):
    print(i)
#使用next()方法从生成器中取值
g = list_pythagorean_triples(100)
next(g)
#构造生成器的结果列表
g = list_pythagorean_triples(100)
list(g)

生成器表达式：(x**3  for  x  in  range(10))
列表生成式：[x**3  for  x  in  range(10)]

NumPy基本数据结构

Numpy —— （1）基础数据结构

Pandas-Series

Pandas-DataFrame

Lambda表达式

lambda表达式
lambda表达式是一个匿名的函数
只包含一条语句, 并自动返回这条语句的结果
语法：
lambda param1, param2, … : expression
使用示例：
f=lambda x: x * x
f(3)				#9

# return results of +, -, *, //, % in a list
lambda x,y: [x+y, x-y, x*y, x//y, x%y]
f(5,6)				#[11, -1, 30, 0, 5]

# return max, min, sum in a tuple
lambda *n: (max(n), min(n), sum(n))			#*n表示包裹位置参数，迭代对象
f(1,2,3,5)					#(5, 1, 11)

# sum 1 to n
lambda n: sum(range(1, n+1))
f(3)		#6



filter()函数中使用lambda表达式:：过滤掉不符合条件的元素，返回由符合条件元素组成的新列表
items=[0, 1, 2, 3, 4, 5]
list(filter(lambda x: x%2==0, items))
list(filter(None, items))

map()函数中使用lambda表达式：根据提供的函数对指定序列做映射
i1=[1, 2, 3, 4, 5, 6]
i2=[11, 12, 13, 14]
i3=[111, 112]
list(map(lambda x,y,z: x+y+z, i1, i2, i3))		#[123,126], 123等于1+11+111，126等于2+12+112


max()函数中使用lambda表达式：返回序列中的最大值
max(1,2,3,4)
max([1,2,3,4])
max([1,2,3,4], key=lambda x:-x)


min()函数中使用lambda表达式：返回序列中的最小值
min(1,2,3,4)
min([1,2,3,4])
min([1,2,3,4], key=lambda x:-x)

sorted()函数中使用lambda表达式：对序列进行排序
sorted([1,2,3,4], reverse=True)
sorted([1,2,3,4], key=lambda x:-x)

正则表达式

正则表达式菜鸟教程
Python正则表达式模块：re模块

常用方法：

在这里插入图片描述

import re
if re.match('abc','a1bc123'):	#逐字匹配
	print(TRUE)
else:
	print(false)
# false

import re
if re.match('abc|a1bc','a1bc123'):	#逐字匹配
	print(TRUE)
else:
	print(false)
# TRUE


import re
re.match('[bc]','a4bc123')	#a4bc123的a不在[bc]中，匹配不上
re.match('[abc]','a4c123')	#<_sre.SRE_Match object; span=(0, 1), match='a'>
re.match('[^bc]','a4bc123')	#a不在【bc】中，<_sre.SRE_Match object; span=(0, 1), match='a'>

re.findall(r'\bc','cb cb')	#['c', 'c']
re.findall(r'B\B','BC BC')	#['B', 'B']

正则表达式：
在这里插入图片描述

默认为贪婪匹配：能匹配多少匹配多少
re.match('.{2,6}c', 'cc cc cc') #  match='cc cc c'，除换行符外匹配任意2到6次个字符
？为非贪婪匹配：匹配0个或1个表达式
re.match('.{2,6}?c', 'cc cc cc')    #  match='cc c'，至少匹配两个，下一个是空格，然后C匹配成功
0长度匹配：匹配0个或1个表达式
re.sub('a?', '-', 'bb') #  result: '-b-b-'，匹配0个或1个a,
re.sub('a*', '-', 'bb') #  result: '-b-b-'
re.sub()用法的详细介绍https://blog.csdn.net/qq_52421092/article/details/129246981

使用()分组：
match=re.match(r'(\d+)', '123') 
groups=match.groups() 	#  groups is ('123',)
g1=match.group(1) 		#  g1 is '123'

match=re.match(r'(\d)(\d)(\d)', '123') 
groups=match.groups() 	#  groups is ('1', '2', '3')
g1=match.group(1) 		#  g1 is '1'
g2=match.group(2) 		#  g2 is '2'
g3=match.group(3) 		#  g3 is '3'

\1...\9匹配第n个分组的内容
ma = re.match(r'(\d(\d(\d)))\1\2\3', '123123233')
print(ma.groups())		#('123', '23', '3')

解析
初始分为三组:
组1: (\d(\d(\d))) 123123233
组2: (\d(\d)) 123123233
组3: (\d) 123123233
\1： 123123233
\2： 123123233
\3： 123123233

?:跳过分组
ma = re.match(r'(\d(?:\d(\d)))-\2',  '647-7')
print(ma.groups())

解析
初始分为三组:
组1: (\d(\d(\d))) 647-7
组2: (\d(\d(\d))) 647-7
组3: (\d(\d(\d))) 647-7
?:在组2中，所以组2被跳过了，所以有两个分组
组1: (\d(\d(\d))) 647-7
组2: (\d( (\d))) 647-7

在这里插入图片描述

re.findall(r'\b\w+(?=ing\b)','I'm singing while you'red ancing')	#['sing','dac']
re.findall(r'\d{3}(?!\d)','123sdf dd4567 890')		#['123', '567', '890']
re.findall(r'(?<=\bre)w+\b','reading a book')	#['ading']
re.findall(r'(?<![a-z])d{7}','88001234 b81233 B8001232')	#['8800123',8001232]