【数据分析】Pandas内容补充（Lambda、applymap、merge...）

本文链接：https://blog.youkuaiyun.com/NewbieJ_/article/details/136892345

Pandas内容补充

1.`lambda`函数

①`f = lambda x:x ** 2`

f = lambda( x:x ** 2)
print(f(100))	# 10000

②`f = lambda x:fun1(x)`

def fun1(x):
    return str(x) + "hahla"
f = lambda x:fun1(x)
print(f(100))	# 100hahla

③`f = (lambda x,y:x + y)(32,23)`

f = (lambda x,y:x + y)(32,23)
f	# 55

④和一些迭代函数一起使用

如pandas的apply、applymap还有内置的map(function, iterable, ...)、filter、reduce

list(map(lambda x:x + 100,[1,2,3,4]))	# [101, 102, 103, 104]
list(map(lambda x,y:x + y,[1,2,3,4],[100,200,300,400]))	# [101, 202, 303, 404]

返回值是迭代器对象，需要用list()函数将其转换为列表。

2.`map`、`apply`、`applymap`的区别(`transform()`)

(27条消息) DataFrame(11)：数据转换——map()函数的使用_数据分析与统计学之美-优快云博客

2.1`map`适用于`Series`

（1）两个参数，第一个参数是一个函数，第二个参数是可迭代的内容（数据）。

（2）函数会依次作用在可迭代内容的每一个元素上进行计算，然后返回一个新的可迭代内容。

在使用Series.map的时候就已经将Series对象的值当作参数给到了map也就是给到了x

series = pd.Series(np.random.randint(1,100,10))
print(series)
print(series.map(lambda x: 1 if x%2 == 0 else 0))

0 71
1 23
2 83
3 68
4 18
dtype: int32
0 0
1 0
2 0
3 1
4 1
dtype: int64

df = pd.DataFrame(np.random.randint(1,100,(4,3)),columns = list('abc'),index = range(4))
df['姓'] = ["张","王","李","黄"]
df['名'] = ["哈哈","发发","钉钉","方方"]
df["姓名"] = list(map(lambda x,y:str(x)+""+str(y),df["姓"],df["名"]))
display(df)

a b c 姓名姓名
0 15 46 77 张哈哈张哈哈
1 85 48 31 王发发王发发
2 83 60 65 李钉钉李钉钉
3 67 86 52 黄方方黄方方

	a	b	c	姓	名	姓名
0	15	46	77	张	哈哈	张哈哈
1	85	48	31	王	发发	王发发
2	83	60	65	李	钉钉	李钉钉
3	67	86	52	黄	方方	黄方方

food = ['Apple','Orange','Tomata',"banana","vegetable"]
prices = np.random.randint(5,10,5)
df = pd.DataFrame(data = {'food':food,'price':prices})

kind = {
    'Apple':"fruit",
    "Orange":"fruit",
    "Tomata":"cuisine",
    "banana":"fruit",
    "vegetable":"cuisine"
}
df.food.map(lambda x:kind[x])

     food  price
0      Apple      6
1     Orange      6
2     Tomata      5
3     banana      5
4  vegetable      6

0      fruit
1      fruit
2    cuisine
3      fruit
4    cuisine
Name: food, dtype: object

2.2 `apply`运用在`DataFrame`，对行或列进行操作、运算

df = pd.DataFrame(np.random.randint(1,100,(4,3)),columns = list('abc'),index = range(4))
# 默认是以行向下传播axis = 0
print(df.apply(lambda x:x.max() - x.min()))
# 修改为列传播
print(df.apply(lambda x:x.max() - x.min(),axis = 1))

 a   b   c
0  79  43  45
1  66  25  73
2  95  63  13
3  23  24  38
a    72
b    39
c    60
dtype: int64
0    36
1    48
2    82
3    15
dtype: int64

df["类型"] = ["A","B","A","B"]
print(df.groupby(['类型']).apply(lambda x:x))
print(df.groupby(['类型']).apply(sum))
print(df.groupby(['类型']).apply(sum).sum(axis = 1))

 a   b   c 类型
0  79  43  45  A
1  66  25  73  B
2  95  63  13  A
3  23  24  38  B

   a    b    c  类型
类型                   
A   174  106   58  AA
B    89   49  111  BB

类型
A    338
B    249
dtype: int64

2.3 `applymap`运用在`DataFrame`上，对每一个元素操作

df = pd.DataFrame(np.random.randint(1,100,(4,3)),columns = list('abc'),index = range(4))
print(df)
print(df.applymap(lambda x:x + 100))

 a   b   c
0  88  50  28
1  85  83  18
2  37  24  35
3  76  42  65
  a    b    c
0  188  150  128
1  185  183  118
2  137  124  135
3  176  142  165

2.4 `filter`

过滤掉不符合条件的元素，返回由符合条件元素组成的新列表。

该函数接收两个参数，第一个为函数，第二个为序列，序列的每个元素作为参数传递给函数进判，

然后返回 True 或 False，最后将返回 True 的元素放到新列表中。

lst = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
 
 
def odd(num):
    if num % 2 == 1:
        return True
    else:
        return False
 
 
ret = filter(odd, lst)
print(ret)                        # <filter object at 0x0000000000860710>
print(list(ret))                  # [1, 3, 5, 7, 9]

2.5 `transform()`

对dataframe的每个series做transform操作，返回的结构与原dataframe一致。

主要用来做分组之后的聚合函数广播到组内的每一个对象上。

import pandas as pd
import numpy as np

df = pd.read_csv("data.csv")
df.head()

Out[40]:

	userid	productid	purchase
0	1001	p1	100
1	1001	p2	300
2	1001	p3	200
3	1001	p4	100
4	1002	p2	250

In [34]:

# 直接使用sum()函数返回对象是分组之后的求和每个组一个值
print(df.groupby('userid')['purchase'].sum())
# transformer可以将每一组的值广播到组内的每一个成员，所以返回的是和原序列长度一样大小的序列
print(df.groupby('userid')['purchase'].transform('sum'))
userid
1001     700
1002    1250
1003     400
1004     250
1005     950
Name: purchase, dtype: int64
0      700
1      700
2      700
3      700
4     1250
5     1250
6     1250
7     1250
8     1250
9      400
10     400
11     250
12     950
13     950
14     950
15     950
16     950
Name: purchase, dtype: int64

In [39]:

# 合并
# sum()的处理办法
df.groupby('userid')['purchase'].sum().rename("购买总量").reset_index()
sumPurchase = df.groupby('userid')['purchase'].sum().rename("Sum").reset_index()
df.merge(sumPurchase,on = 'userid')

# transform()的处理方法
df['购买总量'] = df.groupby('userid')['purchase'].transform('sum')

df.head()

Out[39]:

	userid	productid	purchase	购买总量
0	1001	p1	100	700
1	1001	p2	300	700
2	1001	p3	200	700
3	1001	p4	100	700
4	1002	p2	250	1250

3.`agg()`

agg(’new列名‘=(’列名‘, ’统计方法‘))，注意是括号()，as_index须为True，即作为索引返回。

函数	说明
count	分组中非Nan值的数量
sum	非Nan值的和
mean	非Nan值的平均值
median	非Nan值的算术中间数
std,var	标准差、方差
min,max	非Nan值的最小值和最大值
prob	非Nan值的积
first,last	第一个和最后一个非Nan值

df = pd.DataFrame(np.random.randint(1,100,(4,3)),columns = ['进货量','出货量','售价'],index = range(4))
df["类型"] = ["A","A","A","B"]
print(df.groupby(['类型'], as_index=False).apply(lambda x:x))
   进货量  出货量  售价 类型
0    2   80  94  A
1   21   81  39  A
2   99   61  76  A
3   13   65  34  B
print(df.loc[:,~df.columns.isin(['类型'])].agg(['sum','max'],axis =1))
   sum  max
0  176   94
1  141   81
2  236   99
3  112   65
# 单个列
print(df.groupby('类型')['售价'].agg(['max','sum']))
    max  sum
类型          
A    94  209
B    34   34
# 多个列
aggSort = {'进货量':['sum'],'售价':['max']}
# df.groupby('类型')['售价'].agg('max')
data = df.groupby('类型',as_index = False)[['售价','进货量']].agg(aggSort)
print(data)
  类型  进货量  售价
      sum max
0  A  122  94
1  B   13  34
# 修改列名
# data.columns = ['_'.join(x) for x in data.columns]
data.columns = [x.replace('sum','总和').replace('max','最大量') for x in data.columns]
print(data)
  类型  进货量  售价
      sum max
0  A  122  94
1  B   13  34

4.`np.random`

np.random用法 - 简书 (jianshu.com)

5.常用的东西

5.1 删除一列中所有元素包含字符串的行

df = pd.DataFrame({"A":['a+b+c','pp+dd'],"B":['sd+s','sa+oo+2q']})
df

A	B
0	a+b+c	sd+s
1	pp+dd	sa+oo+2q

df[~ df['A'].str.contains('a')]

A	B
1	pp+dd	sa+oo+2q

5.2将一列中某个字符串全部替换或者删除

df['A'].str.replace('+','')

0     abc
1    ppdd
Name: A, dtype: object

5.3 将包含字符串数组arr里的任意字符串的数据替换为另一字符串

df = pd.DataFrame({"A":['aab+c','pp+dd'],"B":['sd+s','sa+oo+2q']})
df

	A	B
0	aab+c	sd+s
1	pp+dd	sa+oo+2q

menu = ['aa','bb','cc','dd']
def change(x): 
    for i in menu:
        if i in x:
            return i
    return x
df['A']=df['A'].apply(change)
df

	A	B
0	aa	sd+s
1	dd	sa+oo+2q

5.4 str的方法用于df中的一列（包括5.1-.5.3）

1、全部大写:str.upper()
2、全部小写:str.lower()
3、大小写互换：str.swapcase
4、首字母大写，其余小写：str.capitalize()
5、首字母大写：str.title()
6、应对所有字母的小写：str.casefold()

In [297]:

df = pd.DataFrame({"A":['aa b c ','p d d'],"B":[' s d+s','sa+ oo+ 2q']})
df

Out[297]:

	A	B
0	aa b c	s d+s
1	p d d	sa+ oo+ 2q

In [311]:

df['A'].str.upper()

Out[311]:

0    AA B C 
1      P D D
Name: A, dtype: object

5.5日期抽取(条件抽取数据)

In [269]:

import datetime
df = pd.DataFrame({"日期":[datetime.date(year=2010,month=10,day=22),
                         datetime.date(year=2010,month=10,day=23),
                         datetime.date(year=2010,month=10,day=24),
                        datetime.date(year=2010,month=10,day=25)],"出货量":np.random.randint(50,120,4)})
df

Out[269]:

	日期	出货量
0	2010-10-22	61
1	2010-10-23	76
2	2010-10-24	60
3	2010-10-25	76

In [281]:

# 区间(dt1,dt2),注意是开区间
dt1 = datetime.date(year=2010,month=10,day=22)
dt2 = datetime.date(year=2010,month=10,day=25)
df[(df.日期>dt1) & (df.日期<dt2)]

Out[281]:

	日期	出货量
1	2010-10-23	76
2	2010-10-24	60

5.6随机抽样

DataFrame.sample(n,frac,replace=False) n表示抽样个数，frac表示百分比抽样

In [292]:

df = pd.DataFrame(np.random.randint(0,100,(20,50)),index = range(20))
df.sample(frac=0.5)

Out[292]:

	0	1	2	3	4	5	6	7	8	9	…	40	41	42	43	44	45	46	47	48	49
3	57	59	61	98	10	79	31	59	35	61	…	46	25	53	0	95	98	22	44	15	40
0	75	33	48	22	70	20	74	61	73	72	…	32	45	8	95	22	94	36	43	40	73
1	19	83	76	39	73	71	80	45	82	26	…	47	40	18	58	77	18	22	34	4	21
2	95	14	98	74	40	42	82	46	61	86	…	55	6	77	28	84	32	50	74	40	31
19	14	31	89	0	47	82	86	58	98	13	…	10	94	14	40	96	49	22	80	66	48
15	55	65	52	81	88	84	8	83	84	92	…	85	86	38	62	67	14	31	54	6	15
11	43	39	26	85	28	72	64	18	13	71	…	6	28	57	5	71	11	70	33	12	71
6	78	14	28	1	75	0	44	58	61	55	…	49	54	62	31	2	15	27	63	14	10
9	46	71	35	60	55	21	4	31	1	83	…	78	31	61	74	6	1	65	79	26	1
5	19	33	29	4	84	22	60	79	62	22	…	15	45	39	52	7	61	15	71	36	64

10 rows × 50 columns

5.7重复观测

In [12]:

# 重复值检测
df = pd.DataFrame({'A':[10,12,22,12,10,23],'B':range(6)})
print("重复观测数据是否存在: ",any(df.duplicated(subset = 'A')))
重复观测数据是否存在:  True

In [13]:

# 重复值去除
df = pd.DataFrame({'A':[10,12,22,12,10,23],'B':range(6)})
# 删掉一列中重复的只保留第一个,subset选择列(默认全部列) ‘first’删除重复项并保留第一次出现的项
df.drop_duplicates(subset='A',keep='first')

Out[13]:

	A	B
0	10	0
1	12	1
2	22	2
5	23	5

5.8缺失值处理

In [44]:

df = pd.DataFrame({'A':[np.nan,12,22,12,10,23],'B':[np.nan,12,22,12,10,np.nan]})
df

Out[44]:

	A	B
0	NaN	NaN
1	12.0	12.0
2	22.0	22.0
3	12.0	12.0
4	10.0	10.0
5	23.0	NaN

5.8.1 缺失值观测

In [45]:

# 总的缺失值数目
print("总的缺失值数目:",df.isnull().sum().sum())
print("B的缺失值数目:",df['B'].isnull().sum())
总的缺失值数目: 3
B的缺失值数目: 2

5.8.2 缺失值删除

①行删除

In [46]:

# 删除一列中缺失值(整行删除)
print(df[df['B'].notnull()])
'''
axis 指 轴，0是行，1是列
how 是删除条件：any 任意一个为na则删除整行/列,all 整行/列为na才删除
inplace 是否在原DataFrame 上进行删除，false为否
''' 

# 通过dropna
df.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
      A     B
1  12.0  12.0
2  22.0  22.0
3  12.0  12.0
4  10.0  10.0

Out[46]:

	A	B
1	12.0	12.0
2	22.0	22.0
3	12.0	12.0
4	10.0	10.0

②列删除

In [49]:

print(df.drop('B',axis = 1))
      A
0   NaN
1  12.0
2  22.0
3  12.0
4  10.0
5  23.0

5.8.3缺失值替换

In [55]:

# 1.向前向后替换
print(df.fillna(method = 'bfill'))
# 2.固定值替换(df.A.mode()[0]的意思是A列出现的次数最多的内容)
print(df.fillna(value = {'A':df.A.mode()[0],'B':1000}))
      A     B
0  12.0  12.0
1  12.0  12.0
2  22.0  22.0
3  12.0  12.0
4  10.0  10.0
5  23.0   NaN
      A       B
0  12.0  1000.0
1  12.0    12.0
2  22.0    22.0
3  12.0    12.0
4  10.0    10.0
5  23.0  1000.0

5.9不选某几列的方法

In [15]:

df = pd.DataFrame(np.random.randint(1,100,(6,20)))
df.loc[:,~df.columns.isin(['1','2'])]

Out[15]:

	0	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19
0	72	41	39	8	84	80	17	95	19	87	52	23	54	84	36	57	52	4
1	34	68	19	47	16	90	44	44	69	20	98	84	23	65	97	35	56	98
2	54	73	87	56	22	32	57	22	86	80	12	24	21	72	7	98	6	14
3	57	81	36	34	6	66	53	24	71	37	44	77	17	43	77	61	20	49
4	29	88	94	93	55	87	36	3	88	43	76	70	38	10	38	84	18	41
5	66	77	40	71	2	59	91	95	20	92	55	7	61	98	38	17	7	32

5.10 查看df数据

df.info():          # 打印摘要
df.describe():      # 描述性统计信息
df.values:          # 数据 <ndarray>
df.to_numpy()       # 数据 <ndarray> (推荐)
df.shape:           # 形状 (行数, 列数)
df.columns:         # 列标签 <Index>
df.columns.values:  # 列标签 <ndarray>
df.index:           # 行标签 <Index>
df.index.values:    # 行标签 <ndarray>
df.head(n):         # 前n行
df.tail(n):         # 尾n行
pd.options.display.max_columns=n: # 最多显示n列
pd.options.display.max_rows=n:    # 最多显示n行
df.memory_usage():                # 占用内存(字节B)

5.11 分组groupby之后的内容获取

a = study_info.groupby('course_id')
for i,j in a:
    print(i)
    print(j)

课程0
     user_id course_id    course_join_time learn_process  price
51017   用户18436       课程0 2020-05-22 22:32:05    width: 0%;  199.0
164216  用户36580       课程0 2020-04-27 14:36:47    width: 0%;  199.0
课程1
     user_id course_id    course_join_time learn_process  price
589        用户24       课程1 2020-04-20 17:00:18    width: 0%;  199.0
51020   用户18436       课程1 2020-05-22 22:32:28    width: 0%;  199.0
66435   用户23501       课程1 2020-04-24 11:46:42    width: 0%;  199.0
164219  用户36580       课程1 2020-04-27 14:36:47    width: 0%;  199.0

5.12 删除行后df重新索引

df1.reset_index(drop=True, inplace=True)

5.13 Matplotlib做多个纵轴（多y轴）

https://www.cnblogs.com/dajunma21/p/9001145.html

https://matplotlib.org/2.0.2/examples/api/two_scales.html

#-*- coding:utf-8 -*-
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick

def main():
    plt.rcdefaults()
    plt.rcParams['font.sans-serif'] = ['SimHei'] # 指定默认字体
    plt.rcParams['axes.unicode_minus'] = False   # 解决保存图像是负号'-'显示为方块的问题

    info_list = [(u"小给", 88, 23), (u"小人", 78, 10), (u"小民", 90, 5), (u"小一", 66, 9), (u"小个", 80, 22), (u"小胶", 48, 5), (u"小带", 77, 19)]
    positions = np.arange(len(info_list))
    names = [row[0] for row in info_list]
    scores = [row[1] for row in info_list]
    proges = [row[2] for row in info_list]

    fig, ax1 = plt.subplots()

    # 成绩直方图
    ax1.bar(positions, scores, width=0.6, align='center', color='r', label=u"成绩")
    ax1.set_xticks(positions)
    ax1.set_xticklabels(names)
    ax1.set_xlabel(u"名字")
    ax1.set_ylabel(u"成绩")
    max_score = max(scores)
    ax1.set_ylim(0, int(max_score * 1.2))
    # 成绩标签
    for x,y in zip(positions, scores):
        ax1.text(x, y + max_score * 0.02, y, ha='center', va='center', fontsize=13)

    # 变动折线图
    ax2 = ax1.twinx()
    ax2.plot(positions, proges, 'o-', label=u"进步幅度")
    max_proges = max(proges)
    # 变化率标签
    for x,y in zip(positions, proges):
        ax2.text(x, y + max_proges * 0.02, ('%.1f%%' %y), ha='center', va= 'bottom', fontsize=13)
    # 设置纵轴格式
    fmt = '%.1f%%'
    yticks = mtick.FormatStrFormatter(fmt)
    ax2.yaxis.set_major_formatter(yticks)
    ax2.set_ylim(0, int(max_proges * 1.2))
    ax2.set_ylabel(u"进步幅度")

    # 图例
    handles1, labels1 = ax1.get_legend_handles_labels()
    handles2, labels2 = ax2.get_legend_handles_labels()
    plt.legend(handles1+handles2, labels1+labels2, loc='upper right')

    plt.show()


if __name__ == '__main__':
    main()

5.14新增一行数据

res = pd.DataFrame(columns=('a', 'b', 'c'))
res = res.append([{'a':10.0}], ignore_index=True)
print(res.head())

   a   b   c
0  10.0 NaN NaN

5.15 np获取指定行指定列的元素

import numpy as np
 
a = np.array([[1,2,3], [4,5,6],[7,8,9]])
print(a)

arr1 = np.array([0,2])
arr2 = np.array([0,2])
c = a[arr1[:, None],arr2]
print(c)

[[1 2 3]
[4 5 6]
[7 8 9]]
[[1 3]
[7 9]]

6.`merge`、`sub|add`、`concat`的用法

6.1`merge`

pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None,left_index=False, right_index=False, sort=True,suffixes=('_x', '_y'), copy=True)
详细使用

import pandas as pd 
left = pd.DataFrame({ 
   'id':[1,2,3,4], 
   'Name': ['Smith', 'Maiki', 'Hunter', 'Hilen']})
right = pd.DataFrame({ 
    'id':[1,2,3,4], 
   'Sex':[1, 0, 1, 1]
   })
#通过on参数指定合并的键
print(pd.merge(left,right,on='id'))

   id    Name  Sex
0   1   Smith    1
1   2   Maiki    0
2   3  Hunter    1
3   4   Hilen    1

6.2`sub`少用

df1.sub(other, axis=’columns’, level=None, fill_value=None)

6.3 `concat`

pd.concat(objs,axis=0,join='outer',join_axes=None,ignore_index=False)

import pandas as pd
a= pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                    'B': ['B0', 'B1', 'B2', 'B3'],
                    'C': ['C0', 'C1', 'C2', 'C3'],
                    'D': ['D0', 'D1', 'D2', 'D3']},
                    index=[0, 1, 2, 3])
b= pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
                    'B': ['B4', 'B5', 'B6', 'B7'],
                    'C': ['C4', 'C5', 'C6', 'C7'],
                    'D': ['D1', 'D2', 'D5', 'D6']},
                     index=[2,3,4,5])
#连接a与b,设置 ignore_index 等于 True
print(pd.concat([a,b],keys=['x','y'],ignore_index=False))
print(pd.concat([a,b],keys=['x','y'],ignore_index=True))

      A   B   C   D
x 0  A0  B0  C0  D0
  1  A1  B1  C1  D1
  2  A2  B2  C2  D2
  3  A3  B3  C3  D3
y 2  A4  B4  C4  D1
  3  A5  B5  C5  D2
  4  A6  B6  C6  D5
  5  A7  B7  C7  D6
    A   B   C   D
0  A0  B0  C0  D0
1  A1  B1  C1  D1
2  A2  B2  C2  D2
3  A3  B3  C3  D3
4  A4  B4  C4  D1
5  A5  B5  C5  D2
6  A6  B6  C6  D5
7  A7  B7  C7  D6

	0	1	2	3	4	5	6	7	8	9	…	40	41	42	43	44	45	46	47	48	49
3	57	59	61	98	10	79	31	59	35	61	…	46	25	53	0	95	98	22	44	15	40
0	75	33	48	22	70	20	74	61	73	72	…	32	45	8	95	22	94	36	43	40	73
1	19	83	76	39	73	71	80	45	82	26	…	47	40	18	58	77	18	22	34	4	21
2	95	14	98	74	40	42	82	46	61	86	…	55	6	77	28	84	32	50	74	40	31
19	14	31	89	0	47	82	86	58	98	13	…	10	94	14	40	96	49	22	80	66	48
15	55	65	52	81	88	84	8	83	84	92	…	85	86	38	62	67	14	31	54	6	15
11	43	39	26	85	28	72	64	18	13	71	…	6	28	57	5	71	11	70	33	12	71
6	78	14	28	1	75	0	44	58	61	55	…	49	54	62	31	2	15	27	63	14	10
9	46	71	35	60	55	21	4	31	1	83	…	78	31	61	74	6	1	65	79	26	1
5	19	33	29	4	84	22	60	79	62	22	…	15	45	39	52	7	61	15	71	36	64

	0	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19
0	72	41	39	8	84	80	17	95	19	87	52	23	54	84	36	57	52	4
1	34	68	19	47	16	90	44	44	69	20	98	84	23	65	97	35	56	98
2	54	73	87	56	22	32	57	22	86	80	12	24	21	72	7	98	6	14
3	57	81	36	34	6	66	53	24	71	37	44	77	17	43	77	61	20	49
4	29	88	94	93	55	87	36	3	88	43	76	70	38	10	38	84	18	41
5	66	77	40	71	2	59	91	95	20	92	55	7	61	98	38	17	7	32

	0	1	2	3	4	5	6	7	8	9	…	40	41	42	43	44	45	46	47	48	49
3	57	59	61	98	10	79	31	59	35	61	…	46	25	53	0	95	98	22	44	15	40
0	75	33	48	22	70	20	74	61	73	72	…	32	45	8	95	22	94	36	43	40	73
1	19	83	76	39	73	71	80	45	82	26	…	47	40	18	58	77	18	22	34	4	21
2	95	14	98	74	40	42	82	46	61	86	…	55	6	77	28	84	32	50	74	40	31
19	14	31	89	0	47	82	86	58	98	13	…	10	94	14	40	96	49	22	80	66	48
15	55	65	52	81	88	84	8	83	84	92	…	85	86	38	62	67	14	31	54	6	15
11	43	39	26	85	28	72	64	18	13	71	…	6	28	57	5	71	11	70	33	12	71
6	78	14	28	1	75	0	44	58	61	55	…	49	54	62	31	2	15	27	63	14	10
9	46	71	35	60	55	21	4	31	1	83	…	78	31	61	74	6	1	65	79	26	1
5	19	33	29	4	84	22	60	79	62	22	…	15	45	39	52	7	61	15	71	36	64

	0	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19
0	72	41	39	8	84	80	17	95	19	87	52	23	54	84	36	57	52	4
1	34	68	19	47	16	90	44	44	69	20	98	84	23	65	97	35	56	98
2	54	73	87	56	22	32	57	22	86	80	12	24	21	72	7	98	6	14
3	57	81	36	34	6	66	53	24	71	37	44	77	17	43	77	61	20	49
4	29	88	94	93	55	87	36	3	88	43	76	70	38	10	38	84	18	41
5	66	77	40	71	2	59	91	95	20	92	55	7	61	98	38	17	7	32

【数据分析】Pandas内容补充（Lambda、applymap、merge...）

Pandas内容补充

1.lambda函数

①f = lambda x:x ** 2

②f = lambda x:fun1(x)

③f = (lambda x,y:x + y)(32,23)