数据来源:
爬取数据,网盘中包含爬取的数据与停词库
百度网盘 请输入提取码 6666
处理过程
导入数据:
import pandas as pd
data1 = pd.read_excel("kouhong_good.xlsx")
data1.head()
data1.drop(['comment_url'],axis= 1,inplace = True)
将数据店铺分类:
def store(e):
if '天猫' in e:
return '天猫店铺'
elif '旗舰店' in e:
return '旗舰店'
elif '专营店' in e:
return '专营店'
elif '企业店' in e:
return '企业店铺'
else:
return '自营店铺'
data1['store_type'] = data1['store'].apply(store)
data1.drop(['store'],axis = 1,inplace = True)
data1.head()
处理销量与价格:
import re
def delete(e):
if '人收货' in e:
return e.replace('人收货','')
def price(e):
if '万+' in e:
num1 = re.findall('(.*?)万+',e)
return float(num1[0])*10000
elif '+' in e:
return e.replace('+','')
else:
return float(e)
data1['store_sales'] = data1['sales'].apply(delete).apply(price)
data1.drop(['sales'],axis = 1,inplace = True)
data1.head()
品牌分类:
def classify(e):
if'Mac' in e:
return 'MAC'
elif'魅可'in e:
return 'MAC'
elif'Dior'in e:
return 'Dior'
elif 'Givenchy'or'纪梵希' in e:
return 'Givenchy'
else :
return 'Others'
data1['brand'] = data1['title'].apply(classify)
data1['brand'] = data1['title'].apply(classify)
data1.head(20)
处理商铺地点:
def location(e):
return e.split(' ')[0]
data1['store_location'] = data1['location'].apply(location)
data1.drop(['location'],axis = 1,inplace = True)
data1.head(5)
处理价格,删去不合理价格
list = data1[data1['price']<51].index.tolist()
print(list)

data1.drop([54, 93, 104, 162, 173, 457, 500, 541, 551, 654, 674, 685, 705, 726, 789, 823, 837, 847, 851, 949, 956, 1061, 1127, 1128, 1130, 1136, 1137, 1151, 1175, 1193, 1241, 1269, 1308, 1323, 1360, 1380, 1388, 1407, 1459, 1462, 1479, 1483, 1503, 1531, 1544, 1553, 1558, 1572, 1589, 1590, 1624, 1630, 1673, 1703, 1721, 1726, 1779, 1791, 1798, 1812, 1852, 1862, 1935, 1945],inplace = True)
data1
data1['store_sales'] = data1['store_sales'].astype(int)
data1['sales_money'] = data1['price']*data1['store_sales']
data1
品牌占比:
b = [b[0]/m,b[1]/m,b[2]/m]
print(b)
from pyecharts import Pie
pie = Pie("口红品牌比例",width = 600,height = 400)
pie.add("", a, b, is_label_show=True)
pie.render('1.html')

data1['price'].groupby(data1['brand']).sum()
brand_mean = round(data1['price'].groupby(data1['brand']).mean(),1)
brand_mean
口红电商数据分析

本博客通过爬虫技术获取口红电商数据,并运用Python进行数据清洗、处理与分析,包括店铺分类、品牌销售情况、情感分析等,展示了数据分析的全过程。
最低0.47元/天 解锁文章
602





