淘宝商品口红数据爬取与分析

口红电商数据分析

最新推荐文章于 2023-12-18 11:02:59 发布

原创

最新推荐文章于 2023-12-18 11:02:59 发布 · 3.1k 阅读

80 ·

CC 4.0 BY-SA版权

文章标签：

#python #数据分析 #大数据 #爬虫

本博客通过爬虫技术获取口红电商数据，并运用Python进行数据清洗、处理与分析，包括店铺分类、品牌销售情况、情感分析等，展示了数据分析的全过程。

该文章已生成可运行项目，

数据来源：

爬取数据，网盘中包含爬取的数据与停词库

百度网盘请输入提取码 6666

处理过程

导入数据：

import pandas as pd
data1 = pd.read_excel("kouhong_good.xlsx")
data1.head()

data1.drop(['comment_url'],axis= 1,inplace = True)

将数据店铺分类：

def store(e):
    if '天猫' in e:
        return '天猫店铺'
    elif '旗舰店' in e:
        return '旗舰店'
    elif '专营店' in e:
        return '专营店'
    elif '企业店' in e:
        return '企业店铺'
    else:
        return '自营店铺'

data1['store_type'] = data1['store'].apply(store)
data1.drop(['store'],axis = 1,inplace = True)
data1.head()

处理销量与价格：

import re
def delete(e):
    if '人收货' in e:
        return e.replace('人收货','')
def price(e):
    if '万+' in e:
        num1 = re.findall('(.*?)万+',e)
        return float(num1[0])*10000
    elif '+' in e:
        return e.replace('+','')
    else:
        return float(e)
data1['store_sales'] = data1['sales'].apply(delete).apply(price)
data1.drop(['sales'],axis = 1,inplace = True)
data1.head()

品牌分类：

def classify(e):
    if'Mac' in e:
        return 'MAC'
    elif'魅可'in e:
        return 'MAC'
    elif'Dior'in e:
        return 'Dior'
    elif 'Givenchy'or'纪梵希' in e:
        return 'Givenchy'
    else :
        return 'Others'
data1['brand'] = data1['title'].apply(classify)

data1['brand'] = data1['title'].apply(classify)
data1.head(20)

处理商铺地点：

def location(e):
    return e.split(' ')[0]
data1['store_location'] = data1['location'].apply(location)
data1.drop(['location'],axis = 1,inplace = True)
data1.head(5)

处理价格，删去不合理价格

list = data1[data1['price']<51].index.tolist()
print(list)

data1.drop([54, 93, 104, 162, 173, 457, 500, 541, 551, 654, 674, 685, 705, 726, 789, 823, 837, 847, 851, 949, 956, 1061, 1127, 1128, 1130, 1136, 1137, 1151, 1175, 1193, 1241, 1269, 1308, 1323, 1360, 1380, 1388, 1407, 1459, 1462, 1479, 1483, 1503, 1531, 1544, 1553, 1558, 1572, 1589, 1590, 1624, 1630, 1673, 1703, 1721, 1726, 1779, 1791, 1798, 1812, 1852, 1862, 1935, 1945],inplace = True)
data1

data1['store_sales'] = data1['store_sales'].astype(int)

data1['sales_money'] = data1['price']*data1['store_sales']
data1

品牌占比：

b = [b[0]/m,b[1]/m,b[2]/m]
print(b)

from pyecharts import Pie
pie = Pie("口红品牌比例",width = 600,height = 400)
pie.add("", a, b, is_label_show=True)
pie.render('1.html')

data1['price'].groupby(data1['brand']).sum()

brand_mean = round(data1['price'].groupby(data1['brand']).mean(),1)
brand_mean

本文章已经生成可运行项目

最低0.47元/天解锁文章

淘宝商品口红数据爬取与分析

数据来源：

处理过程

2 条评论