机器学习训练营(入群联系qq:2279055353)—— 机器学习案例详解的直播互动平台
下期直播案例预告:大数据预测商品的销售量波动趋势
分析数据
现在,让我们以问题的形式探索产品销售量情况,并从数据里找到答案。
问题一:数据集里有多少家商店?
dataset_sales %>% select(shop_id) %>% distinct() %>% count()
问题二:哪家商店最受欢迎?整体销售量是多少?
most.popular.shop <-
dataset_sales %>% group_by(shop_id) %>%
summarise(total.sales.by.shop = sum(item_cnt_day)) %>%
arrange(desc(total.sales.by.shop)) %>% ungroup()
ggplot(data = most.popular.shop, aes(x = reorder(as.factor(shop_id), total.sales.by.shop),
y = total.sales.by.shop, fill = as.factor(shop_id))) +
geom_bar(stat = "identity") + coord_flip() +
labs(title = "Most popular shop with most sales", x = "Shop(s)", y = "Total sales",
fill = "Shop Id")
rm(most.popular.shop)
问题三:在所有的商店里有多少件商品?
dataset_sales %>% select(item_id) %>% distinct() %>%