文章目录
数据生成模块
1. 目标数据
我们要收集和分析的数据主要包括
页面数据
、事件数据
、曝光数据
、启动数据
和错误数据。
1.1 页面数据
页面数据主要记录一个页面的用户访问情况,包括访问时间、停留时间、页面路径等信息。
字段名称 | 字段描述 |
---|---|
page_id | 页面idhome("首页"), category("分类页"), discovery("发现页"), top_n("热门排行"), favor("收藏页"), search("搜索页"), good_list("商品列表页"), good_detail("商品详情"), good_spec("商品规格"), comment("评价"), comment_done("评价完成"), comment_list("评价列表"), cart("购物车"), trade("下单结算"), payment("支付页面"), payment_done("支付完成"), orders_all("全部订单"), orders_unpaid("订单待支付"), orders_undelivered("订单待发货"), orders_unreceipted("订单待收货"), orders_wait_comment("订单待评价"), mine("我的"), activity("活动"), login("登录"), register("注册"); |
last_page_id | 上页id |
page_item_type | 页面对象类型sku_id("商品skuId"), keyword("搜索关键词"), sku_ids("多个商品skuId"), activity_id("活动id"), coupon_id("购物券id"); |
page_item | 页面对象id |
sourceType | 页面来源类型promotion("商品推广"), recommend("算法推荐商品"), query("查询结果商品"), activity("促销活动"); |
during_time | 停留时间(毫秒) |
ts | 跳入时间 |
1.2 事件数据
字段名称 | 字段描述 |
---|---|
action_id | 动作id favor_add("添加收藏") ,favor_canel("取消收藏") ,cart_add("添加购物车"), cart_remove("删除购物车"), cart_add_num("增加购物车商品数量"), cart_minus_num("减少购物车商品数量"), trade_add_address("增加收货地址"), get_coupon("领取优惠券"); 注:对于下单、支付等业务数据,可从业务数据库获取。 |
item_type | 动作目标类型sku_id("商品"), coupon_id("购物券"); |
item | 动作目标id |
ts | 动作时间 |
1.3 曝光数据
曝光数据主要记录页面所曝光的内容,包括曝光对象,曝光类型等信息。
字段名称 | 字段描述 |
---|---|
displayType | 曝光类型promotion("商品推广"), recommend("算法推荐商品"), query("查询结果商品"), activity("促销活动"); |
item_type | 曝光对象类型sku_id("商品skuId"), activity_id("活动id"); |
item | 曝光对象id |
order | 曝光顺序 |
1.4 启动数据
启动数据记录应用的启动信息。
字段名称 | 字段描述 |
---|---|
entry | 启动入口icon("图标"), notification("通知"), install("安装后启动"); |
loading_time | 启动加载时间 |
open_ad_id | 开屏广告id |
open_ad_ms | 广告播放时间 |
open_ad_skip_ms | 用户跳过广告时间 |
ts | 启动时间 |
1.5 错误数据
错误数据记录应用使用过程中的错误信息,包括
错误编号
及错误信息
。
字段名称 | 字段描述 |
---|---|
error_code | 错误码 |
msg | 错误信息 |
2. 数据埋点
2.1 主流埋点方式(了解)
目前主流的埋点方式,有
代码埋点(前端/后端)
、可视化埋点
、全埋点三种。
代码埋点
- 是通过调用埋点SDK函数,在需要埋点的业务逻辑功能位置调用接口,上报埋点数据。
- 例如,我们对页面中的某个按钮埋点后,当这个按钮被点击时,可以在这个按钮对应的 OnClick 函数里面调用SDK提供的数据发送接口,来发送数据。
可视化埋点
- 只需要研发人员集成采集 SDK,不需要写埋点代码,业务人员就可以通过访问分析平台的“圈选”功能,来“圈”出需要对用户行为进行捕捉的控件,并对该事件进行命名。圈选完毕后,这些配置会同步到各个用户的终端上,由采集 SDK 按照圈选的配置自动进行用户行为数据的采集和发送。
全埋点
- 是通过在产品中嵌入SDK,前端自动采集页面上的全部用户行为事件,上报埋点数据,相当于做了一个统一的埋点。然后再通过界面配置哪些数据需要在系统里面进行分析。
2.2 埋点数据日志结构
我们的日志结构大致可分为两类,一是
普通页面埋点日志
,二是启动日志
。普通页面日志结构如下,每条日志包含了,当前页面的
页面信息
,所有事件(动作)
、所有曝光信息
以及错误信息
。除此之外,还包含了一系列公共信息
,包括设备信息,地理位置,应用信息等,即下边的common
字段。
普通页面埋点日志格式
{
"common": { -- 公共信息
"ar": "230000", -- 地区编码
"ba": "iPhone", -- 手机品牌
"ch": "Appstore", -- 渠道
"md": "iPhone 8", -- 手机型号
"mid": "YXfhjAYH6As2z9Iq", -- 设备id
"os": "iOS 13.2.9", -- 操作系统
"uid": "485", -- 会员id
"vc": "v2.1.134" -- app版本号
},
"actions": [ --动作(事件)
{
"action_id": "favor_add", --动作id
"item": "3", --动作目标id
"item_type": "sku_id", --动作目标类型
"ts": 1585744376605 --动作时间
}
],
"displays": [
{
"displayType": "query", -- 曝光类型
"item": "3", -- 曝光对象id
"item_type": "sku_id", -- 曝光对象类型
"order": 1 -- 曝光顺序
},
{
"displayType": "promotion",
"item": "6",
"item_type": "sku_id",
"order": 2
},
{
"displayType": "promotion",
"item": "9",
"item_type": "sku_id",
"order": 3
},
{
"displayType": "recommend",
"item": "6",
"item_type": "sku_id",
"order": 4
},
{
"displayType": "query ",
"item": "6",
"item_type": "sku_id",
"order": 5
}
],
"page": { -- 页面信息
"during_time": 7648, -- 停留时间(毫秒)
"page_item": "3", -- 页面对象id
"page_item_type": "sku_id", -- 页面对象类型
"last_page_id": "login", -- 上页id
"page_id": "good_detail", -- 页面ID
"sourceType": "promotion" -- 页面来源类型
},
"err":{ -- 错误
"error_code": "1234", -- 错误码
"msg": "***********" -- 错误信息
},
"ts": 1585744374423 -- 跳入时间
}
启动日志格式
启动日志结构相对简单,主要包含公共信息,启动信息和错误信息。
{
"common": {
"ar": "370000",
"ba": "Honor",
"ch": "wandoujia",
"md": "Honor 20s",
"mid": "eQF5boERMJFOujcp",
"os": "Android 11.0",
"uid": "76",
"vc": "v2.1.134"
},
"start": {
"entry": "icon", -- 启动入口
"loading_time": 18803, -- 启动加载时间
"open_ad_id": 7, -- 开屏广告id
"open_ad_ms": 3449, -- 广告播放时间
"open_ad_skip_ms": 1989 -- 用户跳过广告时间
},
"err":{ -- 错误
"error_code": "1234", -- 错误码
"msg": "***********" -- 错误信息
},
"ts": 1585744304000 -- 启动时间
}
2.3 埋点数据上报时机
埋点数据上报时机包括两种方式。
方式一
- 在离开该页面时,上传在这个页面发生的所有事情(页面、事件、曝光、错误等)。优点,批处理,减少了服务器接收数据压力。缺点,不是特别及时。
方式二
- 每个事件、动作、错误等,产生后,立即发送。优点,响应及时。缺点,对服务器接收数据压力比较大
3. 代码模拟生成数据
log-collector.rar下载后,idea打开就可以了。
可以查看模拟生成数据的源代码
4. 模拟数据
将
上传到hadoop103的/opt/module/applog目录下
4.1 配置文件说明
application.properteis文件
可以根据需求生成对应日期的用户行为日志。
vim application.properties logging.level.root=info #业务日期 注意:并不是生成日志的日期 mock.date=2021-09-01 #启动次数 mock.startup.count=100 #设备最大值 mock.max.mid=50 #会员最大值 mock.max.uid=500 #商品最大值 mock.max.sku-id=10 #页面平均访问时间 mock.page.during-time-ms=20000 #错误概率 mock.error.rate=3 #日志发送延迟 mock.log.sleep=100 #商品详情来源 用户查询,商品推广,智能推荐, 促销活动 mock.detail.source-type-rate=40:25:15:20
path2.json
该文件用来配置访问路径
根据需求,可以灵活配置用户点击路径。
[ {"path":["home","good_list","good_detail","cart","trade","payment"],"rate":20 }, {"path":["home","good_list","good_detail","login","good_detail","cart","trade","payment"],"rate":50 }, {"path":["home","mine","orders_unpaid","trade","payment"],"rate":10 }, {"path":["home","mine","orders_unpaid","good_detail","good_spec","comment","trade","payment"],"rate":10 }, {"path":["home","mine","orders_unpaid","good_detail","good_spec","comment","home"],"rate":10 }, {"path":["home","mine","orders_undelivered"],"rate":20 }, {"path":["home","mine","orders_unreceipted"],"rate":20 }, {"path":["home","mine","orders_unreceipted","orders_wait_comment"],"rate":20 }, {"path":["home","mine","orders_all","orders_wait_comment"],"rate":20 }, {"path":["home","mine","favor","good_detail","good_spec","comment","trade","payment"],"rate":20 }, {"path":["home","mine","favor","good_detail","favor","mine"],"rate":20 }, {"path":["home","cart","good_detail","good_spec","comment","trade","payment"],"rate":20 }, {"path":["home","cart","login","top_n","good_detail","home"],"rate":20 }, {"path":["home","login","top_n","good_detail","good_spec","comment","trade","payment"],"rate":20 }, {"path":["home","search","good_list","good_detail","good_spec","comment","trade","payment"],"rate":20 }, {"path":["home","search","good_list","good_detail","home"],"rate":20 }, {"path":["home","category","activity","good_detail","good_spec","comment","trade","payment"],"rate":20 }, {"path":["home","category","activity","category","good_spec","comment","trade","payment"],"rate":20 }, {"path":["home","category","activity","category","home"],"rate":20 }, {"path":["home","category","home"],"rate":20 }, {"path":["home","discovery","good_detail","good_spec","comment","trade","payment"],"rate":20 }, {"path":["home","discovery","good_detail","good_spec","comment","good_detail","discovery","home"],"rate":20 }, {"path":["home","discovery","home"],"rate":20 }, {"path":["home","activity","good_detail","good_spec","comment","trade","payment"],"rate":20 }, {"path":["home","activity","good_detail","good_spec","comment","good_detail","activity","home"],"rate":20 }, {"path":["home","activity","home"],"rate":20 }, {"path":["home","search","top_n","good_detail","good_spec","comment","trade","payment"],"rate":20 }, {"path":["home","search","top_n","good_detail","good_spec","comment","good_detail","top_n","search"],"rate":20 }, {"path":["home","search","good_list","good_detail","good_spec","comment","good_detail","good_list","search"],"rate":20 }, {"path":["home","search","good_list","good_detail","good_spec","comment","trade","payment"],"rate":20 } ]
4.2 日志生成命令
#在/opt/module/applog路径下执行日志生成命令。
java -jar gmall2020-mock-log-2020-04-01.jar
#在/opt/module/applog/log目录下查看生成日志
ll
4.3 集群日志生成脚本lg.sh
#!/bin/bash
for i in hadoop103 hadoop104 hadoop105;
do
echo "========== $i =========="
ssh $i "cd /opt/module/applog/; java -jar gmall2020-mock-log-2020-04-01.jar >/dev/null 2>&1 &"
done
/dev/null代表linux的空设备文件,所有往这个文件里面写入的内容都会丢失,俗称“黑洞”。
- 标准输入0:从键盘获得输入
/proc/self/fd/0
- 标准输出1:输出到屏幕(即控制台)
/proc/self/fd/1
- 错误输出2:输出到屏幕(即控制台)
/proc/self/fd/2