3.数仓项目—数据生成模块

数据生成模块

1. 目标数据

我们要收集和分析的数据主要包括页面数据事件数据曝光数据启动数据错误数据。

1.1 页面数据

页面数据主要记录一个页面的用户访问情况,包括访问时间、停留时间、页面路径等信息。

在这里插入图片描述

字段名称字段描述
page_id页面id
home("首页"),
category("分类页"),
discovery("发现页"),
top_n("热门排行"),
favor("收藏页"),
search("搜索页"),
good_list("商品列表页"),
good_detail("商品详情"),
good_spec("商品规格"),
comment("评价"),
comment_done("评价完成"),
comment_list("评价列表"),
cart("购物车"),
trade("下单结算"),
payment("支付页面"),
payment_done("支付完成"),
orders_all("全部订单"),
orders_unpaid("订单待支付"),
orders_undelivered("订单待发货"),
orders_unreceipted("订单待收货"),
orders_wait_comment("订单待评价"),
mine("我的"),
activity("活动"),
login("登录"),
register("注册");
last_page_id上页id
page_item_type页面对象类型
sku_id("商品skuId"),
keyword("搜索关键词"),
sku_ids("多个商品skuId"),
activity_id("活动id"),
coupon_id("购物券id");
page_item页面对象id
sourceType页面来源类型
promotion("商品推广"),
recommend("算法推荐商品"),
query("查询结果商品"),
activity("促销活动");
during_time停留时间(毫秒)
ts跳入时间
1.2 事件数据

在这里插入图片描述

字段名称字段描述
action_id动作id
favor_add("添加收藏"),
favor_canel("取消收藏"),
cart_add("添加购物车"),
cart_remove("删除购物车"),
cart_add_num("增加购物车商品数量"),
cart_minus_num("减少购物车商品数量"),
trade_add_address("增加收货地址"),
get_coupon("领取优惠券");
注:对于下单、支付等业务数据,可从业务数据库获取。
item_type动作目标类型
sku_id("商品"),
coupon_id("购物券");
item动作目标id
ts动作时间
1.3 曝光数据

曝光数据主要记录页面所曝光的内容,包括曝光对象,曝光类型等信息。

在这里插入图片描述

字段名称字段描述
displayType曝光类型
promotion("商品推广"),
recommend("算法推荐商品"),
query("查询结果商品"),
activity("促销活动");
item_type曝光对象类型
sku_id("商品skuId"),
activity_id("活动id");
item曝光对象id
order曝光顺序
1.4 启动数据

启动数据记录应用的启动信息。

在这里插入图片描述

字段名称字段描述
entry启动入口
icon("图标"),
notification("通知"),
install("安装后启动");
loading_time启动加载时间
open_ad_id开屏广告id
open_ad_ms广告播放时间
open_ad_skip_ms用户跳过广告时间
ts启动时间
1.5 错误数据

错误数据记录应用使用过程中的错误信息,包括错误编号错误信息

字段名称字段描述
error_code错误码
msg错误信息

2. 数据埋点

2.1 主流埋点方式(了解)

目前主流的埋点方式,有代码埋点(前端/后端)可视化埋点、全埋点三种。

代码埋点

  • 是通过调用埋点SDK函数,在需要埋点的业务逻辑功能位置调用接口,上报埋点数据。
  • 例如,我们对页面中的某个按钮埋点后,当这个按钮被点击时,可以在这个按钮对应的 OnClick 函数里面调用SDK提供的数据发送接口,来发送数据。

可视化埋点

  • 只需要研发人员集成采集 SDK,不需要写埋点代码,业务人员就可以通过访问分析平台的“圈选”功能,来“圈”出需要对用户行为进行捕捉的控件,并对该事件进行命名。圈选完毕后,这些配置会同步到各个用户的终端上,由采集 SDK 按照圈选的配置自动进行用户行为数据的采集和发送。

全埋点

  • 是通过在产品中嵌入SDK,前端自动采集页面上的全部用户行为事件,上报埋点数据,相当于做了一个统一的埋点。然后再通过界面配置哪些数据需要在系统里面进行分析。
2.2 埋点数据日志结构

我们的日志结构大致可分为两类,一是普通页面埋点日志,二是启动日志

普通页面日志结构如下,每条日志包含了,当前页面的页面信息,所有事件(动作)、所有曝光信息以及错误信息。除此之外,还包含了一系列公共信息,包括设备信息,地理位置,应用信息等,即下边的common字段。

普通页面埋点日志格式
{
  "common": {                    -- 公共信息
    "ar": "230000",              -- 地区编码
    "ba": "iPhone",              -- 手机品牌
    "ch": "Appstore",            -- 渠道
    "md": "iPhone 8",            -- 手机型号
    "mid": "YXfhjAYH6As2z9Iq",   -- 设备id
    "os": "iOS 13.2.9",          -- 操作系统
    "uid": "485",                -- 会员id
    "vc": "v2.1.134"             -- app版本号
  },
"actions": [                     --动作(事件)  
    {
      "action_id": "favor_add",  --动作id
      "item": "3",               --动作目标id
      "item_type": "sku_id",     --动作目标类型
      "ts": 1585744376605        --动作时间
    }
  ]"displays": [
    {
      "displayType": "query",     -- 曝光类型
      "item": "3",                -- 曝光对象id
      "item_type": "sku_id",      -- 曝光对象类型
      "order": 1                  -- 曝光顺序
    },
    {
      "displayType": "promotion",
      "item": "6",
      "item_type": "sku_id",
      "order": 2
    },
    {
      "displayType": "promotion",
      "item": "9",
      "item_type": "sku_id",
      "order": 3
    },
    {
      "displayType": "recommend",
      "item": "6",
      "item_type": "sku_id",
      "order": 4
    },
    {
      "displayType": "query ",
      "item": "6",
      "item_type": "sku_id",
      "order": 5
    }
  ],
  "page": {                        -- 页面信息
    "during_time": 7648,         -- 停留时间(毫秒)
    "page_item": "3",             -- 页面对象id
    "page_item_type": "sku_id", -- 页面对象类型
    "last_page_id": "login",     -- 上页id
    "page_id": "good_detail",    -- 页面ID
    "sourceType": "promotion"    -- 页面来源类型
  },
"err":{                        -- 错误
    "error_code": "1234",   -- 错误码
    "msg": "***********"    -- 错误信息
},
  "ts": 1585744374423  -- 跳入时间
}
启动日志格式

启动日志结构相对简单,主要包含公共信息,启动信息和错误信息。

{
  "common": {
    "ar": "370000",
    "ba": "Honor",
    "ch": "wandoujia",
    "md": "Honor 20s",
    "mid": "eQF5boERMJFOujcp",
    "os": "Android 11.0",
    "uid": "76",
    "vc": "v2.1.134"
  },
  "start": {   
    "entry": "icon",        -- 启动入口 
    "loading_time": 18803,  -- 启动加载时间
    "open_ad_id": 7,        -- 开屏广告id
    "open_ad_ms": 3449,     -- 广告播放时间
    "open_ad_skip_ms": 1989 -- 用户跳过广告时间
  },
"err":{                     -- 错误
    "error_code": "1234",   -- 错误码
    "msg": "***********"    -- 错误信息
},
  "ts": 1585744304000 -- 启动时间
}
2.3 埋点数据上报时机

埋点数据上报时机包括两种方式。

方式一

  • 在离开该页面时,上传在这个页面发生的所有事情(页面、事件、曝光、错误等)。优点,批处理,减少了服务器接收数据压力。缺点,不是特别及时。

方式二

  • 每个事件、动作、错误等,产生后,立即发送。优点,响应及时。缺点,对服务器接收数据压力比较大

3. 代码模拟生成数据

log-collector.rar下载后,idea打开就可以了。

可以查看模拟生成数据的源代码

4. 模拟数据

  1. application.properties
  2. gmall2020-mock-log-2020-04-01.jar
  3. path2.json

上传到hadoop103的/opt/module/applog目录下

4.1 配置文件说明
application.properteis文件

可以根据需求生成对应日期的用户行为日志。

vim application.properties

logging.level.root=info
#业务日期  注意:并不是生成日志的日期
mock.date=2021-09-01
#启动次数
mock.startup.count=100
#设备最大值
mock.max.mid=50
#会员最大值
mock.max.uid=500
#商品最大值
mock.max.sku-id=10
#页面平均访问时间
mock.page.during-time-ms=20000
#错误概率
mock.error.rate=3
#日志发送延迟
mock.log.sleep=100
#商品详情来源  用户查询,商品推广,智能推荐, 促销活动
mock.detail.source-type-rate=40:25:15:20
path2.json

该文件用来配置访问路径

根据需求,可以灵活配置用户点击路径。

[
  {"path":["home","good_list","good_detail","cart","trade","payment"],"rate":20 },
  {"path":["home","good_list","good_detail","login","good_detail","cart","trade","payment"],"rate":50 },
  {"path":["home","mine","orders_unpaid","trade","payment"],"rate":10 },
  {"path":["home","mine","orders_unpaid","good_detail","good_spec","comment","trade","payment"],"rate":10 },
  {"path":["home","mine","orders_unpaid","good_detail","good_spec","comment","home"],"rate":10 },
  {"path":["home","mine","orders_undelivered"],"rate":20 },
  {"path":["home","mine","orders_unreceipted"],"rate":20 },
  {"path":["home","mine","orders_unreceipted","orders_wait_comment"],"rate":20 },
  {"path":["home","mine","orders_all","orders_wait_comment"],"rate":20 },
  {"path":["home","mine","favor","good_detail","good_spec","comment","trade","payment"],"rate":20 },
  {"path":["home","mine","favor","good_detail","favor","mine"],"rate":20 },
  {"path":["home","cart","good_detail","good_spec","comment","trade","payment"],"rate":20 },
  {"path":["home","cart","login","top_n","good_detail","home"],"rate":20 },
  {"path":["home","login","top_n","good_detail","good_spec","comment","trade","payment"],"rate":20 },
  {"path":["home","search","good_list","good_detail","good_spec","comment","trade","payment"],"rate":20 },
  {"path":["home","search","good_list","good_detail","home"],"rate":20 },
  {"path":["home","category","activity","good_detail","good_spec","comment","trade","payment"],"rate":20 },
  {"path":["home","category","activity","category","good_spec","comment","trade","payment"],"rate":20 },
  {"path":["home","category","activity","category","home"],"rate":20 },
  {"path":["home","category","home"],"rate":20 },
  {"path":["home","discovery","good_detail","good_spec","comment","trade","payment"],"rate":20 },
  {"path":["home","discovery","good_detail","good_spec","comment","good_detail","discovery","home"],"rate":20 },
  {"path":["home","discovery","home"],"rate":20 },
  {"path":["home","activity","good_detail","good_spec","comment","trade","payment"],"rate":20 },
  {"path":["home","activity","good_detail","good_spec","comment","good_detail","activity","home"],"rate":20 },
  {"path":["home","activity","home"],"rate":20 },
  {"path":["home","search","top_n","good_detail","good_spec","comment","trade","payment"],"rate":20 },
  {"path":["home","search","top_n","good_detail","good_spec","comment","good_detail","top_n","search"],"rate":20 },
  {"path":["home","search","good_list","good_detail","good_spec","comment","good_detail","good_list","search"],"rate":20 },
  {"path":["home","search","good_list","good_detail","good_spec","comment","trade","payment"],"rate":20 }
]
4.2 日志生成命令
#在/opt/module/applog路径下执行日志生成命令。
java -jar gmall2020-mock-log-2020-04-01.jar

#在/opt/module/applog/log目录下查看生成日志
ll
4.3 集群日志生成脚本lg.sh
#!/bin/bash
for i in hadoop103 hadoop104 hadoop105; 
do
    echo "========== $i =========="
    ssh $i "cd /opt/module/applog/; java -jar gmall2020-mock-log-2020-04-01.jar >/dev/null 2>&1 &"
done 

/dev/null代表linux的空设备文件,所有往这个文件里面写入的内容都会丢失,俗称“黑洞”。

  1. 标准输入0:从键盘获得输入 /proc/self/fd/0
  2. 标准输出1:输出到屏幕(即控制台) /proc/self/fd/1
  3. 错误输出2:输出到屏幕(即控制台) /proc/self/fd/2

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值