首先来介绍下morphline
Morphlines provides a set of frequently-used high-level transformation and I/O commands that can be combined in application specific ways, as described in the Introduction. The following tables provide a short description of each available command and a link to the complete documentation.
更加具体的可以看官网介绍
这次需求跟上次一样只是中间加一层morphline清洗出更多的数据。
测试使用数据请看上一篇的最后部分
- morphline.conf
morphlines: [
{
id: morphline
importCommands : ["org.kitesdk.**"]
commands: [
{
readLine {
charset: UTF-8
}
}
# 解析出字段
{
split {
inputField: message
outputFields: [date, time, soft, version]
separator: " "
isRegex: false
addEmptyStrings: false
trim: true
}
}
{
split {
inputField: soft
outputFields: [mes,plat]
separator: ":"
isRegex: false
addEmptyStrings: false
trim: true
}
}
{
split {
inputField: mes
outputFields: ["",status,name]
separator: ","
isRegex: false
addEmptyStrings: false
trim: true
}
}
# 将时间戳添加到header中,不加会报找不到timestap
{
addValues {
timestamp: "@{date} @{time}"
}
}
# 格式化上面的时间戳
{
convertTimestamp {
field : timestamp
inputFormats : ["yyyy-MM-dd HH:mm:ss"]
outputFormat : unixTimeInMillis
}
}
# 测试使用
{
logInfo {
format : "timestamp: {}, record: {}"
args : ["@{timestamp}", "@{}"]
}
}
# 将数据转成avro格式,自定义schema
{
toAvro {
schemaFile: /home/training/Desktop/flume-kafka/morphline1/softschema.avsc
}
}
# 指定containlessBinary可以去掉schema头,指定编码解码器