1、 使用logstash 消费kafka内采集的数据
数据来源:
https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html
数据输出
https://www.elastic.co/guide/en/logstash/current/plugins-outputs-file.html
https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-sniffing_path
数据过滤:
https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html
Grok 匹配选项
https://github.com/logstash-plugins/logstash-patterns-core/blob/main/patterns/ecs-v1/grok-patterns
Logstash 脚本示例 https://blog.youkuaiyun.com/Q748893892/article/details/101349888
# 输入数据源,配置上 kakfa 连接设置
input {
kafka {
bootstrap_servers => ["10.11.12.13:10193","10.11.12.17:10193"]
client_id => "test"
# auto_offset_reset => "latest"
topics => ["test_log_caoke"]
consumer_threads => 5
group_id => 'logstash'
}
}
# 过滤规则,通过 filebeat ,写入 kafka ,在通过logstash消费时,冗余的字段
filter {
# 1.把当前消息中的 message 字段提取出来,作为一个新的消息体
json {
source => "message"
}
# 2.把1中生成的新消息体,去掉下面的字段
mutate {
remove_field => ["@version","@timestamp","fields"]
}
# 3.此时就剩下message 一个字段,对此字段进行拆分,以 & 分开,然后以 = ,组成 kv 格式
kv {
source => "message"
field_split => "&?"
}
# 4. 移除掉原始的 message 字段
mutate {
remove_field => ["message"]
}
# 5.判断 content 字段是否存在,不存在删除
if ![content] {
drop {}
}
# 6.对content 进行url解码
urldecode {
field => "content"
}
# 7. 对content进行json格式化,并且赋值给content
json {
source => "content"
target => "content"
}
}
# 写入到elasticsearch中
output {
elasticsearch {
hosts => ["elasticsearch.com:80"] # 不加端口号,会默认给一个
index => "test_log_caoke_200190924" # 索引名称,不加 document_type 会默认创建一个叫 doc 的type
}
stdout {
codec => rubydebug
}
file {
path => "/your/log/path/logstash-%{+yyyy-MM-dd-HH}.log" # 使用 Logstash 内置日期格式,每小时轮换日志文件
codec => line { format => "%{message}" }
}
}
2、go语言实现kafka 消费者
https://www.alibabacloud.com/help/zh/sls/user-guide/use-sarama-kafka-go-to-achieve-kafka-consumption
3、python kafak消费
"""
读取 kafka 的用户操作数据并打印
"""
from kafka import KafkaConsumer
topic = 'test1'
bootstrap_servers = ['localhost:9092']
group_id = 'group7'
consumer = KafkaConsumer(
topic, # topic的名称
#group_id=group_id, # 指定此消费者实例属于的组名,可以不指定
bootstrap_servers=bootstrap_servers, # 指定kafka服务器
auto_offset_reset='latest', # 'smallest': 'earliest', 'largest': 'latest'
)
for msg in consumer:
print(msg.value.decode('utf-8').encode('utf-8').decode('unicode_escape'))
1288

被折叠的 条评论
为什么被折叠?



