初识druid

最新推荐文章于 2021-10-13 11:58:42 发布

原创最新推荐文章于 2021-10-13 11:58:42 发布 · 286 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#大数据 #java #服务端

技术分享专栏收录该内容

0 篇文章

订阅专栏

Druid学习笔记
druid快速入门单机搭建
安装druid
curl -O http://static.druid.io/artifacts/releases/druid-0.9.0-bin.tar.gz
tar -xzf druid-0.9.0-bin.tar.gz
cd druid-0.9.0

启动Zookeeper
curl http://www.gtlib.gatech.edu/pub/apache/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz -o zookeeper-3.4.6.tar.gz
tar -xzf zookeeper-3.4.6.tar.gz
cd zookeeper-3.4.6
cp conf/zoo_sample.cfg conf/zoo.cfg
./bin/zkServer.sh start

启动节点
cd /usr/local/java/druid/druid-0.9.0
Java cat conf-quickstart/druid/historical/jvm.config | xargs -cp conf-quickstart/druid/_common:conf-quickstart/druid/historical:lib/* io.druid.cli.Main server historical
java cat conf-quickstart/druid/broker/jvm.config | xargs -cp conf-quickstart/druid/_common:conf-quickstart/druid/broker:lib/* io.druid.cli.Main server broker
java cat conf-quickstart/druid/coordinator/jvm.config | xargs -cp conf-quickstart/druid/_common:conf-quickstart/druid/coordinator:lib/* io.druid.cli.Main server coordinator
java cat conf-quickstart/druid/overlord/jvm.config | xargs -cp conf-quickstart/druid/_common:conf-quickstart/druid/overlord:lib/* io.druid.cli.Main server overlord
java cat conf-quickstart/druid/middleManager/jvm.config | xargs -cp conf-quickstart/druid/_common:conf-quickstart/druid/middleManager:lib/* io.druid.cli.Main server middleManager

启动Druid的服务
bin/init
cd /usr/local/java/druid/druid-0.9.0
本地数据加载
数据样例

{
“time”: “2015-09-12T23:59:59.200Z”,
“channel”: “#en.wikipedia”,
“cityName”: null,
“comment”: “(edited with [[User:ProveIt_GT|ProveIt]])”,
“countryIsoCode”: null,
“countryName”: null,
“isAnonymous”: false,
“isMinor”: false,
“isNew”: false,
“isRobot”: false,
“isUnpatrolled”: false,
“metroCode”: null,
“namespace”: “Main”,
“page”: “Tom Watson (politician)”,
“regionIsoCode”: null,
“regionName”: null,
“user”: “Eva.pascoe”,
“delta”: 182,
“added”: 182,
“deleted”: 0
}

数据配置
cat wikiticker-index.json
{
“type” : “index_hadoop”,
“spec” : {
“ioConfig” : {
“type” : “hadoop”,
“inputSpec” : {
“type” : “static”,
“paths” : “quickstart/wikiticker-2015-09-12-sampled.json”
}
},
“dataSchema” : {
“dataSource” : “wikiticker”,
“granularitySpec” : {
“type” : “uniform”,
“segmentGranularity” : “day”,
“queryGranularity” : “none”,
“intervals” : [“2015-09-12/2015-09-13”]
},
“parser” : {
“type” : “string”,
“parseSpec” : {
“format” : “json”,
“dimensionsSpec” : {
“dimensions” : [
“channel”,
“cityName”,
“comment”,
“countryIsoCode”,
“countryName”,
“isAnonymous”,
“isMinor”,
“isNew”,
“isRobot”,
“isUnpatrolled”,
“metroCode”,
“namespace”,
“page”,
“regionIsoCode”,
“regionName”,
“user”
]
},
“timestampSpec” : {
“format” : “auto”,
“column” : “time”
}
}
},
“metricsSpec” : [
{
“name” : “count”,
“type” : “count”
},
{
“name” : “added”,
“type” : “longSum”,
“fieldName” : “added”
},
{
“name” : “deleted”,
“type” : “longSum”,
“fieldName” : “deleted”
},
{
“name” : “delta”,
“type” : “longSum”,
“fieldName” : “delta”
},
{
“name” : “user_unique”,
“type” : “hyperUnique”,
“fieldName” : “user”
}
]
},
“tuningConfig” : {
“type” : “hadoop”,
“partitionsSpec” : {
“type” : “hashed”,
“targetPartitionSize” : 5000000
},
“jobProperties” : {}
}
}
}

提交索引
curl -X ‘POST’ -H ‘Content-Type:application/json’ -d @quickstart/wikiticker-index.json localhost:8090/druid/indexer/v1/task

overlord控制台页面查看任务的状态：http://192.168.84.132:8090/console.html

coordinator的控制台监控到加载数据到进程: http://192.168.84.132:8081/#/

search
cat wikiticker-top-pages.json
{
“queryType” : “topN”,
“dataSource” : “wikiticker”,
“intervals” : [“2015-09-12/2015-09-13”],
“granularity” : “all”,
“dimension” : “page”,
“metric” : “edits”,
“threshold” : 5,
“aggregations” : [
{
“type” : “longSum”,
“name” : “edits”,
“fieldName” : “count”
}
]
}

查询命令
curl -L -H’Content-Type: application/json’ -XPOST --data-binary @quickstart/wikiticker-top-pages.json http://localhost:8082/druid/v2/?pretty
结果
[ {
“timestamp” : “2015-09-12T00:46:58.771Z”,
“result” : [ {
“edits” : 33,
“page” : “Wikipedia:Vandalismusmeldung”
}, {
“edits” : 28,
“page” : “User:Cyde/List of candidates for speedy deletion/Subpage”
}, {
“edits” : 27,
“page” : “Jeremy Corbyn”
}, {
“edits” : 21,
“page” : “Wikipedia:Administrators’ noticeboard/Incidents”
}, {
“edits” : 20,
“page” : “Flavia Pennetta”
} ]
} ]

流式数据加载
安装tranquility
druidcurl -O http://static.druid.io/tranquility/releases/tranquility-distribution-0.7.4.tgz
tar -xzf tranquility-distribution-0.7.4.tgz
cd tranquility-distribution-0.7.4
启动tranquility
bin/tranquility server -configFile …/druid-0.9.0/conf-quickstart/tranquility/metrcserver.json
数据样例
{“time”: “2019-08-08T12:40:06Z”, “url”: “/foo/bar”, “user”: “alice”, “latencyMs”: 32}
数据配置
cat metrcserver.jsoncat server.json
{
“dataSources” : {
“metrics” : {
“spec” : {
“dataSchema” : {
“dataSource” : “metrics”,
“parser” : {
“type” : “string”,
“parseSpec” : {
“timestampSpec” : {
“column” : “timestamp”,
“format” : “auto”
},
“dimensionsSpec” : {
“dimensions” : [“page”],
“dimensionExclusions” : [
“timestamp”,
“value”
]
},
“format” : “json”
}
},
“granularitySpec” : {
“type” : “uniform”,
“segmentGranularity” : “hour”,
“queryGranularity” : “none”
},
“metricsSpec” : [
{
“type” : “count”,
“name” : “count”
},
{
“name” : “value_sum”,
“type” : “doubleSum”,
“fieldName” : “value”
},
{
“fieldName” : “value”,
“name” : “value_min”,
“type” : “doubleMin”
},
{
“type” : “doubleMax”,
“name” : “value_max”,
“fieldName” : “value”
}
]
},
“ioConfig” : {
“type” : “realtime”
},
“tuningConfig” : {
“type” : “realtime”,
“maxRowsInMemory” : “100000”,
“intermediatePersistPeriod” : “PT10M”,
“windowPeriod” : “PT10M”
}
},
“properties” : {
“task.partitions” : “1”,
“task.replicants” : “1”
}
}
},
“properties” : {
“zookeeper.connect” : “localhost”,
“druid.discovery.curator.path” : “/druid/discovery”,
“druid.selectors.indexing.serviceName” : “druid/overlord”,
“http.port” : “8200”,
“http.threads” : “8”
}
}
提交索引
bin/generate-example-metrics | curl -XPOST -H’Content-Type: application/json’ --data-binary @- http://localhost:8200/v1/post/metrics
{“result”:{“received”:25,“sent”:25}}You have new mail in /var/spool/mail/testc2
Search
cat metris.json
{
“queryType”:“groupBy”,
“dataSource”:“metrics”,
“granularity”:“hour”,
“dimensions”:[
“page”
],
“aggregations”:[
{
“type”:“count”,
“name”:“count”
}
],
“intervals”:[
“2019-08-13T03:30:08Z/2019-08-16T03:30:08Z”
]
}
查询命令
curl -L -H’Content-Type: application/json’ -XPOST --data-binary @quickstart/metris.json http://localhost:8082/druid/v2/?pretty
结果
[ {
“version” : “v1”,
“timestamp” : “2019-08-15T11:00:00.000Z”,
“event” : {
“count” : 1,
“page” : “/”
}
}, {
“version” : “v1”,
“timestamp” : “2019-08-15T11:00:00.000Z”,
“event” : {
“count” : 1,
“page” : “/get/25”
}
}, {
“version” : “v1”,
“timestamp” : “2019-08-15T11:00:00.000Z”,
“event” : {
“count” : 1,
“page” : “/get/4”
}
}, {
“version” : “v1”,
“timestamp” : “2019-08-15T11:00:00.000Z”,
“event” : {
“count” : 1,
“page” : “/get/40”
}
}, {
“version” : “v1”,
“timestamp” : “2019-08-15T11:00:00.000Z”,
“event” : {
“count” : 1,
“page” : “/get/68”
}
}, {
“version” : “v1”,
“timestamp” : “2019-08-15T11:00:00.000Z”,
“event” : {
“count” : 1,
“page” : “/get/83”
}
}, {
“version” : “v1”,
“timestamp” : “2019-08-15T11:00:00.000Z”,
“event” : {
“count” : 1,
“page” : “/list”
}
} ]
原生json查询组件
过滤（Filters）
一个Filter就是一个Json对象，用于过滤数据行过滤，类似SQL中的Where子句。过滤器类型有：Selector filte、Regular expression filter（正则表达式过滤）、Logical expression filters（AND、OR、NOT）、In filter、Bound filter、Search filter、JavaScript filter、Extraction filter
例：“filter”: { “type”: “selector”, “dimension”: <dimension_string>, “value”: <dimension_value_string> }
聚合（Aggregations）
聚合可以在采集时间时规格部分的一种方式，汇总数据进入Druid之前提供。聚合也可以被指定为在查询时多查询的部分，聚合类型如下：Count aggregator、Sum aggregators、Min / Max aggregators、Approximate Aggregations、Miscellaneous Aggregations
例：
Sum aggregator：{ “type” : “longSum”, “name” : <output_name>, “fieldName” : <metric_name> }
后聚合 post-aggregators
算术后聚合应用已提供的函数从左到右获取字段，这些字段可聚合或后聚合；支持+, -, *, /, and quotient。
查询示例
时间序列查询（Timeseries）
{
“queryType”: “timeseries”, //查询类型
“dataSource”: “sample_datasource”, //查询的数据源（类似hive的表）
“granularity”: “day”, //查询结果进行聚合的粒度
“descending”: “true”, //是否降序
“filter”: { //过滤定义
“type”: “and”, //运算符，支持and,or,not
“fields”: [
{ “type”: “selector”, “dimension”: “sample_dimension1”, “value”: “sample_value1” }, //selector，相当where sample_dimension1=sample_value1
{ “type”: “or”, //运算符
“fields”: [
{ “type”: “columnComparison”, “dimensions”: [“sample_dimension2”, “sample_dimension3”] }, //columnComparison,相当where sample_dimension2=sample_dimension3
{ “type”: “columnComparison”, “dimensions”: [“sample_dimension4”, “sample_dimension5”] }
]
}
]
},
“aggregations”: [ //聚合定义
{ “type”: “longSum”, “name”: “sample_name1”, “fieldName”: “sample_fieldName1” }, //支持long,double,float类型，还有min,max等聚合
{ “type”: “doubleSum”, “name”: “sample_name2”, “fieldName”: “sample_fieldName2” }
],
“postAggregations”: [ //后聚合定义
{ “type”: “arithmetic”, //算术聚合
“name”: “sample_divide”, //输出名称
“fn”: “/”, //运算符，支持+, -, *, /
“fields”: [
{ “type”: “fieldAccess”, “name”: “postAgg__sample_name1”, “fieldName”: “sample_name1” }, //name是输出结果的名称，fieldName对应aggregations的name
{ “type”: “fieldAccess”, “name”: “postAgg__sample_name2”, “fieldName”: “sample_name2” }
]
}
],
“intervals”: [ “2012-01-01T00:00:00.000/2012-01-03T00:00:00.000” ] //查询时间段
}

排名查询（TopN query）
{
“queryType”: “topN”, //查询类型
“dataSource”: “sample_data”, //查询的数据源（类似hive的表）
“dimension”: “sample_dim”, //查询的维度
“threshold”: 5, //需要返回多少结果
“metric”: “count”, //查询的度量
“granularity”: “all”, //查询粒度
“filter”: { //过滤定义
“type”: “and”, //运算符
“fields”: [ //过滤条件
{
“type”: “selector”,
“dimension”: “dim1”,
“value”: “some_value”
},
{
“type”: “selector”,
“dimension”: “dim2”,
“value”: “some_other_val”
}
]
},
“aggregations”: [ //聚合定义
{
“type”: “longSum”, //支持long,double,float类型，还有min,max等聚合
“name”: “count”,
“fieldName”: “count”
},
{
“type”: “doubleSum”,
“name”: “some_metric”,
“fieldName”: “some_metric”
}
],
“postAggregations”: [ //后聚合定义
{
“type”: “arithmetic”, //算术聚合
“name”: “average”, //输出名称
“fn”: “/”, //运算符，支持+, -, *, /
“fields”: [
{
“type”: “fieldAccess”,
“name”: “some_metric”,
“fieldName”: “some_metric”
},
{
“type”: “fieldAccess”,
“name”: “count”,
“fieldName”: “count”
}
]
}
],
“intervals”: [
“2013-08-31T00:00:00.000/2013-09-03T00:00:00.000” //查询时间段
]
}
groupBy查询
{
“queryType”: “groupBy”, //查询类型
“dataSource”: “sample_datasource”, //查询的数据源（类似hive的表）
“granularity”: “day”, //查询粒度
“dimensions”: [“country”, “device”], //group by维度
“limitSpec”: { “type”: “default”, “limit”: 5000, “columns”: [“country”, “data_transfer”] }, //返回country,data_transfer最多5000条结果
“filter”: { //过滤定义
“type”: “and”, //运算符
“fields”: [
{ “type”: “selector”, “dimension”: “carrier”, “value”: “AT&T” }, //selector，相当where carrier=AT&T
{ “type”: “or”, //运算符
“fields”: [
{ “type”: “selector”, “dimension”: “make”, “value”: “Apple” },
{ “type”: “selector”, “dimension”: “make”, “value”: “Samsung” }
]
}
]
},
“aggregations”: [ //聚合定义
{ “type”: “longSum”, “name”: “total_usage”, “fieldName”: “user_count” }, //支持long,double,float类型，还有min,max等聚合
{ “type”: “doubleSum”, “name”: “data_transfer”, “fieldName”: “data_transfer” }
],
“postAggregations”: [ //后聚合定义
{ “type”: “arithmetic”, //算术聚合
“name”: “avg_usage”, //输出名称
“fn”: “/”, //运算符，支持+, -, *, /
“fields”: [
{ “type”: “fieldAccess”, “fieldName”: “data_transfer” }, //fieldName对应aggregations的name
{ “type”: “fieldAccess”, “fieldName”: “total_usage” }
]
}
],
“intervals”: [ “2012-01-01T00:00:00.000/2012-01-03T00:00:00.000” ], //查询时间段
“having”: { //having操作
“type”: “greaterThan”, //支持类型filter，equalTo，greaterThan，lessThan等
“aggregation”: “total_usage”, //对应aggregations的name
“value”: 100
}
}
集成plyql使用sql查询
基本使用
命令：curl -L -H’Content-Type: application/json’ -XPOST --data-binary ‘{“sql”:“select itemId,count(userId) as totalUser from hdp_lbg_jyfz_userbehavior where __time>=“2019-08-05T12:00:00.000+08:00” and __time<=“2019-08-13T12:00:00.000+08:00” group by itemId order by totalUser desc limit 10”}’ 10.135.8.229:8083/plyql
结果：{
“result”: {
“keys”: [“behavior”, “time”],
“attributes”: [{
“name”: “time”,
“type”: “NUMBER”
}, {
“name”: “behavior”,
“type”: “STRING”
}, {
“name”: “count(userId)”,
“type”: “NUMBER”
}],
“data”: [{
“count(userId)”: 4,
“time”: 1,
“behavior”: “pv”
}, {
“count(userId)”: 2,
“time”: 2,
“behavior”: “pv”
}, {
“count(userId)”: 5,
“time”: 3,
“behavior”: “pv”
}, {
“count(userId)”: 11,
“time”: 5,
“behavior”: “pv”
}, {
“count(userId)”: 4,
“time”: 9,
“behavior”: “pv”
}]
}
}
TIME_FLOOR 函数
命令：curl -L -H’Content-Type: application/json’ -XPOST --data-binary ‘{“sql”:“select TIME_FLOOR(__time,P1D, “Etc/UTC”) as time ,behavior,count(userId) from hdp_lbg_jyfz_userbehavior where __time>=“2019-08-10T12:00:00.000+08:00” and __time<=“2019-08-16T12:00:00.000+08:00” and categoryId=“1000959” group by 1,2”}’ 10.135.8.229:8083/plyql
结果：{
“result”: {
“keys”: [“behavior”, “time”],
“attributes”: [{
“name”: “time”,
“type”: “TIME”
}, {
“name”: “behavior”,
“type”: “STRING”
}, {
“name”: “count(userId)”,
“type”: “NUMBER”
}],
“data”: [{
“behavior”: “pv”,
“count(userId)”: 4,
“time”: “2019-08-12T00:00:00.000Z”
}, {
“behavior”: “pv”,
“count(userId)”: 11,
“time”: “2019-08-14T00:00:00.000Z”
}, {
“behavior”: “pv”,
“count(userId)”: 11,
“time”: “2019-08-15T00:00:00.000Z”
}]
}
}
TIME_PART 函数
命令：curl -L -H’Content-Type: application/json’ -XPOST --data-binary ‘{“sql”:“select TIME_PART(__time, HOUR_OF_DAY, “Etc/UTC”) as time ,behavior,count(userId) from hdp_lbg_jyfz_userbehavior where __time>=“2019-08-10T12:00:00.000+08:00” and __time<=“2019-08-16T12:00:00.000+08:00” and categoryId=“1000959” group by 1,2”}’ 10.135.8.229:8083/plyql
结果：{
“result”: {
“keys”: [“behavior”, “time”],
“attributes”: [{
“name”: “time”,
“type”: “NUMBER”
}, {
“name”: “behavior”,
“type”: “STRING”
}, {
“name”: “count(userId)”,
“type”: “NUMBER”
}],
“data”: [{
“count(userId)”: 4,
“time”: 1,
“behavior”: “pv”
}, {
“count(userId)”: 2,
“time”: 2,
“behavior”: “pv”
}, {
“count(userId)”: 5,
“time”: 3,
“behavior”: “pv”
}, {
“count(userId)”: 11,
“time”: 5,
“behavior”: “pv”
}, {
“count(userId)”: 4,
“time”: 9,
“behavior”: “pv”
}]
}
}

TIME_BUCKET 函数
命令：curl -L -H’Content-Type: application/json’ -XPOST --data-binary ‘{“sql”:“select TIME_BUCKET(__time,PT12H,“Etc/UTC”) as time ,behavior,count(userId) from hdp_lbg_jyfz_userbehavior where __time>=“2019-08-10T12:00:00.000+08:00” and __time<=“2019-08-16T12:00:00.000+08:00” and categoryId=“1000959” group by 1,2”}’ 10.135.8.229:8083/plyql
结果：{
“result”: {
“keys”: [“behavior”, “time”],
“attributes”: [{
“name”: “time”,
“type”: “TIME_RANGE”
}, {
“name”: “behavior”,
“type”: “STRING”
}, {
“name”: “count(userId)”,
“type”: “NUMBER”
}],
“data”: [{
“behavior”: “pv”,
“count(userId)”: 4,
“time”: {
“start”: “2019-08-12T00:00:00.000Z”,
“end”: “2019-08-12T12:00:00.000Z”
}
}, {
“behavior”: “pv”,
“count(userId)”: 11,
“time”: {
“start”: “2019-08-14T00:00:00.000Z”,
“end”: “2019-08-14T12:00:00.000Z”
}
}, {
“behavior”: “pv”,
“count(userId)”: 11,
“time”: {
“start”: “2019-08-15T00:00:00.000Z”,
“end”: “2019-08-15T12:00:00.000Z”
}
}]
}
}