既有适合小白学习的零基础资料,也有适合3年以上经验的小伙伴深入学习提升的进阶课程,涵盖了95%以上大数据知识点,真正体系化!
由于文件比较多,这里只是将部分目录截图出来,全套包含大厂面经、学习笔记、源码讲义、实战项目、大纲路线、讲解视频,并且后续会持续更新
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
“hits” : {
“total” : {
“value” : 3,
“relation” : “eq”
},
“max_score” : 1.0,
“hits” : [
{
“_index” : “first”,
“_type” : “chunsheng”,
“_id” : “2”,
“_score” : 1.0,
“_source” : {
“name” : “宝儿姐”,
“tags” : [
“长生”,
“超能力”,
“道法”
]
}
},
{
“_index” : “first”,
“_type” : “chunsheng”,
“_id” : “3”,
“_score” : 1.0,
“_source” : {
“name” : “愚者”,
“tags” : [
“魔法”,
“超能力”,
“塔罗”
]
}
},
{
“_index” : “first”,
“_type” : “chunsheng”,
“_id” : “1”,
“_score” : 1.0,
“_source” : {
“name” : “春生”,
“tags” : [
“能力者”,
“男”,
“暗”
]
}
}
]
}
}
##### 2.7.4、排序 sort
desc[倒序] or asc[正序]
GET /first/_search
{
“query”: {
“match_all”: {}
},
“_source”: [“age”,“name”],
“sort”: [
{
“age”: {
“order”: “asc”
}
}
]
}
结果:
{
“took” : 1,
“timed_out” : false,
“_shards” : {
“total” : 1,
“successful” : 1,
“skipped” : 0,
“failed” : 0
},
“hits” : {
“total” : {
“value” : 3,
“relation” : “eq”
},
“max_score” : null,
“hits” : [
{
“_index” : “first”,
“_type” : “chunsheng”,
“_id” : “2”,
“_score” : null,
“_source” : {
“name” : “宝儿姐”,
“age” : 18
},
“sort” : [
18
]
},
{
“_index” : “first”,
“_type” : “chunsheng”,
“_id” : “1”,
“_score” : null,
“_source” : {
“name” : “春生”,
“age” : 18
},
“sort” : [
18
]
},
{
“_index” : “first”,
“_type” : “chunsheng”,
“_id” : “3”,
“_score” : null,
“_source” : {
“name” : “愚者”,
“age” : 22
},
“sort” : [
22
]
}
]
}
}
##### 2.7.5、分页查询 from size
GET /first/_search
{
“query”: {
“match_all”: {}
},
“_source”: [“age”,“name”],
“sort”: [
{
“age”: {
“order”: “asc”
}
}
],
“from”:0, #第n条开始
“size”:1 #返回多少条数据
}
##### 2.7.6、布尔查询
###### MUST
“select age,name where first where from=gu and age=18”
GET /first/_search
{
“query”: {
“bool”: {
“must”: [
{“match”: {
“from”: “gu”
}
},
{“match”: {
“age”: “18”}
}
]
}
},
“_source”: [“age”,“name”],
“sort”: [
{
“age”: {
“order”: “asc”
}
}
]
}
{
“took” : 2,
“timed_out” : false,
“_shards” : {
“total” : 1,
“successful” : 1,
“skipped” : 0,
“failed” : 0
},
“hits” : {
“total” : {
“value” : 1,
“relation” : “eq”
},
“max_score” : null,
“hits” : [
{
“_index” : “first”,
“_type” : “chunsheng”,
“_id” : “1”,
“_score” : null,
“_source” : {
“name” : “春生”,
“age” : 18
},
“sort” : [
18
]
}
]
}
}
###### shoud
“select age,name where first where from=gu **or** age=18”
GET /first/_search
{
“query”: {
“bool”: {
“should”: [
{“match”: {
“from”: “gu”
}
},
{“match”: {
“age”: “18”}
}
]
}
},
“_source”: [“age”,“name”,“from”],
“sort”: [
{
“age”: {
“order”: “asc”
}
}
]
}
###### most\_not
“select age,name where first where from!=gu **and** age!=18”
GET /first/_search
{
“query”: {
“bool”: {
“must_not”: [
{“match”: {
“from”: “gu”
}
},
{“match”: {
“age”: “22”}
}
]
}
},
“_source”: [“age”,“name”,“from”],
“sort”: [
{
“age”: {
“order”: “asc”
}
}
]
}
###### filter 过滤查询
过滤条件的范围用 range 表示
* gt 表示大于
* gte 表示大于等于
* lt 表示小于
* lte 表示小于等于
“select age,name where first where from=gu **and** age>=18 and age<=20”
GET /first/_search
{
“query”: {
“bool”: {
“must”: [
{“match”: {
“from”: “gu”
}
}
],
“filter”: [
{“range”: {
“age”: {
“gte”: 18,
“lte”: 20
}
}}
]
}
},
“_source”: [“age”,“name”,“from”],
“sort”: [
{
“age”: {
“order”: “asc”
}
}
]
}
##### 2.7.7、短语检索【可用数组中检索关键字】
###### 模糊查找
GET /first/_search
{
“query”: {
“match”: {
“tags”: “暗 魔” #空格分开
}
}
}
结果
{
“took” : 1,
“timed_out” : false,
“_shards” : {
“total” : 1,
“successful” : 1,
“skipped” : 0,
“failed” : 0
},
“hits” : {
“total” : {
“value” : 2,
“relation” : “eq”
},
“max_score” : 1.0732633,
“hits” : [
{
“_index” : “first”,
“_type” : “chunsheng”,
“_id” : “1”,
“_score” : 1.0732633,
“_source” : {
“name” : “春生”,
“age” : 18,
“from” : “gu”,
“desc” : “念能力,学生,暗属性”,
“tags” : [
“能力者”,
“男”,
“暗”
]
}
},
{
“_index” : “first”,
“_type” : “chunsheng”,
“_id” : “3”,
“_score” : 0.9403362,
“_source” : {
“name” : “愚者”,
“age” : 22,
“from” : “gu”,
“desc” : “塔罗”,
“tags” : [
“魔法”,
“超能力”,
“塔罗”
]
}
}
]
}
}
###### 精准查找
GET /first/_search
{
“query”: {
“match_phrase”: {
“tags”: “魔法”
}
}
}
##### 2.7.8 、term查询
`term`查询是直接通过倒排索引指定的 词条,也就是精确查找。
term和match的区别:
* match是经过分析(analyer)的,也就是说,文档是先被分析器处理了,根据不同的分析器,分析出的结果也会不同,在会根据分词 结果进行匹配。
* term是不经过分词的,直接去倒排索引查找精确的值。
###### 2.7.8.1、字段是否存在:exist
GET /first/search
{
“query”: {
“exists”: {
“field”: "from"
}
}
}
###### 2.7.8.2、id查询:ids
ids 即对id查找
GET /first/_search
{
“query”: {
“ids”: {
“values”: [3, 1]
}
}
}
###### 2.7.8.3、前缀:prefix
通过前缀查找某个字段
GET /first/_search
{
“query”: {
“prefix”: {
“desc”: {
“value”: “道”
}
}
}
}
select * from first where match(desc,“^道”)
###### 2.7.8.4、分词匹配:term
前文最常见的根据分词查询
GET /first/_search
{
“query”: {
“terms”: {
“tags”: “长生”
}
}
}
select * from first where “长生” in tags
###### 2.7.8.5、多个分词匹配:terms
按照读个分词term匹配,它们是or的关系
GET /test-dsl-term-level/_search
{
“query”: {
“terms”: {
“programming_languages”: [“php”,“c++”]
}
}
}
###### 2.7.8.6、通配符:wildcard
GET /first/_search
{
“query”: {
“wildcard”: {
“name”: {
“value”: “儿*”,
“boost”: 1.0,
“rewrite”: “constant_score”
}
}
}
}
SELECT * from accesslog a WHERE match(host,‘儿’);
#### 模糊匹配:fuzzy
官方文档对模糊匹配:编辑距离是将一个术语转换为另一个术语所需的一个字符更改的次数。这些更改可以包括:
* 更改字符(box→ fox)
* 删除字符(black→ lack)
* 插入字符(sic→ sick)
* 转置两个相邻字符(act→ cat)
GET /first/_search
{
“query”: {
“fuzzy”: {
“name”: {
“value”: “shong”
}
}
}
}
#可以匹配sheng
#### 2.8、高亮显示
GET /first/_search
{
“query”: {
“match_phrase”: {
“tags”: “魔法”
}
},
“highlight”: {
“fields”: {
“tags”: {}
}
}
}
结果
{
“took” : 108,
“timed_out” : false,
“_shards” : {
“total” : 1,
“successful” : 1,
“skipped” : 0,
“failed” : 0
},
“hits” : {
“total” : {
“value” : 1,
“relation” : “eq”
},
“max_score” : 1.390936,
“hits” : [
{
“_index” : “first”,
“_type” : “chunsheng”,
“_id” : “3”,
“_score” : 1.390936,
“_source” : {
“name” : “愚者”,
“age” : 22,
“from” : “gu”,
“desc” : “塔罗”,
“tags” : [
“魔法”,
“超能力”,
“塔罗”
]
},
“highlight” : {
“tags” : [
“魔法” #this
]
}
}
]
}
}
#### 2.9 深度分页
es有10000条限制,因此要使用分页
es深度分页https://blog.youkuaiyun.com/weixin\_44799217/article/details/127100272
#### 3.0、正则语法
[正则表达式语法 |弹性搜索指南 [8.7] |弹性的 (elastic.co)]( )
#找到所有外网ip @&~(
* @运算符与 &运算符组合以创建 “一切除外”逻辑。例如:`@``&``~`
* 启用运算符。您可以使用 匹配数字范围。为 例:`<>``<>`
* 启用运算符。您可以使用 匹配数字范围。为 例:`<>``<>`
foo<1-100> # matches ‘foo1’, ‘foo2’ … ‘foo99’, ‘foo100’
foo<01-100> # matches ‘foo01’, ‘foo02’ … ‘foo99’, ‘foo100’
*
foo<1-100> # matches ‘foo1’, ‘foo2’ … ‘foo99’, ‘foo100’
foo<01-100> # matches ‘foo01’, ‘foo02’ … ‘foo99’, ‘foo100’
GET /_indexs-20230523*/_search
{
“query”: {
“regexp”:{
“realip”: “@&~((192\.168\.<0-255>\.<0-255>)|(10\…*)|(172\.<16-31>\.<0-255>\.<0-255>))”
}
}
}
#### 3,1、聚合
##### 3.1.1、单个聚合
GET /test-agg-cars/_search
{
“size” : 0,
“aggs” : {
“popular_colors” : {
“terms” : {
“field” : “color.keyword”
}
}
}
}
#原文
{
“_index” : “test-agg-cars”,
“_type” : “_doc”,
“_id” : “W8W6dYgBfCbtsoUlEOxh”,
“_score” : 1.0,
“_source” : {
“price” : 30000,
“color” : “green”,
“make” : “ford”,
“sold” : “2014-05-18”
}
},
#响应
{
“took” : 2,
“timed_out” : false,
“_shards” : {
“total” : 1,
“successful” : 1,
“skipped” : 0,
“failed” : 0
},
“hits” : {
“total” : {
“value” : 8,
“relation” : “eq”
},
“max_score” : null,
“hits” : [ ]
},
“aggregations” : {
“popular_colors” : {
“doc_count_error_upper_bound” : 0,
“sum_other_doc_count” : 0,
“buckets” : [
{
“key” : “red”,
“doc_count” : 4
},
{
“key” : “blue”,
“doc_count” : 2
},
{
“key” : “green”,
“doc_count” : 2
}
]
}
}
}
select color,count(color) from test-agg-cars group by color
##### 3.1.2、多个聚合
{
“aggs”: {
“actionflag_info”: {
“terms”: {
“script”: {
“inline”: “doc[‘host’].value +‘:’+ doc[‘post’].value”,
“lang”: “painless”
},
“size”: 1000
}
}
}
}
#相当于 select host+“:”+“post” from ttt group by host,post
{
“size” : 0,
“aggs” : {
“popular_colors” : {
“terms” : {
“field” : “color.keyword”
}
},
“make_by” : {
“terms” : {
“field” : “make.keyword”
}
}
}
}
select color,count(color) from test-agg-cars group by color
select make,count(make) from test-agg-cars group by make
“aggregations” : {
“popular_colors” : {
“doc_count_error_upper_bound” : 0,
“sum_other_doc_count” : 0,
“buckets” : [
{
“key” : “red”,
“doc_count” : 4
},
{
“key” : “blue”,
“doc_count” : 2
},
{
“key” : “green”,
“doc_count” : 2
}
]
},
“make_by” : {
“doc_count_error_upper_bound” : 0,
“sum_other_doc_count” : 0,
“buckets” : [
{
“key” : “honda”,
“doc_count” : 3
},
{
“key” : “ford”,
“doc_count” : 2
},
{
“key” : “toyota”,
“doc_count” : 2
},
{
“key” : “bmw”,
“doc_count” : 1
}
]
}
}
GET /test-agg-cars/_search
{
“size” : 0,
“aggs”: {
“colors”: {
“terms”: {
“field”: “color.keyword”
},
“aggs”: {
“avg_price”: {
“avg”: {
“field”: “price”
}
}
}
}
}
}
select color,count(color),avg(price) from test-agg-cars group by color
##### 3.1.3、聚合过滤
GET /test-agg-cars/_search
{
“size”: 0,
“aggs”: {
“make_by”: {
“filter”: { “term”: { “type”: “honda” } },
“aggs”: {
“avg_price”: { “avg”: { “field”: “price” } }
}
}
}
}
select make,count(make),avg(price) from test-agg-cars where make==“handa” group by make
##### 3.1.4、number分组聚合
GET /test-agg-cars/_search
{
“size”: 0,
“aggs”: {
“price_ranges”: {
“range”: {
“field”: “price”,
“ranges”: [
{ “to”: 20000 },
{ “from”: 20000, “to”: 40000 },
{ “from”: 40000 }
]
}
}
}
}
select count() from test-agg-cars where range<2000,
select count() from test-agg-cars where 4000>range>2000
select count() from test-agg-cars where range>4000
##### 3.1.5、对IP类型聚合:IP Range
GET /ip_addresses/_search
{
“size”: 10,
“aggs”: {
“ip_ranges”: {
“ip_range”: {
“field”: “ip”,
“ranges”: [
{ “to”: “10.0.0.5” },
{ “from”: “10.0.0.5” }
]
}
}
}
}
##### 3.1.6、**CIDR Mask分组**
此外还可以用CIDR Mask分组
GET /ip_addresses/_search
{
“size”: 0,
“aggs”: {
“ip_ranges”: {
“ip_range”: {
“field”: “ip”,
“ranges”: [
{ “mask”: “10.0.0.0/25” },
{ “mask”: “10.0.0.127/25” }
]
}
}
}
}
##### 3.1.7、对日期类型聚合:Date Range
专用于日期值的范围聚合
GET /test-agg-cars/_search
{
“size”: 0,
“aggs”: {
“range”: {
“date_range”: {
“field”: “sold”,
“format”: “yyyy-MM”,
“ranges”: [
{ “from”: “2014-01-01” },
{ “to”: “2014-12-31” }
]
}
}
}
}
#### 3.2、Metric聚合
##### 3.2.1、avg 平均值
POST /exams/_search?size=0
{
“aggs”: {
“avg_grade”: { “avg”: { “field”: “grade” } }
}
}
## 二、python es模块
### 1、插入
#### 1.1、单条插入 (消耗较大,不建议使用)
def create_data():
“”" 写入数据 “”"
for line in range(100):
es.index(index=‘second’, doc_type=‘doc’, body={‘title’: line})
#### 1.2、批量插入
#helper
,通过helper.bulk
来批量处理大量的数据。首先我们将所有的数据定义成字典形式
import time
from elasticsearch import Elasticsearch
from elasticsearch import helpers
def batch_data():
# t=es.search(index=‘second’)
# print(t)
“”" 批量写入数据 “”"
action = [{
“_index”: “second”,
“_type”: “doc”,
“_source”: {
“title”: i
}
} for i in range(100)]
print(action)
helpers.bulk(es, action)
## 链接:
ElasticSearch—查询es集群状态、分片、索引:https://blog.youkuaiyun.com/ss810540895/article/details/129279667?spm=1001.2101.3001.6650.1&utm\_medium=distribute.pc\_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7ERate-1-129279667-blog-126363246.235%5Ev35%5Epc\_relevant\_increate\_t0\_download\_v2\_base&depth\_1-utm\_source=distribute.pc\_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7ERate-1-129279667-blog-126363246.235%5Ev35%5Epc\_relevant\_increate\_t0\_download\_v2\_base&utm\_relevant\_index=2
ES数据库入门:https://blog.youkuaiyun.com/m0\_52165864/article/details/127047138
DSL查询语法:https://blog.youkuaiyun.com/K\_zibeng/article/details/126970214
ElasticSearch的DSL高级查询操作 :https://www.cnblogs.com/tanghaorong/p/16297788.html


**网上学习资料一大堆,但如果学到的知识不成体系,遇到问题时只是浅尝辄止,不再深入研究,那么很难做到真正的技术提升。**
**[需要这份系统化资料的朋友,可以戳这里获取](https://bbs.youkuaiyun.com/topics/618545628)**
**一个人可以走的很快,但一群人才能走的更远!不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人,都欢迎加入我们的的圈子(技术交流、学习资源、职场吐槽、大厂内推、面试辅导),让我们一起学习成长!**
"_type": "doc",
"_source": {
"title": i
}
} for i in range(100)]
print(action)
helpers.bulk(es, action)
链接:
ElasticSearch—查询es集群状态、分片、索引:https://blog.youkuaiyun.com/ss810540895/article/details/129279667?spm=1001.2101.3001.6650.1&utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7ERate-1-129279667-blog-126363246.235%5Ev35%5Epc_relevant_increate_t0_download_v2_base&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7ERate-1-129279667-blog-126363246.235%5Ev35%5Epc_relevant_increate_t0_download_v2_base&utm_relevant_index=2
ES数据库入门:https://blog.youkuaiyun.com/m0_52165864/article/details/127047138
DSL查询语法:https://blog.youkuaiyun.com/K_zibeng/article/details/126970214
ElasticSearch的DSL高级查询操作 :https://www.cnblogs.com/tanghaorong/p/16297788.html
[外链图片转存中…(img-4ldObZl9-1715794051388)]
[外链图片转存中…(img-UtBwjd5k-1715794051388)]
网上学习资料一大堆,但如果学到的知识不成体系,遇到问题时只是浅尝辄止,不再深入研究,那么很难做到真正的技术提升。
一个人可以走的很快,但一群人才能走的更远!不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人,都欢迎加入我们的的圈子(技术交流、学习资源、职场吐槽、大厂内推、面试辅导),让我们一起学习成长!