文章目录
单附件:
1.创建管道single_attachment
定义文本抽取管道
PUT /_ingest/pipeline/attachment
{
"description": "Extract attachment information",
"processors": [
{
"attachment": {
"field": "content",
"ignore_missing": true
}
},
{
"remove": {
"field": "content"
}
}
]
}
2.创建index
创建索引库
PUT /knowbase
{
"mappings": {
"properties": {
"esId": {
"type": "keyword"
},
"assortId": {
"type": "long"
},
"title":{
"type": "text",
"analyzer": "ik_max_word",
"copy_to": "all"
},
"articleContent":{
"type": "text",
"analyzer": "ik_max_word",
"copy_to": "all"
},
"viewNum": {
"type": "long"
},
"version": {
"type": "long"
},
"label": {
"type": "keyword"
},
"code": {
"type": "keyword"
},
"tenantId": {
"type": "keyword"
},
"releaseTime": {
"type": "date"
},
"createBy": {
"type": "keyword",
"index": false
},
"createTime": {
"type": "date"
},
"all":{
"type": "text",
"analyzer": "ik_max_word"
},
"attachment": {
"properties": {
"content":{
"type": "text",
"analyzer": "ik_smart",
"copy_to": "all"
}
}
}
}
}
}
3.索引数据
插入word文档
POST /knowbase/_doc?pipeline=attachment
{"name":"知识库文档2.0",
"type":"word",
"content":"文档base64编码"}
另:文件转base64编码 base64.guru。
4.查询
查询所有
GET /knowbase/_search
{
"query": {
"match_all": {}
}
}
多附件:
1.创建管道single_attachment
定义文本抽取管道-多附件
PUT /_ingest/pipeline/attachment
{
"description": "Extract attachment information",
"processors": [
{
"foreach": {
"field": "attachments",
"processor": {
"attachment": {
"field": "_ingest._value.content",
"target_field": "_ingest._value.attachment",
"remove_binary": true
}
}
}
}
]
}
需要注意的是,多附件的情况下,field 和 target_field 必须要写成 _ingest._value.*,否则不能匹配正确的字段。
从 es 8.0 版本开始,需要删除二进制文件内容,只需要为 attachment 添加一个属性 remove_binary 为 true,就不需要像上面那样单独写一个 remove 处理器了。
2.创建index
创建知识库索引 - 多附件
PUT /knowbase
{
"mappings": {
"properties": {
"esId": {
"type": "keyword"
},
"assortId": {
"type": "long"
},
"title": {
"type": "text",
"analyzer": "ik_max_word",
"copy_to": "all"
},
"articleContent": {
"type": "text",
"analyzer": "ik_max_word",
"copy_to": "all"
},
"viewNum": {
"type": "long"
},
"version": {
"type": "long"
},
"label": {
"type": "keyword"
},
"code": {
"type": "keyword"
},
"tenantId": {
"type": "keyword"
},
"releaseTime": {
"type": "date"
},
"createBy": {
"type": "keyword",
"index": false
},
"createTime": {
"type": "date"
},
"all": {
"type": "text",
"analyzer": "ik_max_word"
},
"attachments": {
"properties": {
"attachment": {
"properties": {
"content":{
"type": "text",
"analyzer": "ik_smart",
"copy_to": "all"
}
}}
}
}
}
}
}
3.索引数据
插入word文档 - 多附件
POST /knowbase/_doc?pipeline=attachment
{"name":["知识库文档2.0","test知识库文档2.0"],
"type":"word",
"attachments":[{"content":"文档base64编码"},{"content":"文档base64编码"}]
}
4.查询
查询所有
GET /knowbase/_search
{
"query": {
"match_all": {}
}
}
查询结果:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "knowbase",
"_id": "KYfN9oUBJjbl-1BDeeXL",
"_score": 1,
"_source": {
"name": [
"知识库文档2.0",
"test知识库文档2.0"
],
"attachments": [
{
"attachment": {
"date": "2023-01-28T02:12:00Z",
"content_type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"author": "tong shaoqing",
"modifier": "tong shaoqing",
"modified": "2023-01-28T02:12:00Z",
"language": "lt",
"content": """文档内容""",
"content_length": 30
}
},
{
"attachment": {
"date": "2023-01-28T02:12:00Z",
"content_type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"author": "tong shaoqing",
"modifier": "tong shaoqing",
"modified": "2023-01-28T02:12:00Z",
"language": "lt",
"content": """文档内容""",
"content_length": 30
}
}
],
"type": "word"
}
}
]
}
}
参考:
https://blog.youkuaiyun.com/catoop/article/details/124611260
https://www.jianshu.com/p/774e5ed120ba