一、常规操作
1.1 建立连接
from elasticsearch import Elasticsearch
es = Elasticsearch([{"host": "主机名", "port": 端口}],
http_auth=('账号', '密码'))
1.2 在ES中插入数据
def add_one_index(body):
es.index(index="test_index", doc_type="_doc", body=body)
# 插入单条数据
add_one_index({'rate_type': 'report_user_rate', 'rate_key': '张三', 'rate_value': 0.23})
1.3 获取ES中的所有数据
def print_all():
# 获取所有数据
body = {
"query": {
"match_all": {}
}
}
#精准匹配
body = {
"query":{
"term":{
"rate_type":"report_user_rate"
}
}
}
#匹配查询
body = {
"query":{
"match":{
"sku_name":"卡通可爱包包汽车挂件萌琪琪钥匙扣小礼品创意手机 可妮兔蒙奇奇+白+红白铃"
}
}
}
#match_phrase 的查询不会被分词
body = {
"query":{
"match_phrase":{
"sku_name":"卡通可爱包包汽车挂件萌琪琪钥匙扣小礼品创意手机 可妮兔蒙奇奇+白+红白铃"
}
}
}
result = es.search(index="mingzi,可以插入时直接指定", doc_type="_doc", body=body)
rate_item_list = []
for item in result['hits']['hits']:
rate_item = item['_source']
rate_item_list.append(rate_item)
print(rate_item_list)
1.4 删除ES中的数据
body = {
"query": {
"match_all": {}
}
}
es.delete_by_query(index='test_index', body=body, doc_type='_doc')
#处理超时问题 request_timeout
es.delete_by_query(index='test_index', body=body, doc_type='_doc',request_timeout=600)
1.5 ES 配置查询
# print(es.cat.count()) # 集群内的文档总数
print(es.cat.count(index='索引名称')) # 指定索引文档总数
二、PYSPRK
三、参考资料
https://blog.youkuaiyun.com/qq_55752792/article/details/125430563
https://www.cnblogs.com/ExMan/p/11323984.html (es查询)
https://www.cnblogs.com/wangshouchang/p/8029825.html (相对比较好用)
https://www.cnblogs.com/maoruqiang/p/11509873.html#61–python%E8%BF%9E%E6%8E%A5-es (查看配置信息)
https://www.jianshu.com/p/3ccd902f0a03 (pyspark 数据写入es)
https://xcel.me/pyspark-write-to-elasticsearch/ (pyspark es 写入,包含包)
https://blog.youkuaiyun.com/dianxiang0791/article/details/101604611 (配置jar包)
https://www.cnblogs.com/Neeo/articles/10788573.html (批量写入数据)
https://cuiyonghua.com/2019/11/database/%E5%A4%A7%E6%95%B0%E6%8D%AE/es%E4%BB%8B%E7%BB%8D%E5%8F%8Apython%E6%93%8D%E4%BD%9Ces/ (感觉内容比较好用)