Elasticsearch
--分布式RESTful搜索引擎
官网地址:https://www.elastic.co/products/elasticsearch
Elasticsearch是为云构建的分布式RESTful搜索引擎。功能包括:
- 分布式和高可用性搜索引擎。
- 每个索引都使用可配置数量的分片进行完全分片。
- 每个分片都可以有一个或多个副本。
- 在任何副本分片上执行读取/搜索操作。
- 多租户。
- 支持多个索引。
- 索引级别配置(分片数,索引存储,......)。
- 各种API
- HTTP RESTful API
- Native Java API。
- 所有API都执行自动节点操作重新路由。
- 面向文档
- 无需前期架构定义。
- 可以定义模式以定制索引过程。
- 可靠,异步写入以实现长期持久性。
- (近)实时搜索。
- 建在Lucene之上
- 每个分片都是一个功能齐全的Lucene索引
- Lucene的所有功能都可以通过简单的配置/插件轻松暴露出来。
- 每个操作一致性
- 单文档级操作具有原子性,一致性,隔离性和持久性。
入门
首先,不要恐慌。获得Elasticsearch的全部内容需要5分钟。
要求
您需要安装最新版本的Java。有关更多信息,请参阅“ 设置”页面
安装
- 下载并解压缩Elasticsearch官方发行版。
bin/elasticsearch
在unix或bin\elasticsearch.bat
windows上运行。- 跑
curl -X GET http://localhost:9200/
。 - 启动更多服务器......
索引
让我们尝试索引一些类似于Twitter的信息。首先,让我们索引一些推文(twitter
索引将自动创建):
```
curl -XPUT 'http://localhost:9200/twitter/_doc/1?pretty' -H 'Content-Type: application/json' -d '
{
"user": "kimchy",
"post_date": "2009-11-15T13:12:00",
"message": "Trying out Elasticsearch, so far so good?"
}'
curl -XPUT 'http://localhost:9200/twitter/_doc/2?pretty' -H 'Content-Type: application/json' -d '
{
"user": "kimchy",
"post_date": "2009-11-15T14:12:12",
"message": "Another tweet, will it be indexed?"
}'
curl -XPUT 'http://localhost:9200/twitter/_doc/3?pretty' -H 'Content-Type: application/json' -d '
{
"user": "elastic",
"post_date": "2010-01-15T01:46:38",
"message": "Building the site, should be kewl"
}'
```
现在,让我们看看是否通过GETting添加了这些信息:
curl -XGET 'http://localhost:9200/twitter/_doc/1?pretty=true'
curl -XGET 'http://localhost:9200/twitter/_doc/2?pretty=true'
curl -XGET 'http://localhost:9200/twitter/_doc/3?pretty=true'
搜索
嗯搜索......,它不应该是弹性的吗?
让我们找到kimchy
发布的所有推文:
curl -XGET 'http://localhost:9200/twitter/_search?q=user:kimchy&pretty=true'
我们还可以使用Elasticsearch提供的JSON查询语言而不是查询字符串:
curl -XGET 'http://localhost:9200/twitter/_search?pretty=true' -H 'Content-Type: application/json' -d '
{
"query" : {
"match" : { "user": "kimchy" }
}
}'
让我们搜索所有文件(我们也应该看到推文elastic
):
curl -XGET 'http://localhost:9200/twitter/_search?pretty=true' -H 'Content-Type: application/json' -d '
{
"query" : {
"match_all" : {}
}
}'
我们还可以进行范围搜索(post_date
自动识别为日期)
curl -XGET 'http://localhost:9200/twitter/_search?pretty=true' -H 'Content-Type: application/json' -d '
{
"query" : {
"range" : {
"post_date" : { "from" : "2009-11-15T13:00:00", "to" : "2009-11-15T14:00:00" }
}
}
}'
还有更多选项可以执行搜索,毕竟它是搜索产品吗?所有熟悉的Lucene查询都可以通过JSON查询语言或查询解析器获得。
多租户 - 指数和类型
伙计,那个twitter索引可能会变大(在这种情况下,索引大小==估值)。让我们看看我们是否可以稍微改变我们的推特系统,以支持如此大量的数据。
Elasticsearch支持多个索引。在前面的示例中,我们使用了一个名为twitter
每个用户存储的推文的索引。
定义我们简单的推特系统的另一种方法是为每个用户提供不同的索引(注意,尽管每个索引都有一个开销)。这是这种情况下的索引卷曲:
curl -XPUT 'http://localhost:9200/kimchy/_doc/1?pretty' -H 'Content-Type: application/json' -d '
{
"user": "kimchy",
"post_date": "2009-11-15T13:12:00",
"message": "Trying out Elasticsearch, so far so good?"
}'
curl -XPUT 'http://localhost:9200/kimchy/_doc/2?pretty' -H 'Content-Type: application/json' -d '
{
"user": "kimchy",
"post_date": "2009-11-15T14:12:12",
"message": "Another tweet, will it be indexed?"
}'
以上将信息索引到kimchy
索引中。每个用户都将获得自己的特殊索引。
允许完全控制索引级别。例如,在上述情况下,我们希望从每个索引1个副本的默认5个分片更改为每个索引只有1个副本(==每个Twitter用户)的1个分片。以下是如何做到这一点(配置也可以是yaml):
curl -XPUT http://localhost:9200/another_user?pretty -H 'Content-Type: application/json' -d '
{
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 1
}
}'
搜索(和类似操作)是多索引感知的。这意味着我们可以轻松搜索多个
索引(推特用户),例如:
curl -XGET 'http://localhost:9200/kimchy,another_user/_search?pretty=true' -H 'Content-Type: application/json' -d '
{
"query" : {
"match_all" : {}
}
}'
或者在所有指数上:
curl -XGET 'http://localhost:9200/_search?pretty=true' -H 'Content-Type: application/json' -d '
{
"query" : {
"match_all" : {}
}
}'
{One liner teaser}:关于那个很酷的部分?您可以轻松搜索多个Twitter用户(索引),每个用户具有不同的提升级别(索引),使社交搜索变得更加简单(我朋友的结果排名高于我朋友的朋友的结果)。
分布式,高度可用
让我们面对现实吧,事情会失败......
Elasticsearch是一个高度可用的分布式搜索引擎。每个索引都分解为碎片,每个碎片可以有一个或多个副本。默认情况下,创建一个索引,每个分片有5个分片和1个副本(5/1)。可以使用许多拓扑,包括1/10(提高搜索性能)或20/1(提高索引性能,在映射中执行搜索会减少分片中的时间)。
为了使用Elasticsearch的分布式特性,只需启动更多节点并关闭节点即可。系统将继续为索引的最新数据提供请求(确保使用正确的http端口)。
更多信息
我们刚刚介绍了Elasticsearch的一小部分内容。有关更多信息,请参阅elastic.co网站。一般问题可以在弹性话语论坛上或在#elasticsearch的 Freenode 上的IRC上询问。Elasticsearch GitHub存储库仅用于错误报告和功能请求。
从Source构建
Elasticsearch使用Gradle作为其构建系统。
要创建分发,只需./gradlew assemble在克隆目录中运行该命令即可。
将build/distributions
在该项目的目录下创建每个项目的分发。
有关运行Elasticsearch测试套件的更多信息,请参阅TESTING文件。
从旧的Elasticsearch版本升级
为了确保从早期版本的Elasticsearch顺利升级过程,请参阅我们的升级文档以获取有关升级过程的更多详细信息。
通过Docker快速启动
```
docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" --restart always elasticsearch:6.5.4
```
通过Vagrant快速启动/Vagrantfile
# -*- mode: ruby -*-
# vi: set ft=ruby :
# This Vagrantfile exists to test packaging. Read more about its use in the
# vagrant section in TESTING.asciidoc.
# Licensed to Elasticsearch under one or more contributor
# license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright
# ownership. Elasticsearch licenses this file to you under
# the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
define_opts = {
autostart: false
}.freeze
Vagrant.configure(2) do |config|
config.vm.provider 'virtualbox' do |vbox|
# Give the box more memory and cpu because our tests are beasts!
vbox.memory = Integer(ENV['VAGRANT_MEMORY'] || 8192)
vbox.cpus = Integer(ENV['VAGRANT_CPUS'] || 4)
# see https://github.com/hashicorp/vagrant/issues/9524
vbox.customize ["modifyvm", :id, "--audio", "none"]
end
# Switch the default share for the project root from /vagrant to
# /elasticsearch because /vagrant is confusing when there is a project inside
# the elasticsearch project called vagrant....
config.vm.synced_folder '.', '/vagrant', disabled: true
config.vm.synced_folder '.', '/elasticsearch'
# Expose project directory. Note that VAGRANT_CWD may not be the same as Dir.pwd
PROJECT_DIR = ENV['VAGRANT_PROJECT_DIR'] || Dir.pwd
config.vm.synced_folder PROJECT_DIR, '/project'
'ubuntu-1404'.tap do |box|
config.vm.define box, define_opts do |config|
config.vm.box = 'elastic/ubuntu-14.04-x86_64'
deb_common config, box
end
end
'ubuntu-1604'.tap do |box|
config.vm.define box, define_opts do |config|
config.vm.box = 'elastic/ubuntu-16.04-x86_64'
deb_common config, box, extra: <<-SHELL
# Install Jayatana so we can work around it being present.
[ -f /usr/share/java/jayatanaag.jar ] || install jayatana
SHELL
end
end
'ubuntu-1804'.tap do |box|
config.vm.define box, define_opts do |config|
config.vm.box = 'elastic/ubuntu-18.04-x86_64'
deb_common config, box, extra: <<-SHELL
# Install Jayatana so we can work around it being present.
[ -f /usr/share/java/jayatanaag.jar ] || install jayatana
SHELL
end
end
# Wheezy's backports don't contain Openjdk 8 and the backflips
# required to get the sun jdk on there just aren't worth it. We have
# jessie and stretch for testing debian and it works fine.
'debian-8'.tap do |box|
config.vm.define box, define_opts do |config|
config.vm.box = 'elastic/debian-8-x86_64'
deb_common config, box
end
end
'debian-9'.tap do |box|
config.vm.define box, define_opts do |config|
config.vm.box = 'elastic/debian-9-x86_64'
deb_common config, box
end
end
'centos-6'.tap do |box|
config.vm.define box, define_opts do |config|
config.vm.box = 'elastic/centos-6-x86_64'
rpm_common config, box
end
end
'centos-7'.tap do |box|
config.vm.define box, define_opts do |config|
config.vm.box = 'elastic/centos-7-x86_64'
rpm_common config, box
end
end
'oel-6'.tap do |box|
config.vm.define box, define_opts do |config|
config.vm.box = 'elastic/oraclelinux-6-x86_64'
rpm_common config, box
end
end
'oel-7'.tap do |box|
config.vm.define box, define_opts do |config|
config.vm.box = 'elastic/oraclelinux-7-x86_64'
rpm_common config, box
end
end
'fedora-27'.tap do |box|
config.vm.define box, define_opts do |config|
config.vm.box = 'elastic/fedora-27-x86_64'
dnf_common config, box
end
end
'fedora-28'.tap do |box|
config.vm.define box, define_opts do |config|
config.vm.box = 'elastic/fedora-28-x86_64'
dnf_common config, box
end
end
'opensuse-42'.tap do |box|
config.vm.define box, define_opts do |config|
config.vm.box = 'elastic/opensuse-42-x86_64'
suse_common config, box
end
end
'sles-12'.tap do |box|
config.vm.define box, define_opts do |config|
config.vm.box = 'elastic/sles-12-x86_64'
sles_common config, box
end
end
windows_2012r2_box = ENV['VAGRANT_WINDOWS_2012R2_BOX']
if windows_2012r2_box && windows_2012r2_box.empty? == false
'windows-2012r2'.tap do |box|
config.vm.define box, define_opts do |config|
config.vm.box = windows_2012r2_box
windows_common config, box
end
end
end
windows_2016_box = ENV['VAGRANT_WINDOWS_2016_BOX']
if windows_2016_box && windows_2016_box.empty? == false
'windows-2016'.tap do |box|
config.vm.define box, define_opts do |config|
config.vm.box = windows_2016_box
windows_common config, box
end
end
end
end
def deb_common(config, name, extra: '')
# http://foo-o-rama.com/vagrant--stdin-is-not-a-tty--fix.html
config.vm.provision 'fix-no-tty', type: 'shell' do |s|
s.privileged = false
s.inline = "sudo sed -i '/tty/!s/mesg n/tty -s \\&\\& mesg n/' /root/.profile"
end
linux_common(
config,
name,
update_command: 'apt-get update',
update_tracking_file: '/var/cache/apt/archives/last_update',
install_command: 'apt-get install -y',
extra: extra
)
end
def rpm_common(config, name)
linux_common(
config,
name,
update_command: 'yum check-update',
update_tracking_file: '/var/cache/yum/last_update',
install_command: 'yum install -y'
)
end
def dnf_common(config, name)
# Autodetect doesn't work....
if Vagrant.has_plugin?('vagrant-cachier')
config.cache.auto_detect = false
config.cache.enable :generic, { :cache_dir => '/var/cache/dnf' }
end
linux_common(
config,
name,
update_command: 'dnf check-update',
update_tracking_file: '/var/cache/dnf/last_update',
install_command: 'dnf install -y',
install_command_retries: 5
)
end
def suse_common(config, name, extra: '')
linux_common(
config,
name,
update_command: 'zypper --non-interactive list-updates',
update_tracking_file: '/var/cache/zypp/packages/last_update',
install_command: 'zypper --non-interactive --quiet install --no-recommends',
extra: extra
)
end
def sles_common(config, name)
extra = <<-SHELL
zypper rr systemsmanagement_puppet puppetlabs-pc1
zypper --non-interactive install git-core
SHELL
suse_common config, name, extra: extra
end
# Configuration needed for all linux boxes
# @param config Vagrant's config object. Required.
# @param name [String] The box name. Required.
# @param update_command [String] The command used to update the package
# manager. Required. Think `apt-get update`.
# @param update_tracking_file [String] The location of the file tracking the
# last time the update command was run. Required. Should be in a place that
# is cached by vagrant-cachier.
# @param install_command [String] The command used to install a package.
# Required. Think `apt-get install #{package}`.
# @param install_command_retries [Integer] Number of times to retry
# a failed install command
# @param extra [String] Additional script to run before installing
# dependencies
#
def linux_common(config,
name,
update_command: 'required',
update_tracking_file: 'required',
install_command: 'required',
install_command_retries: 0,
extra: '')
raise ArgumentError, 'update_command is required' if update_command == 'required'
raise ArgumentError, 'update_tracking_file is required' if update_tracking_file == 'required'
raise ArgumentError, 'install_command is required' if install_command == 'required'
if Vagrant.has_plugin?('vagrant-cachier')
config.cache.scope = :box
end
config.vm.provision 'markerfile', type: 'shell', inline: <<-SHELL
touch /etc/is_vagrant_vm
touch /is_vagrant_vm # for consistency between linux and windows
SHELL
# This prevents leftovers from previous tests using the
# same VM from messing up the current test
config.vm.provision 'clean es installs in tmp', run: 'always', type: 'shell', inline: <<-SHELL
rm -rf /tmp/elasticsearch*
SHELL
sh_set_prompt config, name
sh_install_deps(
config,
update_command,
update_tracking_file,
install_command,
install_command_retries,
extra
)
end
# Sets up a consistent prompt for all users. Or tries to. The VM might
# contain overrides for root and vagrant but this attempts to work around
# them by re-source-ing the standard prompt file.
def sh_set_prompt(config, name)
config.vm.provision 'set prompt', type: 'shell', inline: <<-SHELL
cat \<\<PROMPT > /etc/profile.d/elasticsearch_prompt.sh
export PS1='#{name}:\\w$ '
PROMPT
grep 'source /etc/profile.d/elasticsearch_prompt.sh' ~/.bashrc |
cat \<\<SOURCE_PROMPT >> ~/.bashrc
# Replace the standard prompt with a consistent one
source /etc/profile.d/elasticsearch_prompt.sh
SOURCE_PROMPT
grep 'source /etc/profile.d/elasticsearch_prompt.sh' ~vagrant/.bashrc |
cat \<\<SOURCE_PROMPT >> ~vagrant/.bashrc
# Replace the standard prompt with a consistent one
source /etc/profile.d/elasticsearch_prompt.sh
SOURCE_PROMPT
SHELL
end
def sh_install_deps(config,
update_command,
update_tracking_file,
install_command,
install_command_retries,
extra)
config.vm.provision 'install dependencies', type: 'shell', inline: <<-SHELL
set -e
set -o pipefail
# Retry install command up to $2 times, if failed
retry_installcommand() {
n=0
while true; do
#{install_command} $1 && break
let n=n+1
if [ $n -ge $2 ]; then
echo "==> Exhausted retries to install $1"
return 1
fi
echo "==> Retrying installing $1, attempt $((n+1))"
# Add a small delay to increase chance of metalink providing updated list of mirrors
sleep 5
done
}
installed() {
command -v $1 2>&1 >/dev/null
}
install() {
# Only apt-get update if we haven't in the last day
if [ ! -f #{update_tracking_file} ] || [ "x$(find #{update_tracking_file} -mtime +0)" == "x#{update_tracking_file}" ]; then
echo "==> Updating repository"
#{update_command} || true
touch #{update_tracking_file}
fi
echo "==> Installing $1"
if [ #{install_command_retries} -eq 0 ]
then
#{install_command} $1
else
retry_installcommand $1 #{install_command_retries}
fi
}
ensure() {
installed $1 || install $1
}
#{extra}
installed java || {
echo "==> Java is not installed"
return 1
}
ensure tar
ensure curl
ensure unzip
ensure rsync
installed bats || {
# Bats lives in a git repository....
ensure git
echo "==> Installing bats"
git clone https://github.com/sstephenson/bats /tmp/bats
# Centos doesn't add /usr/local/bin to the path....
/tmp/bats/install.sh /usr
rm -rf /tmp/bats
}
cat \<\<VARS > /etc/profile.d/elasticsearch_vars.sh
export ZIP=/elasticsearch/distribution/zip/build/distributions
export TAR=/elasticsearch/distribution/tar/build/distributions
export RPM=/elasticsearch/distribution/rpm/build/distributions
export DEB=/elasticsearch/distribution/deb/build/distributions
export BATS=/project/build/bats
export BATS_UTILS=/project/build/packaging/bats/utils
export BATS_TESTS=/project/build/packaging/bats/tests
export PACKAGING_ARCHIVES=/project/build/packaging/archives
export PACKAGING_TESTS=/project/build/packaging/tests
VARS
cat \<\<SUDOERS_VARS > /etc/sudoers.d/elasticsearch_vars
Defaults env_keep += "ZIP"
Defaults env_keep += "TAR"
Defaults env_keep += "RPM"
Defaults env_keep += "DEB"
Defaults env_keep += "BATS"
Defaults env_keep += "BATS_UTILS"
Defaults env_keep += "BATS_TESTS"
Defaults env_keep += "PACKAGING_ARCHIVES"
Defaults env_keep += "PACKAGING_TESTS"
SUDOERS_VARS
chmod 0440 /etc/sudoers.d/elasticsearch_vars
SHELL
end
def windows_common(config, name)
config.vm.provision 'markerfile', type: 'shell', inline: <<-SHELL
$ErrorActionPreference = "Stop"
New-Item C:/is_vagrant_vm -ItemType file -Force | Out-Null
SHELL
config.vm.provision 'set prompt', type: 'shell', inline: <<-SHELL
$ErrorActionPreference = "Stop"
$ps_prompt = 'function Prompt { "#{name}:$($ExecutionContext.SessionState.Path.CurrentLocation)>" }'
$ps_prompt | Out-File $PsHome/Microsoft.PowerShell_profile.ps1
SHELL
config.vm.provision 'set env variables', type: 'shell', inline: <<-SHELL
$ErrorActionPreference = "Stop"
[Environment]::SetEnvironmentVariable("PACKAGING_ARCHIVES", "C:/project/build/packaging/archives", "Machine")
[Environment]::SetEnvironmentVariable("PACKAGING_TESTS", "C:/project/build/packaging/tests", "Machine")
SHELL
end