Elasticsearch--分布式RESTful搜索引擎

Elasticsearch

                            --分布式RESTful搜索引擎

 

官网地址:https://www.elastic.co/products/elasticsearch

Elasticsearch是为云构建的分布式RESTful搜索引擎。功能包括:

  • 分布式和高可用性搜索引擎。
    • 每个索引都使用可配置数量的分片进行完全分片。
    • 每个分片都可以有一个或多个副本。
    • 在任何副本分片上执行读取/搜索操作。
  • 多租户。
    • 支持多个索引。
    • 索引级别配置(分片数,索引存储,......)。
  • 各种API
    • HTTP RESTful API
    • Native Java API。
    • 所有API都执行自动节点操作重新路由。
  • 面向文档
    • 无需前期架构定义。
    • 可以定义模式以定制索引过程。
  • 可靠,异步写入以实现长期持久性。
  • (近)实时搜索。
  • 建在Lucene之上
    • 每个分片都是一个功能齐全的Lucene索引
    • Lucene的所有功能都可以通过简单的配置/插件轻松暴露出来。
  • 每个操作一致性
    • 单文档级操作具有原子性,一致性,隔离性和持久性。

 

入门

首先,不要恐慌。获得Elasticsearch的全部内容需要5分钟。

 

要求

您需要安装最新版本的Java。有关更多信息,请参阅“ 设置”页面

 

安装

 

  • 下载并解压缩Elasticsearch官方发行版。
  • bin/elasticsearch在unix或bin\elasticsearch.batwindows上运行。
  • curl -X GET http://localhost:9200/
  • 启动更多服务器......

索引

让我们尝试索引一些类似于Twitter的信息。首先,让我们索引一些推文(twitter索引将自动创建):

```
curl -XPUT 'http://localhost:9200/twitter/_doc/1?pretty' -H 'Content-Type: application/json' -d '
{
    "user": "kimchy",
    "post_date": "2009-11-15T13:12:00",
    "message": "Trying out Elasticsearch, so far so good?"
}'

curl -XPUT 'http://localhost:9200/twitter/_doc/2?pretty' -H 'Content-Type: application/json' -d '
{
    "user": "kimchy",
    "post_date": "2009-11-15T14:12:12",
    "message": "Another tweet, will it be indexed?"
}'

curl -XPUT 'http://localhost:9200/twitter/_doc/3?pretty' -H 'Content-Type: application/json' -d '
{
    "user": "elastic",
    "post_date": "2010-01-15T01:46:38",
    "message": "Building the site, should be kewl"
}'

```

现在,让我们看看是否通过GETting添加了这些信息:

curl -XGET 'http://localhost:9200/twitter/_doc/1?pretty=true'
curl -XGET 'http://localhost:9200/twitter/_doc/2?pretty=true'
curl -XGET 'http://localhost:9200/twitter/_doc/3?pretty=true'

 

搜索

嗯搜索......,它不应该是弹性的吗?
让我们找到kimchy发布的所有推文:

curl -XGET 'http://localhost:9200/twitter/_search?q=user:kimchy&pretty=true'

我们还可以使用Elasticsearch提供的JSON查询语言而不是查询字符串:

curl -XGET 'http://localhost:9200/twitter/_search?pretty=true' -H 'Content-Type: application/json' -d '
{
    "query" : {
        "match" : { "user": "kimchy" }
    }
}'

让我们搜索所有文件(我们也应该看到推文elastic):

curl -XGET 'http://localhost:9200/twitter/_search?pretty=true' -H 'Content-Type: application/json' -d '
{
    "query" : {
        "match_all" : {}
    }
}'

我们还可以进行范围搜索(post_date自动识别为日期)

curl -XGET 'http://localhost:9200/twitter/_search?pretty=true' -H 'Content-Type: application/json' -d '
{
    "query" : {
        "range" : {
            "post_date" : { "from" : "2009-11-15T13:00:00", "to" : "2009-11-15T14:00:00" }
        }
    }
}'

还有更多选项可以执行搜索,毕竟它是搜索产品吗?所有熟悉的Lucene查询都可以通过JSON查询语言或查询解析器获得。

 

多租户 - 指数和类型

伙计,那个twitter索引可能会变大(在这种情况下,索引大小==估值)。让我们看看我们是否可以稍微改变我们的推特系统,以支持如此大量的数据。

Elasticsearch支持多个索引。在前面的示例中,我们使用了一个名为twitter每个用户存储的推文的索引。

定义我们简单的推特系统的另一种方法是为每个用户提供不同的索引(注意,尽管每个索引都有一个开销)。这是这种情况下的索引卷曲:

curl -XPUT 'http://localhost:9200/kimchy/_doc/1?pretty' -H 'Content-Type: application/json' -d '
{
    "user": "kimchy",
    "post_date": "2009-11-15T13:12:00",
    "message": "Trying out Elasticsearch, so far so good?"
}'

curl -XPUT 'http://localhost:9200/kimchy/_doc/2?pretty' -H 'Content-Type: application/json' -d '
{
    "user": "kimchy",
    "post_date": "2009-11-15T14:12:12",
    "message": "Another tweet, will it be indexed?"
}'

以上将信息索引到kimchy索引中。每个用户都将获得自己的特殊索引。

允许完全控制索引级别。例如,在上述情况下,我们希望从每个索引1个副本的默认5个分片更改为每个索引只有1个副本(==每个Twitter用户)的1个分片。以下是如何做到这一点(配置也可以是yaml):

curl -XPUT http://localhost:9200/another_user?pretty -H 'Content-Type: application/json' -d '
{
    "index" : {
        "number_of_shards" : 1,
        "number_of_replicas" : 1
    }
}'

搜索(和类似操作)是多索引感知的。这意味着我们可以轻松搜索多个
索引(推特用户),例如:

curl -XGET 'http://localhost:9200/kimchy,another_user/_search?pretty=true' -H 'Content-Type: application/json' -d '
{
    "query" : {
        "match_all" : {}
    }
}'

或者在所有指数上:

curl -XGET 'http://localhost:9200/_search?pretty=true' -H 'Content-Type: application/json' -d '
{
    "query" : {
        "match_all" : {}
    }
}'

{One liner teaser}:关于那个很酷的部分?您可以轻松搜索多个Twitter用户(索引),每个用户具有不同的提升级别(索引),使社交搜索变得更加简单(我朋友的结果排名高于我朋友的朋友的结果)。

 

分布式,高度可用

让我们面对现实吧,事情会失败......

Elasticsearch是一个高度可用的分布式搜索引擎。每个索引都分解为碎片,每个碎片可以有一个或多个副本。默认情况下,创建一个索引,每个分片有5个分片和1个副本(5/1)。可以使用许多拓扑,包括1/10(提高搜索性能)或20/1(提高索引性能,在映射中执行搜索会减少分片中的时间)。

为了使用Elasticsearch的分布式特性,只需启动更多节点并关闭节点即可。系统将继续为索引的最新数据提供请求(确保使用正确的http端口)。

 

更多信息

我们刚刚介绍了Elasticsearch的一小部分内容。有关更多信息,请参阅elastic.co网站。一般问题可以在弹性话语论坛上或在#elasticsearch的 Freenode 上的IRC上询问。Elasticsearch GitHub存储库仅用于错误报告和功能请求。

 

从Source构建

Elasticsearch使用Gradle作为其构建系统。

要创建分发,只需./gradlew assemble在克隆目录中运行该命令即可。

build/distributions在该项目的目录下创建每个项目的分发。

有关运行Elasticsearch测试套件的更多信息,请参阅TESTING文件。

 

从旧的Elasticsearch版本升级

为了确保从早期版本的Elasticsearch顺利升级过程,请参阅我们的升级文档以获取有关升级过程的更多详细信息。

 

通过Docker快速启动

```

docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" --restart always elasticsearch:6.5.4

```

 

通过Vagrant快速启动/Vagrantfile

# -*- mode: ruby -*-
# vi: set ft=ruby :

# This Vagrantfile exists to test packaging. Read more about its use in the
# vagrant section in TESTING.asciidoc.

# Licensed to Elasticsearch under one or more contributor
# license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright
# ownership. Elasticsearch licenses this file to you under
# the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

define_opts = {
  autostart: false
}.freeze

Vagrant.configure(2) do |config|

  config.vm.provider 'virtualbox' do |vbox|
    # Give the box more memory and cpu because our tests are beasts!
    vbox.memory = Integer(ENV['VAGRANT_MEMORY'] || 8192)
    vbox.cpus = Integer(ENV['VAGRANT_CPUS'] || 4)

    # see https://github.com/hashicorp/vagrant/issues/9524
    vbox.customize ["modifyvm", :id, "--audio", "none"]
  end

  # Switch the default share for the project root from /vagrant to
  # /elasticsearch because /vagrant is confusing when there is a project inside
  # the elasticsearch project called vagrant....
  config.vm.synced_folder '.', '/vagrant', disabled: true
  config.vm.synced_folder '.', '/elasticsearch'

  # Expose project directory. Note that VAGRANT_CWD may not be the same as Dir.pwd
  PROJECT_DIR = ENV['VAGRANT_PROJECT_DIR'] || Dir.pwd
  config.vm.synced_folder PROJECT_DIR, '/project'

  'ubuntu-1404'.tap do |box|
    config.vm.define box, define_opts do |config|
      config.vm.box = 'elastic/ubuntu-14.04-x86_64'
      deb_common config, box
    end
  end
  'ubuntu-1604'.tap do |box|
    config.vm.define box, define_opts do |config|
      config.vm.box = 'elastic/ubuntu-16.04-x86_64'
      deb_common config, box, extra: <<-SHELL
        # Install Jayatana so we can work around it being present.
        [ -f /usr/share/java/jayatanaag.jar ] || install jayatana
      SHELL
    end
  end
  'ubuntu-1804'.tap do |box|
    config.vm.define box, define_opts do |config|
      config.vm.box = 'elastic/ubuntu-18.04-x86_64'
      deb_common config, box, extra: <<-SHELL
       # Install Jayatana so we can work around it being present.
       [ -f /usr/share/java/jayatanaag.jar ] || install jayatana
      SHELL
    end
  end
  # Wheezy's backports don't contain Openjdk 8 and the backflips
  # required to get the sun jdk on there just aren't worth it. We have
  # jessie and stretch for testing debian and it works fine.
  'debian-8'.tap do |box|
    config.vm.define box, define_opts do |config|
      config.vm.box = 'elastic/debian-8-x86_64'
      deb_common config, box
    end
  end
  'debian-9'.tap do |box|
    config.vm.define box, define_opts do |config|
      config.vm.box = 'elastic/debian-9-x86_64'
      deb_common config, box
    end
  end
  'centos-6'.tap do |box|
    config.vm.define box, define_opts do |config|
      config.vm.box = 'elastic/centos-6-x86_64'
      rpm_common config, box
    end
  end
  'centos-7'.tap do |box|
    config.vm.define box, define_opts do |config|
      config.vm.box = 'elastic/centos-7-x86_64'
      rpm_common config, box
    end
  end
  'oel-6'.tap do |box|
    config.vm.define box, define_opts do |config|
      config.vm.box = 'elastic/oraclelinux-6-x86_64'
      rpm_common config, box
    end
  end
  'oel-7'.tap do |box|
    config.vm.define box, define_opts do |config|
      config.vm.box = 'elastic/oraclelinux-7-x86_64'
      rpm_common config, box
    end
  end
  'fedora-27'.tap do |box|
    config.vm.define box, define_opts do |config|
      config.vm.box = 'elastic/fedora-27-x86_64'
      dnf_common config, box
    end
  end
  'fedora-28'.tap do |box|
    config.vm.define box, define_opts do |config|
      config.vm.box = 'elastic/fedora-28-x86_64'
      dnf_common config, box
    end
  end
  'opensuse-42'.tap do |box|
    config.vm.define box, define_opts do |config|
      config.vm.box = 'elastic/opensuse-42-x86_64'
      suse_common config, box
    end
  end
  'sles-12'.tap do |box|
    config.vm.define box, define_opts do |config|
      config.vm.box = 'elastic/sles-12-x86_64'
      sles_common config, box
    end
  end

  windows_2012r2_box = ENV['VAGRANT_WINDOWS_2012R2_BOX']
  if windows_2012r2_box && windows_2012r2_box.empty? == false
    'windows-2012r2'.tap do |box|
      config.vm.define box, define_opts do |config|
        config.vm.box = windows_2012r2_box
        windows_common config, box
      end
    end
  end

  windows_2016_box = ENV['VAGRANT_WINDOWS_2016_BOX']
  if windows_2016_box && windows_2016_box.empty? == false
    'windows-2016'.tap do |box|
      config.vm.define box, define_opts do |config|
        config.vm.box = windows_2016_box
        windows_common config, box
      end
    end
  end
end

def deb_common(config, name, extra: '')
  # http://foo-o-rama.com/vagrant--stdin-is-not-a-tty--fix.html
  config.vm.provision 'fix-no-tty', type: 'shell' do |s|
      s.privileged = false
      s.inline = "sudo sed -i '/tty/!s/mesg n/tty -s \\&\\& mesg n/' /root/.profile"
  end
  linux_common(
    config,
    name,
    update_command: 'apt-get update',
    update_tracking_file: '/var/cache/apt/archives/last_update',
    install_command: 'apt-get install -y',
    extra: extra
  )
end

def rpm_common(config, name)
  linux_common(
    config,
    name,
    update_command: 'yum check-update',
    update_tracking_file: '/var/cache/yum/last_update',
    install_command: 'yum install -y'
  )
end

def dnf_common(config, name)
  # Autodetect doesn't work....
  if Vagrant.has_plugin?('vagrant-cachier')
    config.cache.auto_detect = false
    config.cache.enable :generic, { :cache_dir => '/var/cache/dnf' }
  end
  linux_common(
    config,
    name,
    update_command: 'dnf check-update',
    update_tracking_file: '/var/cache/dnf/last_update',
    install_command: 'dnf install -y',
    install_command_retries: 5
  )
end

def suse_common(config, name, extra: '')
  linux_common(
    config,
    name,
    update_command: 'zypper --non-interactive list-updates',
    update_tracking_file: '/var/cache/zypp/packages/last_update',
    install_command: 'zypper --non-interactive --quiet install --no-recommends',
    extra: extra
  )
end

def sles_common(config, name)
  extra = <<-SHELL
    zypper rr systemsmanagement_puppet puppetlabs-pc1
    zypper --non-interactive install git-core
  SHELL
  suse_common config, name, extra: extra
end

# Configuration needed for all linux boxes
# @param config Vagrant's config object. Required.
# @param name [String] The box name. Required.
# @param update_command [String] The command used to update the package
#   manager. Required. Think `apt-get update`.
# @param update_tracking_file [String] The location of the file tracking the
#   last time the update command was run. Required. Should be in a place that
#   is cached by vagrant-cachier.
# @param install_command [String] The command used to install a package.
#   Required. Think `apt-get install #{package}`.
# @param install_command_retries [Integer] Number of times to retry
#   a failed install command
# @param extra [String] Additional script to run before installing
#   dependencies
#
def linux_common(config,
                 name,
                 update_command: 'required',
                 update_tracking_file: 'required',
                 install_command: 'required',
                 install_command_retries: 0,
                 extra: '')

  raise ArgumentError, 'update_command is required' if update_command == 'required'
  raise ArgumentError, 'update_tracking_file is required' if update_tracking_file == 'required'
  raise ArgumentError, 'install_command is required' if install_command == 'required'

  if Vagrant.has_plugin?('vagrant-cachier')
    config.cache.scope = :box
  end

  config.vm.provision 'markerfile', type: 'shell', inline: <<-SHELL
    touch /etc/is_vagrant_vm
    touch /is_vagrant_vm # for consistency between linux and windows
  SHELL

  # This prevents leftovers from previous tests using the
  # same VM from messing up the current test
  config.vm.provision 'clean es installs in tmp', run: 'always', type: 'shell', inline: <<-SHELL
    rm -rf /tmp/elasticsearch*
  SHELL

  sh_set_prompt config, name
  sh_install_deps(
    config,
    update_command,
    update_tracking_file,
    install_command,
    install_command_retries,
    extra
  )
end

# Sets up a consistent prompt for all users. Or tries to. The VM might
# contain overrides for root and vagrant but this attempts to work around
# them by re-source-ing the standard prompt file.
def sh_set_prompt(config, name)
  config.vm.provision 'set prompt', type: 'shell', inline: <<-SHELL
      cat \<\<PROMPT > /etc/profile.d/elasticsearch_prompt.sh
export PS1='#{name}:\\w$ '
PROMPT
      grep 'source /etc/profile.d/elasticsearch_prompt.sh' ~/.bashrc |
        cat \<\<SOURCE_PROMPT >> ~/.bashrc
# Replace the standard prompt with a consistent one
source /etc/profile.d/elasticsearch_prompt.sh
SOURCE_PROMPT
      grep 'source /etc/profile.d/elasticsearch_prompt.sh' ~vagrant/.bashrc |
        cat \<\<SOURCE_PROMPT >> ~vagrant/.bashrc
# Replace the standard prompt with a consistent one
source /etc/profile.d/elasticsearch_prompt.sh
SOURCE_PROMPT
  SHELL
end

def sh_install_deps(config,
                    update_command,
                    update_tracking_file,
                    install_command,
                    install_command_retries,
                    extra)
  config.vm.provision 'install dependencies', type: 'shell', inline:  <<-SHELL
    set -e
    set -o pipefail
    # Retry install command up to $2 times, if failed
    retry_installcommand() {
      n=0
      while true; do
        #{install_command} $1 && break
        let n=n+1
        if [ $n -ge $2 ]; then
          echo "==> Exhausted retries to install $1"
          return 1
        fi
        echo "==> Retrying installing $1, attempt $((n+1))"
        # Add a small delay to increase chance of metalink providing updated list of mirrors
        sleep 5
      done
    }
    installed() {
      command -v $1 2>&1 >/dev/null
    }
    install() {
      # Only apt-get update if we haven't in the last day
      if [ ! -f #{update_tracking_file} ] || [ "x$(find #{update_tracking_file} -mtime +0)" == "x#{update_tracking_file}" ]; then
        echo "==> Updating repository"
        #{update_command} || true
        touch #{update_tracking_file}
      fi
      echo "==> Installing $1"
      if [ #{install_command_retries} -eq 0 ]
      then
        #{install_command} $1
      else
        retry_installcommand $1 #{install_command_retries}
      fi
    }
    ensure() {
      installed $1 || install $1
    }
    #{extra}
    installed java || {
      echo "==> Java is not installed"
      return 1
    }
    ensure tar
    ensure curl
    ensure unzip
    ensure rsync
    installed bats || {
      # Bats lives in a git repository....
      ensure git
      echo "==> Installing bats"
      git clone https://github.com/sstephenson/bats /tmp/bats
      # Centos doesn't add /usr/local/bin to the path....
      /tmp/bats/install.sh /usr
      rm -rf /tmp/bats
    }
    cat \<\<VARS > /etc/profile.d/elasticsearch_vars.sh
export ZIP=/elasticsearch/distribution/zip/build/distributions
export TAR=/elasticsearch/distribution/tar/build/distributions
export RPM=/elasticsearch/distribution/rpm/build/distributions
export DEB=/elasticsearch/distribution/deb/build/distributions
export BATS=/project/build/bats
export BATS_UTILS=/project/build/packaging/bats/utils
export BATS_TESTS=/project/build/packaging/bats/tests
export PACKAGING_ARCHIVES=/project/build/packaging/archives
export PACKAGING_TESTS=/project/build/packaging/tests
VARS
    cat \<\<SUDOERS_VARS > /etc/sudoers.d/elasticsearch_vars
Defaults   env_keep += "ZIP"
Defaults   env_keep += "TAR"
Defaults   env_keep += "RPM"
Defaults   env_keep += "DEB"
Defaults   env_keep += "BATS"
Defaults   env_keep += "BATS_UTILS"
Defaults   env_keep += "BATS_TESTS"
Defaults   env_keep += "PACKAGING_ARCHIVES"
Defaults   env_keep += "PACKAGING_TESTS"
SUDOERS_VARS
    chmod 0440 /etc/sudoers.d/elasticsearch_vars
  SHELL
end

def windows_common(config, name)
  config.vm.provision 'markerfile', type: 'shell', inline: <<-SHELL
    $ErrorActionPreference = "Stop"
    New-Item C:/is_vagrant_vm -ItemType file -Force | Out-Null
  SHELL

  config.vm.provision 'set prompt', type: 'shell', inline: <<-SHELL
    $ErrorActionPreference = "Stop"
    $ps_prompt = 'function Prompt { "#{name}:$($ExecutionContext.SessionState.Path.CurrentLocation)>" }'
    $ps_prompt | Out-File $PsHome/Microsoft.PowerShell_profile.ps1
  SHELL

  config.vm.provision 'set env variables', type: 'shell', inline: <<-SHELL
    $ErrorActionPreference = "Stop"
    [Environment]::SetEnvironmentVariable("PACKAGING_ARCHIVES", "C:/project/build/packaging/archives", "Machine")
    [Environment]::SetEnvironmentVariable("PACKAGING_TESTS", "C:/project/build/packaging/tests", "Machine")
  SHELL
end

 

转载来源:https://github.com/YunWisdom/elasticsearch

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值