etcd集群部署自动化:Ansible/Terraform配置

etcd集群部署自动化:Ansible/Terraform配置

【免费下载链接】etcd Distributed reliable key-value store for the most critical data of a distributed system 【免费下载链接】etcd 项目地址: https://gitcode.com/GitHub_Trending/et/etcd

概述

在现代分布式系统中,etcd作为高可用的键值存储系统,已成为Kubernetes、云原生应用等关键基础设施的核心组件。手动部署etcd集群不仅耗时耗力,还容易出错。本文详细介绍如何使用Ansible和Terraform实现etcd集群的自动化部署,确保部署过程的可重复性、一致性和可靠性。

etcd集群架构基础

集群拓扑结构

etcd集群采用Raft共识算法,典型的部署架构包含3个或5个节点:

mermaid

关键配置参数

参数说明示例值
--name节点名称etcd-node1
--data-dir数据目录/var/lib/etcd
--listen-client-urls客户端监听地址http://0.0.0.0:2379
--advertise-client-urls客户端通告地址http://<node-ip>:2379
--listen-peer-urls节点间通信地址http://0.0.0.0:2380
--initial-cluster初始集群配置node1=http://ip1:2380,node2=http://ip2:2380
--initial-cluster-token集群令牌etcd-cluster-token

Ansible自动化部署方案

项目结构设计

etcd-ansible/
├── inventories/
│   ├── production/
│   │   └── hosts
│   └── staging/
│       └── hosts
├── group_vars/
│   ├── all.yml
│   └── etcd_cluster.yml
├── roles/
│   └── etcd/
│       ├── tasks/
│       │   ├── main.yml
│       │   ├── install.yml
│       │   ├── configure.yml
│       │   └── service.yml
│       ├── templates/
│       │   ├── etcd.service.j2
│       │   └── etcd.conf.j2
│       └── handlers/
│           └── main.yml
└── site.yml

Ansible Playbook实现

主配置文件 (group_vars/all.yml)
# etcd版本配置
etcd_version: "3.5.0"
etcd_download_url: "https://github.com/etcd-io/etcd/releases/download/v{{ etcd_version }}/etcd-v{{ etcd_version }}-linux-amd64.tar.gz"

# 系统配置
etcd_data_dir: "/var/lib/etcd"
etcd_user: "etcd"
etcd_group: "etcd"

# 网络配置
client_port: 2379
peer_port: 2380
集群配置 (group_vars/etcd_cluster.yml)
# 集群配置
etcd_cluster_name: "production-etcd-cluster"
etcd_cluster_token: "etcd-cluster-token-2024"

# 节点配置
etcd_nodes:
  - name: "etcd-node1"
    ip: "192.168.1.101"
    client_url: "http://192.168.1.101:2379"
    peer_url: "http://192.168.1.101:2380"
  - name: "etcd-node2" 
    ip: "192.168.1.102"
    client_url: "http://192.168.1.102:2379"
    peer_url: "http://192.168.1.102:2380"
  - name: "etcd-node3"
    ip: "192.168.1.103"
    client_url: "http://192.168.1.103:2379"
    peer_url: "http://192.168.1.103:2380"

# 生成初始集群字符串
etcd_initial_cluster: "{% for node in etcd_nodes %}{{ node.name }}={{ node.peer_url }}{% if not loop.last %},{% endif %}{% endfor %}"
安装任务 (roles/etcd/tasks/install.yml)
- name: 创建etcd用户和组
  user:
    name: "{{ etcd_user }}"
    group: "{{ etcd_group }}"
    system: yes
    create_home: no
    shell: /sbin/nologin

- name: 创建数据目录
  file:
    path: "{{ etcd_data_dir }}"
    state: directory
    owner: "{{ etcd_user }}"
    group: "{{ etcd_group }}"
    mode: '0755'

- name: 下载etcd二进制包
  get_url:
    url: "{{ etcd_download_url }}"
    dest: "/tmp/etcd-v{{ etcd_version }}-linux-amd64.tar.gz"
    checksum: "sha256:相应的SHA256校验和"

- name: 解压etcd二进制文件
  unarchive:
    src: "/tmp/etcd-v{{ etcd_version }}-linux-amd64.tar.gz"
    dest: "/tmp/"
    remote_src: yes

- name: 安装etcd和etcdctl到系统路径
  copy:
    src: "/tmp/etcd-v{{ etcd_version }}-linux-amd64/{{ item }}"
    dest: "/usr/local/bin/{{ item }}"
    owner: root
    group: root
    mode: '0755'
  loop:
    - etcd
    - etcdctl
配置任务 (roles/etcd/tasks/configure.yml)
- name: 生成etcd配置文件
  template:
    src: etcd.conf.j2
    dest: "/etc/etcd/etcd.conf"
    owner: root
    group: root
    mode: '0644'

- name: 生成systemd服务文件
  template:
    src: etcd.service.j2
    dest: "/etc/systemd/system/etcd.service"
    owner: root
    group: root
    mode: '0644'

- name: 重载systemd配置
  systemd:
    daemon_reload: yes
Systemd服务模板 (roles/etcd/templates/etcd.service.j2)
[Unit]
Description=etcd key-value store
Documentation=https://github.com/etcd-io/etcd
After=network.target

[Service]
Type=notify
User={{ etcd_user }}
Group={{ etcd_group }}
Environment=ETCD_DATA_DIR={{ etcd_data_dir }}
Environment=ETCD_NAME={{ inventory_hostname_short }}
Environment=ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380
Environment=ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379
Environment=ETCD_INITIAL_ADVERTISE_PEER_URLS=http://{{ hostvars[inventory_hostname].ansible_default_ipv4.address }}:2380
Environment=ETCD_ADVERTISE_CLIENT_URLS=http://{{ hostvars[inventory_hostname].ansible_default_ipv4.address }}:2379
Environment=ETCD_INITIAL_CLUSTER={{ etcd_initial_cluster }}
Environment=ETCD_INITIAL_CLUSTER_TOKEN={{ etcd_cluster_token }}
Environment=ETCD_INITIAL_CLUSTER_STATE=new

ExecStart=/usr/local/bin/etcd
Restart=always
RestartSec=5
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

部署执行流程

mermaid

Terraform基础设施即代码方案

Terraform项目结构

etcd-terraform/
├── main.tf
├── variables.tf
├── outputs.tf
├── modules/
│   └── etcd_instance/
│       ├── main.tf
│       ├── variables.tf
│       └── outputs.tf
└── scripts/
    └── etcd-setup.sh

Terraform主配置 (main.tf)

terraform {
  required_version = ">= 1.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 4.0"
    }
  }
}

provider "aws" {
  region = var.region
}

# 创建安全组
resource "aws_security_group" "etcd_cluster" {
  name_prefix = "etcd-cluster-"
  description = "Security group for etcd cluster"

  ingress {
    from_port   = 2379
    to_port     = 2379
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
    description = "etcd client communication"
  }

  ingress {
    from_port   = 2380
    to_port     = 2380
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
    description = "etcd peer communication"
  }

  ingress {
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
    description = "SSH access"
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

# 创建etcd集群实例
module "etcd_instances" {
  source = "./modules/etcd_instance"

  count         = var.cluster_size
  name          = "etcd-node-${count.index + 1}"
  instance_type = var.instance_type
  ami           = var.ami_id
  key_name      = var.key_name
  subnet_id     = element(var.subnet_ids, count.index)
  security_group_ids = [aws_security_group.etcd_cluster.id]
  
  # 节点配置
  node_name    = "etcd-node-${count.index + 1}"
  cluster_size = var.cluster_size
  cluster_token = var.cluster_token
}

# 创建负载均衡器
resource "aws_elb" "etcd_lb" {
  name               = "etcd-cluster-lb"
  subnets            = var.subnet_ids
  security_groups    = [aws_security_group.etcd_cluster.id]

  listener {
    instance_port     = 2379
    instance_protocol = "http"
    lb_port           = 2379
    lb_protocol       = "http"
  }

  health_check {
    healthy_threshold   = 2
    unhealthy_threshold = 2
    timeout             = 3
    target              = "HTTP:2379/health"
    interval            = 30
  }

  instances = module.etcd_instances[*].instance_id
}

etcd实例模块 (modules/etcd_instance/main.tf)

resource "aws_instance" "etcd_node" {
  ami                    = var.ami
  instance_type          = var.instance_type
  key_name               = var.key_name
  subnet_id              = var.subnet_id
  vpc_security_group_ids = var.security_group_ids

  tags = {
    Name = var.name
    Role = "etcd"
  }

  user_data = templatefile("${path.module}/user_data.sh", {
    node_name     = var.node_name
    cluster_size  = var.cluster_size
    cluster_token = var.cluster_token
    private_ip    = self.private_ip
  })

  root_block_device {
    volume_size = 50
    volume_type = "gp3"
  }
}

resource "aws_ebs_volume" "etcd_data" {
  availability_zone = aws_instance.etcd_node.availability_zone
  size              = 100
  type              = "gp3"

  tags = {
    Name = "${var.name}-data"
  }
}

resource "aws_volume_attachment" "etcd_data_att" {
  device_name = "/dev/sdf"
  volume_id   = aws_ebs_volume.etcd_data.id
  instance_id = aws_instance.etcd_node.id
}

用户数据脚本 (user_data.sh)

#!/bin/bash

# 安装必要的依赖
apt-get update
apt-get install -y curl tar

# 下载并安装etcd
ETCD_VERSION="3.5.0"
ETCD_URL="https://github.com/etcd-io/etcd/releases/download/v${ETCD_VERSION}/etcd-v${ETCD_VERSION}-linux-amd64.tar.gz"

curl -L ${ETCD_URL} -o /tmp/etcd.tar.gz
tar xzf /tmp/etcd.tar.gz -C /tmp/
cp /tmp/etcd-v${ETCD_VERSION}-linux-amd64/etcd /usr/local/bin/
cp /tmp/etcd-v${ETCD_VERSION}-linux-amd64/etcdctl /usr/local/bin/

# 创建etcd用户和组
groupadd etcd
useradd -r -g etcd -d /var/lib/etcd -s /sbin/nologin etcd

# 创建数据目录
mkdir -p /var/lib/etcd
chown etcd:etcd /var/lib/etcd

# 生成etcd配置文件
cat > /etc/etcd.conf <<EOF
ETCD_NAME=${node_name}
ETCD_DATA_DIR=/var/lib/etcd
ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380
ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379
ETCD_ADVERTISE_CLIENT_URLS=http://${private_ip}:2379
ETCD_INITIAL_ADVERTISE_PEER_URLS=http://${private_ip}:2380
ETCD_INITIAL_CLUSTER_TOKEN=${cluster_token}
ETCD_INITIAL_CLUSTER_STATE=new
EOF

# 生成初始集群配置
# 这里需要动态获取其他节点的IP地址来构建initial-cluster
# 在实际生产环境中,可以使用服务发现或配置管理工具

# 创建systemd服务
cat > /etc/systemd/system/etcd.service <<EOF
[Unit]
Description=etcd key-value store
Documentation=https://github.com/etcd-io/etcd
After=network.target

[Service]
Type=notify
User=etcd
Group=etcd
EnvironmentFile=/etc/etcd.conf
ExecStart=/usr/local/bin/etcd
Restart=always
RestartSec=5
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
EOF

# 启动etcd服务
systemctl daemon-reload
systemctl enable etcd
systemctl start etcd

自动化部署最佳实践

1. 集群健康检查

创建Ansible任务用于集群健康验证:

- name: 验证etcd集群健康状态
  shell: |
    etcdctl endpoint health \
      --endpoints={{ etcd_nodes | map(attribute='client_url') | join(',') }}
  register: etcd_health
  changed_when: false
  failed_when: "'healthy' not in etcd_health.stdout"

- name: 显示集群成员
  shell: |
    etcdctl member list \
      --endpoints={{ etcd_nodes | map(attribute='client_url') | join(',') }}
  register: etcd_members
  changed_when: false

- debug:
    msg: "{{ etcd_members.stdout_lines }}"

2. 备份与恢复自动化

- name: 创建etcd快照备份
  shell: |
    etcdctl snapshot save /backup/etcd-snapshot-$(date +%Y%m%d-%H%M%S).db \
      --endpoints={{ etcd_nodes.0.client_url }}
  when: inventory_hostname == etcd_nodes.0.name

- name: 验证快照完整性
  shell: |
    etcdctl snapshot status /backup/etcd-snapshot-latest.db
  register: snapshot_status
  changed_when: false

3. 监控与告警配置

集成Prometheus监控:

- name: 配置etcd监控指标
  template:
    src: etcd-metrics.j2
    dest: /etc/etcd/metrics.conf

- name: 重启etcd启用监控
  systemd:
    name: etcd
    state: restarted
  when: etcd_metrics_enabled

【免费下载链接】etcd Distributed reliable key-value store for the most critical data of a distributed system 【免费下载链接】etcd 项目地址: https://gitcode.com/GitHub_Trending/et/etcd

创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值