Prometheus实战教程 02 - Prometheus 配置详解

Prometheus 配置详解

配置概述

  • Prometheus 的配置通过命令行标志配置文件实现
    • 命令行标志:配置不可变的系统参数(如存储位置、数据保留量等)
    • 配置文件:定义与抓取相关的所有内容(作业和实例)以及要加载的规则文件
  • 配置可在运行时重新加载,通过发送SIGHUP信号或向/-/reload端点发送HTTP POST请求(需启用--web.enable-lifecycle标志)
  • 若新配置格式不正确,更改不会生效

命令行配置核心参数(Flags)

主要参数按功能分类如下:

  1. 基础配置参数

    • --config.file:配置文件路径,默认 prometheus.yml
    • --config.auto-reload-interval:配置自动重载间隔,默认 30s
    • --web.listen-address:UI、API 监听地址,默认 0.0.0.0:9090
    • --web.enable-lifecycle:
    • --version:显示应用版本
    • --h/--help:显示帮助信息
  2. 资源设置参数

    • --auto-gomaxprocs:自动设置 GOMAXPROCS 匹配 CPU 配额,默认 true
    • --auto-gomemlimit:自动设置 GOMEMLIMIT 匹配内存限制,默认 true
    • --auto-gomemlimit.ratio:内存限制比例,默认 0.9
  3. 存储相关参数

    • --storage.tsdb.path:服务器模式存储路径,默认 data/
    • --storage.tsdb.retention.time:样本保留时间,默认 15 天(服务器模式)
    • --storage.tsdb.retention.size:存储块最大字节数(需带单位)
    • --storage.agent.path:代理模式存储路径,默认 data-agent/
  4. 查询与规则参数

    • --query.timeout:查询超时时间,默认 2 分钟(服务器模式)
    • --query.max-concurrency:最大并发查询数,默认 20(服务器模式)
    • --rules.max-concurrent-evals:规则并发评估上限,默认 4(服务器模式)
    • --rules.alert.resend-delay:告警重发最小延迟,默认 1 分钟(服务器模式)
  5. 日志设置参数

    • --log.level:日志级别,默认 info(可选 debug、warn、error)
    • --log.format:日志格式,默认 logfmt(可选 json)
  6. 其他重要参数

    • --enable-feature:启用的特性列表(如 native-histograms 等)
    • --agent:以代理模式运行 Prometheus
    • --alertmanager.notification-queue-capacity:告警通知队列容量,默认 10000(服务器模式)

配置文件详解

  • 使用--config.file标志指定要加载的配置文件,文件采用YAML格式
  • 包含多种配置参数类型及占位符定义(如<boolean><duration><host>等)
  • 主要配置部分包括:
    • global:全局配置,为其他配置部分提供默认值,包括抓取间隔、超时时间、规则评估间隔等
    • runtime:配置Go垃圾收集器GOGC参数
    • rule_files:指定规则和警报的文件路径列表
    • scrape_config_files:指定抓取配置文件路径列表
    • scrape_configs:抓取配置列表,定义目标和抓取参数
    • alerting:与Alertmanager相关的设置
    • remote_write:远程写入功能的设置
    • otlp:OTLP接收器功能的设置
    • remote_read:远程读取功能的设置
    • storage:与存储相关的可重新加载设置
    • tracing:配置跟踪导出

官网案例配置

# my global config
global:
  scrape_interval: 15s
  evaluation_interval: 30s
  body_size_limit: 15MB
  sample_limit: 1500
  target_limit: 30
  label_limit: 30
  label_name_length_limit: 200
  label_value_length_limit: 200
  query_log_file: query.log
  scrape_failure_log_file: fail.log
  # scrape_timeout is set to the global default (10s).

  external_labels:
    monitor: codelab
    foo: bar

runtime:
  gogc: 42

rule_files:
  - "first.rules"
  - "my/*.rules"

remote_write:
  - url: http://remote1/push
    name: drop_expensive
    write_relabel_configs:
      - source_labels: [__name__]
        regex: expensive.*
        action: drop
    oauth2:
      client_id: "123"
      client_secret: "456"
      token_url: "http://remote1/auth"
      tls_config:
        cert_file: valid_cert_file
        key_file: valid_key_file

  - url: http://remote2/push
    protobuf_message: io.prometheus.write.v2.Request
    name: rw_tls
    tls_config:
      cert_file: valid_cert_file
      key_file: valid_key_file
    headers:
      name: value

otlp:
  promote_resource_attributes: ["k8s.cluster.name", "k8s.job.name", "k8s.namespace.name"]

remote_read:
  - url: http://remote1/read
    read_recent: true
    name: default
    enable_http2: false
  - url: http://remote3/read
    read_recent: false
    name: read_special
    required_matchers:
      job: special
    tls_config:
      cert_file: valid_cert_file
      key_file: valid_key_file

scrape_configs:
  - job_name: prometheus

    honor_labels: true
    # scrape_interval is defined by the configured global (15s).
    # scrape_timeout is defined by the global default (10s).

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    fallback_scrape_protocol: PrometheusText0.0.4

    scrape_failure_log_file: fail_prom.log
    file_sd_configs:
      - files:
          - foo/*.slow.json
          - foo/*.slow.yml
          - single/file.yml
        refresh_interval: 10m
      - files:
          - bar/*.yaml

    static_configs:
      - targets: ["localhost:9090", "localhost:9191"]
        labels:
          my: label
          your: label

    http_headers:
      foo:
        values: ["foobar"]
        secrets: ["bar", "foo"]
        files: ["valid_password_file"]

    relabel_configs:
      - source_labels: [job, __meta_dns_name]
        regex: (.*)some-[regex]
        target_label: job
        replacement: foo-${1}
        # action defaults to 'replace'
      - source_labels: [abc]
        target_label: cde
      - replacement: static
        target_label: abc
      - regex:
        replacement: static
        target_label: abc
      - source_labels: [foo]
        target_label: abc
        action: keepequal
      - source_labels: [foo]
        target_label: abc
        action: dropequal

    authorization:
      credentials_file: valid_token_file

    tls_config:
      min_version: TLS10

  - job_name: service-x

    basic_auth:
      username: admin_name
      password: "multiline\nmysecret\ntest"

    scrape_interval: 50s
    scrape_timeout: 5s
    scrape_protocols: ["PrometheusText0.0.4"]

    body_size_limit: 10MB
    sample_limit: 1000
    target_limit: 35
    label_limit: 35
    label_name_length_limit: 210
    label_value_length_limit: 210

    metrics_path: /my_path
    scheme: https

    dns_sd_configs:
      - refresh_interval: 15s
        names:
          - first.dns.address.domain.com
          - second.dns.address.domain.com
      - names:
          - first.dns.address.domain.com

    relabel_configs:
      - source_labels: [job]
        regex: (.*)some-[regex]
        action: drop
      - source_labels: [__address__]
        modulus: 8
        target_label: __tmp_hash
        action: hashmod
      - source_labels: [__tmp_hash]
        regex: 1
        action: keep
      - action: labelmap
        regex: 1
      - action: labeldrop
        regex: d
      - action: labelkeep
        regex: k

    metric_relabel_configs:
      - source_labels: [__name__]
        regex: expensive_metric.*
        action: drop

  - job_name: service-y

    consul_sd_configs:
      - server: "localhost:1234"
        token: mysecret
        path_prefix: /consul
        services: ["nginx", "cache", "mysql"]
        tags: ["canary", "v1"]
        node_meta:
          rack: "123"
        allow_stale: true
        scheme: https
        tls_config:
          ca_file: valid_ca_file
          cert_file: valid_cert_file
          key_file: valid_key_file
          insecure_skip_verify: false

    relabel_configs:
      - source_labels: [__meta_sd_consul_tags]
        separator: ","
        regex: label:([^=]+)=([^,]+)
        target_label: ${1}
        replacement: ${2}

  - job_name: service-z

    tls_config:
      cert_file: valid_cert_file
      key_file: valid_key_file

    authorization:
      credentials: mysecret

  - job_name: service-kubernetes

    kubernetes_sd_configs:
      - role: endpoints
        api_server: "https://localhost:1234"
        tls_config:
          cert_file: valid_cert_file
          key_file: valid_key_file

        basic_auth:
          username: "myusername"
          password: "mysecret"

  - job_name: service-kubernetes-namespaces

    kubernetes_sd_configs:
      - role: endpoints
        api_server: "https://localhost:1234"
        namespaces:
          names:
            - default

    basic_auth:
      username: "myusername"
      password_file: valid_password_file

  - job_name: service-kuma

    kuma_sd_configs:
      - server: http://kuma-control-plane.kuma-system.svc:5676
        client_id: main-prometheus

  - job_name: service-marathon
    marathon_sd_configs:
      - servers:
          - "https://marathon.example.com:443"

        auth_token: "mysecret"
        tls_config:
          cert_file: valid_cert_file
          key_file: valid_key_file

  - job_name: service-nomad
    nomad_sd_configs:
      - server: 'http://localhost:4646'

  - job_name: service-ec2
    ec2_sd_configs:
      - region: us-east-1
        access_key: access
        secret_key: mysecret
        profile: profile
        filters:
          - name: tag:environment
            values:
              - prod

          - name: tag:service
            values:
              - web
              - db

  - job_name: service-lightsail
    lightsail_sd_configs:
      - region: us-east-1
        access_key: access
        secret_key: mysecret
        profile: profile

  - job_name: service-azure
    azure_sd_configs:
      - environment: AzurePublicCloud
        authentication_method: OAuth
        subscription_id: 11AAAA11-A11A-111A-A111-1111A1111A11
        resource_group: my-resource-group
        tenant_id: BBBB222B-B2B2-2B22-B222-2BB2222BB2B2
        client_id: 333333CC-3C33-3333-CCC3-33C3CCCCC33C
        client_secret: mysecret
        port: 9100

  - job_name: service-nerve
    nerve_sd_configs:
      - servers:
          - localhost
        paths:
          - /monitoring

  - job_name: 0123service-xxx
    metrics_path: /metrics
    static_configs:
      - targets:
          - localhost:9090

  - job_name: badfederation
    honor_timestamps: false
    metrics_path: /federate
    static_configs:
      - targets:
          - localhost:9090

  - job_name: 測試
    metrics_path: /metrics
    static_configs:
      - targets:
          - localhost:9090

  - job_name: httpsd
    http_sd_configs:
      - url: "http://example.com/prometheus"

  - job_name: service-triton
    triton_sd_configs:
      - account: "testAccount"
        dns_suffix: "triton.example.com"
        endpoint: "triton.example.com"
        port: 9163
        refresh_interval: 1m
        version: 1
        tls_config:
          cert_file: valid_cert_file
          key_file: valid_key_file

  - job_name: digitalocean-droplets
    digitalocean_sd_configs:
      - authorization:
          credentials: abcdef

  - job_name: docker
    docker_sd_configs:
      - host: unix:///var/run/docker.sock

  - job_name: dockerswarm
    dockerswarm_sd_configs:
      - host: http://127.0.0.1:2375
        role: nodes

  - job_name: service-openstack
    openstack_sd_configs:
      - role: instance
        region: RegionOne
        port: 80
        refresh_interval: 1m
        tls_config:
          ca_file: valid_ca_file
          cert_file: valid_cert_file
          key_file: valid_key_file

  - job_name: service-puppetdb
    puppetdb_sd_configs:
      - url: https://puppetserver/
        query: 'resources { type = "Package" and title = "httpd" }'
        include_parameters: true
        port: 80
        refresh_interval: 1m
        tls_config:
          ca_file: valid_ca_file
          cert_file: valid_cert_file
          key_file: valid_key_file

  - job_name: hetzner
    relabel_configs:
      - action: uppercase
        source_labels: [instance]
        target_label: instance
    hetzner_sd_configs:
      - role: hcloud
        authorization:
          credentials: abcdef
      - role: robot
        basic_auth:
          username: abcdef
          password: abcdef

  - job_name: service-eureka
    eureka_sd_configs:
      - server: "http://eureka.example.com:8761/eureka"

  - job_name: ovhcloud
    ovhcloud_sd_configs:
      - service: vps
        endpoint: ovh-eu
        application_key: testAppKey
        application_secret: testAppSecret
        consumer_key: testConsumerKey
        refresh_interval: 1m
      - service: dedicated_server
        endpoint: ovh-eu
        application_key: testAppKey
        application_secret: testAppSecret
        consumer_key: testConsumerKey
        refresh_interval: 1m

  - job_name: scaleway
    scaleway_sd_configs:
      - role: instance
        project_id: 11111111-1111-1111-1111-111111111112
        access_key: SCWXXXXXXXXXXXXXXXXX
        secret_key: 11111111-1111-1111-1111-111111111111
      - role: baremetal
        project_id: 11111111-1111-1111-1111-111111111112
        access_key: SCWXXXXXXXXXXXXXXXXX
        secret_key: 11111111-1111-1111-1111-111111111111

  - job_name: linode-instances
    linode_sd_configs:
      - authorization:
          credentials: abcdef

  - job_name: stackit-servers
    stackit_sd_configs:
      - project: 11111111-1111-1111-1111-111111111111
        authorization:
          credentials: abcdef

  - job_name: uyuni
    uyuni_sd_configs:
      - server: https://localhost:1234
        username: gopher
        password: hole

  - job_name: ionos
    ionos_sd_configs:
      - datacenter_id: 8feda53f-15f0-447f-badf-ebe32dad2fc0
        authorization:
          credentials: abcdef

  - job_name: vultr
    vultr_sd_configs:
      - authorization:
          credentials: abcdef

alerting:
  alertmanagers:
    - scheme: https
      static_configs:
        - targets:
            - "1.2.3.4:9093"
            - "1.2.3.5:9093"
            - "1.2.3.6:9093"

storage:
  tsdb:
    out_of_order_time_window: 30m

tracing:
  endpoint: "localhost:4317"
  client_type: "grpc"
  headers:
    foo: "bar"
  timeout: 5s
  compression: "gzip"
  tls_config:
    cert_file: valid_cert_file
    key_file: valid_key_file
    insecure_skip_verify: true
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值