In InfluxDB node_id and shard_id

In InfluxDB, both node_id and shard_id have specific roles, particularly in the context of distributed or clustered setups. Here's a detailed explanation:


1. node_id:

  • The node_id refers to a specific node in an InfluxDB cluster.
  • InfluxDB clusters typically consist of multiple nodes to distribute the workload and provide fault tolerance.
  • Each node in the cluster is assigned a unique node_id.
  • Responsibilities of a node include:
    • Storing and managing a subset of the data.
    • Handling client requests (writes and queries).
    • Communicating with other nodes to ensure consistency and replication.
Example Scenario:

If you see node_id=6 in the log, it means that node 6 in the cluster is handling (or failing to handle) a specific operation. If this node is unavailable or experiencing issues, it could lead to data write failures or delays.


2. shard_id:

  • A shard in InfluxDB is a time-based partition of data within a database.
  • shard_id uniquely identifies a shard in the database. Shards are further divided into series keys, which help organize the data for efficient querying and storage.
  • Shards are created based on the retention policy and the time range of the data.
    • For example, if a retention policy defines a shard duration of 1 week, then a new shard will be created for each week of data.
  • Each shard is assigned a unique ID (shard_id) for identification.
Shard Responsibilities:
  • Store data for a specific time range.
  • Handle queries targeting its time range.
  • Participate in data replication if the database is in a clustered setup.
Example Scenario:

If you see shard_id=1951 in the log, it refers to a specific shard responsible for storing data for a particular time range. If there are issues with this shard (e.g., it’s corrupted or unavailable), operations targeting it may fail.


Relationship Between node_id and shard_id:

  • In a clustered setup:
    • A shard can be replicated across multiple nodes for fault tolerance.
    • Each node is responsible for storing and maintaining certain shards, depending on the cluster configuration.
    • The node_id in the log indicates which node encountered an issue, and the shard_id indicates the specific shard affected.
  • Example:
    If node_id=6 and shard_id=1951, it means that node 6 has a problem managing shard 1951.

Common Issues Related to node_id and shard_id:

  1. Node Unavailability:

    • If a node is down or unreachable, it may result in hinted handoff queues filling up and failing writes.
  2. Shard Corruption or Overload:

    • If the shard identified by shard_id is corrupted, write or query operations may fail.
    • High write volumes targeting a single shard can cause performance bottlenecks.
  3. Cluster Imbalance:

    • If shards are unevenly distributed among nodes, certain nodes might get overloaded.

How to Troubleshoot and Manage Issues:

  1. Check Node Health:

    • Use the influx CLI or monitoring tools to check the status of all nodes in the cluster.
    • Ensure the node with the reported node_id is running and reachable.
  2. Inspect Shard Details:

    • Use the SHOW SHARDS command in InfluxDB to list all shards, their shard_id, time ranges, and owners (nodes responsible for them).
    
    

    sql

    SHOW SHARDS;

  3. Balance the Cluster:

    • Redistribute shards if one node is overloaded.
    • Use tools or features like shard rebalancing if supported.
  4. Repair or Rebuild Shards:

    • If a shard is corrupted, you may need to recreate it from backups or remove it if the data is non-critical.
  5. Check Retention Policy:

    • Verify that your retention policies and shard durations are appropriately configured to avoid excessive shard creation.

Summary:

  • node_id identifies a specific node in the InfluxDB cluster.
  • shard_id identifies a specific time-based data partition within the database.
  • Together, they help pinpoint where issues are occurring in the system, particularly in clustered or distributed environments.
# influxd --help WARN[0000]log.go:228 gosnowflake.(*defaultLogger).Warn DBUS_SESSION_BUS_ADDRESS envvar looks to be not set, this can lead to runaway dbus-daemon processes. To avoid this, set envvar DBUS_SESSION_BUS_ADDRESS=$XDG_RUNTIME_DIR/bus (if it exists) or DBUS_SESSION_BUS_ADDRESS=/dev/null. Start up the daemon configured with flags/env vars/config file. The order of precedence for config options are as follows (1 highest, 3 lowest): 1. flags 2. env vars 3. config file A config file can be provided via the INFLUXD_CONFIG_PATH env var. If a file is not provided via an env var, influxd will look in the current directory for a config.{json|toml|yaml|yml} file. If one does not exist, then it will continue unchanged. Usage: influxd [flags] influxd [command] Available Commands: downgrade Downgrade metadata schema used by influxd to match the expectations of an older release help Help about any command inspect Commands for inspecting on-disk database data recovery Commands used to recover / regenerate operator access to the DB run Start the influxd server upgrade Upgrade a 1.x version of InfluxDB version Print the influxd server version Flags: --assets-path string override default assets by serving from a specific directory (developer mode) --bolt-path string path to boltdb database (default "/root/.influxdbv2/influxd.bolt") --e2e-testing add /debug/flush endpoint to clear stores; used for end-to-end tests --engine-path string path to persistent engine files (default "/root/.influxdbv2/engine") --feature-flags stringToString feature flag overrides (default []) --flux-log-enabled enables detailed logging for flux queries --hardening-enabled enable hardening options (disallow private IPs within flux and templates HTTP requests; disable file URLs in templates) -h, --help help for influxd --http-bind-address string bind address for the REST HTTP API (default ":8086") --http-idle-timeout duration max duration the server should keep established connections alive while waiting for new requests. Set to 0 for no timeout (default 3m0s) --http-read-header-timeout duration max duration the server should spend trying to read HTTP headers for new requests. Set to 0 for no timeout (default 10s) --http-read-timeout duration max duration the server should spend trying to read the entirety of new requests. Set to 0 for no timeout --http-write-timeout duration max duration the server should spend on processing+responding to requests. Set to 0 for no timeout --influxql-max-select-buckets int The maximum number of group by time bucket a SELECT can create. A value of zero will max the maximum number of buckets unlimited. --influxql-max-select-point int The maximum number of points a SELECT can process. A value of 0 will make the maximum point count unlimited. This will only be checked every second so queries will not be aborted immediately when hitting the limit. --influxql-max-select-series int The maximum number of series a SELECT can run. A value of 0 will make the maximum series count unlimited. --instance-id string add an instance id for replications to prevent collisions and allow querying by edge node --log-level Log-Level supported log levels are debug, info, and error (default info) --metrics-disabled Don't expose metrics over HTTP at /metrics --no-tasks disables the task scheduler --overwrite-pid-file overwrite PID file if it already exists instead of exiting --pid-file string write process ID to a file --pprof-disabled Don't expose debugging information over HTTP at /debug/pprof --query-concurrency int32 the number of queries that are allowed to execute concurrently. Set to 0 to allow an unlimited number of concurrent queries (default 1024) --query-initial-memory-bytes int the initial number of bytes allocated for a query when it is started. If this is unset, then query-memory-bytes will be used --query-max-memory-bytes int the maximum amount of memory used for queries. Can only be set when query-concurrency is limited. If this is unset, then this number is query-concurrency * query-memory-bytes --query-memory-bytes int maximum number of bytes a query is allowed to use at any given time. This must be greater or equal to query-initial-memory-bytes --query-queue-size int32 the number of queries that are allowed to be awaiting execution before new queries are rejected. Must be > 0 if query-concurrency is not unlimited (default 1024) --reporting-disabled disable sending telemetry data to https://telemetry.influxdata.com every 8 hours --secret-store string data store for secrets (bolt or vault) (default "bolt") --session-length int ttl in minutes for newly created sessions (default 60) --session-renew-disabled disables automatically extending session ttl on request --sqlite-path string path to sqlite database. if not set, sqlite database will be stored in the bolt-path directory as "influxd.sqlite". --storage-cache-max-memory-size Size The maximum size a shard's cache can reach before it starts rejecting writes. (default 1.0 GiB) --storage-cache-snapshot-memory-size Size The size at which the engine will snapshot the cache and write it to a TSM file, freeing up memory. (default 25 MiB) --storage-cache-snapshot-write-cold-duration Duration The length of time at which the engine will snapshot the cache and write it to a new TSM file if the shard hasn't received writes or deletes. (default 10m0s) --storage-compact-full-write-cold-duration Duration The duration at which the engine will compact all TSM files in a shard if it hasn't received a write or delete. (default 4h0m0s) --storage-compact-throughput-burst Size The rate limit in bytes per second that we will allow TSM compactions to write to disk. (default 48 MiB) --storage-max-concurrent-compactions int The maximum number of concurrent full and level compactions that can run at one time. A value of 0 results in 50% of runtime.GOMAXPROCS(0) used at runtime. Any number greater than 0 limits compactions to that value. This setting does not apply to cache snapshotting. --storage-max-index-log-file-size Size The threshold, in bytes, when an index write-ahead log file will compact into an index file. Lower sizes will cause log files to be compacted more quickly and result in lower heap usage at the expense of write throughput. (default 1.0 MiB) --storage-no-validate-field-size Skip field-size validation on incoming writes. --storage-retention-check-interval Duration The interval of time when retention policy enforcement checks run. (default 30m0s) --storage-series-file-max-concurrent-snapshot-compactions int The maximum number of concurrent snapshot compactions that can be running at one time across all series partitions in a database. --storage-series-id-set-cache-size int The size of the internal cache used in the TSI index to store previously calculated series results. --storage-shard-precreator-advance-period Duration The default period ahead of the endtime of a shard group that its successor group is created. (default 30m0s) --storage-shard-precreator-check-interval Duration The interval of time when the check to pre-create new shards runs. (default 10m0s) --storage-tsm-use-madv-willneed Controls whether we hint to the kernel that we intend to page in mmap'd sections of TSM files. --storage-validate-keys Validates incoming writes to ensure keys only have valid unicode characters. --storage-wal-flush-on-shutdown Flushes and clears the WAL on shutdown --storage-wal-fsync-delay Duration The amount of time that a write will wait before fsyncing. A duration greater than 0 can be used to batch up multiple fsync calls. This is useful for slower disks or when WAL write contention is seen. (default 0s) --storage-wal-max-concurrent-writes int The max number of writes that will attempt to write to the WAL at a time. (default <nprocs> * 2) --storage-wal-max-write-delay storage-wal-max-concurrent-writes The max amount of time a write will wait when the WAL already has storage-wal-max-concurrent-writes active writes. Set to 0 to disable the timeout. (default 10m0s) --storage-write-timeout duration The max amount of time the engine will spend completing a write request before cancelling with a timeout. (default 10s) --store string backing store for REST resources (disk or memory) (default "disk") --strong-passwords enable password strength enforcement --template-file-urls-disabled disable template file URLs --testing-always-allow-setup ensures the /api/v2/setup endpoint always returns true to allow onboarding --tls-cert string TLS certificate for HTTPs --tls-key string TLS key for HTTPs --tls-min-version string Minimum accepted TLS version (default "1.2") --tls-strict-ciphers Restrict accept ciphers to: ECDHE_ECDSA_WITH_AES_128_GCM_SHA256, ECDHE_RSA_WITH_AES_128_GCM_SHA256, ECDHE_ECDSA_WITH_AES_256_GCM_SHA384, ECDHE_RSA_WITH_AES_256_GCM_SHA384, ECDHE_ECDSA_WITH_CHACHA20_POLY1305, ECDHE_RSA_WITH_CHACHA20_POLY1305 --tracing-type string supported tracing types are log, jaeger --ui-disabled Disable the InfluxDB UI --vault-addr string address of the Vault server expressed as a URL and port, for example: https://127.0.0.1:8200/. --vault-cacert string path to a PEM-encoded CA certificate file on the local disk. This file is used to verify the Vault server's SSL certificate. This environment variable takes precedence over VAULT_CAPATH. --vault-capath string path to a directory of PEM-encoded CA certificate files on the local disk. These certificates are used to verify the Vault server's SSL certificate. --vault-client-cert string path to a PEM-encoded client certificate on the local disk. This file is used for TLS communication with the Vault server. --vault-client-key string path to an unencrypted, PEM-encoded private key on disk which corresponds to the matching client certificate. --vault-client-timeout duration timeout variable. The default value is 60s. --vault-max-retries int maximum number of retries when a 5xx error code is encountered. The default is 2, for three total attempts. Set this to 0 or less to disable retrying. --vault-skip-verify do not verify Vault's presented certificate before communicating with it. Setting this variable is not recommended and voids Vault's security model. --vault-tls-server-name string name to use as the SNI host when connecting via TLS. --vault-token string vault authentication token Use "influxd [command] --help" for more information about a command. 结合以上参数,提供一份完整的/etc/default/influxdb2配置文件,用于降低influxdbv2 CPU 的占用率。
11-12
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值