Flink监控及调优

本文介绍了Flink的监控及调优,包括History Server的配置和使用,REST API的概述和开发,以及常用优化策略。重点讲解了如何配置HistoryServer以查询已完成作业的统计信息,并探讨了Flink的资源管理、并行度调整和数据倾斜问题。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Flink监控及调优

History Server
Hadoop MapReduce
Spark
Flink

start/stop-xxx.sh
看一下这些脚本的写法
shell对于bigdata有用吗? lower

配置:
historyserver.web.address: 0.0.0.0
historyserver.web.port: 8082
historyserver.archive.fs.refresh-interval: 10000

jobmanager.archive.fs.dir: hdfs://hadoop000:8020/completed-jobs-pk/
historyserver.archive.fs.dir: hdfs://hadoop000:8020/completed-jobs-pk/

启动:./historyserver.sh start

History Server

Flink has a history server that can be used to query the statistics of completed jobs after the corresponding Flink cluster has been shut down.

Furthermore, it exposes a REST API that accepts HTTP requests and responds with JSON data.

Overview

The HistoryServer allows you to query the status and statistics of completed jobs that have been archived by a JobManager.

After you have configured the HistoryServer and JobManager, you start and stop the HistoryServer via its corresponding startup script:

# Start or stop the HistoryServer
bin/historyserver.sh (start|start-foreground|stop)

By default, this server binds to localhost and listens at port 8082.

Currently, you can only run it as a standalone process.

Configuration

The configuration keys jobmanager.archive.fs.dir and historyserver.archive.fs.refresh-interval need to be adjusted for archiving and displaying archived jobs.

JobManager

The archiving of completed jobs happens on the JobManager, which uploads the archived job information to a file system directory. You can configure the directory to archive completed jobs in flink-conf.yaml by setting a directory via jobmanager.archive.fs.dir.

# Directory to upload completed job information
jobmanager.archive.fs.dir: hdfs:///completed-jobs

HistoryServer

The HistoryServer can be configured to monitor a comma-separated list of directories in via historyserver.archive.fs.dir. The configured directories are regularly polled for new archives; the polling interval can be configured via historyserver.archive.fs.refresh-interval.

# Monitor the following directories for completed jobs
historyserver.archive.fs.dir: hdfs:///completed-jobs

# Refresh every 10 seconds
historyserver.archive.fs.refresh-interval: 10000

The contained archives are downloaded and cached in the local filesystem. The local directory for this is configured via historyserver.web.tmpdir.

Check out the configuration page for a complete list of configuration options.

Available Requests

Below is a list of available requests, with a sample JSON response. All requests are of the sample form http://hostname:8082/jobs, below we list only the path part of the URLs.

Values in angle brackets are variables, for example http://hostname:port/jobs/<jobid>/exceptions will have to requested for example as http://hostname:port/jobs/7684be6004e4e955c2a558a9bc463f65/exceptions.

  • /config
  • /jobs/overview
  • /jobs/<jobid>
  • /jobs/<jobid>/vertices
  • /jobs/<jobid>/config
  • /jobs/<jobid>/exceptions
  • /jobs/<jobid>/accumulators
  • /jobs/<jobid>/vertices/<vertexid>
  • /jobs/<jobid>/vertices/<vertexid>/subtasktimes
  • /jobs/<jobid>/vertices/<vertexid>/taskmanagers
  • /jobs/<jobid>/vertices/<vertexid>/accumulators
  • /jobs/<jobid>/vertices/<vertexid>/subtasks/accumulators
  • /jobs/<jobid>/vertices/<vertexid>/subtasks/<subtasknum>
  • /jobs/<jobid>/vertices/<vertexid>/subtasks/<subtasknum>/attempts/<attempt>
  • /jobs/<jobid>/vertices/<vertexid>/subtasks/<subtasknum>/attempts/<attempt>/accumulators
  • /jobs/<jobid>/plan

REST API

Flink has a monitoring API that can be used to query status and statistics of running jobs, as well as recent completed jobs. This monitoring API is used by Flink’s own dashboard, but is designed to be used also by custom monitoring tools.

The monitoring API is a REST-ful API that accepts HTTP requests and responds with JSON data.

Overview

The monitoring API is backed by a web server that runs as part of the Dispatcher. By default, this server listens at post 8081, which can be configured in flink-conf.yaml via rest.port. Note that the monitoring API web server and the web dashboard web server are currently the same and thus run together at the same port. They respond to different HTTP URLs, though.

In the case of multiple Dispatchers (for high availability), each Dispatcher will run its own instance of the monitoring API, which offers information about completed and running job while that Dispatcher was elected the cluster leader.

Developing

The REST API backend is in the flink-runtime project. The core class is org.apache.flink.runtime.webmonitor.WebMonitorEndpoint, which sets up the server and the request routing.

We use Netty and the Netty Router library to handle REST requests and translate URLs. This choice was made because this combination has lightweight dependencies, and the performance of Netty HTTP is very good.

To add new requests, one needs to

  • add a new MessageHeaders class which serves as an interface for the new request,
  • add a new AbstractRestHandler class which handles the request according to the added MessageHeaders class,
  • add the handler to org.apache.flink.runtime.webmonitor.WebMonitorEndpoint#initializeHandlers().

A good example is the org.apache.flink.runtime.rest.handler.job.JobExceptionsHandler that uses the org.apache.flink.runtime.rest.messages.JobExceptionsHeaders.

API

The REST API is versioned, with specific versions being queryable by prefixing the url with the version prefix. Prefixes are always of the form v[version_number]. For example, to access version 1 of /foo/bar one would query /v1/foo/bar.

If no version is specified Flink will default to the oldest version supporting the request.

Querying unsupported/non-existing versions will return a 404 error.

There exist several async operations among these APIs, e.g. trigger savepoint, rescale a job. They would return a triggerid to identify the operation you just POST and then you need to use that triggerid to query for the status of the operation.

详情如下:V1

Metrics(指标)

Flink exposes a metric system that allows gathering and exposing metrics to external systems.

Flink公开了一个度量系统,该度量系统允许收集度量并将其暴露给外部系统。

详细文档如下:

文档

常用优化

Flink中常用的优化策略

  • 资源
  • 并行度
    默认是1 适当的调整:好几种 ==> 项目实战
  • 数据倾斜
    100task 98-99跑完了 1-2很慢 ==> 能跑完 、 跑不完
    • group by: 二次聚合
      random_key + random
      key - random
    • join on xxx=xxx
      repartition-repartition strategy 大大
      broadcast-forward strategy 大小

链接:Big Data Zone

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值