通过Spark Rest 服务监控Spark任务执行情况

1、Rest服务

  Spark源为了方便用户对任务做监控,从1.4版本启用Rest服务,用户可以通过访问地址,得到application的运行状态。

  Spark的REST API返回的信息是JSON格式的,开发者们可以很方便地通过这个API来创建可视化的Spark监控工具。目前

  这个API支持正在运行的应用程序,也支持历史服务器。在请求URL都有/api/v1。比如,对于历史服务器来说,我们可以通过

  http://***:18080/api/v1 来获取一些信息,端口可以改;对于正在运行的Spark应用程序,我们可以通过 https://***/api/v1 

  来获取一些信息。

 

  主要用途: 通过rest服务,可以轻松对任务时长、stage等做监控,同时可以配合时间序列数据库,对集群各个任务做监控。

2、实例代码(Python)

  通过脚本先得到自己程序有关的application id,根据application id 去请求rest服务,获取结果集,进而对结果进行其他分析。

#!/usr/bin/env python
# -*- coding: utf-8 -*-
'''
    Created by zhangy on Aug 25, 2017
'''
import datetime
import json, urllib2
import os
import time  

12 if __name__ == '__main__':
    command = "yarn application -list |grep Noce |awk -F'\t' '{print $1}'"
    val = os.popen(command).read()
    appids = val.split("\n")
    for pid in appids:
        if pid.__eq__(""):continue
        url = "http://th04-znwg-sgi620-001:18088/api/v1/applications/" + pid
        req = urllib2.Request(url)
        res_data = urllib2.urlopen(req)
        res = res_data.read()
        jo = json.loads(res)
        dict1 = jo['attempts'][0]
        st = dict1['startTime']
        GMT_FORMAT = '%Y-%m-%dT%H:%M:%S.%fGMT'
        sti = datetime.datetime.strptime(st, GMT_FORMAT)
        startTime = time.mktime(sti.timetuple()) + 8 * 60 * 60
        nowTime = long(time.time())
        sub = nowTime - startTime
        if sub > 4 * 60 * 60:
            killCommand = "yarn application -kill " + pid
            res = os.popen(killCommand ).read() 
       cc = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(float(nowTime)))
       f = open("/home/noce1/run_noce/his/monitor/" + pid + ".txt", "a") 
       f.write(cc + " : " + "pid : " + "\n" + sub + " seconds") 
       f.write(res + "\n") 
         f.close()

测试实例中,只是对spark任务的时间做了监控,如果任务超过理想执行时长(4个小时),则终止任务,释放资源。

 结果:

例如:
http://132.12*****:18088/api/v1/applications/application_1502706935975_233268/

返回内容:json格式
{
  "id" : "application_1502706935975_233268",
  "name" : "FRT3_73",
  "attempts" : [ {
    "startTime" : "2017-09-04T01:29:53.986GMT",
    "endTime" : "2017-09-04T01:31:52.955GMT",
    "sparkUser" : "noce1",
    "completed" : true
  } ]
}

3、官方其他(2.1.0版本,http:**** :18080/api/v1/)

     官方给出了rest服务的其他地址接口,通过这些请求这些地址,可以获取不同的json结果集合。

EndpointMeaning
/applicationsA list of all applications. 
?status=[completed|running] list only applications in the chosen state.
?minDate=[date] earliest start date/time to list. 
?maxDate=[date] latest start date/time to list. 
?minEndDate=[date] earliest end date/time to list. 
?maxEndDate=[date] latest end date/time to list. 
?limit=[limit] limits the number of applications listed. 
Examples: 
?minDate=2015-02-10 
?minDate=2015-02-03T16:42:40.000GMT 
?maxDate=2015-02-11T20:41:30.000GMT 
?minEndDate=2015-02-12 
?minEndDate=2015-02-12T09:15:10.000GMT 
?maxEndDate=2015-02-14T16:30:45.000GMT 
?limit=10
/applications/[app-id]/jobsA list of all jobs for a given application. 
?status=[running|succeeded|failed|unknown] list only jobs in the specific state.
/applications/[app-id]/jobs/[job-id]Details for the given job.
/applications/[app-id]/stagesA list of all stages for a given application.
/applications/[app-id]/stages/[stage-id]A list of all attempts for the given stage.
/applications/[app-id]/stages/[stage-id]/[stage-attempt-id]Details for the given stage attempt.
/applications/[app-id]/stages/[stage-id]/[stage-attempt-id]/taskSummarySummary metrics of all tasks in the given stage attempt. 
?quantiles summarize the metrics with the given quantiles. 
Example: ?quantiles=0.01,0.5,0.99
/applications/[app-id]/stages/[stage-id]/[stage-attempt-id]/taskListA list of all tasks for the given stage attempt. 
?offset=[offset]&length=[len] list tasks in the given range. 
?sortBy=[runtime|-runtime] sort the tasks. 
Example: ?offset=10&length=50&sortBy=runtime
/applications/[app-id]/executorsA list of all active executors for the given application.
/applications/[app-id]/allexecutorsA list of all(active and dead) executors for the given application.
/applications/[app-id]/storage/rddA list of stored RDDs for the given application.
/applications/[app-id]/storage/rdd/[rdd-id]Details for the storage status of a given RDD.
/applications/[base-app-id]/logsDownload the event logs for all attempts of the given application as files within a zip file.
/applications/[base-app-id]/[attempt-id]/logsDownload the event logs for a specific application attempt as a zip file.
/applications/[app-id]/streaming/statisticsStatistics for the streaming context.
/applications/[app-id]/streaming/receiversA list of all streaming receivers.
/applications/[app-id]/streaming/receivers/[stream-id]Details of the given receiver.
/applications/[app-id]/streaming/batchesA list of all retained batches.
/applications/[app-id]/streaming/batches/[batch-id]Details of the given batch.
/applications/[app-id]/streaming/batches/[batch-id]/operationsA list of all output operations of the given batch.
/applications/[app-id]/streaming/batches/[batch-id]/operations/[outputOp-id]Details of the given operation and given batch.
/applications/[app-id]/environmentEnvironment details of the given application.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值