Hadoop YARN - Introduction to the web services REST API’s

Hadoop YARN - Introduction to the web services REST API’s

Overview

The Hadoop YARN web service REST APIs are a set of URI resources that give access to the cluster, nodes, applications, and application historical information. The URI resources are grouped into APIs based on the type of information returned. Some URI resources return collections while others return singletons.

URI’s

The URIs for the REST-based Web services have the following syntax:

  http://{http address of service}/ws/{version}/{resourcepath}

The elements in this syntax are as follows:

  {http address of service} - The http address of the service to get information about. 
                              Currently supported are the ResourceManager, NodeManager, 
                              MapReduce application master, and history server.
  {version} - The version of the APIs. In this release, the version is v1.
  {resourcepath} - A path that defines a singleton resource or a collection of resources. 

HTTP Requests

To invoke a REST API, your application calls an HTTP operation on the URI associated with a resource.

Summary of HTTP operations

Currently only GET is supported. It retrieves information about the resource specified.

Security

The web service REST API’s go through the same security as the web UI. If your cluster adminstrators have filters enabled you must authenticate via the mechanism they specified.

Headers Supported

  * Accept 
  * Accept-Encoding

Currently the only fields used in the header is Accept and Accept-EncodingAccept currently supports XML and JSON for the response type you accept. Accept-Encoding currently supports only gzip format and will return gzip compressed output if this is specified, otherwise output is uncompressed. All other header fields are ignored.

HTTP Responses

The next few sections describe some of the syntax and other details of the HTTP Responses of the web service REST APIs.

Compression

This release supports gzip compression if you specify gzip in the Accept-Encoding header of the HTTP request (Accept-Encoding: gzip).

Response Formats

This release of the web service REST APIs supports responses in JSON and XML formats. JSON is the default. To set the response format, you can specify the format in the Accept header of the HTTP request.

As specified in HTTP Response Codes, the response body can contain the data that represents the resource or an error message. In the case of success, the response body is in the selected format, either JSON or XML. In the case of error, the resonse body is in either JSON or XML based on the format requested. The Content-Type header of the response contains the format requested. If the application requests an unsupported format, the response status code is 500. Note that the order of the fields within response body is not specified and might change. Also, additional fields might be added to a response body. Therefore, your applications should use parsing routines that can extract data from a response body in any order.

Response Errors

After calling an HTTP request, an application should check the response status code to verify success or detect an error. If the response status code indicates an error, the response body contains an error message. The first field is the exception type, currently only RemoteException is returned. The following table lists the items within the RemoteException error message:

Item Data Type Description
exception String Exception type
javaClassName String Java class name of exception
message String Detailed message of exception

Response Examples

JSON response with single resource

HTTP Request: GET http://rmhost.domain:8088/ws/v1/cluster/app/application\_1324057493980\_0001

Response Status Line: HTTP/1.1 200 OK

Response Header:

  HTTP/1.1 200 OK
  Content-Type: application/json
  Transfer-Encoding: chunked
  Server: Jetty(6.1.26)

Response Body:

{
  app":
  {
    "id":"application_1324057493980_0001",
    "user":"user1",
    "name":"",
    "queue":"default",
    "state":"ACCEPTED",
    "finalStatus":"UNDEFINED",
    "progress":0,
    "trackingUI":"UNASSIGNED",
    "diagnostics":"",
    "clusterId":1324057493980,
    "startedTime":1324057495921,
    "finishedTime":0,
    "elapsedTime":2063,
    "amContainerLogs":"http:\/\/amNM:2\/node\/containerlogs\/container_1324057493980_0001_01_000001",
    "amHostHttpAddress":"amNM:2"
  }
}
JSON response with Error response

Here we request information about an application that doesn’t exist yet.

HTTP Request: GET http://rmhost.domain:8088/ws/v1/cluster/app/application\_1324057493980\_9999

Response Status Line: HTTP/1.1 404 Not Found

Response Header:

  HTTP/1.1 404 Not Found
  Content-Type: application/json
  Transfer-Encoding: chunked
  Server: Jetty(6.1.26)

Response Body:

{
   "RemoteException" : {
      "javaClassName" : "org.apache.hadoop.yarn.webapp.NotFoundException",
      "exception" : "NotFoundException",
      "message" : "java.lang.Exception: app with id: application_1324057493980_9999 not found"
   }
}

Sample Usage

You can use any number of ways/languages to use the web services REST API’s. This example uses the curl command line interface to do the REST GET calls.

In this example, a user submits a MapReduce application to the ResourceManager using a command like:

  hadoop jar hadoop-mapreduce-test.jar sleep -Dmapred.job.queue.name=a1 -m 1 -r 1 -rt 1200000 -mt 20

The client prints information about the job submitted along with the application id, similar to:

12/01/18 04:25:15 INFO mapred.ResourceMgrDelegate: Submitted application application_1326821518301_0010 to ResourceManager at host.domain.com/10.10.10.10:8032
12/01/18 04:25:15 INFO mapreduce.Job: Running job: job_1326821518301_0010
12/01/18 04:25:21 INFO mapred.ClientServiceDelegate: The url to track the job: host.domain.com:8088/proxy/application_1326821518301_0010/
12/01/18 04:25:22 INFO mapreduce.Job: Job job_1326821518301_0010 running in uber mode : false
12/01/18 04:25:22 INFO mapreduce.Job:  map 0% reduce 0%

The user then wishes to track the application. The users starts by getting the information about the application from the ResourceManager. Use the –comopressed option to request output compressed. curl handles uncompressing on client side.

curl --compressed -H "Accept: application/json" -X GET "http://host.domain.com:8088/ws/v1/cluster/apps/application_1326821518301_0010" 

Output:

{
   "app" : {
      "finishedTime" : 0,
      "amContainerLogs" : "http://host.domain.com:8042/node/containerlogs/container_1326821518301_0010_01_000001",
      "trackingUI" : "ApplicationMaster",
      "state" : "RUNNING",
      "user" : "user1",
      "id" : "application_1326821518301_0010",
      "clusterId" : 1326821518301,
      "finalStatus" : "UNDEFINED",
      "amHostHttpAddress" : "host.domain.com:8042",
      "progress" : 82.44703,
      "name" : "Sleep job",
      "startedTime" : 1326860715335,
      "elapsedTime" : 31814,
      "diagnostics" : "",
      "trackingUrl" : "http://host.domain.com:8088/proxy/application_1326821518301_0010/",
      "queue" : "a1"
   }
}

The user then wishes to get more details about the running application and goes directly to the MapReduce application master for this application. The ResourceManager lists the trackingUrl that can be used for this application: http://host.domain.com:8088/proxy/application\_1326821518301\_0010. This could either go to the web browser or use the web service REST API’s. The user uses the web services REST API’s to get the list of jobs this MapReduce application master is running:

 curl --compressed -H "Accept: application/json" -X GET "http://host.domain.com:8088/proxy/application_1326821518301_0010/ws/v1/mapreduce/jobs"

Output:

{
   "jobs" : {
      "job" : [
         {
            "runningReduceAttempts" : 1,
            "reduceProgress" : 72.104515,
            "failedReduceAttempts" : 0,
            "newMapAttempts" : 0,
            "mapsRunning" : 0,
            "state" : "RUNNING",
            "successfulReduceAttempts" : 0,
            "reducesRunning" : 1,
            "acls" : [
               {
                  "value" : " ",
                  "name" : "mapreduce.job.acl-modify-job"
               },
               {
                  "value" : " ",
                  "name" : "mapreduce.job.acl-view-job"
               }
            ],
            "reducesPending" : 0,
            "user" : "user1",
            "reducesTotal" : 1,
            "mapsCompleted" : 1,
            "startTime" : 1326860720902,
            "id" : "job_1326821518301_10_10",
            "successfulMapAttempts" : 1,
            "runningMapAttempts" : 0,
            "newReduceAttempts" : 0,
            "name" : "Sleep job",
            "mapsPending" : 0,
            "elapsedTime" : 64432,
            "reducesCompleted" : 0,
            "mapProgress" : 100,
            "diagnostics" : "",
            "failedMapAttempts" : 0,
            "killedReduceAttempts" : 0,
            "mapsTotal" : 1,
            "uberized" : false,
            "killedMapAttempts" : 0,
            "finishTime" : 0
         }
      ]
   }
}

The user then wishes to get the task details about the job with job id job_1326821518301_10_10 that was listed above.

 curl --compressed -H "Accept: application/json" -X GET "http://host.domain.com:8088/proxy/application_1326821518301_0010/ws/v1/mapreduce/jobs/job_1326821518301_10_10/tasks" 

Output:

{
   "tasks" : {
      "task" : [
         {
            "progress" : 100,
            "elapsedTime" : 5059,
            "state" : "SUCCEEDED",
            "startTime" : 1326860725014,
            "id" : "task_1326821518301_10_10_m_0",
            "type" : "MAP",
            "successfulAttempt" : "attempt_1326821518301_10_10_m_0_0",
            "finishTime" : 1326860730073
         },
         {
            "progress" : 72.104515,
            "elapsedTime" : 0,
            "state" : "RUNNING",
            "startTime" : 1326860732984,
            "id" : "task_1326821518301_10_10_r_0",
            "type" : "REDUCE",
            "successfulAttempt" : "",
            "finishTime" : 0
         }
      ]
   }
}

The map task has finished but the reduce task is still running. The users wishes to get the task attempt information for the reduce task task_1326821518301_10_10_r_0, note that the Accept header isn’t really required here since JSON is the default output format:

  curl --compressed -X GET "http://host.domain.com:8088/proxy/application_1326821518301_0010/ws/v1/mapreduce/jobs/job_1326821518301_10_10/tasks/task_1326821518301_10_10_r_0/attempts"

Output:

{
   "taskAttempts" : {
      "taskAttempt" : [
         {
            "elapsedMergeTime" : 158,
            "shuffleFinishTime" : 1326860735378,
            "assignedContainerId" : "container_1326821518301_0010_01_000003",
            "progress" : 72.104515,
            "elapsedTime" : 0,
            "state" : "RUNNING",
            "elapsedShuffleTime" : 2394,
            "mergeFinishTime" : 1326860735536,
            "rack" : "/10.10.10.0",
            "elapsedReduceTime" : 0,
            "nodeHttpAddress" : "host.domain.com:8042",
            "type" : "REDUCE",
            "startTime" : 1326860732984,
            "id" : "attempt_1326821518301_10_10_r_0_0",
            "finishTime" : 0
         }
      ]
   }
}

The reduce attempt is still running and the user wishes to see the current counter values for that attempt:

 curl --compressed -H "Accept: application/json"  -X GET "http://host.domain.com:8088/proxy/application_1326821518301_0010/ws/v1/mapreduce/jobs/job_1326821518301_10_10/tasks/task_1326821518301_10_10_r_0/attempts/attempt_1326821518301_10_10_r_0_0/counters" 

Output:

{
   "JobTaskAttemptCounters" : {
      "taskAttemptCounterGroup" : [
         {
            "counterGroupName" : "org.apache.hadoop.mapreduce.FileSystemCounter",
            "counter" : [
               {
                  "value" : 4216,
                  "name" : "FILE_BYTES_READ"
               }, 
               {
                  "value" : 77151,
                  "name" : "FILE_BYTES_WRITTEN"
               }, 
               {
                  "value" : 0,
                  "name" : "FILE_READ_OPS"
               },
               {
                  "value" : 0,
                  "name" : "FILE_LARGE_READ_OPS"
               },
               {
                  "value" : 0,
                  "name" : "FILE_WRITE_OPS"
               },
               {
                  "value" : 0,
                  "name" : "HDFS_BYTES_READ"
               },
               {
                  "value" : 0,
                  "name" : "HDFS_BYTES_WRITTEN"
               },
               {
                  "value" : 0,
                  "name" : "HDFS_READ_OPS"
               },
               {
                  "value" : 0,
                  "name" : "HDFS_LARGE_READ_OPS"
               },
               {
                  "value" : 0,
                  "name" : "HDFS_WRITE_OPS"
               }
            ]  
         }, 
         {
            "counterGroupName" : "org.apache.hadoop.mapreduce.TaskCounter",
            "counter" : [
               {
                  "value" : 0,
                  "name" : "COMBINE_INPUT_RECORDS"
               }, 
               {
                  "value" : 0,
                  "name" : "COMBINE_OUTPUT_RECORDS"
               }, 
               {  
                  "value" : 1767,
                  "name" : "REDUCE_INPUT_GROUPS"
               },
               {  
                  "value" : 25104,
                  "name" : "REDUCE_SHUFFLE_BYTES"
               },
               {
                  "value" : 1767,
                  "name" : "REDUCE_INPUT_RECORDS"
               },
               {
                  "value" : 0,
                  "name" : "REDUCE_OUTPUT_RECORDS"
               },
               {
                  "value" : 0,
                  "name" : "SPILLED_RECORDS"
               },
               {
                  "value" : 1,
                  "name" : "SHUFFLED_MAPS"
               },
               {
                  "value" : 0,
                  "name" : "FAILED_SHUFFLE"
               },
               {
                  "value" : 1,
                  "name" : "MERGED_MAP_OUTPUTS"
               },
               {
                  "value" : 50,
                  "name" : "GC_TIME_MILLIS"
               },
               {
                  "value" : 1580,
                  "name" : "CPU_MILLISECONDS"
               },
               {
                  "value" : 141320192,
                  "name" : "PHYSICAL_MEMORY_BYTES"
               },
              {
                  "value" : 1118552064,
                  "name" : "VIRTUAL_MEMORY_BYTES"
               }, 
               {  
                  "value" : 73728000,
                  "name" : "COMMITTED_HEAP_BYTES"
               }
            ]
         },
         {  
            "counterGroupName" : "Shuffle Errors",
            "counter" : [
               {  
                  "value" : 0,
                  "name" : "BAD_ID"
               },
               {  
                  "value" : 0,
                  "name" : "CONNECTION"
               },
               {  
                  "value" : 0,
                  "name" : "IO_ERROR"
               },
               {  
                  "value" : 0,
                  "name" : "WRONG_LENGTH"
               },
               {  
                  "value" : 0,
                  "name" : "WRONG_MAP"
               },
               {  
                  "value" : 0,
                  "name" : "WRONG_REDUCE"
               }
            ]
         },
         {  
            "counterGroupName" : "org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter",
            "counter" : [
              {  
                  "value" : 0,
                  "name" : "BYTES_WRITTEN"
               }
            ]
         }
      ],
      "id" : "attempt_1326821518301_10_10_r_0_0"
   }
}

The job finishes and the user wishes to get the final job information from the history server for this job.

  curl --compressed -X GET "http://host.domain.com:19888/ws/v1/history/mapreduce/jobs/job_1326821518301_10_10" 

Output:

{
   "job" : {
      "avgReduceTime" : 1250784,
      "failedReduceAttempts" : 0,
      "state" : "SUCCEEDED",
      "successfulReduceAttempts" : 1,
      "acls" : [
         {
            "value" : " ",
            "name" : "mapreduce.job.acl-modify-job"
         },
         {
            "value" : " ",
            "name" : "mapreduce.job.acl-view-job"
         }
      ],
      "user" : "user1",
      "reducesTotal" : 1,
      "mapsCompleted" : 1,
      "startTime" : 1326860720902,
      "id" : "job_1326821518301_10_10",
      "avgMapTime" : 5059,
      "successfulMapAttempts" : 1,
      "name" : "Sleep job",
      "avgShuffleTime" : 2394,
      "reducesCompleted" : 1,
      "diagnostics" : "",
      "failedMapAttempts" : 0,
      "avgMergeTime" : 2552,
      "killedReduceAttempts" : 0,
      "mapsTotal" : 1,
      "queue" : "a1",
      "uberized" : false,
      "killedMapAttempts" : 0,
      "finishTime" : 1326861986164
   }
}

The user also gets the final applications information from the ResourceManager.

  curl --compressed -H "Accept: application/json" -X GET "http://host.domain.com:8088/ws/v1/cluster/apps/application_1326821518301_0010" 

Output:

{
   "app" : {
      "finishedTime" : 1326861991282,
      "amContainerLogs" : "http://host.domain.com:8042/node/containerlogs/container_1326821518301_0010_01_000001",
      "trackingUI" : "History",
      "state" : "FINISHED",
      "user" : "user1",
      "id" : "application_1326821518301_0010",
      "clusterId" : 1326821518301,
      "finalStatus" : "SUCCEEDED",
      "amHostHttpAddress" : "host.domain.com:8042",
      "progress" : 100,
      "name" : "Sleep job",
      "startedTime" : 1326860715335,
      "elapsedTime" : 1275947,
      "diagnostics" : "",
      "trackingUrl" : "http://host.domain.com:8088/proxy/application_1326821518301_0010/jobhistory/job/job_1326821518301_10_10",
      "queue" : "a1"
   }
}
<think>好的,我需要帮助用户解决他们遇到的Class loading error,具体是找不到AmIpFilter类的问题。根据用户提供的引用内容,用户已经尝试复制了hadoop-lzo-0.4.19.jar到lib目录,但问题依旧存在。用户现在想确认Hadoophadoop-yarn-server-webproxy组件是否存在,并希望得到解决异常的方案。 首先,我需要回顾之前的回答,看看是否有遗漏或需要补充的地方。之前的解决方案提到了依赖缺失、类路径配置错误、环境变量问题和服务器缓存。用户现在可能已经检查了部分步骤,但问题仍未解决,需要更详细的指导。 用户引用的错误日志显示,在复制hadoop-lzo的jar后,仍然出现ClassNotFoundException,说明hadoop-yarn-server-web-proxy的jar可能没有正确引入。需要确认该组件的存在以及路径是否正确。 接下来,我应该分步骤详细说明如何确认组件是否存在,并提供具体的检查方法。比如,通过Hadoop安装目录查找相关jar,检查项目依赖,环境变量设置是否正确,以及如何验证类加载等。 此外,用户提到的是JanusGraph与Hadoop集成的情况,可能需要考虑两者的版本兼容性,以及是否有额外的配置步骤,比如修改JanusGraph的配置文件。 还需要注意用户可能遇到的常见问题,比如依赖冲突,特别是如果用户使用了不同的Hadoop版本,或者构建工具中的依赖管理不当,导致版本不一致。 最后,应该提供验证步骤,确保用户按照指导操作后问题得到解决,并生成相关问题帮助用户进一步排查可能的问题。</think>### 分步解决方案 根据您提供的错误日志和引用信息,`AmIpFilter`类属于**Hadoop YARN Web代理组件**(`hadoop-yarn-server-web-proxy`)。以下是具体验证和修复步骤: --- #### **1. 确认Hadoop组件是否存在** **验证目标**: 确保Hadoop安装目录中包含`hadoop-yarn-server-web-proxy`的JAR文件。 **操作步骤**: 1. **定位Hadoop安装路径** 执行以下命令,确认`HADOOP_HOME`环境变量是否正确: ```bash echo $HADOOP_HOME # 输出应为Hadoop安装路径,如/usr/local/hadoop-3.3.6 ``` 若未设置,需手动指定路径(例如`/usr/local/hadoop`)。 2. **检查组件JAR文件** 进入HadoopYARN模块目录,查找`hadoop-yarn-server-web-proxy`的JAR包: ```bash ls $HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-server-web-proxy-*.jar ``` - **预期结果**:应显示类似`hadoop-yarn-server-web-proxy-3.3.6.jar`的文件。 - **若不存在**:说明Hadoop安装不完整,需重新安装或补充组件[^1]。 --- #### **2. 修复类加载异常** **核心问题**: `AmIpFilter`类位于`hadoop-yarn-server-web-proxy`的JAR中,但未正确部署到应用(如JanusGraph)的类路径。 **操作步骤**: 1. **手动复制缺失的JAR** 将Hadoop的`web-proxy`组件和依赖的`hadoop-lzo`包复制到JanusGraph的`lib`目录: ```bash # 复制web-proxy组件 cp $HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-server-web-proxy-*.jar $janusgraph-home/lib/ # 复制LZO压缩支持(根据您的引用[1],已执行此步骤,但需确认版本一致性) cp $HADOOP_HOME/share/hadoop/tools/lib/hadoop-lzo-0.4.19.jar $janusgraph-home/lib/ ``` 2. **验证类路径包含性** 检查JanusGraph启动脚本(如`janusgraph-server.sh`)是否包含`lib`目录: ```bash # 示例启动命令类路径配置 java -cp "$janusgraph-home/lib/*:...其他路径..." ... ``` 若未包含`lib/*`,需在脚本中添加`$janusgraph-home/lib/*`到`CLASSPATH`。 3. **清理缓存并重启服务** Jetty等服务器可能缓存旧类文件,需清理工作目录后重启: ```bash rm -rf $janusgraph-home/tmp/* # 清理临时文件 $janusgraph-home/bin/janusgraph-server.sh restart # 重启服务 ``` --- #### **3. 版本兼容性检查** **潜在问题**: Hadoop版本与JanusGraph集成的Hadoop客户端版本不匹配(例如Hadoop 3.x与JanusGraph依赖的Hadoop 2.x)。 **操作步骤**: 1. **检查项目依赖声明** 若使用Maven/Gradle,确认`hadoop-yarn-server-web-proxy`版本与Hadoop集群一致: ```xml <!-- Maven示例:版本需与HADOOP_HOME中的版本一致 --> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-yarn-server-web-proxy</artifactId> <version>3.3.6</version> </dependency> ``` 2. **排除冲突依赖** 使用`mvn dependency:tree`检查是否存在其他组件(如Spark、HBase)引入了旧版Hadoop依赖: ```xml <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>3.3.6</version> <exclusions> <exclusion> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-yarn-server-web-proxy</artifactId> </exclusion> </exclusions> </dependency> ``` --- #### **4. 验证类加载** **操作步骤**: 1. **手动测试类加载** 执行以下命令,确认JVM能否找到`AmIpFilter`类: ```bash java -cp "$janusgraph-home/lib/*" org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter ``` - **成功**:无输出或提示缺少主类(正常,因该类是Filter,非可执行类)。 - **失败**:若仍报`ClassNotFoundException`,说明JAR未正确部署或损坏。 2. **检查日志详细信息** 查看JanusGraph服务启动日志,定位类加载失败的具体原因: ```bash tail -n 100 $janusgraph-home/logs/janusgraph-server.log ``` --- ### 关键总结 | 问题根源 | 解决动作 | 验证方法 | |-------------------------|---------------------------------------|------------------------------| | `hadoop-yarn-server-web-proxy`缺失 | 复制JAR到`lib/`目录 | 检查`lib/`下是否存在对应JAR | | 版本冲突 | 统一Hadoop组件版本或排除冲突依赖 | `mvn dependency:tree` | | 类路径配置错误 | 修正启动脚本中的`CLASSPATH` | 检查启动命令参数 | --- ### 相关问题 1. 如何排查Hadoop依赖冲突导致的`NoClassDefFoundError`? 2. JanusGraph集成Hadoop时需要哪些额外配置? 3. Jetty服务器类加载机制与Tomcat有何区别?
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值