TaskTracker中HttpServer doGet源码分析-优快云博客

TaskTracker节点的内部Http服务组件主要提供两个功能：1)./logtask，获取某一个Task的执行日志；2)./mapOutput，获取某一个Task的map输出数据。对于用户来说，Http服务组件的/logtask功能不是必须的，但是它的/mapOutput功能对于整个Map-Reduce框架实现来说则是至关重要的，因为每一个Job的每一个Reduce任务就是通过该服务来获取它所需要的处理数据(也就是同属一个Job的Map任务的输出数据)的。如果不是Http服务组件负责提供该功能的话，我们完全可以取消该组件来优化TaskTracker节点的性能。所以，下面我将主要围绕Http服务组件的/mapOutput功能来展开。

作业的Reduce任务在shuffle阶段主要负责从执行该作业的Map任务的TaskTracker节点上抓取属于自己的Map输出数据，当然前提是这些Map任务已经被成功执行了。至于Reduce任务是如何知道作业的那些Map任务完成了，这一点在前面的博文中有详细的谈到。当Reduce任务发现一个完成的Map任务时，它会向负责执行该Map任务的TaskTracker节点发送一个Http请求来获取这个Map输出中属于自己的数据，也就是说作业的Map/Reduce任务之间的数据是通过Http协议来传输的。这个请求连接的URL请求格式是：http://*:*/mapOutput?job=jobId&map=mapId&reduce=partition。

TaskTracker节点的Http服务组件来接受到/mapOutput的http请求之后，就会交给它的一个后台线程来处理。这个后台线程最终会调用对应的MapOutputServlet来处理，该处理的详细操作如下：

[html]view plaincopy 
   
     private static final int MAX_BYTES_TO_READ = 64 * 1024;  
     @Override  
     public void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {  
       String mapId = request.getParameter("map");  
       String reduceId = request.getParameter("reduce");  
       String jobId = request.getParameter("job");  
   
       if (jobId == null) {  
         throw new IOException("job parameter is required");  
       }  
   
       if (mapId == null || reduceId == null) {  
         throw new IOException("map and reduce parameters are required");  
       }  
         
       ServletContext context = getServletContext();  
       int reduce = Integer.parseInt(reduceId);  
       byte[] buffer = new byte[MAX_BYTES_TO_READ];  
       // true iff IOException was caused by attempt to access input  
       boolean isInputException = true;  
       OutputStream outStream = null;  
       FSDataInputStream mapOutputIn = null;  
    
       long totalRead = 0;  
       ShuffleServerMetrics shuffleMetrics = (ShuffleServerMetrics) context.getAttribute("shuffleServerMetrics");  
       TaskTracker tracker = (TaskTracker) context.getAttribute("task.tracker");  
   
       try {  
         shuffleMetrics.serverHandlerBusy();  
         //创建输出响应流  
         outStream = response.getOutputStream();  
           
         JobConf conf = (JobConf) context.getAttribute("conf");  
         //TaskTracker节点的本地目录，用来存储Map/Reduce任务的中间结果  
         LocalDirAllocator lDirAlloc = (LocalDirAllocator)context.getAttribute("localDirAllocator");  
         FileSystem rfs = ((LocalFileSystem) context.getAttribute("local.file.system")).getRaw();  
   
         //通过JobId和TaskId就可以找到Map任务的map输出文件及索引文件  
         Path indexFileName = lDirAlloc.getLocalPathToRead(TaskTracker.getIntermediateOutputDir(jobId, mapId) + "/file.out.index", conf);  
         Path mapOutputFileName = lDirAlloc.getLocalPathToRead(TaskTracker.getIntermediateOutputDir(jobId, mapId) + "/file.out", conf);  
   
         /**  
          * Read the index file to get the information about where  
          * the map-output for the given reducer is available.   
          */  
         IndexRecord info = tracker.indexCache.getIndexInformation(mapId, reduce,indexFileName);  
             
         //set the custom "from-map-task" http header to the map task from which  
         //the map output data is being transferred  
         response.setHeader(FROM_MAP_TASK, mapId);  
           
         //set the custom "Raw-Map-Output-Length" http header to   
         //the raw (decompressed) length  
         response.setHeader(RAW_MAP_OUTPUT_LENGTH,Long.toString(info.rawLength));  
   
         //set the custom "Map-Output-Length" http header to   
         //the actual number of bytes being transferred  
         response.setHeader(MAP_OUTPUT_LENGTH, Long.toString(info.partLength));  
   
         //set the custom "for-reduce-task" http header to the reduce task number  
         //for which this map output is being transferred  
         response.setHeader(FOR_REDUCE_TASK, Integer.toString(reduce));  
           
         //use the same buffersize as used for reading the data from disk  
         response.setBufferSize(MAX_BYTES_TO_READ);  
           
         /**  
          * Read the data from the sigle map-output file and  
          * send it to the reducer.  
          */  
         //open the map-output file  
         LOG.debug("open MapTask["+mapId+"]'s output file: "+mapOutputFileName);  
         mapOutputIn = rfs.open(mapOutputFileName);  
   
         //seek to the correct offset for the reduce  
         mapOutputIn.seek(info.startOffset);  
         long rem = info.partLength;  
         int len = mapOutputIn.read(buffer, 0, (int)Math.min(rem, MAX_BYTES_TO_READ));  
         while (rem > 0 && len >= 0) {  
           rem -= len;  
           try {  
             shuffleMetrics.outputBytes(len);  
             outStream.write(buffer, 0, len);  
             outStream.flush();  
           } catch (IOException ie) {  
             isInputException = false;  
             throw ie;  
           }  
           totalRead += len;  
           len = mapOutputIn.read(buffer, 0, (int)Math.min(rem, MAX_BYTES_TO_READ));  
         }  
   
         LOG.info("Sent out " + totalRead + " bytes for reduce: " + reduce + " from map: " + mapId + " given " + info.partLength + "/" + info.rawLength);  
       } catch (IOException ie) {  
         Log log = (Log) context.getAttribute("log");  
         String errorMsg = ("getMapOutput(" + mapId + "," + reduceId + ") failed :\n"+ StringUtils.stringifyException(ie));  
         log.warn(errorMsg);  
         //异常是由于map输出造成的，所以通知TaskTracker该Map任务的输出发生了错误  
         if (isInputException) {  
           tracker.mapOutputLost(TaskAttemptID.forName(mapId), errorMsg);  
         }  
         response.sendError(HttpServletResponse.SC_GONE, errorMsg);  
         shuffleMetrics.failedOutput();  
         throw ie;  
       } finally {  
         if (null != mapOutputIn) {  
           mapOutputIn.close();  
         }  
         shuffleMetrics.serverHandlerFree();  
         if (ClientTraceLog.isInfoEnabled()) {  
           ClientTraceLog.info(String.format(MR_CLIENTTRACE_FORMAT, request.getLocalAddr() + ":" + request.getLocalPort(), request.getRemoteAddr() + ":" + request.getRemotePort(), totalRead, "MAPRED_SHUFFLE", mapId));  
         }  
       }  
         
       outStream.close();  
       shuffleMetrics.successOutput();  
     }  
   

这个处理过程可表示如下图：

从上面的代码可以看出，MapOutputServlet为了提高响应时间，对Map任务的输出索引文件信息做了缓存，这里想要解释一下的就是Map任务的输出索引文件file.out.index到底存储了什么重要信息。对于map操作的输出key-value，用户通常会根据应用的需要来为作业的Map输出设置一个partitioner，这个绝对了map操作的每一个key-value输出将要交给哪一个Reduce任务来处理。交给相同的Reduce处理的key-value会存储在Map任务输出文件file.out中的一块连续的的位置，那么这个数据块在file.out中的起始位置、原始长度(map的输出由于用户的设置可能被压缩了)、实际长度就会存储在对应的file.out.index文件中。这样设计的合理性在于，1).file.out.index文件是小文件，缓存的它的信息不回消耗多少内存；2).当作业的一个Map任务完成时，该作业的所有Reduce任务都会马上来获取属于他们的map输出数据，这非常符合局部性原理。同时这个缓存空间又被限制了大小，这样设计的主要目的是通过先进先出的缓存策略来自动的删除那些已经无用的Map输出索引信息，这是因为绝大部分作业的生命周期是很短暂的，当一个作业被完成时，它的所有Map任务输出(即中间数据)就没有任何存在的意义了而被TaskTracker节点给删除了。这个缓存的大小可以由TaskTracker的配置文件来设置，对应的配置项为：mapred.tasktracker.indexcache.mb。

当MapOutputServlet在给Reduce任务发送属于它的Map输出数据时，如果发生了读取该Map输出数据异常，则会通知给TaskTracker节点，而TaskTracker节点会认为这个Map任务执行失败了稍后把这个信息报告给JobTracker节点。至于JobTracker节点再是如何处理的，前面的博文有详细的阐述。

ZZ from http://blog.youkuaiyun.com/xhh198781/article/details/7471048