org.apache.hadoop.io.compress源码解读

最新推荐文章于 2024-08-07 09:31:07 发布

USTCZYY

最新推荐文章于 2024-08-07 09:31:07 发布

阅读量1.1k

点赞数

CC 4.0 BY-SA版权

分类专栏： hadoop

本文链接：https://blog.youkuaiyun.com/ustczyy/article/details/16827319

hadoop 专栏收录该内容

49 篇文章

订阅专栏

本文详细介绍了Hadoop中压缩器接口Compressor的功能与使用方法，包括设置输入数据、检查输入缓冲区状态、设定预设词典等操作，并阐述了如何获取压缩数据及管理压缩流程。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
*     http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.hadoop.io.compress;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

/**
* Specification of a stream-based 'compressor' which can be
* plugged into a {@link CompressionOutputStream} to compress data.
* This is modelled after {@link java.util.zip.Deflater}
*
*/
public interface Compressor {
/**
   * Sets input data for compression.
   * This should be called whenever #needsInput() returns
   * <code>true</code> indicating that more input data is required.
   *
   * @param b Input data
   * @param off Start offset
   * @param len Length
   */
   //通过setInput()接收数据到内部缓存区可以多次调用该方法
public void setInput(byte[] b, int off, int len);

/**
   * Returns true if the input data buffer is empty and
   * #setInput() should be called to provide more input.
   *
   * @return <code>true</code> if the input data buffer is empty and
   * #setInput() should be called in order to provide more input.
   */

//返回false表示内部缓存已经满了此时必须通过compress()方法获取压缩后的数据
public boolean needsInput();

/**
   * Sets preset dictionary for compression. A preset dictionary
   * is used when the history buffer can be predetermined.
   *
   * @param b Dictionary data bytes
   * @param off Start offset
   * @param len Length
   */
public void setDictionary(byte[] b, int off, int len);

/**
   * Return number of uncompressed bytes input so far.
   */
//获得compressor()输入没有压缩字节的总数
public long getBytesRead();

/**
   * Return number of compressed bytes output so far.
   */
//输出压缩自己的总数
public long getBytesWritten();

/**
   * When called, indicates that compression should end
   * with the current contents of the input buffer.
   */
//调用finish()就开始压缩
public void finish();

/**
   * Returns true if the end of the compressed
   * data output stream has been reached.
   * @return <code>true</code> if the end of the compressed
   * data output stream has been reached.
   */
//判断压缩器里是不是还有没有压缩的数据
public boolean finished();

/**
   * Fills specified buffer with compressed data. Returns actual number
   * of bytes of compressed data. A return value of 0 indicates that
   * needsInput() should be called in order to determine if more input
   * data is required.
   *
   * @param b Buffer for the compressed data
   * @param off Start offset of the data
   * @param len Size of the buffer
   * @return The actual number of bytes of compressed data.
   */
public int compress(byte[] b, int off, int len) throws IOException;

/**
   * Resets compressor so that a new set of input data can be processed.
   */
//用与重置压缩器以处理新的输入数据集合
public void reset();

/**
   * Closes the compressor and discards any unprocessed input.
   */
//关闭解压缩器并放弃所有没有处理的输入
public void end();

/**
   * Prepare the compressor to be used in a new stream with settings defined in
   * the given Configuration
   *
   * @param conf Configuration from which new setting are fetched
   */
//更进一步允许使用hadoop的配置系统重新配置压缩器
public void reinit(Configuration conf);
}