java.util.zip.ZipInputStream读取zip文件采坑记录

最新推荐文章于 2024-10-21 23:10:17 发布

原创最新推荐文章于 2024-10-21 23:10:17 发布 · 4.8k 阅读

2 ·

CC 4.0 BY-SA版权

文章标签：

#java.util.zip.ZipInputStream

Java 专栏收录该内容

15 篇文章

订阅专栏

本文记录了使用java.util.zip.ZipInputStream读取zip文件时遇到的问题，包括文件大小异常增大、文件内容错误及解压失败。在解决过程中，通过查阅源码发现ZipInputStream的API使用不当是问题所在。最终，通过调整读取方式，修复了问题并提高了读取效率。

部署运行你感兴趣的模型镜像

问题描述

最近做个东西，需求是这样的。web端上传一个zip格式的Java应用部署包到Linux服务器上的指定目录下，我负责后端。
zip包目录结构是这样的：
· conf # 该目录下存放app配置文件
· lib # 该目录下存放app用到的jar包
需求很简单，于是开始实现，核心代码也就是下面这样子。

byte[] buf = new byte[1024];
// zipFile is a org.springframework.web.multipart.MultipartFile object
try (ZipInputStream zin = new ZipInputStream(zipFile.getInputStream())) {
	ZipEntry entry;
	while ((entry = zin.getNextEntry()) != null) {
		if (!entry.isDirectory()) {
			try (OutputStream os = sftp.put(sb.toString())) {
				while((zin.read(buf)) != -1) {
					// write buf to Linux server by SFTP
			 		os.write(buf);
				}
				os.flush();
			}
		}
		zin.closeEntry();
	}
}

乍一看，这代码没什么问题吧？简洁优雅，逻辑清晰。
才怪！这能够达到预期吗？
显然是不能啊！不然就没有本文了。
自测发现，文件是可以上传到服务器，但是：
· 上传上去的文件居然比原文件大3倍左右，原来1KB变成3KB，显然不对
· cat命令查看conf下的配置文件，文件末尾多了些乱七八糟的东西
· 取上传的jar包到本地，zip程序解压失败，提示“文件被修改”

解决过程

尝试了以下方法来解决这个bug。
` 求助搜索引擎
· 尝试不同的API
· 读源代码
结果呢，也是颇费周折。
· 搜索没发现这个问题的解决办法
· 尝试不同API，知道了它们的关系
· 仔细看了源代码，最终解决问题
通过阅读源代码发现，ZipInputStream有三个读取当前压缩项目数据的API：

int java.util.zip.InflaterInputStream.read() throws IOException

int java.io.FilterInputStream.read(byte[] b) throws IOException

int java.util.zip.ZipInputStream.read(byte[] b, int off, int len) throws IOException

这三个API来自三个不同的类。具体来说，这是爷孙三代的关系。

java.io.FilterInputStream // 它是爷爷，是java.io.InputStream的子类，就简单继承了InputStream

java.util.zip.InflaterInputStream // 它是爸爸，继承了java.io.FilterInputStream

java.util.zip.ZipInputStream // 它自然就是孙子了，继承了java.util.zip.InflaterInputStream

解决办法

我分别尝试通过三个API去读取数据，最终验证发现，来自爸爸的那个方法可以达到目的。即：

int java.util.zip.InflaterInputStream.read() throws IOException

因此，可行的代码是下面这个。

// zipFile is a org.springframework.web.multipart.MultipartFile object
try (ZipInputStream zin = new ZipInputStream(zipFile.getInputStream());
	ByteArrayOutputStream baos = new ByteArrayOutputStream(2048)) {
	ZipEntry entry;
	while ((entry = zin.getNextEntry()) != null) {
		if (!entry.isDirectory()) {
			int b = 0;
			while ((b = zin.read()) != -1) {
				baos.write(b);
			}
			baos.flush();
			// write baos to Linux server by SFTP
		}
		zin.closeEntry();
	}
}

但是，一次读取一字节，未免效率太低了，没法实现秒杀啊。于是，继续探索源代码，当看到java.util.jar.JarInputStream（ZipInputStream 子类）里这个方法的代码时，我恍然大悟。

private byte[] getBytes(InputStream is) throws IOException {
	byte[] buffer = new byte[8192];
	ByteArrayOutputStream baos = new ByteArrayOutputStream(2048);
	int n;
	while ((n = is.read(buffer, 0, buffer.length)) != -1) {
		baos.write(buffer, 0, n);
	}
	return baos.toByteArray();
}

其实，我的问题就出在这行代码。

os.write(buf);
// equal to
os.write(buf, 0, buf.length);

这显然不对，read方法不保证每次都可以读到buf.length这么多数据的！
因此，正确又高效的写法如下。

byte[] buf = new byte[1024];
// zipFile is a org.springframework.web.multipart.MultipartFile object
try (ZipInputStream zin = new ZipInputStream(zipFile.getInputStream())) {
	ZipEntry entry;
	while ((entry = zin.getNextEntry()) != null) {
		if (!entry.isDirectory()) {
			try (OutputStream os = sftp.put(sb.toString())) {
				int n;
				while((n = zin.read(buf)) != -1) {
					// write buf to Linux server by SFTP
			 		os.write(buf, 0 ,n);
				}
				os.flush();
			}
		}
		zin.closeEntry();
	}
}

和getBytes方法很像吧？至此，这个bug算是解决了。