文件上传 - Apache commons FileUpload 模块解读

最新推荐文章于 2024-03-31 01:08:21 发布

原创最新推荐文章于 2024-03-31 01:08:21 发布 · 595 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#apache #java #文件上传 #http

本文详细介绍了文件上传的过程，包括HTTP协议中multipart/form-data的作用及其实现方式。通过解析FileUploadBase和MultipartStream源码，揭示了如何从HTTP请求中获取文件数据。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

前言

大部分后台开发的同学都自己做过或者接触过文件上传，不知道是不是也有下面这些疑问。

文件上传和普通的 GET、POST 请求有什么区别？
HTTP是如何处理很大的文件？会不会占用很大的带宽和内存？

如果你也有这些疑问，适合继续阅读。

整体过程

一个含有上传按钮的页面

<form method="POST" enctype="multipart/form-data" action="xxx">
  File to upload: <input type="file" name="upfile"><br/>
  Notes about the file: <input type="text" name="note"><br/>
  <br/>
  <input type="submit" value="Press"> to upload the file!
</form>

上传相关 HTTP 协议

从页面代码可以看出，文件上传也是一个普通的 POST 请求。
通过 chrome 浏览器可以查看发送的http请求。

重点在于http请求头中的 Content-Type.

Content-Type:multipart/form-data; boundary=----WebKitFormBoundary4jna7XPdbJmDdWgC

有两部分内容，Content-Type为“多部分的表单数据”，boundary为一个独立字符串“—-WebKitFormBoundary4jna7XPdbJmDdWgC”。

HTTP 协议约定，上传的表单字段在 body 中，利用 boundary 进行分割。

字段之间通过 --boundary\r\n 进行分割;
请求最后通过 --boundary--标识结束。

为什么需要 boundary ?

基本的 POST 请求是不需要 boundary 的，字段和字段之间通过 & 符号进行连接。如 key1=value1&key2=value2.

文件上传为何不行？

个人觉得是因为文件结构比较复杂，一级的 key=value 表示不了。
如图2所示，一个文件会有文件名称、文件类型、文件内容等，多个文件的话就需要多个这种表示，所以需要通过加入 boundary 来增加一个层级。

Commons FileUpload 代码走读

了解了 http 协议后也就了解了文件上传的整体过程，接下来就是看服务端实现了。

了解两个主体类 FileUploadBase 和 MultipartStream，也就了解了核心实现。

源代码太长，下面列出了 FileUploadBase 的主体逻辑。

class FileUploadBase {
    /**
     * 主方法，从 request 中获取 form 表单属性和文件流。
     * 统一通过 FileItem 返回。
     */
    List<FileItem> parseRequest(requestContext) {   
        List<FileItem> fileItems = new ArrayList<>();

        FileItemIterator iter = new FileItemIteratorImpl(ctx);
        while(iter.hasNext()) {
            FileItemStreamImpl item = iter.getNext();
            FileItem fileItem = fileItemFactory.create(item.getFieldName(), item.getFileName(), item.isFormField);

            // 必须在此时读取每个 boundary 隔离出的 value，因为底层是字节流，必须按照顺序读取 
            Streams.copy(item.openStream(), fileItem.getOutputStream());

            fileItems.add(fileItem);
        }
        return fileItems;
    }

    class FileItemIteratorImpl {    

        class FileItemStreamImpl {

            String fileName;
            String fieldName;
            boolean isFormField;

            InputStream itemStream;

            FileItemStreamImpl(fileName, fieldName, isFormField) {
                this.fileName = fileName;
                this.fileName = fileName;
                this.isFormField = isFormField;
                itemStream = multi.newInputStream();
            }
        }

        MultipartStream multi;

        FileItemStream currentItem;

        boolean itemValid;

        boolean eof;


        boolean findNextItem() {
            if (eof) {
                return false;
            }
            // 读取 boundary，并根据后面的 -- 或者 \r\n 来判断是否有下一部分
            boolean nextPart = multi.nextBoundary();

            if (!nextPart) {
                eof = true;
                return false;
            }

            // 读取请求头，并根据 \r\n\r\n 来区分 value         
            String headersStr = multi.readHeaders()
            FileItemHeaders headers = getParsedHeaders(headersStr);

            // 从 headers 获取 fileName/fieldName/contentType/isFormField
            currentItem = new FileItemStreamImpl(fileName, fieldName, contentType, isFormField);
            itemValid = true;
            return ture; 
        }

        boolean hasNext() {
            if (eof) {
                return false;
            }
            if (itemValid) {
                return true;
            }
            return findNextItem();
        }

        FileItemStream next() {
            if (eof || (!itemValid && !findNextItem()))  {
                throw new Exception();
            }
            itemValid = false;
            return currentItem;
        }
    }
}

对照源码可以看到 FileUploadBase 解析了 request header 中的 content-type 并得到 boundary。
并且负责解析了 FileItem 项中的 headers 信息。得到 fieldName、fileName、contentType 等信息。
具体从流中解析出整段的 boundary、 FileItem headers 以及 FileItem body 是通过 MultipartStream 来完成。

class MultipartStream {
    byte[] boundary;
    int boundaryLength;
    InputStream input;
    byte[] buffer;
    int bufferSize;
    int head;
    int tail;

    boolean readBoundary() {
        head += boundaryLength;
        marker[0] = readByte();
        marker[1] = readByte();
        // 判断是否到达流结尾
        if (marker[0]=='-' && marker[1] == '-') {
            return false;
        }
        // 判断是否还有FileItem
        if (marker[0]=='\r' && marker[1] == '\n') {
            return true;
        }

        throw new Exception();
    }

    String readHeaders() {
        byte[] headerSeparator = ['\r', '\n', '\r', '\n'];
        int i = 0;
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        while (i < headerSeparator.length) {
            byte b = readByte();
            if (b == headerSeparator[i]) {
                i++;
            } else {
                i = 0;
            }
        }
        return baos.toString();
    }

    /**
     * 读取完整 body 到 out
     */
    int readBodyData(OutputStream out) {
        ItemInputStream input = new ItemInputStream();
        return Streams.copy(input, out);
    }

    /**
     * 返回一个 ItemInputStream。
     * 此 inputStrream 将要解析 body 到下一个 boundary。
     */
    ItemInputStream newInputStream() {
        return new ItemInputStream();
    }


    protected int findSeparator() {
        int first;
        int match = 0;
        int maxpos = tail - boundaryLength;
        for (first = head; first <= maxpos && match != boundaryLength; first++) {
             first = findByte(boundary[0], first);
            if (first == -1 || first > maxpos) {
                return -1;
            }
            for (match = 1; match < boundaryLength; match++) {
                if (buffer[first + match] != boundary[match]) {
                    break;
                }
            }
        }
        if (match == boundaryLength) {
            return first - 1;
        }
        return -1;
    }   


    class ItemInputStream {
        // 能够识别出 boundary 的最短长度。（一般等于 boundary.leagth）
        int pad;

        // 代表在外层 buffer 中找到 boundary 的位置.
        int pos;

        ItemInputStream() {
            pos = MultipartStream.this.findSeparator();
            if (pos == -1) {
                pad = boundary.leagth;
            }
        }

        public int read(byte[] b, int off, int len) {
            int res = available();
            if (res == 0) {
                res = makeAvailable();
                if (res == 0) {
                    return -1;
                }
            }
            res = Math.min(len, res);
            System.arraycopy(buffer, head, b, off, res);
            head += res;
            return res;
        }

        public int available() {
            if (pos == -1) {
                return tail - head - pad;
            } else {
                return pos - head;
            }
        }

        // 注意必须先调用 available() == 0 后再调用此方法
        public int makeAvailable() {
            if (pos != -1) {
                return 0;
            }

            System.arraycopy(buffer, tail - pad, buffer, 0, pad);
            head = 0;
            tail = pad;

            // 网络包可能需要一段时间过来
            for (;;) {
                int bytesRead = input.read(buffer, tail, bufSize - tail);
                if (bytesRead == -1) {
                    throw new MalformedStreamException("Stream ended unexpectedly");
                }

                tail += bytesRead;
                findSeparator();
                int av = available();
                if (av > 0 || pos != -1) {
                    return av;
                }
            }
        }
    }
}