java flying saucer生成的pdf文件中文、样式、换行问题

最新推荐文章于 2025-09-24 09:30:00 发布

原创最新推荐文章于 2025-09-24 09:30:00 发布 · 1.8k 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#java #操作系统

本文介绍了在Java项目中使用flying saucer和iText生成PDF时遇到的中文字符换行和样式问题。通过分析Unicode块，作者解决了中文字符超过版面宽度的问题，并修复了中文标点符号出现在行首的bug。文章还提供了修改源码的建议以及相关依赖项的版本信息，并提醒读者flying saucer仅支持CSS 2.1，CSS 3可能导致显示异常。

部署运行你感兴趣的模型镜像

在项目中，利用iText和flying saucer生成pdf文件，网上所说的中文不显示的问题倒是没有遇到，不过就是中文字体时，由于字符宽度是按字母计算的，同样字数会导致一行显示很长不换行，从而超过版面宽度显示不全的问题。经过分析和查找，终于的到解决方案，另外也解决了中文标点符号出现在行首的现象。

首先，我们要明白这几个东西的真实含义：

Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS ： 4E00-9FBF：CJK 统一表意符号

Character.UnicodeBlock.CJK_COMPATIBILITY_IDEOGRAPHS ：F900-FAFF：CJK 兼容象形文字Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A ：3400-4DBF：CJK 统一表意符号扩展 A

CJK的意思是“Chinese，Japanese，Korea”的简写 ，实际上就是指中日韩三国的象形文字的Unicode编码

Character.UnicodeBlock.GENERAL_PUNCTUATION ：2000-206F：常用标点Character.UnicodeBlock.CJK_SYMBOLS_AND_PUNCTUATION ：3000-303F：CJK 符号和标点Character.UnicodeBlock.HALFWIDTH_AND_FULLWIDTH_FORMS ：FF00-FFEF：半角及全角形式

在flying saucer生成pdf的过程中，源代码中是按组和行宽来换行的，具体逻辑如下：

以空格为分组符，任意两个相邻空格之间的字符都是不可换行的，当该行多组字符的总长到达最接近行宽的时候，也就是再加一个字符组就会超过行宽的时候，就会换行

这种方式对英文是很凑效的，但是对中文是不行的，因为中文不是以空格分隔的。yye_javaeye老兄对源码做了改进，判断是否是中文字符，如果是中文字符，则把每个中文字符做一组，然后再用组+行宽来换行。他添加了两个方法：isChinese(char c)和getStrRight(String s,int left)，改过后的部分源码如下：

package org.xhtmlrenderer.layout;  
  
import org.xhtmlrenderer.css.constants.IdentValue;  
import org.xhtmlrenderer.css.style.CalculatedStyle;  
import org.xhtmlrenderer.render.FSFont;  
  
/** 
 * A utility class that scans the text of a single inline box, looking for the  
 * next break point. 
 * @author Torbjrn Gannholm 
 */  
public class Breaker {  
  
。。。。。。      
    public static void breakText(LayoutContext c,   
            LineBreakContext context, int avail, CalculatedStyle style) {  
。。。。。。  
        String currentString = context.getStartSubstring();  
        int left = 0;  
//        int right = currentString.indexOf(WhitespaceStripper.SPACE, left + 1);  
        int right = getStrRight(currentString,left);  
        int lastWrap = 0;  
        int graphicsLength = 0;  
        int lastGraphicsLength = 0;  
  
        while (right > 0 && graphicsLength <= avail) {  
            lastGraphicsLength = graphicsLength;  
            graphicsLength += c.getTextRenderer().getWidth(  
                    c.getFontContext(), font, currentString.substring(left, right));  
            lastWrap = left;  
            left = right;  
//            right = currentString.indexOf(WhitespaceStripper.SPACE, left + 1);  
            right = getStrRight(currentString,left+1);  
        }  
  
。。。。。。  
    }  
  
    private static boolean isChinese(char c) {  
        Character.UnicodeBlock ub = Character.UnicodeBlock.of(c);  
        if (ub == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS  
                || ub == Character.UnicodeBlock.CJK_COMPATIBILITY_IDEOGRAPHS  
                || ub == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A  
                || ub == Character.UnicodeBlock.GENERAL_PUNCTUATION  
                || ub == Character.UnicodeBlock.CJK_SYMBOLS_AND_PUNCTUATION  
                || ub == Character.UnicodeBlock.HALFWIDTH_AND_FULLWIDTH_FORMS) {  
            return true;  
        }  
        return false;  
    }  
  
    private static int getStrRight(String s,int left){  
        if(left>=s.length())  
            return -1;  
        char[] ch = s.toCharArray();  
        for(int i = left;i<ch.length;i++){  
            if(isChinese(ch[i]) || ' ' == ch[i]){  
                return i==0?i+1:i;  
            }  
        }  
        return -1;  
    }  
  
}

这种方法确实实现了中文汉字的换行，但是也带来了“标点符号在行首”的问题，因为他把标点符号也视作汉字了，这样标点符号也被分组被换行了，解决的方法就是把标点符号从isChinese方法中去掉。其源码如下：

/* 
 * Breaker.java 
 * Copyright (c) 2004, 2005 Torbj锟絩n Gannholm,  
 * Copyright (c) 2005 Wisconsin Court System 
 * 
 * This program is free software; you can redistribute it and/or 
 * modify it under the terms of the GNU Lesser General Public License 
 * as published by the Free Software Foundation; either version 2.1 
 * of the License, or (at your option) any later version. 
 * 
 * This program is distributed in the hope that it will be useful, 
 * but WITHOUT ANY WARRANTY; without even the implied warranty of 
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 
 * GNU Lesser General Public License for more details. 
 * 
 * You should have received a copy of the GNU Lesser General Public License 
 * along with this program; if not, write to the Free Software 
 * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. 
 * 
 */  
package org.xhtmlrenderer.layout;  
  
import org.xhtmlrenderer.css.constants.IdentValue;  
import org.xhtmlrenderer.css.style.CalculatedStyle;  
import org.xhtmlrenderer.render.FSFont;  
  
/** 
 * A utility class that scans the text of a single inline box, looking for the  
 * next break point. 
 * @author Torbj锟絩n Gannholm 
 */  
public class Breaker {  
  
    public static void breakFirstLetter(LayoutContext c, LineBreakContext context,  
            int avail, CalculatedStyle style) {  
        FSFont font = style.getFSFont(c);  
        context.setEnd(getFirstLetterEnd(context.getMaster(), context.getStart()));  
        context.setWidth(c.getTextRenderer().getWidth(  
                c.getFontContext(), font, context.getCalculatedSubstring()));  
          
        if (context.getWidth() > avail) {  
            context.setNeedsNewLine(true);  
            context.setUnbreakable(true);  
        }  
    }  
      
    private static int getFirstLetterEnd(String text, int start) {  
        int i = start;  
        while (i < text.length()) {  
            char c = text.charAt(i);  
            int type = Character.getType(c);  
            if (type == Character.START_PUNCTUATION ||   
                    type == Character.END_PUNCTUATION ||  
                    type == Character.INITIAL_QUOTE_PUNCTUATION ||  
                    type == Character.FINAL_QUOTE_PUNCTUATION ||  
                    type == Character.OTHER_PUNCTUATION) {  
                i++;  
            } else {  
                break;  
            }  
        }  
        if (i < text.length()) {  
            i++;  
        }  
        return i;  
    }      
      
    public static void breakText(LayoutContext c,   
            LineBreakContext context, int avail, CalculatedStyle style) {  
        FSFont font = style.getFSFont(c);  
        IdentValue whitespace = style.getWhitespace();  
          
        // ====== handle nowrap  
        if (whitespace == IdentValue.NOWRAP) {  
            context.setEnd(context.getLast());  
            context.setWidth(c.getTextRenderer().getWidth(  
                    c.getFontContext(), font, context.getCalculatedSubstring()));  
            return;  
        }  
  
        //check if we should break on the next newline  
        if (whitespace == IdentValue.PRE ||  
                whitespace == IdentValue.PRE_WRAP ||  
                whitespace == IdentValue.PRE_LINE) {  
            int n = context.getStartSubstring().indexOf(WhitespaceStripper.EOL);  
            if (n > -1) {  
                context.setEnd(context.getStart() + n + 1);  
                context.setWidth(c.getTextRenderer().getWidth(  
                        c.getFontContext(), font, context.getCalculatedSubstring()));  
                context.setNeedsNewLine(true);  
                context.setEndsOnNL(true);  
            } else if (whitespace == IdentValue.PRE) {  
                context.setEnd(context.getLast());  
                context.setWidth(c.getTextRenderer().getWidth(  
                        c.getFontContext(), font, context.getCalculatedSubstring()));    
            }  
        }  
  
        //check if we may wrap  
        if (whitespace == IdentValue.PRE ||   
                (context.isNeedsNewLine() && context.getWidth() <= avail)) {  
            return;  
        }  
          
        context.setEndsOnNL(false);  
  
        String currentString = context.getStartSubstring();  
        int left = 0;  
//        int right = currentString.indexOf(WhitespaceStripper.SPACE, left + 1);  
        int right = getStrRight(currentString,left);  
        int lastWrap = 0;  
        int graphicsLength = 0;  
        int lastGraphicsLength = 0;  
  
        while (right > 0 && graphicsLength <= avail) {  
            lastGraphicsLength = graphicsLength;  
            graphicsLength += c.getTextRenderer().getWidth(  
                    c.getFontContext(), font, currentString.substring(left, right));  
            lastWrap = left;  
            left = right;  
//            right = currentString.indexOf(WhitespaceStripper.SPACE, left + 1);  
            right = getStrRight(currentString,left+1);  
        }  
  
        if (graphicsLength <= avail) {  
            //try for the last bit too!  
            lastWrap = left;  
            lastGraphicsLength = graphicsLength;  
            graphicsLength += c.getTextRenderer().getWidth(  
                    c.getFontContext(), font, currentString.substring(left));  
        }  
  
        if (graphicsLength <= avail) {  
            context.setWidth(graphicsLength);  
            context.setEnd(context.getMaster().length());  
            //It fit!  
            return;  
        }  
          
        context.setNeedsNewLine(true);  
  
        if (lastWrap != 0) {//found a place to wrap  
            context.setEnd(context.getStart() + lastWrap);  
            context.setWidth(lastGraphicsLength);  
        } else {//unbreakable string  
            if (left == 0) {  
                left = currentString.length();  
            }  
              
            context.setEnd(context.getStart() + left);  
            context.setUnbreakable(true);  
              
            if (left == currentString.length()) {  
                context.setWidth(c.getTextRenderer().getWidth(  
                        c.getFontContext(), font, context.getCalculatedSubstring()));  
            } else {  
                context.setWidth(graphicsLength);  
            }  
        }  
        return;  
    }  
  
    private static boolean isChinese(char c) {  
        Character.UnicodeBlock ub = Character.UnicodeBlock.of(c);  
        if (ub == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS  
                || ub == Character.UnicodeBlock.CJK_COMPATIBILITY_IDEOGRAPHS  
                || ub == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A) {  
            return true;  
        }  
        return false;  
    }  
  
    private static int getStrRight(String s,int left){  
        if(left>=s.length())  
            return -1;  
        char[] ch = s.toCharArray();  
        for(int i = left;i<ch.length;i++){  
            if(isChinese(ch[i]) || ' ' == ch[i]){  
                return i==0?i+1:i;  
            }  
        }  
        return -1;  
    }      
  
}

由于我使用的是maven管理项目依赖，包名有所变化

依赖：

<dependency>
	<groupId>org.xhtmlrenderer</groupId>
	<artifactId>flying-saucer-pdf-itext5</artifactId>
	<version>9.0.1</version>
</dependency>

上面会引入以下几个依赖项：

flying-saucer-parent-9.0.1.pom

flying-saucer-pdf-itext5-9.0.1.jar

flying-saucer-core-9.0.1.jar

core-renderer-R8.jar

上面对应的代码就在flying-saucer-core-9.0.1.jar包中，修改之即可。

另外需要注意的就是，目前flying-saucer支持的最高版本是CSS 2.1，如果用了CSS 3的内容，可能会不起作用而导致版面错乱。

http://bettereveryday.iteye.com/blog/611561

您可能感兴趣的与本文相关的镜像

Stable-Diffusion-3.5

图片生成

Stable-Diffusion

Stable Diffusion 3.5 (SD 3.5) 是由 Stability AI 推出的新一代文本到图像生成模型，相比 3.0 版本，它提升了图像质量、运行速度和硬件效率