GemFire/Geode中的国际化(二)

本文通过代码走读开源项目Geode,揭示了国际化实现的关键点,重点关注DataSerializer、HeapDataOutputStream和UriUtils中的字符串处理方法。强调了在处理可能包含非ASCII字符的display_name和description时,应正确使用write/read UTF以避免乱码问题。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

代码走读

商业产品的代码我们是看不到啦,这里我们就以开源的Geode为例,一起看看哪些区域涉及到了国际化实现。(这里笔者使用了内部研发的代码语法感知工具)首先,我们的目光投在了DataSerializer.java中的writeString和readString方法。

public staticvoid writeString(String value, DataOutput out) throws IOException {
    …
    if (value == null) {
      if (isDebugEnabled) {
        logger.trace(LogMarker.SERIALIZER,"Writing NULL_STRING");
      }
      out.writeByte(DSCODE.NULL_STRING);
 
    } else {
      // 注意这里!考虑到可能引入性能损耗
      // 程序会对单字节还是多字节char进行判断,再决定使用何种write方式
      int len = value.length();
      int utfLen = len; // added for bug 40932
      for (int i = 0; i < len; i++) {
        char c = value.charAt(i);
        if ((c <= 0x007F) && (c>= 0x0001)) {
          // nothing needed
        } else if (c > 0x07FF) {
          utfLen += 2;
        } else {
          utfLen += 1;
        }
      }
      boolean writeUTF = utfLen > len;
      if (writeUTF) {
        if (utfLen > 0xFFFF) {
          if (isDebugEnabled) {
            logger.trace(LogMarker.SERIALIZER,"Writing utf HUGE_STRING of len={}", len);
          }
          out.writeByte(DSCODE.HUGE_STRING);
          out.writeInt(len);
          out.writeChars(value);
        } else {
          if (isDebugEnabled) {
            logger.trace(LogMarker.SERIALIZER,"Writing utf STRING of len={}", len);
          }
          out.writeByte(DSCODE.STRING);
          out.writeUTF(value);
        }
      } else {
        if (len > 0xFFFF) {
          if (isDebugEnabled) {
            logger.trace(LogMarker.SERIALIZER,"Writing HUGE_STRING_BYTES of len={}", len);
          }
          out.writeByte(DSCODE.HUGE_STRING_BYTES);
          out.writeInt(len);
          out.writeBytes(value);
        } else {
          if (isDebugEnabled) {
            logger.trace(LogMarker.SERIALIZER,"Writing STRING_BYTES of len={}", len);
          }
          out.writeByte(DSCODE.STRING_BYTES);
          out.writeShort(len);
          out.writeBytes(value);
        }
      }
    }
  }
 
public staticString readString(DataInput in) throws IOException {
    returnInternalDataSerializer.readString(in, in.readByte());
  }

再看HeapDataOutputStream.java中的writeUTF方法,清晰的看到对于ASCII和non-ASCII字符也有着不同的处理逻辑。

public voidwriteUTF(String str) throws UTFDataFormatException {
    if (this.ignoreWrites)
      return;
    checkIfWritable();
    if (ASCII_STRINGS) {
      writeAsciiUTF(str, true);
    } else {
      writeFullUTF(str, true);
    }
  }
 
private voidwriteFullUTF(String str, boolean encodeLength) throws UTFDataFormatException {
    int strlen = str.length();
    if (encodeLength && strlen >65535) {
      throw new UTFDataFormatException();
    }
//这里也为了3字节字符和长度做了预留
//显然4字节字符是不支持的,大家也不用尝试了
    {
      int maxLen = (strlen * 3);
      if (encodeLength) {
        maxLen += 2;
      }
      ensureCapacity(maxLen);
    }
    int utfSizeIdx = this.buffer.position();
    if (encodeLength) {
      // skip bytes reserved for length
      this.buffer.position(utfSizeIdx + 2);
    }
    for (int i = 0; i < strlen; i++) {
      int c = str.charAt(i);
      if ((c >= 0x0001) && (c <=0x007F)) {
        this.buffer.put((byte) c);
      } else if (c > 0x07FF) {
        this.buffer.put((byte) (0xE0 | ((c>> 12) & 0x0F)));
        this.buffer.put((byte) (0x80 | ((c>> 6) & 0x3F)));
        this.buffer.put((byte) (0x80 | ((c>> 0) & 0x3F)));
      } else {
        this.buffer.put((byte) (0xC0 | ((c>> 6) & 0x1F)));
        this.buffer.put((byte) (0x80 | ((c>> 0) & 0x3F)));
      }
    }
    int utflen = this.buffer.position() -utfSizeIdx;
    if (encodeLength) {
      utflen -= 2;
      if (utflen > 65535) {
        // act as if we wrote nothing to thisbuffer
        this.buffer.position(utfSizeIdx);
        throw new UTFDataFormatException();
      }
      this.buffer.putShort(utfSizeIdx, (short)utflen);
    }
  }

最后看UriUtils.java中的decode方法,代码中提前声明public static final String DEFAULT_ENCODING ="UTF-8"; 解决了缺省码表问题,值得我们在编码过程中效法。

public static String decode(final StringencodedValue) {
    return decode(encodedValue,DEFAULT_ENCODING);
  }
 
public staticString decode(String encodedValue, final String encoding) {
    try {
      if (encodedValue != null) {
        String previousEncodedValue;
 
        do {
          previousEncodedValue = encodedValue;
          encodedValue =URLDecoder.decode(encodedValue, encoding);
        } while(!encodedValue.equals(previousEncodedValue));
      }
 
      return encodedValue;
    } catch (UnsupportedEncodingExceptionignore) {
      return encodedValue;
    }
  }

国际化高危区

在前文讲述Redis的时候,我跟大家一起记住了两个国际化高危方法——serialize/ deseialize,今天就让我们一起再认识两张新面孔——toData / fromData

public classPlayer implements DataSerializable {
  private int id;
  private String name;
  private Date birthday;
  private FC club;
 
  @Override
  public void toData(DataOutput out) throwsIOException {
    out.writeInt(this.id);
    out.writeUTF(this.name);
    DataSerializer.writeDate(this.birthday,out);
    DataSerializer.writeObject(this.club, out);
  }
 
  @Override
  public void fromData(DataInput in) throwsIOException, ClassNotFoundException {
    this.id = in.readInt();
    this.name = in.readUTF();
    this.birthday =DataSerializer.readDate(in);
    this.club = (FC)DataSerializer.readObject(in);
  }
}

一旦参数中包含display_name,description之类可能包含non-ASCII字符时,请务必invoke write orread UTF,否则就只能是你家乱码常打开,开放怀抱等你喽!(〒︿〒)

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值