通常而言,把明文的字符序列转换成计算机能理解的二进制序列称为编码,把二进制序列转换成普通人能看懂的明文字符串称为解码。
JDK1.4提供了Charset来处理字节序列和字符序列之间的转换关系,该类包含了用于创建解码器和编码器的方法,还提供了Charset所支持的字符集的方法,Charset类是不可变的。
Charset类提供了一个availableCharset()的静态方法来获取当前JDK所支持的所有字符集,下面小试牛刀
- import java.nio.charset.Charset;
- import java.util.SortedMap;
- public class Test {
- public static void main(String[] args) throws Exception
- {
- SortedMap<String,Charset> sm= Charset.availableCharsets();
- for(String str:sm.keySet())
- {
- System.out.println(sm.get(str));
- }
- }
- }
- Big5
- Big5-HKSCS
- EUC-JP
- EUC-KR
- GB18030
- GB2312
- GBK
- IBM-Thai
- IBM00858
- IBM01140
- IBM01141
- IBM01142
- IBM01143
- IBM01144
- IBM01145
- IBM01146
- IBM01147
- IBM01148
- IBM01149
- IBM037
- IBM1026
- IBM1047
- IBM273
- IBM277
- IBM278
- IBM280
- IBM284
- IBM285
- IBM297
- IBM420
- IBM424
- IBM437
- IBM500
- IBM775
- IBM850
- IBM852
- IBM855
- IBM857
- IBM860
- IBM861
- IBM862
- IBM863
- IBM864
- IBM865
- IBM866
- IBM868
- IBM869
- IBM870
- IBM871
- IBM918
- ISO-2022-CN
- ISO-2022-JP
- ISO-2022-JP-2
- ISO-2022-KR
- ISO-8859-1
- ISO-8859-13
- ISO-8859-15
- ISO-8859-2
- ISO-8859-3
- ISO-8859-4
- ISO-8859-5
- ISO-8859-6
- ISO-8859-7
- ISO-8859-8
- ISO-8859-9
- JIS_X0201
- JIS_X0212-1990
- KOI8-R
- KOI8-U
- Shift_JIS
- TIS-620
- US-ASCII
- UTF-16
- UTF-16BE
- UTF-16LE
- UTF-32
- UTF-32BE
- UTF-32LE
- UTF-8
- windows-1250
- windows-1251
- windows-1252
- windows-1253
- windows-1254
- windows-1255
- windows-1256
- windows-1257
- windows-1258
- windows-31j
- x-Big5-HKSCS-2001
- x-Big5-Solaris
- x-euc-jp-linux
- x-EUC-TW
- x-eucJP-Open
- x-IBM1006
- x-IBM1025
- x-IBM1046
- x-IBM1097
- x-IBM1098
- x-IBM1112
- x-IBM1122
- x-IBM1123
- x-IBM1124
- x-IBM1364
- x-IBM1381
- x-IBM1383
- x-IBM33722
- x-IBM737
- x-IBM833
- x-IBM834
- x-IBM856
- x-IBM874
- x-IBM875
- x-IBM921
- x-IBM922
- x-IBM930
- x-IBM933
- x-IBM935
- x-IBM937
- x-IBM939
- x-IBM942
- x-IBM942C
- x-IBM943
- x-IBM943C
- x-IBM948
- x-IBM949
- x-IBM949C
- x-IBM950
- x-IBM964
- x-IBM970
- x-ISCII91
- x-ISO-2022-CN-CNS
- x-ISO-2022-CN-GB
- x-iso-8859-11
- x-JIS0208
- x-JISAutoDetect
- x-Johab
- x-MacArabic
- x-MacCentralEurope
- x-MacCroatian
- x-MacCyrillic
- x-MacDingbat
- x-MacGreek
- x-MacHebrew
- x-MacIceland
- x-MacRoman
- x-MacRomania
- x-MacSymbol
- x-MacThai
- x-MacTurkish
- x-MacUkraine
- x-MS932_0213
- x-MS950-HKSCS
- x-MS950-HKSCS-XP
- x-mswin-936
- x-PCK
- x-SJIS_0213
- x-UTF-16LE-BOM
- X-UTF-32BE-BOM
- X-UTF-32LE-BOM
- x-windows-50220
- x-windows-50221
- x-windows-874
- x-windows-949
- x-windows-950
- x-windows-iso2022jp
Charset cs=Charset.forName("ISO-8859-1");
Charset cs=Charset.forName("GBK");
获得了Charset对象的之后,就可已通过该对象的newDecode()和newEncode()这两个方法分别返回CharsetDecode和CharsetEncode对象,代表该Charset的解码器和编码器,调用CharsetDecode的decode()方法就可以将ByteBuffer转换成CharBuffer,调用CharsetEncode就可以将CharBuffer或String转换成ByteBuffer。
- import java.nio.ByteBuffer;
- import java.nio.CharBuffer;
- import java.nio.charset.Charset;
- import java.nio.charset.CharsetDecoder;
- import java.nio.charset.CharsetEncoder;
- public class Test {
- public static void main(String[] args) throws Exception
- {
- Charset cs=Charset.forName("GBK");
- CharsetDecoder cd=cs.newDecoder();
- CharsetEncoder ce=cs.newEncoder();
- CharBuffer cb=CharBuffer.allocate(6);
- cb.put("张");
- cb.put("译");
- cb.put("成");
- cb.flip();
- ByteBuffer bb=ce.encode(cb);
- for(int i=0;i<bb.capacity();i++)
- {
- System.out.println(bb.get(i));
- }
- System.out.println(cd.decode(bb));
- }
- }
Charset还提供了一下方法处理编码问题
CharBuffer decode(ByteBuffer bb)
ByteBuffer encode(CharBuffer cb)
ByteBuffer encode(String str)
我就不解释这三个方法了,估计都能估计出来