code point,code unit

本文深入探讨了16位Unicode字符及其在String API中的应用,详细解释了Unicode编码原理,特别是基本多文种平面(BMP)内字符的16位表示,并介绍了code point与code unit的概念。

从一段API描述谈起: 在String的length的API中描述是这样的!

length

public int length()
Returns the length of this string. The length is equal to the number of 16-bit Unicode characters in the string.

Specified by:
length in interface CharSequence

Returns:
the length of the sequence of characters represented by this object.

其中有一句话:

The length is equal to the number of 16-bit Unicode characters in the string.

直译过来就是: length的大小和 16 bit 的Unicode字符的个数相同!

1、为什么是16bit?

Unicode是包括目前世界上几乎所有语言的字符集,每一个字符对应的一个唯一编号,这个编号规则是:常用的Unicode称谓:BMP,包含了大量的字符集,目前Unicode版本是8.0,BMP是U+0000-U+FFFF代表的字符集。当然了后期又扩展了很多。

可以看到BMP在U+0000-U+FFFF之间的字符,每一个字符的Unicode编码对应的是四个16进制,每个16进制用四个bit表示,所以一个Unicode就是16 bit。

所以BMP内的字符都是由16Bit组成,所以有多少个16bit就有多少个字符。

[Unicode BMP](https://en.wikipedia.org/wiki/Plane_(Unicode) Unicode和UTF-8对应关系

2、String API codePoint什么意思?

每一个16bit的Unicode就是一个codePoint

关于code point、code unit的对应关系:

输入图片说明

wikipedia关于code_point

3、code unit是个什么概念?

The code unit size is equivalent to the bit measurement for the particular encoding:

A code unit in US-ASCII consists of 7 bits; A code unit in UTF-8, EBCDIC and GB18030 consists of 8 bits; A code unit in UTF-16 consists of 16 bits; A code unit in UTF-32 consists of 32 bits. 翻译: 在US-ASCII中一个code unit代表7bits 在UTF-8,EBCDIC和GB18080中一个code unit代表8bits 在UTF-16中一个code unit代表16bits 在UTF-32中一个code unit代表32bits

总结:

code point是从unicode上定义的概念,是指一个字符集比如A代表的16bits。也就是字符的个数。

比如:

	String   s = "π王A23";
		//π用Unicode代表一个16bit的code point
		//王用Unicode代表一个16bit的code point
		//A用Unicode代表一个16bit的code point
		//2用Unicode代表一个16bit的code point
		//3用Unicode代表一个16bit的code point
		System.out.println("字符串s的长度为:"+s.length());
		System.out.println("第三个code point为:"+s.codePointAt(2));

输出:

	字符串s的长度为:5
第三个code point为:65

其中5代表5geunicode字符,每个字符是一个16bit的unicode。 65是代表字母A的标示。是第三个字符A

关于unicode学习最好的方式就是参考Wikipedia中的讲述

转载于:https://my.oschina.net/u/2525142/blog/618823

/usr/local/scala/mycode/exercise2-2.scala:11: error: not found: type Drawable case class Point(var x:Double,var y:Double) extends Drawable{ ^ defined trait Drawable /usr/local/scala/mycode/exercise2-2.scala:12: error: not found: type Point abstract class Shape(var point:Point) extends Drawable{ ^ /usr/local/scala/mycode/exercise2-2.scala:13: error: not found: type Point def moveTo(epoint:Point){ ^ /usr/local/scala/mycode/exercise2-2.scala:11: error: not found: type Shape class Line(var point1:Point,var point2:Point) extends Shape(point1){ ^ /usr/local/scala/mycode/exercise2-2.scala:11: error: not found: type Point class Line(var point1:Point,var point2:Point) extends Shape(point1){ ^ /usr/local/scala/mycode/exercise2-2.scala:11: error: not found: type Point class Line(var point1:Point,var point2:Point) extends Shape(point1){ ^ /usr/local/scala/mycode/exercise2-2.scala:11: error: no arguments allowed for nullary constructor Object: ()Object class Line(var point1:Point,var point2:Point) extends Shape(point1){ ^ /usr/local/scala/mycode/exercise2-2.scala:12: error: not found: type Point override def moveTo(point:Point){ ^ /usr/local/scala/mycode/exercise2-2.scala:25: error: not found: value Point point1 = Point(newPoint1X,newPoint1Y) ^ /usr/local/scala/mycode/exercise2-2.scala:26: error: not found: value Point point2 = Point(newPoint2X,newPoint2Y) ^ /usr/local/scala/mycode/exercise2-2.scala:11: error: not found: type Shape class Circle(var point:Point,var r:Double) extends Shape(point){ ^ /usr/local/scala/mycode/exercise2-2.scala:11: error: not found: type Point cla这是运行结果 帮我查看一下哪里出错
03-31
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值