编程练习——字符串截取

最新推荐文章于 2022-05-05 22:56:40 发布

原创最新推荐文章于 2022-05-05 22:56:40 发布 · 493 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#java #编码 #string #utf-8 #编程

在java中，字符串“abcd”与字符串“ab你好”的长度是一样，都是四个字符。
但对应的字节数不同，一个汉字占两个字节。
定义一个方法，按照指定的字节数来取子串。

如：对于“ab你好”，如果取三个字节，那么子串就是ab与“你”字的半个，那么半个就要舍弃。如果取四个字节就是“ab你”，取五个字节还是“ab你”。

GBK汉字是两个字节，UTF-8汉字大部分是三个字节。

我们要知道一个知识点：当string。getBytes() 后，中文是会转成负数的。

这样，这道题的思路：编码解码，统计byte数组中的负数，gbk就对二整除，utf-8就除3。

编码我们可以通过System.getProperty("file.encoding").equalsIgnoreCase("gbk")来匹配，就可以写成一个通用发法。

package io;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
public class StringCut {
	public static void main(String[] args) {
//		String str = "ab你好a您aj";
		String str = "ab你好a琲琲琲";
		try {
//			byte []buf = str.getBytes("utf-8");
//			byte []buf = str.getBytes("gbk");
			byte []buf = str.getBytes();
			for(int i=0;i<=buf.length;i++){
//				String res = cutStringUTF8(str,i);
//				String res = cutStringGBK(str,i);
				String res = cutString(str,i);
				System.out.println(i+":"+res);
			}
		} catch (IOException e) {
			e.printStackTrace();
		}
	}

	private static String cutString(String str, int i) throws IOException {
		if(System.getProperty("file.encoding").equalsIgnoreCase("gbk")){
			return cutStringGBK(str,i);
		}
		if(System.getProperty("file.encoding").equalsIgnoreCase("utf-8")){
			return cutStringUTF8(str,i);
		}
		throw new RuntimeException("不支持当前编码");
	}

	private static String cutStringGBK(String str, int a) throws IOException{
		byte []buf = str.getBytes("gbk");
		int count=0;
		for(int i=0;i<a;i++){
//		for(int i=a-1;i>=0;i--){ //这种更好 再加个break 从后算 如果是只有半个中文，那肯定是最后一个为负，所以从后找会更快捷参考cutStringUTF8
			if(buf[i]<0){
				count++;
			}
		}
		if(count%2==0){
			return new String(buf,0,a,"gbk");
		}else{
			return new String(buf,0,a-1,"gbk");
		}
	}

	private static String cutStringUTF8(String str, int a) throws IOException {
		byte []buf=str.getBytes("utf-8");
		int count=0;
		for(int i=a-1;i>=0;i--){
			if(buf[i]<0){
				count++;
			}else{
				break;
			}
		}
		int x=count%3;
		return new String(buf, 0, a-x,"utf-8");
	}
}