概述
- 字符流是专门用于处理文本文件的流,其中包括了文本编码相关转换工作。
- 字符流只可以处理文本字符数据。
- 每个字符流都包括一种编码方式,可以使用系统默认,也可以自行设定。
编码
1. 编码表
生活中语言文字和计算机中对应的数据表
2. 常见编码表
- ASCII:包括字母和常见符号,1字节编码,首位为0。
- ISO8859-1:包括拉丁字母,1字节编码,首位为1。
- GB2312:简体中文码表,包括6~7千汉子及符号,2字节编码,2个字节首位均为1。
- GBK:中文码表,包括GB2312全部编码,约2万个汉子和符号,2字节编码,第一个字节首位为1,第二个字节任意。
- GB18030:中文码表,包括约7万个汉字及少数民族文字,1,2,4字节编码。
- Unicode:国际标准码表,包括大部分世界语言文字,2字节编码。
char类型即为Unicode编码,而String类型为系统默认编码。 - UTF-8:基于Unicode的变长字符编码,每个字节头均有编码信息,国际标准。
3. 案例
按字节数分割文本文件,避免出现多字节编码的割裂。
package io.charstream;import java.io.File;import java.io.FileInputStream;import java.io.IOException;public class Text_Code {public static void main(String[] args) {// 按字节截取字符串内容,避免出现编码割裂File file = new File( "temp\\split.txt");String str = null;for (int i = 0; i < file.length(); i++) {for (int j = i + 1; j <= file.length(); j++) {try {str = splitString(file, i, j); // 截取文件字符,含头不含尾System. out.println(str );} catch (IOException e) {// TODO Auto-generated catch blocke.printStackTrace();}}}}public static String splitString(File file , int i, int j) throws IOException {FileInputStream fis = new FileInputStream( file);byte[] buffer = new byte[1024];fis.read(buffer);fis.close();/** 判断某字节是否为双字节编码:* 由于默认的GBK编码中,可能为单字节编码,也可能是双字节编码。* 单字节字符对应整型为正,双字节字符第一个字节(后称为头字节)对应整型为负。* 所以判断该字节是否为双字节字符的第二个字节(后称为尾字节)的关键,在于在该字节之前距离该字节最近的头字节。* 因为连续负数字节的第一个字节一定是头字节,而连续负数的个数如果是偶数,则该字节刚好是尾子节。否则反之。* 如果该字节为头字节,则左包含右舍弃。*/int pos = i, len = j - i;if (position(buffer , i ) < 0) {// 如果截取下标位于尾字节,则下标向前移动一个字节,长度自增pos = i - 1;len++;}if (position(buffer , j - 1) > 0) {// 由于含头不含尾,所以判断j - 1// 如果截取上标位于头字节,则上标向前移动一个字节,长度自减len--;}return new String( buffer, pos, len);}// 判断该字节为单字节,头字节还是尾字节public static int position(byte[] buffer, int i) {int point = buffer[i];if (point > 0) {// 该字节为正,则可能为单字节或尾字节,取决于之前连续负数的个数if (negCount(buffer , i - 1) % 2 == 0) {return 0; // 单字节} else {return -1; // 尾子节}} else {// 该字节为负,则可能为头字节或尾字节,取决于之前连续负数的个数(包括自己)if (negCount(buffer , i ) % 2 == 0) {return -1; // 尾子节} else {return 1; // 头字节}}}// 获取该字节前(包括该字节)有多少个连续负数字节public static int negCount(byte[] buffer, int i) {int sum = 0;while ( i >= 0 && buffer[ i] < 0) {sum++;i--;}return sum;}}
文件字符流(FileReader/FileWriter)
1. 概述
- 目标:文件,内容:字符流,方向:输入/输出。
- 能够很方便的处理文本文件,但也只能处理文本文件。
- 同字节流类似,可以读写单个字符,也可以使用字符数组读写多个字符。
- FileReader/FileWriter使用的是系统默认编码。
- FileReader/FileWriter自带默认缓冲区。
- 方法均继承自父类(InputStreamReader/OutputStreamWriter)。
- FileReader/FileWriter是其父类InputStreamReader/OutputStreamWriter的简易版本。
2. 构造器
- FileReader:
FileReader(File file)
FileReader(String fileName) - FileWriter:
FileWriter(File file)
FileWriter(File file, boolean append)
FileWriter(String fileName)
FileWriter(String fileName, boolean append)
3. 常用方法
- FileReader:
void close()
String getEncoding()
int read()
int read(char[] cbuf, int offset, int length) - FileWriter:
void close()
void flush()
String getEncoding()
void write(char[] cbuf, int off, int len)
void write(int c)
void write(String str, int off, int len)
4. 注意
- FileReader/FileWriter内置缓冲区和自定义缓冲区的区别?
实际上,FileReader和FileWriter类加上使用自定义字符缓冲区,也就是两层缓冲,分别是OS层缓冲和APP层缓冲,第一层将数据从硬盘上读到内存中,第二层将第一层缓冲中的数据转换为字符。
5. 示例
读取文件
public static String readFile() {StringBuilder str = new StringBuilder();char[] buffer = new char[1024];int len = -1;FileReader fr = null;try {// 使用自定义缓冲区读取文件fr = new FileReader( "temp\\char.txt");while (( len = fr.read( buffer)) != -1) {str.append(buffer, 0, len);}} catch (IOException e) {System.out.println( "读取文件失败!" );e.printStackTrace();} finally {// 释放文件if (fr != null) {try {fr.close();} catch (IOException e) {System. out.println("释放文件失败!" );e.printStackTrace();throw new RuntimeException();}}}return str.toString();}
写入文件
public static void writeFile(String str) {FileWriter fw = null;try {// 写入文件fw = new FileWriter( "temp\\char_out.txt");fw.write(str.toCharArray());} catch (IOException e) {System.out.println( "写入文件失败!" );e.printStackTrace();} finally {// 释放文件if (fw != null) {try {fw.close();} catch (IOException e) {System. out.println("释放文件失败!" );e.printStackTrace();throw new RuntimeException();}}}}
字符转换流(InputStreamReader/OutputStreamWriter)
1. 概述
- 目标:字节流,内容:字符流,方向:输入/输出。
- InputStreamReader/OutputStreamWriter是字节流和字符流之间的桥梁,扮演转换的角色,其本质是字符流。
- InputStreamReader/OutputStreamWriter也是一个装饰类,对字节流的装饰,提供按指定编码识别字符功能。
2. 构造器
- InputStreamReader:
InputStreamReader(InputStream in)
InputStreamReader(InputStream in, Charset cs)
InputStreamReader(InputStream in, CharsetDecoder dec)
InputStreamReader(InputStream in, String charsetName) - OutputStreamWriter:
OutputStreamWriter(OutputStream out)
OutputStreamWriter(OutputStream out, Charset cs)
OutputStreamWriter(OutputStream out, CharsetEncoder enc)
OutputStreamWriter(OutputStream out, String charsetName)
3. 常用方法
- InputStreamReader:
void close()
String getEncoding()
int read()
int read(char[] cbuf, int offset, int length) - OutputStreamWriter:
void close()
void flush()
String getEncoding()
void write(char[] cbuf, int off, int len)
void write(int c)
void write(String str, int off, int len)
4. 示例
按指定编码写入
public static void writeFile_UTF8(String str) {OutputStreamWriter osw = null;try {osw = new OutputStreamWriter( new FileOutputStream("temp\\char_utf8.txt" ), "UTF-8" );osw.write(str.toCharArray());} catch (UnsupportedEncodingException e) {// 不支持的字符编码e.printStackTrace();} catch (FileNotFoundException e) {// 文件没有找到e.printStackTrace();} catch (IOException e) {// 文件写入失败e.printStackTrace();} finally {if (osw != null) {try {osw.close();} catch (IOException e) {// 文件关闭失败e.printStackTrace();throw new RuntimeException( "文件关闭失败!" );}}}}
缓冲字符流(BufferedReader/BufferedWriter)
1. 概述
- BufferedReader/BufferedWriter是字符流的装饰类,用于提供缓冲功能,类似于字节流的BufferedInputStream/BufferedOutputStram。
- BufferedReader/BufferedWriter可以指定缓冲区的大小。
- BufferedReader/BufferedWriter提供了按行读取写入功能,其他的字符流是不具备的。
- 通过用合适的 BufferedReader 替代每个 DataInputStream,可以对将 DataInputStream 用于文字输入的程序进行本地化。
- 为了达到最高效率,推荐考虑在 BufferedReader/BufferedWriter 内包装 InputStreamReader/OutputStreamWriter。
- BufferedWriter提供了newLine方法,能够写入当前系统支持的换行符。
2. 构造器
- BufferedReader:
BufferedReader(Reader in)
BufferedReader(Reader in, int sz) - BufferedWriter:
BufferedWriter(Writer out)
BufferedWriter(Writer out, int sz)
3. 常用方法
- BufferedReader:
void close()
int read()
int read(char[] cbuf, int off, int len)
String readLine()
long skip(long n) - BufferedWriter:
void close()
void flush()
void newLine()
void write(char[] cbuf, int off, int len)
void write(int c)
void write(String s, int off, int len)
4. 示例
package io.charstream;import java.io.BufferedReader;import java.io.BufferedWriter;import java.io.FileReader;import java.io.FileWriter;import java.io.IOException;public class CharBufferIODemo {public static void main(String[] args) throws IOException {String str = null;// 读取文件BufferedReader br = new BufferedReader( new FileReader("temp\\char.txt" ));while (( str = br.readLine()) != null) {System.out.println( str);}br.close();// 复制文件copyFileUseBuffer("temp\\char.txt" );}public static void copyFileUseBuffer(String string) throws IOException {BufferedReader br = new BufferedReader( new FileReader(string));// 将文件扩展名前添加_copy来命名复制的文件string = new StringBuilder(string).insert(string .lastIndexOf('.'), "_copy").toString();BufferedWriter bw = new BufferedWriter( new FileWriter(string));String str = null;while (( str = br.readLine()) != null) {bw.write(str);bw.newLine();}br.close();bw.close();}}
案例
使用模板方法构建复杂度检测类
GetComplexity.java
package tools;public abstract class GetComplexity {// 模板方法,实现具体算法骨架,不确定的部分由子类定义。private final Runtime s_runtime = Runtime. getRuntime();private String name = null;public GetComplexity() {super();}public GetComplexity(String name) {super();this.name = name;}public final long[] start() {// 获取起始时间long startTime = System.nanoTime();runGC();// 获取起始内存消耗long startMemory = s_runtime.totalMemory() - s_runtime.freeMemory();run();// 获取实际时间和内存消耗long[] use ={System. nanoTime() - startTime,s_runtime.totalMemory() - s_runtime.freeMemory() - startMemory };if (isPrinted()) {if (name == null) {System. out.println(this .getClass().getName() + " :");} else {System. out.println(this .name + " :" );}System.out.printf( "Estimated Time is %.3f ms.\n" , (double) use [0] / 1000000);System.out.printf( "Used Memory is %.3f KB.\n" , (double) use [1] / 1000);}return use;}private void runGC() {for (int i = 0; i < 4; i++) {System.gc();}}public abstract void run();// 这是一个hook,可以由子类扩展功能。public boolean isPrinted() {return true;}}
1. 使用各种方法复制一个大的单行文本文件,并比较复杂度
各种方法复制文本文件的工具类
CharFileCopy.java
package io.charstream;import java.io.BufferedReader;import java.io.BufferedWriter;import java.io.File;import java.io.FileReader;import java.io.FileWriter;import java.io.IOException;public class CharFileCopy {/*** 使用FileReader/Writer复制文本** @param old* @param newFile* @throws IOException*/public static void copyCharFile(File old, File newFile) throws IOException {FileReader fr = new FileReader( old);FileWriter fw = new FileWriter( newFile);int ch = -1;while (( ch = fr.read()) != -1) {fw.write(ch);}fr.close();fw.close();}/*** 使用自定义缓冲区的FileReader/Writer复制文本** @param old* @param newFile* @throws IOException*/public static void copyCharFile_UseArr(File old, File newFile) throws IOException {FileReader fr = new FileReader( old);FileWriter fw = new FileWriter( newFile);int len = -1;char[] buf = new char[1024];while (( len = fr.read( buf)) != -1) {fw.write(buf, 0, len);}fr.close();fw.close();}/*** 使用BufferedReader/Writer复制文本** @param old* @param newFile* @throws IOException*/public static void bufCopyCharFile(File old, File newFile) throws IOException {BufferedReader br = new BufferedReader( new FileReader(old));BufferedWriter bw = new BufferedWriter( new FileWriter(newFile));int ch = -1;while (( ch = br.read()) != -1) {bw.write(ch);}br.close();bw.close();}/*** 使用BufferedReader/Writer按行复制文本** @param old* @param newFile* @throws IOException*/public static void bufCopyCharFile_UseLine(File old, File newFile) throws IOException {BufferedReader br = new BufferedReader( new FileReader(old));BufferedWriter bw = new BufferedWriter( new FileWriter(newFile));String line = null;while (( line = br.readLine()) != null) {bw.write(line);bw.newLine();}br.close();bw.close();}/*** 使用BufferedReader/Writer按数组复制文本** @param old* @param newFile* @throws IOException*/public static void bufCopyCharFile_UseArr(File old, File newFile) throws IOException {BufferedReader br = new BufferedReader( new FileReader(old));BufferedWriter bw = new BufferedWriter( new FileWriter(newFile));char[] buf = new char[1024];int len = -1;while (( len = br.read( buf)) != -1) {bw.write(buf, 0, len);}br.close();bw.close();}}
主程序
将各种方式复制文本的方法封装成不同类,这些类继承自GetComplexity以获得检测复杂度的功能。
CharFileCopyDemo.java
package io.charstream;import java.io.File;import java.io.IOException;import tools.GetComplexity;public class CharFileCopyDemo {public static void main(String[] args) {File src = new File( "temp\\singlelinechar.txt");File dest1 = new File("temp\\CopyCharFile\\char_copy.txt" );File dest2 = new File("temp\\CopyCharFile\\char_copy_arr.txt" );File dest3 = new File("temp\\CopyCharFile\\char_bufcopy.txt" );File dest4 = new File("temp\\CopyCharFile\\char_bufcopy_line.txt" );File dest5 = new File("temp\\CopyCharFile\\char_bufcopy_arr.txt" );new CopyCharFile( "使用FileReader/Writer复制文本" , src, dest1 ).start();new CopyCharFile_UseArr( "使用自定义缓冲区的FileReader/Writer复制文本" , src , dest2 ).start();new BufCopyCharFile( "使用BufferedReader/Writer复制文本" , src , dest3 ).start();new BufCopyCharFile_UseLine( "使用BufferedReader/Writer按行复制文本" , src , dest4 ).start();new BufCopyCharFile_UseArr( "使用BufferedReader/Writer按数组复制文本" , src , dest5).start();}}class CopyCharFile extends GetComplexity {private File src = null;private File dest = null;public CopyCharFile(String name, File src, File dest) {super(name);this.src = src;this.dest = dest;}@Overridepublic void run() {try {CharFileCopy.copyCharFile(src , dest );} catch (IOException e) {e.printStackTrace();}}}class CopyCharFile_UseArr extends GetComplexity {private File src = null;private File dest = null;public CopyCharFile_UseArr(String name, File src, File dest) {super(name);this.src = src;this.dest = dest;}@Overridepublic void run() {try {CharFileCopy.copyCharFile_UseArr(src , dest );} catch (IOException e) {e.printStackTrace();}}}class BufCopyCharFile extends GetComplexity {private File src = null;private File dest = null;public BufCopyCharFile(String name, File src, File dest) {super(name);this.src = src;this.dest = dest;}@Overridepublic void run() {try {CharFileCopy.bufCopyCharFile(src , dest );} catch (IOException e) {e.printStackTrace();}}}class BufCopyCharFile_UseLine extends GetComplexity {private File src = null;private File dest = null;public BufCopyCharFile_UseLine(String name, File src, File dest) {super(name);this.src = src;this.dest = dest;}@Overridepublic void run() {try {CharFileCopy.bufCopyCharFile_UseLine(src , dest );} catch (IOException e) {e.printStackTrace();}}}class BufCopyCharFile_UseArr extends GetComplexity {private File src = null;private File dest = null;public BufCopyCharFile_UseArr(String name, File src, File dest) {super(name);this.src = src;this.dest = dest;}@Overridepublic void run() {try {CharFileCopy.bufCopyCharFile_UseArr(src , dest );} catch (IOException e) {e.printStackTrace();}}}
运行结果
使用FileReader/Writer复制文本 :
Estimated Time is 1707.23 ms.
Used Memory is 37464.54 KB.
使用自定义缓冲区的FileReader/Writer复制文本 :
Estimated Time is 229.43 ms.
Used Memory is 3247.94 KB.
使用BufferedReader/Writer复制文本 :
Estimated Time is 950.88 ms.
Used Memory is 1623.88 KB.
使用BufferedReader/Writer按行复制文本 :
Estimated Time is 309.89 ms.
Used Memory is 178302.42 KB.
使用BufferedReader/Writer按数组复制文本 :
Estimated Time is 238.92 ms.
Used Memory is 1077.44 KB.
结论
- 不使用数组来复制文件将消耗较多时间。
- 按行复制对于单行较长的文件将消耗较多空间。
- 最优化的方案就是使用带缓冲功能并自定义数组进行复制,即双层缓冲。
2. 获取文本文件中某字符串出现次数
package io.charstream;import java.io.BufferedReader;import java.io.File;import java.io.FileReader;import java.io.IOException;import tools.GetComplexity;import tools.StringTools;public class Text_GetCount {public static void main(String[] args) {new GetCountByBuffer( "按自定义缓冲区查找字符串" ).start();new GetCountByLine( "按行查找字符串" ).start();}}/*** 通过带缓冲字符流按行读取,获取文件中某字段出现次数* 继承自GetComplexity,使得可以获取该算法所消耗的时间和内存*/class GetCountByLine extends GetComplexity {public GetCountByLine() {super();}public GetCountByLine(String name) {super(name);}public void run() {try {File file = new File( "temp\\singlelinechar.txt");int count = getCountByLine( file, "个人文档" );System.out.println( "“个人文档”出现次数:" + count);} catch (IOException e) {e.printStackTrace();}}public int getCountByLine(File file, String target) throws IOException {BufferedReader br = new BufferedReader( new FileReader(file));String line = null;int sum = 0;// 使用字符流循环获取每一行的字符串,并获取其中出现特定字段的次数,并累加// 由于可能出现文件中某行内容太多,所以该方法可能会出现占用内存过大的问题while (( line = br.readLine()) != null) {sum += StringTools.getCount(line , target );}br.close();return sum;}}/*** 通过字符流以及自定义缓冲区,获取文件中某字段出现次数* 由于该方法缓冲区大小是固定的,所以避免了某一行过长照成消耗内存过大的情况* 继承自GetComplexity,使得可以获取该算法所消耗的时间和内存*/class GetCountByBuffer extends GetComplexity {public GetCountByBuffer() {super();}public GetCountByBuffer(String name) {super(name);}// 自定义缓冲区大小为1024*2字节private final int BUFFER_SIZE = 1024;public void run() {File file = new File( "temp\\singlelinechar.txt");try {int count = getCount( file, "个人文档" );System.out.println( "“个人文档”出现次数:" + count);} catch (IOException e) {e.printStackTrace();}}public int getCount(File file, String target) throws IOException {FileReader fr = new FileReader( file);char[] buf = new char[ BUFFER_SIZE];int len = -1;int sum = 0;// 首先判断文件是否为空,是则返回,否则继续if ((len = fr.read(buf)) == -1) {fr.close();return 0;}// 第一次读取,将文件内容读取到缓冲区中,并累加特定字段出现的次数sum += StringTools.getCount(new String(buf, 0, len), target);// 由于特定字段不止一个字符,可能出现特定字段被缓冲区截断的情况,以至于可能会漏记次数// 所以这里将最后可能被截断的字符串保留(长度最长为target.length() - 1),复制到缓冲区最前面System.arraycopy(buf , BUFFER_SIZE - target .length() + 1, buf, 0, target .length() - 1);// 下一次读取将覆盖缓冲区剩余空间(长度为BUFFER_SIZE - target.length() + 1),并循环判断累加次数。while (( len = fr.read( buf, target.length() - 1, BUFFER_SIZE - target .length() + 1)) != -1) {sum += StringTools.getCount(new String(buf, 0, len + target.length() - 1), target );System.arraycopy(buf , BUFFER_SIZE - target .length() + 1, buf, 0, target .length() - 1);}fr.close();return sum;}}
运行结果
“个人文档”出现次数:113496
按自定义缓冲区查找字符串 :
Estimated Time is 157.69 ms.
Used Memory is 5902.80 KB.
“个人文档”出现次数:113496
按行查找字符串 :
Estimated Time is 188.58 ms.
Used Memory is 178165.50 KB.
3. 自行设计BufferedReader类
package io.charstream;import java.io.FileReader;import java.io.IOException;import java.io.Reader;class MyBufferedReader {private Reader r;private static final int SIZE = 1024;// 缓冲区指针private int pos = 0;// 缓冲区字符数private int count = 0;// 缓冲区定义private char[] buffer = new char[ SIZE];public MyBufferedReader(Reader r) {super();this.r = r;}/*** 将数据从流读入缓冲区** @throws IOException*/private void fill() throws IOException {count = r.read(buffer);pos = 0;}/*** 读取单个字符** @return* @throws IOException*/public int read() throws IOException {if (pos >= count) {fill();}if (count == -1) {return -1;}return ( int) buffer[ pos++];}/*** 读取一行数据,换行符为"\n"** @return* @throws IOException*/public String readLine() throws IOException {StringBuilder sb = new StringBuilder();int ch;while (( ch = read()) != -1) {if (ch == '\r') {continue;} else if (ch == '\n') {return sb .toString();} else {sb.append(( char) ch);}}if (sb.length() != 0) {return sb.toString();} else {return null;}}/*** 关闭流* @throws IOException*/public void close() throws IOException {r.close();}}public class CharMyBufferedDemo {public static void main(String[] args) {MyBufferedReader mbr = null;String str = null;try {mbr = new MyBufferedReader( new FileReader("temp\\char.txt" ));while (( str = mbr.readLine()) != null) {System. out.println(str );}mbr.close();} catch (IOException e) {e.printStackTrace();}}}
本文深入探讨字符流的基础概念、常见编码及其应用场景,包括文件字符流、字符转换流、缓冲字符流等,通过丰富的示例代码展示如何高效地处理文本文件。
648

被折叠的 条评论
为什么被折叠?



