读取EXCEL文件、XML文件以及TXT文件模式。
1、Excel的读取。
public static Map<String, List<Content>> parse(String file) {
Workbook book = null;
Map<String, List<Content>> map = new HashMap<String, List<Content>>();
List<Content> list = null;
try {
book = Workbook.getWorkbook(new File(file));
int sheets = book.getNumberOfSheets();
Sheet se = null;
for (int i = 0; i < sheets; i++) {
se = book.getSheet(i);
list = new ArrayList<Content>();
for (int j = 1; j < se.getRows(); j++) {
list.add(data(se, j));
}
map.put(se.getName(), list);
}
} catch (BiffException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (book != null) {
book.close();
}
}
return map;
}
private static Content data(Sheet se, int row) {
Cell id, name, author, time, description, content;
Content contents = new Content();
id = se.getCell(0, row);
name = se.getCell(1, row);
author = se.getCell(2, row);
time = se.getCell(3, row);
description = se.getCell(4, row);
content = se.getCell(5, row);
contents.setId(id.getContents());
contents.setName(name.getContents());
contents.setAuthor(author.getContents());
contents.setTime(time.getContents());
contents.setDescription(description.getContents());
contents.setContent(content.getContents());
return contents;
}
此处注意需要加入Excel的解析jar包:jxl.jar。
2、XML的读取
XML的样式:
<list desc="news content xml" start="1" size="1" total="1">
<item type="news">
<id>00</id>
<name>文章标题</name>
<author>你是谁</author>
<time>2008-07-08</time>
<description>
自己的描述内容
</description>
<content>src/content/database/news_cbdwt.txt</content>
</item>
</list>
content/database/news.txt 读取文件:
public static List<Content> parse(String file) {
List<Content> list = new ArrayList<Content>();
NodeList items = readFile(file, "item");
Element node = null;
NodeList ids;
NodeList names;
NodeList authors;
NodeList times;
NodeList descs;
NodeList contents;
Content content;
for (int i = 0; i < items.getLength(); i++) {
content = new Content();
node = (Element) items.item(i);
ids = node.getElementsByTagName("id");
if (ids.getLength() == 1) {
content.setId(((Text) ((Element) ids.item(0)).getFirstChild())
.getNodeValue());
}
names = node.getElementsByTagName("name");
if (names.getLength() == 1) {
content.setName(((Text) ((Element) names.item(0))
.getFirstChild()).getNodeValue());
}
authors = node.getElementsByTagName("author");
if (authors.getLength() == 1) {
content.setAuthor(((Text) ((Element) authors.item(0))
.getFirstChild()).getNodeValue());
}
times = node.getElementsByTagName("time");
if (times.getLength() == 1) {
content.setTime(((Text) ((Element) times.item(0))
.getFirstChild()).getNodeValue());
}
descs = node.getElementsByTagName("description");
if (descs.getLength() == 1) {
content.setDescription(((Text) ((Element) descs.item(0))
.getFirstChild()).getNodeValue());
}
contents = node.getElementsByTagName("content");
if (contents.getLength() == 1) {
content.setContent(((Text) ((Element) contents.item(0))
.getFirstChild()).getNodeValue());
}
list.add(content);
}
return list;
}
private static NodeList readFile(String file, String item) {
File f = new File(file);
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = null;
try {
builder = factory.newDocumentBuilder();
} catch (ParserConfigurationException e) {
e.printStackTrace();
}
Document doc = null;
try {
doc = builder.parse(f);
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
NodeList items = doc.getElementsByTagName(item);
return items;
}
3、读取TXT文件。
public static char[] file(String file) {
char[] chars = null;
FileReader in = null;
try {
in = new FileReader(new File(file));
chars = new char[length(file)];
int ch;
int i = 0;
while ((ch = in.read()) != -1) {
chars[i] = (char) ch;
i++;
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (in != null) {
try {
in.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
return chars;
}
值得注意的是,我是直接采用了FileReader,因为TXT文件直接采用GBK的编码,在进入网页时如果编码正确的话不会引起乱码。
一般来说读取bytep[]应该是最好的,可以支持多样化,但是解析的过程麻烦了一点。
有时间的话可以分析分析rar文件的解析方式。
直接流处理的问题。 有没有必要进行XML的文件流分析??似乎没必要,在目前此种环境下。