本文主要记录最近的工作内容,使用java实现对Excel(03,07)的上传、解析、验证和入库(PostgreSQL)。
一。上传
实际就是实现文件上传至服务器即可,但是方法有很多,首先要考虑采用何种方法实现上传:
1.参考文章:
主要有以下几种方法:
JSP+Servlet(或纯JSP);Struts2;Struts;FTP;ExtJs;Flex;
2.我采用的是jsp+Servlet的方式来实现的。需要先从apache官网下载fileupload.jar。
UploadFileServlet.java的主要代码:
![[转载]Java对Excel(03,07)进行上传、解析、验证、入库 [转载]Java对Excel(03,07)进行上传、解析、验证、入库](https://i-blog.csdnimg.cn/blog_migrate/cde5deb62f07e8e0fefe78bd744c5832.jpeg)
testfileupload.jsp的主要代码:
![[转载]Java对Excel(03,07)进行上传、解析、验证、入库 [转载]Java对Excel(03,07)进行上传、解析、验证、入库](https://i-blog.csdnimg.cn/blog_migrate/bdeb8b20db87a4c12c1c7725a0c0c453.jpeg)
3.需要注意的几个问题
(1)上传文件的大小限制
设置 sfu.setFileSizeMax(1024*1024*10000);单位byte.表示近10G。
测试上传1.5G的文件没问题,2.9G以上的文件上传失败,无错误提示,而是“无法显示该页面”的错误:
![[转载]Java对Excel(03,07)进行上传、解析、验证、入库 [转载]Java对Excel(03,07)进行上传、解析、验证、入库](https://i-blog.csdnimg.cn/blog_migrate/b197ad6196e53fa90a94bb60cdd12ddd.jpeg)
所以目前不清楚到底支持多大的文件上传。
二。解析
1.java解析excel通常有两种方法:(1)jxl(2)poi.
至于选用哪种那做得看具体的需求,我这次主要是要支持03,07的excel,目前jxl更新慢,尚不支持07,所以只能选用poi;本次只涉及到excel的读操作,没有写操作,jxl的写快于poi,但jxl的读慢于poi.
综合下来,选用poi。这个poi具体是啥意思我不太清楚,但不是常说的感兴趣点(point of interest)哈.
2.是否支持大文件解析
excel03最多只能有65536条记录,65536行*256列;excel07:1048576行*16384列。
excel07文件超过65536条记录,另存为xls时会报错:
![[转载]Java对Excel(03,07)进行上传、解析、验证、入库 [转载]Java对Excel(03,07)进行上传、解析、验证、入库](https://i-blog.csdnimg.cn/blog_migrate/04098ad048be37875f3ebc9c19b4b816.jpeg)
所以要支持大文件,必须得用excel07格式来操作。
需要考虑这点,因为这决定了是用DOM做,还是用SAX。
就1和2来看,java解析excel比起.net解析excel是麻烦了很多很多啊,.net解析excel主要有三种方法:com组件;oledb;openxml。其中使用oledb最常用也最简单。
3.jar包下载
![[转载]Java对Excel(03,07)进行上传、解析、验证、入库 [转载]Java对Excel(03,07)进行上传、解析、验证、入库](https://i-blog.csdnimg.cn/blog_migrate/41f23a7c8ad932239027e3a945db09d5.jpeg)
jxl;http://www.andykhan.com/jexcelapi/download.html,最新版本为JExcelApiv2.6.12
poi:http://poi.apache.org/download.html#POI-3.9,最新版本为poi-bin-3.9-20121203.zip
4.poi+SAX,支持excel07大文件解析(EventUserModel)
这种方式应该是最高级别的,能满足大数据量需求,也不会造成oom错误的。我这次需要采用的就是这种实现方式。
主要定义了三个类来实现:
(1)Excel2003Reader.java,操作03Excel。
package com.cbe.excel;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
importorg.apache.poi.hssf.eventusermodel.EventWorkbookBuilder.SheetRecordCollectingListener;
importorg.apache.poi.hssf.eventusermodel.FormatTrackingHSSFListener;
import org.apache.poi.hssf.eventusermodel.HSSFEventFactory;
import org.apache.poi.hssf.eventusermodel.HSSFListener;
import org.apache.poi.hssf.eventusermodel.HSSFRequest;
importorg.apache.poi.hssf.eventusermodel.MissingRecordAwareHSSFListener;
importorg.apache.poi.hssf.eventusermodel.dummyrecord.LastCellOfRowDummyRecord;
importorg.apache.poi.hssf.eventusermodel.dummyrecord.MissingCellDummyRecord;
import org.apache.poi.hssf.model.HSSFFormulaParser;
import org.apache.poi.hssf.record.BOFRecord;
import org.apache.poi.hssf.record.BlankRecord;
import org.apache.poi.hssf.record.BoolErrRecord;
import org.apache.poi.hssf.record.BoundSheetRecord;
import org.apache.poi.hssf.record.FormulaRecord;
import org.apache.poi.hssf.record.LabelRecord;
import org.apache.poi.hssf.record.LabelSSTRecord;
import org.apache.poi.hssf.record.NumberRecord;
import org.apache.poi.hssf.record.Record;
import org.apache.poi.hssf.record.SSTRecord;
import org.apache.poi.hssf.record.StringRecord;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
public class Excel2003Reader implements HSSFListener {
private int minColumns = -1;
private POIFSFileSystem fs;
private int lastRowNumber;
private int lastColumnNumber;
private boolean outputFormulaValues = true;
private SheetRecordCollectingListenerworkbookBuildingListener;
//excel2003工作薄
private HSSFWorkbook stubWorkbook;
// Records we pick up as we process
private SSTRecord sstRecord;
private FormatTrackingHSSFListenerformatListener;
//表索引
private int sheetIndex = -1;
private BoundSheetRecord[] orderedBSRs;
//@SuppressWarnings("unchecked")
@SuppressWarnings("rawtypes")
private ArrayList boundSheetRecords = newArrayList();
// For handling formulas with stringresults
private int nextRow;
private int nextColumn;
private boolean outputNextStringRecord;
//当前行
//private int curRow = 0;
//存储行记录的容器
privateList<String> rowlist = newArrayList<String>();
privateList<List<String>>exceldata=newArrayList<List<String>>();
@SuppressWarnings( "unused")
private String sheetName;
publicList<List<String>>getExcelData(){
return exceldata;
}
public void process(String fileName) throwsIOException {
this.fs = newPOIFSFileSystem(new FileInputStream(fileName));
MissingRecordAwareHSSFListenerlistener = new MissingRecordAwareHSSFListener(
this);
formatListener = newFormatTrackingHSSFListener(listener);
HSSFEventFactory factory = newHSSFEventFactory();
HSSFRequest request = newHSSFRequest();
if (outputFormulaValues){
request.addListenerForAllRecords(formatListener);
} else {
workbookBuildingListener= new SheetRecordCollectingListener(
formatListener);
request.addListenerForAllRecords(workbookBuildingListener);
}
factory.processWorkbookEvents(request,fs);
}
@SuppressWarnings("unchecked")
public void processRecord(Record record) {
//List<String> rowlist = newArrayList<String>();
int thisRow = -1;
int thisColumn = -1;
String thisStr = null;
String value = null;
switch (record.getSid()){
caseBoundSheetRecord.sid:
boundSheetRecords.add(record);
break;
caseBOFRecord.sid:
BOFRecordbr = (BOFRecord) record;
if(br.getType() == BOFRecord.TYPE_WORKSHEET) {
//如果有需要,则建立子工作薄
if(workbookBuildingListener != null&& stubWorkbook == null) {
stubWorkbook= workbookBuildingListener
.getStubHSSFWorkbook();
}
sheetIndex++;
if(orderedBSRs == null) {
orderedBSRs= BoundSheetRecord
.orderByBofPosition(boundSheetRecords);
}
sheetName= orderedBSRs[sheetIndex].getSheetname();
}
break;
caseSSTRecord.sid:
sstRecord= (SSTRecord) record;
break;
caseBlankRecord.sid:
BlankRecordbrec = (BlankRecord) record;
thisRow= brec.getRow();
thisColumn= brec.getColumn();
thisStr= "";
rowlist.add(thisColumn,thisStr);
break;
caseBoolErrRecord.sid: //单元格为布尔类型
BoolErrRecordberec = (BoolErrRecord) record;
thisRow= berec.getRow();
thisColumn= berec.getColumn();
thisStr= berec.getBooleanValue()+"";
rowlist.add(thisColumn,thisStr);
break;
caseFormulaRecord.sid: //单元格为公式类型
FormulaRecordfrec = (FormulaRecord) record;
thisRow= frec.getRow();
thisColumn= frec.getColumn();
if(outputFormulaValues) {
if(Double.isNaN(frec.getValue())) {
//Formula result is a string
//This is stored in the next record
outputNextStringRecord= true;
nextRow= frec.getRow();
nextColumn= frec.getColumn();
}else {
thisStr= formatListener.formatNumberDateCell(frec);
}
}else {
thisStr= '"' + HSSFFormulaParser.toFormulaString(stubWorkbook,
frec.getParsedexpression_r())+ '"';
}
rowlist.add(thisColumn,thisStr);
break;
caseStringRecord.sid://单元格中公式的字符串
if(outputNextStringRecord) {
//String for formula
StringRecordsrec = (StringRecord) record;
thisStr= srec.getString();
thisRow= nextRow;
thisColumn= nextColumn;
outputNextStringRecord= false;
}
break;
caseLabelRecord.sid:
LabelRecordlrec = (LabelRecord) record;
//curRow= thisRow = lrec.getRow();
thisColumn= lrec.getColumn();
value= lrec.getValue().trim();
value= value.equals("")?" ":value;
rowlist.add(thisColumn,value);
break;
caseLabelSSTRecord.sid: //单元格为字符串类型
LabelSSTRecordlsrec = (LabelSSTRecord) record;
//curRow= thisRow = lsrec.getRow();
thisColumn= lsrec.getColumn();
if(sstRecord == null) {
rowlist.add(thisColumn," ");
}else {
value= sstRecord
.getString(lsrec.getSSTIndex()).toString().trim();
value= value.equals("")?" ":value;
rowlist.add(thisColumn,value);
}
break;
caseNumberRecord.sid: //单元格为数字类型
NumberRecordnumrec = (NumberRecord) record;
//curRow= thisRow = numrec.getRow();
thisColumn= numrec.getColumn();
value= formatListener.formatNumberDateCell(numrec).trim();
value= value.equals("")?" ":value;
//向容器加入列值
rowlist.add(thisColumn,value);
break;
default:
break;
}
// 遇到新行的操作
if (thisRow != -1&& thisRow != lastRowNumber){
lastColumnNumber= -1;
}
// 空值的操作
if (record instanceofMissingCellDummyRecord) {
MissingCellDummyRecordmc = (MissingCellDummyRecord) record;
//curRow =thisRow = mc.getRow();
thisColumn =mc.getColumn();
rowlist.add(thisColumn,"");
}
// 更新行和列的值
if (thisRow >-1)
lastRowNumber= thisRow;
if (thisColumn >-1)
lastColumnNumber= thisColumn;
// 行结束时的操作
if (record instanceofLastCellOfRowDummyRecord) {
if(minColumns > 0) {
//列值重新置空
if(lastColumnNumber == -1) {
lastColumnNumber= 0;
}
}
lastColumnNumber= -1;
//每行结束时, 调用getRows() 方法
//rowReader.getRows(sheetIndex,curRow,rowlist);
List<String>rowlistcopy = newArrayList<String>();
for (int i =0; i < rowlist.size(); i++) {
rowlistcopy.add(rowlist.get(i));
}
exceldata.add(rowlistcopy);
// 清空容器
rowlist.clear();
}
}
}
(2)Excel2007ReaderNew.java,操作07excel,这个是核心,之所以有个New,是取代了之前的Excel2007Reader.java,解决有单元格为空的判定处理问题。
参考:http://gaosheng08.iteye.com/blog/624758
package com.cbe.excel;
importjava.io.InputStream;
importjava.util.ArrayList;
importjava.util.Iterator;
import java.util.List;
importorg.apache.poi.xssf.eventusermodel.XSSFReader;
importorg.apache.poi.xssf.model.SharedStringsTable;
importorg.apache.poi.xssf.usermodel.XSSFRichTextString;
importorg.apache.poi.openxml4j.opc.OPCPackage;
importorg.xml.sax.Attributes;
importorg.xml.sax.InputSource;
importorg.xml.sax.SAXException;
importorg.xml.sax.XMLReader;
importorg.xml.sax.helpers.DefaultHandler;
importorg.xml.sax.helpers.XMLReaderFactory;
public classExcel2007ReaderNew extendsDefaultHandler {
private SharedStringsTablesst;
private StringlastContents;
private booleannextIsString;
@SuppressWarnings("unused")
private int sheetIndex =-1;
private List<String> rowlist = newArrayList<String>();
private int curRow =0; //当前行
private int curCol =0; //当前列索引
private int preCol =0; //上一列列索引
private int titleRow = 0; //标题行,一般情况下为0
private int rowsize =0; //列数
privateList<List<String>>exceldata=newArrayList<List<String>>();//整个excel数据
publicList<List<String>>getExcelData(){
return exceldata;
}
//excel记录行操作方法,以行索引和行元素列表为参数,对一行元素进行操作,元素为String类型
// public abstract voidoptRows(int curRow, List<String>rowlist) throws SQLException;
//excel记录行操作方法,以sheet索引,行索引和行元素列表为参数,对sheet的一行元素进行操作,元素为String类型
// publicabstract void optRows(int sheetIndex,int curRow,List<String> rowlist) throwsSQLException;
//只遍历一个sheet,其中sheetId为要遍历的sheet索引,从1开始,1-3
public void processOneSheet(String filename,int sheetId) throwsException {
OPCPackage pkg =OPCPackage.open(filename);
XSSFReader r = newXSSFReader(pkg);
SharedStringsTable sst =r.getSharedStringsTable();
XMLReader parser =fetchSheetParser(sst);
// rId2 found by processing theWorkbook
// 根据 rId# 或 rSheet#查找sheet
InputStream sheet2 =r.getSheet("rId"+sheetId);
sheetIndex++;
InputSource sheetSource = newInputSource(sheet2);
parser.parse(sheetSource);
sheet2.close();
pkg.close();//add by lfc 20130710,解决文件使用后无法删除的问题
}
public void process(String filename) throws Exception{
OPCPackage pkg =OPCPackage.open(filename);
XSSFReader r = newXSSFReader(pkg);
SharedStringsTable sst =r.getSharedStringsTable();
XMLReader parser =fetchSheetParser(sst);
Iterator<InputStream> sheets =r.getSheetsData();
while (sheets.hasNext()){
curRow = 0;
sheetIndex++;
InputStream sheet =sheets.next();
InputSource sheetSource = newInputSource(sheet);
parser.parse(sheetSource);
sheet.close();
}
pkg.close();//add by lfc 20130710,解决文件使用后无法删除的问题
}
public XMLReader fetchSheetParser(SharedStringsTablesst)
throws SAXException {
XMLReader parser =XMLReaderFactory
.createXMLReader("org.apache.xerces.parsers.SAXParser");
this.sst = sst;
parser.setContentHandler(this);
return parser;
}
public void startElement(String uri, String localName, Stringname,
Attributes attributes) throws SAXException{
// c =>单元格
if (name.equals("c")) {
// 如果下一个元素是 SST的索引,则将nextIsString标记为true
String cellType =attributes.getValue("t");
String rowStr =attributes.getValue("r");
curCol =this.getRowIndex(rowStr);
if (cellType != null &&cellType.equals("s")) {
nextIsString = true;
} else {
nextIsString = false;
}
}
// 置空
lastContents = "";
}
public void endElement(String uri, String localName, Stringname)
throws SAXException {
//根据SST的索引值的到单元格的真正要存储的字符串
//这时characters()方法可能会被调用多次
if (nextIsString) {
try {
int idx =Integer.parseInt(lastContents);
lastContents = newXSSFRichTextString(sst.getEntryAt(idx))
.toString();
} catch (Exception e) {
}
}
// v =>单元格的值,如果单元格是字符串则v标签的值为该字符串在SST中的索引
//将单元格内容加入rowlist中,在这之前先去掉字符串前后的空白符
if (name.equals("v")) {
String value =lastContents.trim();
value = value.equals("")?"":value;
int cols = curCol-preCol;
if(cols>1){
for (int i = 0;i <cols-1;i++){
rowlist.add(preCol,"");
}
}
preCol = curCol;
rowlist.add(curCol-1,value);
}else {
//如果标签名称为 row ,这说明已到行尾,调用 optRows()方法
if (name.equals("row")) {
int tmpCols =rowlist.size();
if(curRow>this.titleRow&&tmpCols<this.rowsize){
for (int i = 0;i <this.rowsize-tmpCols;i++){
rowlist.add(rowlist.size(),"");
}
}
// add by lfc 20130710,注释掉
if(curRow==this.titleRow){
this.rowsize =rowlist.size();
}
List<String> rowlistcopy = newArrayList<String>();
for(int i = 0; i < rowlist.size(); i++) {
rowlistcopy.add(rowlist.get(i));
}
exceldata.add(rowlistcopy);
rowlist.clear();
curRow++;
curCol = 0;
preCol = 0;
}
}
}
public void characters(char[] ch, int start, intlength)
throws SAXException {
//得到单元格内容的值
lastContents += new String(ch, start,length);
}
//得到列索引,每一列c元素的r属性构成为字母加数字的形式,字母组合为列索引,数字组合为行索引,
//如AB45,表示为第(A-A+1)*26+(B-A+1)*26列,45行
public int getRowIndex(StringrowStr){
rowStr = rowStr.replaceAll("[^A-Z]","");
byte[] rowAbc =rowStr.getBytes();
int len = rowAbc.length;
float num = 0;
for (inti=0;i<len;i++){
num += (rowAbc[i]-'A'+1)*Math.pow(26,len-i-1);
}
return (int) num;
}
public int getTitleRow(){
return titleRow;
}
public void setTitleRow(int titleRow){
this.titleRow = titleRow;
}
}
(3)ExcelReaderUtil.java
核心代码:
public static final StringEXCEL03_EXTENSION = ".xls"; //excel2003扩展名
public static final StringEXCEL07_EXTENSION = ".xlsx"; //excel2007扩展名
public staticList<List<String>>readExcel(String fileName) throws Exception{
List<List<String>>exceldata=newArrayList<List<String>>();
if(fileName.endsWith(EXCEL03_EXTENSION)){// 处理excel2003文件
Excel2003Readerexcel03 = new Excel2003Reader();
excel03.process(fileName);
exceldata=excel03.getExcelData();
}
else if(fileName.endsWith(EXCEL07_EXTENSION)){// 处理excel2007文件
//Excel2007Readerexcel07 = new Excel2007Reader();
Excel2007ReaderNewexcel07 = new Excel2007ReaderNew();
excel07.process(fileName);
exceldata=excel07.getExcelData();
}
else {
thrownew Exception("文件格式错误,fileName的扩展名只能是xls或xlsx!");
}
returnexceldata;
}
5.poi+DOM,普通模式(UserModel)
相对简单,但数据量大了报OOM错误,所以没有发展前景。
![[转载]Java对Excel(03,07)进行上传、解析、验证、入库 [转载]Java对Excel(03,07)进行上传、解析、验证、入库](https://i-blog.csdnimg.cn/blog_migrate/4100bee660e7f0b210e384a0a6d3febe.jpeg)
ExcelUtil.java类,代码如下:
import java.io.FileInputStream;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.DateUtil;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.usermodel.WorkbookFactory;
import com.cbe.entity.Index;
public class ExcelUtil {
public staticList<Index> GetDistrictIndex(Stringfilename){
List<Index>indexsList=newArrayList<Index>();
try{
}
catch (Exception ex) {
ex.printStackTrace();
}
return indexsList;
}
public staticList<List<Object>>ReadExcel(String fileName) throws Exception{
Workbook workbook=null;
try {
if (fileName.endsWith(".xlsx")|| fileName.endsWith(".xls")){
//注:add bylfc,20130702,采用这种方式同时支持xls和xlsx,而不必分别用HSSF和XSSF来做
FileInputStreaminputStream = new FileInputStream(fileName);
workbook=WorkbookFactory.create(inputStream);
}
}
catch (Exception e) {
e.printStackTrace();
}
returnanalyzeWorkbook(workbook);
}
public static List<List<Object>>analyzeWorkbook(Workbook workbook){
Sheet sheet =workbook.getSheetAt(0);//第一个表单
List<String>columnNames=newArrayList<String>();//列
List<Object>row=newArrayList<Object>();//一行
List<List<Object>>rows=newArrayList<List<Object>>();//所有行
int rowCount =sheet.getLastRowNum();//行数
intcolumnCount=sheet.getRow(0).getPhysicalNumberOfCells();//列数
for (int i = 0; i<columnCount; i++) {
Cellcell=sheet.getRow(0).getCell(i);
columnNames.add(getCellValue(cell).toString());
}
for(intj=0;j<=rowCount;j++){ //第一行为列名,若只是取数据则从第二行开始
Rowr=sheet.getRow(j);
for(Cell cell : r){
row.add(getCellValue(cell));
}
rows.add(row);
}
return rows;
}
public static Object getCellValue(Cellcell){
int cellType =cell.getCellType();
switch (cellType) {
case Cell.CELL_TYPE_STRING://1
return cell.getStringCellValue();
case Cell.CELL_TYPE_NUMERIC://0
if (DateUtil.isCellDateFormatted(cell)){//日期格式
Date t =cell.getDateCellValue();
SimpleDateFormat format=newSimpleDateFormat("yyyy-MM-dd");
return format.format(t);
}
else{
returncell.getNumericCellValue();
}
case Cell.CELL_TYPE_BOOLEAN://4
return cell.getBooleanCellValue();
case Cell.CELL_TYPE_FORMULA://2
case Cell.CELL_TYPE_BLANK://3
case Cell.CELL_TYPE_ERROR://5
default:
return cell.getStringCellValue();
}
}
}
附:
测试的时候可能要监控一下耗时什么的,可以使用MyEclipse自带的jvisualvm.exe工具(我没测过)。
路径:D:ProgramFilesMyEclipseCommonbinarycom.sun.java.jdk.win32.x86_1.6.0.013bin
![[转载]Java对Excel(03,07)进行上传、解析、验证、入库 [转载]Java对Excel(03,07)进行上传、解析、验证、入库](https://i-blog.csdnimg.cn/blog_migrate/cb96329a91f006ffc89e9582afadd9df.jpeg)
初次运行时会提示要做jdk的校准标准化(calibration),没细研究是个什么概念,直接点确定就好。
运行后的主界面:
![[转载]Java对Excel(03,07)进行上传、解析、验证、入库 [转载]Java对Excel(03,07)进行上传、解析、验证、入库](https://i-blog.csdnimg.cn/blog_migrate/8a47053f67fa7ca7e743704a0f954be4.jpeg)
三。验证
这个就根据自己的需求,对特定的excel模板做特定的检查,废话就不多说了。
验证处理还是比较费劲的,所以还是控制好源头“excel模板”,整理好数据,否则问题会很多的,本身excel07采用xml的解析方式就存在不少问题的。
如图,不同的格式会有不同的xml,这个在调试程序的时候我也费了不少周折
![[转载]Java对Excel(03,07)进行上传、解析、验证、入库 [转载]Java对Excel(03,07)进行上传、解析、验证、入库](https://i-blog.csdnimg.cn/blog_migrate/dafcda2bfd66c306aaf4b82fa14b1821.jpeg)
![[转载]Java对Excel(03,07)进行上传、解析、验证、入库 [转载]Java对Excel(03,07)进行上传、解析、验证、入库](https://i-blog.csdnimg.cn/blog_migrate/6b16261446576c02b81bf00102c96e64.jpeg)
四。入库
这个工作就交个dao和daoImp包下的接口和类来实现了,自己写。