前言
在公司中经常会碰到将不同形式的数据导入到数据库中,目前用的数据库是mongodb,数据的主要格式有Excel、text等等;为了不重复劳动,写了一个很简单的算是框架之类的东西;这只是一个雏形,按照我的思路可以继续扩展来满足更多的需求。
目标
比如Excel或者Text等等这里数据,如果要导入数据库,就要去读取不同来源的数据,最终数据的输入肯定是一行一行的进行的,因为数据库需要一条一条的存储;所以第一步就要将数据映射为实体对象;例如将Excel的不同的列或者Txt文本以逗号分隔的列映射为实体类的字段;接下来需要做的就是在实体类中添加注解,用来标示出映射信息;比如每一个字段对应excel中的哪一列等等,在读取到每一行资源的时候,使用反射自动为实体类赋值,最后将实体类传递给导入数据库的处理类进行数据导入;为了执行起来更加方便,我使用注解扫描指定的包,然后用多线程分别执行,效率更高。
实现
首先按照设想的写出一个实体类的模型
package com.data.entity;
import com.data.annotation.Field;
import com.data.annotation.Ignore;
import com.data.annotation.Resource;
import com.data.annotation.ResourceType;
import com.data.handler.DefaultFieldHandler;
/**
* Created by yamorn on 2015/3/24.
*/
@Ignore
@Resource(type = ResourceType.EXCEL,
location = "E:\\test.xlsx",
fieldHandle = DefaultFieldHandler.class)
public class DemoEntity {
@Field(alias = "name", index = 0, handleMethod = "nameHandle")
String name;
@Field(index = 1)
int age;
@Field(index = 2)
String description;
@Field(index = 4)
String comment;
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public int getAge() {
return age;
}
public void setAge(int age) {
this.age = age;
}
public String getDescription() {
return description;
}
public void setDescription(String description) {
this.description = description;
}
public String getComment() {
return comment;
}
public void setComment(String comment) {
this.comment = comment;
}
}
上面的实体类就是初步设想的实体类,实体类上的注解信息包含了资源的位置,对应关系等等;所以只要扫描的该实体类,即可以把它最后一次导入任务进行导入,当然之后就可以使用多线程进行导入,提高效率;
@Ignore 是一个标记类,即不让被程序扫描到; @Resource该注解主要包含了资源的位置信息,资源的类型,资源每个字段的处理方法和导入该实体数据的数据处理类;@Field详细的标注了字段和资源的映射关系,其中combine是联合的意思,比如一个字段是一个数组,可能对应excel中的多个字段,那么可以使用该注解;
下面给出注解的详细实现
@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.TYPE)
public @interface Resource {
ResourceType type() default ResourceType.EXCEL;
String location();
Class fieldHandle() default DefaultFieldHandler.class;
Class dataHandle() default DefaultDataHandle.class;
}
public enum ResourceType {
EXCEL,TEXT
}
目前就只开发了这两种类型,以后可以继续扩展
@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.TYPE)
public @interface Ignore {
}
@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.FIELD)
public @interface Field {
String alias() default "";
int index();
int[] combine() default {};
String handleMethod() default "";
}
这里的alias别名目标是将其作为最终导入数据库是的字段名而不是其标注的字段名,但是这里我还来得及实现;
接下来就是去扫描这些代表着不同资源的实体类了,这里有一个专门负责扫描的类;
package com.data.core;
import com.data.annotation.Ignore;
import com.data.annotation.Resource;
import org.reflections.Reflections;
import java.util.HashSet;
import java.util.Set;
/**
* Created by yamorn on 2015/3/24.
*/
public class EntityScanner {
private String packageName;
public EntityScanner(String packageName) {
this.packageName = packageName;
}
public Set<Class<?>> scan() {
Set<Class<?>> newSet = new HashSet<>();
Reflections reflections = new Reflections(packageName);
Set<Class<?>> set = reflections.getTypesAnnotatedWith(Resource.class);
for (Class<?> clazz : set) {
Ignore ignore = clazz.getAnnotation(Ignore.class);
if (ignore == null) {
newSet.add(clazz);
}
}
return newSet;
}
}
该类将扫描到的实体以集合的方式返回;
接下是考虑的是如何读取资源,资源的位置信息在实体类中注解中,资源读取后需要进行数据导入,所以负责读取资源的类需要被注入一个导入数据库的工具类;由此实现了一个简单的接口
public interface ResourceReader {
public void setMongoManager(MongoManager mongoManager);
public MongoManager getMongoManager();
public void read(String location, Class<?> mateClass) throws Exception;
}
MongoManager是一个工具类,其实这里可以是一个接口等等,但是目前只是一个简单的东西;资源有多重形式,需要对应不同类型的资源读取实现;这里使用了简单工厂模式
public class ResourceReaderFactory {
public static ResourceReader resourceReader(ResourceType type) {
ResourceReader resourceReader;
switch (type) {
case EXCEL:
resourceReader = new ExcelResourceLoader();
break;
case TEXT:
resourceReader = new TextResourceReader();
break;
default:
resourceReader = new ExcelResourceLoader();
}
return resourceReader;
}
}
package com.data.core;
import com.data.annotation.Resource;
import com.data.annotation.ResourceType;
import com.data.utils.MongoManager;
import com.sun.corba.se.spi.orbutil.fsm.Input;
import org.apache.commons.configuration.ConfigurationException;
import org.apache.commons.configuration.PropertiesConfiguration;
import java.io.*;
import java.util.HashMap;
import java.util.Map;
import java.util.Properties;
import java.util.Set;
/**
* Created by yamorn on 2015/3/24.
*/
public class ResourceLoader {
protected String packageName;
protected MongoManager mongoManager;
//这里是classpath下的文件,里面主要是写了@Resource注解中的占位符${}中的内容
private static final String RESOURCE_PROPERTIES = "resource.properties";
private Properties properties;
public ResourceLoader(String packageName) {
this.packageName = packageName;
properties = getResourceConfig();
}
public void setPackageName(String packageName) {
this.packageName = packageName;
}
public void setMongoManager(MongoManager mongoManager) {
this.mongoManager = mongoManager;
}
//主要入口
public void dispatcher() {
EntityScanner scanner = new EntityScanner(packageName);
Set<Class<?>> classes = scanner.scan();
//这里可以使用一个线程池,将每一个扫描到的类在线程中进行处理
try{
for (Class<?> clazz : classes) {
parseMetaInfo(clazz);
}
}catch (Exception e){
e.printStackTrace();
}finally {
mongoManager.close();
}
}
private void parseMetaInfo(Class<?> clazz) {
Resource resource = clazz.getAnnotation(Resource.class);
String location = location(resource.location());
assert !location.equals("");
ResourceType resourceType = resource.type();
try {
ResourceReader resourceReader = ResourceReaderFactory.resourceReader(resourceType);
//注入数据库工具类方便在读取资源后进行数据导入
resourceReader.setMongoManager(mongoManager);
resourceReader.read(location, clazz);
} catch (Exception e) {
e.printStackTrace();
}
}
private String location(String placeHolder) {
placeHolder = placeHolder.replaceAll("\\$\\{", "").replaceAll("}", "");
return (String)properties.get(placeHolder);
}
private Properties getResourceConfig() {
Properties properties = new Properties();
InputStream inputStream =null;
try {
inputStream=ResourceLoader.class.getClassLoader().getResourceAsStream(RESOURCE_PROPERTIES);
BufferedReader bf = new BufferedReader(new InputStreamReader(inputStream,"UTF-8"));
properties.load(bf);
} catch (IOException e) {
e.printStackTrace();
}finally {
if (inputStream != null) {
try {
inputStream.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
return properties;
}
}
这里实现了一个针对Excel导入的资源读取类
package com.data.core;
import com.data.annotation.Resource;
import com.data.handler.DataHandle;
import com.data.utils.MongoManager;
import org.apache.commons.beanutils.ConvertUtils;
import org.apache.log4j.Logger;
import org.apache.poi.ss.usermodel.*;
import java.io.File;
import java.lang.reflect.Field;
import java.lang.reflect.Method;
/**
* Created by yamorn on 2015/3/24.
*/
public class ExcelResourceLoader implements ResourceReader {
private Logger logger = Logger.getLogger(getClass());
protected MongoManager mongoManager;
@Override
public void setMongoManager(MongoManager mongoManager) {
this.mongoManager=mongoManager;
}
@Override
public MongoManager getMongoManager() {
return mongoManager;
}
@Override
public void read(String location, Class<?> mateClass) throws Exception {
Resource resource = mateClass.getAnnotation(Resource.class);
//通过实体类上的注解获取到对应的字段处理类和实体数据处理类,使用反射实例化它们
Class<?> fieldHandleClazz = resource.fieldHandle(); //字段处理类
Class<?> dataHandleClazz = resource.dataHandle(); //实体类序列化后的数据处理类
Object fieldHandleObject = null;
if (fieldHandleClazz != null) {
fieldHandleObject = fieldHandleClazz.newInstance();
}
DataHandle dataHandleObject = (DataHandle) dataHandleClazz.newInstance();
Field[] fields = mateClass.getDeclaredFields();
Workbook wb = WorkbookFactory.create(new File(location));
int sheets = wb.getNumberOfSheets();
for (int i = 0; i < sheets; i++) {
Sheet sheet = wb.getSheetAt(i);
int rowNum = sheet.getLastRowNum();
int colNum = sheet.getRow(0).getLastCellNum();
for (int j = 1; j < rowNum; j++) { // start from 1, skip the title
try{
Row row = sheet.getRow(j);
Object entity = mateClass.newInstance();
for (Field field : fields) {
Class<?> fieldType = field.getType();
com.data.annotation.Field fieldAnnotation = field.getAnnotation(com.data.annotation.Field.class);
if (fieldAnnotation != null) {
//获取索引位置,将读取到的表格内容赋值给对应的字段
int index = fieldAnnotation.index();
if (index > colNum) {
throw new Exception("Field index out of sheet column length.");
}
int[] combine = fieldAnnotation.combine();
String handleMethod = fieldAnnotation.handleMethod();
Object value = getCellValue(row, index);
Object newValue = value;
if (combine.length > 0) {
Object[] valueArray=new Object[combine.length+1];
valueArray[0]=value;
for (int k = 1; k < valueArray.length; k++) {
valueArray[k]=getCellValue(row,combine[k-1]);
}
//如果配置了字段处理类,就调用对应的字段处理方法;字段处理方法的返回值类型应该和字段类型保持一致
if (fieldHandleObject != null && !"".equals(handleMethod)) {
Method handle = fieldHandleClazz.getDeclaredMethod(handleMethod, field.getType());
newValue = handle.invoke(fieldHandleObject, ConvertUtils.convert(valueArray,fieldType));
}else{
newValue = ConvertUtils.convert(valueArray, fieldType);
}
}else{
if (fieldHandleObject != null && !"".equals(handleMethod)) {
Method handle = fieldHandleClazz.getDeclaredMethod(handleMethod, field.getType());
newValue = handle.invoke(fieldHandleObject, ConvertUtils.convert(value,fieldType));
}
}
//为字段赋值
field.setAccessible(true);
field.set(entity, ConvertUtils.convert(newValue, fieldType));
}
}
//将赋值后的实体类送入数据处理类中进行数据导入操作
// Persist entity
dataHandleObject.handle(entity, mongoManager,rowNum);
}catch (Exception e){
logger.warn("Error at:" + i + " row " + j + " column");
e.printStackTrace();
}
}
}
}
private Object getCellValue(Row row,int index){
Object value;
Cell cell = row.getCell(index, Row.CREATE_NULL_AS_BLANK);
switch (cell.getCellType()) {
case Cell.CELL_TYPE_BLANK:
value = "";
break;
case Cell.CELL_TYPE_NUMERIC:
value = cell.getNumericCellValue();
break;
case Cell.CELL_TYPE_STRING:
value = cell.getStringCellValue();
break;
case Cell.CELL_TYPE_BOOLEAN:
value=cell.getBooleanCellValue();
break;
case Cell.CELL_TYPE_FORMULA:
value = cell.getCellFormula();
break;
default:
value = "error";
}
return value;
}
}
接下来是数据处理接口,即序列化后的数据如何处理,这里是导入数据库中
package com.data.handler;
import com.data.utils.MongoManager;
/**
* Created by yamorn on 2015/3/24.
*/
public interface DataHandle {
public void handle(Object entity,MongoManager mongoManager,int total);
}
抽象类,提供了序列化json的工具package com.data.handler;
import com.mongodb.DBObject;
import com.mongodb.util.JSON;
import flexjson.JSONSerializer;
/**
* Created by louis on 2015/3/25.
*/
public abstract class AbstractDataHandle implements DataHandle{
protected JSONSerializer jsonSerializer = new JSONSerializer();
public String jsonSerializer(Object entity) {
return jsonSerializer.exclude("*.class").deepSerialize(entity);
}
public DBObject convertDBObject(Object entity) {
String json = jsonSerializer(entity);
return (DBObject) JSON.parse(json);
}
}
这里实现了一个简单的demo,即将从excel中读取的数据序列化后,进行坐标转换,最后存储到mongodb中
package com.data.handler;
import com.data.entity.LTEBaseStationEntity;
import com.data.utils.CoordinateConvert;
import com.data.utils.MongoManager;
import com.mongodb.BasicDBList;
import com.mongodb.DBObject;
import org.apache.log4j.Logger;
import java.math.BigDecimal;
/**
* Created by louis on 2015/3/25.
*/
public class LTEBaseStationDataHandle extends AbstractDataHandle {
private static final String collection = "jt_poi_lteBaseStation";
private int n = 0;
private Logger logger = Logger.getLogger(ScanSignalDataHandle.class);
@Override
public void handle(Object entity, MongoManager mongoManager, int total) {
if (entity instanceof LTEBaseStationEntity) {
LTEBaseStationEntity lteBaseStationEntity = (LTEBaseStationEntity) entity;
double[] raw = lteBaseStationEntity.getRaw();
if (validateCoordinate(raw)) {
//设置百度坐标
lteBaseStationEntity.setBd(convertToBaiduCoordinate(raw));
DBObject dbObject = convertDBObject(lteBaseStationEntity);
mongoManager.save(collection, dbObject);
n++;
BigDecimal bigDecimal = new BigDecimal(n * 1.0 / total * 100);
double percent = bigDecimal.setScale(2, BigDecimal.ROUND_HALF_UP).doubleValue();
if (percent % 5 <= 0.1) {
logger.info("total:" + total + ",percent:" + percent);
}
}
}
}
private boolean validateCoordinate(double[] raw) {
return raw.length != 0 && !(raw[0] == 0 || raw[1] == 0);
}
private double[] convertToBaiduCoordinate(double[] raw) {
String coords = raw[0] + "," + raw[1];
BasicDBList list = CoordinateConvert.convert(coords, 1, 5, 3);
if (list == null) {
return new double[]{};
}
//单个坐标转换
DBObject dbObject = (DBObject) list.get(0);
return new double[]{(double) dbObject.get("x"), (double) dbObject.get("y")};
}
}
对于字段处理的handle,有一点需要注意的是方法的参数类型和方法的返回类型应该和字段保持一致;字段处理类没有限制,只是一个POJOpackage com.data.handler;
/**
* Created by yamorn on 2015/3/24.
*/
public class DefaultFieldHandler {
}
最后是主方法的一个实例
package com.data.main;
import com.data.core.ResourceLoader;
import com.data.utils.MongoManager;
import com.mongodb.DB;
import com.mongodb.ServerAddress;
import org.apache.commons.configuration.PropertiesConfiguration;
import java.util.Arrays;
/**
* Created by yamorn on 2015/3/24.
*/
public class Main {
public static void main(String[] args) throws Exception {
PropertiesConfiguration mongoConfig = new PropertiesConfiguration("mongodb.properties");
MongoManager mongoManager = new MongoManager(
Arrays.asList(new ServerAddress(mongoConfig.getString("mongo.host"),
mongoConfig.getInt("mongo.port"))
),
mongoConfig.getString("mongo.database"),
mongoConfig.getString("mongo.username"),
mongoConfig.getString("mongo.password")
);
ResourceLoader resourceLoader = new ResourceLoader("com.data.entity");
resourceLoader.setMongoManager(mongoManager);
resourceLoader.dispatcher();
}
}
到此为止,这个简单的小东西就算是完成了;接下来就是根据需求继续扩展~