1.如果发现表输入乱码,把这里的打勾去掉试一下:
2.kettle使用java脚本加载缓存代码
2.1 读取数据库,放入缓存代码:
import java.util.Arrays;
import java.util.List;
import java.util.HashMap;
import java.util.Map;
// 注意导包
import com.ywhz.framework.cache.KettleCacheMap;
import com.ywhz.framework.cache.CacheDto;
import org.pentaho.di.core.database.Database;
import org.pentaho.di.core.database.DatabaseMeta;
import org.pentaho.di.repository.Repository;
import org.pentaho.di.core.Const;
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
if (first){
first = false;
}
Object[] r = getRow();
if (r == null) {
setOutputDone();
return false;
}
//获取要翻译字典的代码
String TYPE = get(Fields.In, "TYPE").getString(r);
String RES_CODE = get(Fields.In, "RES_CODE").getString(r);
String CODE = get(Fields.In, "CODE").getString(r);
String NAME = get(Fields.In, "NAME").getString(r);
String DATAKEY = get(Fields.In, "DATAKEY").getString(r);
String ARRCODE = get(Fields.In, "ARRCODE").getString(r);
// It is always safest to call createOutputRow() to ensure that your output row's Object[] is large
// enough to handle any new fields you are creating in this step.
r = createOutputRow(r, data.outputRowMeta.size());
CacheDto dto = new CacheDto();
dto.setType(TYPE);
dto.setResCode(RES_CODE);
dto.setCode(CODE);
dto.setName(NAME);
dto.setDatakey(DATAKEY);
dto.setArrcode(ARRCODE);
KettleCacheMap.kettleCacheMap.put(TYPE+"_"+RES_CODE,dto);
putRow(data.outputRowMeta, r);
return true;
}
public boolean init(StepMetaInterface stepMetaInterface, StepDataInterface stepDataInterface)
{
return parent.initImpl(stepMetaInterface, stepDataInterface);
}
2.2 读取缓存并转换输入流的内容代码:
import java.util.Arrays;
import java.util.List;
import java.util.HashMap;
import java.util.Map;
import com.ywhz.framework.cache.KettleCacheMap;
import com.ywhz.framework.cache.CacheDto;
import org.pentaho.di.core.database.Database;
import org.pentaho.di.core.database.DatabaseMeta;
import org.pentaho.di.repository.Repository;
import org.pentaho.di.core.Const;
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException {
if (first) {
first = false;
}
Object[] r = getRow();
if (r == null) {
setOutputDone();
return false;
}
// It is always safest to call createOutputRow() to ensure that your output row's Object[] is large
// enough to handle any new fields you are creating in this step.
r = createOutputRow(r, data.outputRowMeta.size());
/* TODO: Your code here. (See Sample)
// Get the value from an input field
String foobar = get(Fields.In, "a_fieldname").getString(r);
foobar += "bar";
// Set a value in a new output field
get(Fields.Out, "output_fieldname").setValue(r, foobar);
*/
String UNIT_CODE = get(Fields.In, "UNIT_CODE").getString(r);//单位编码
CacheDto unit = (CacheDto)KettleCacheMap.kettleCacheMap.get("AGENCY_MAP_"+UNIT_CODE);
if(null != unit){
get(Fields.Out, "UNIT_ID").setValue(r, unit.getDatakey());//单位ID
get(Fields.Out, "UNIT_NAME").setValue(r, unit.getName());//单位名称
get(Fields.Out, "ARRCODE").setValue(r, unit.getArrcode());//单位arrcode
}
// Send the row on to the next step.
putRow(data.outputRowMeta, r);
return true;
}
3.kettle插入更新慢时,试试把几个用于查询的字段组成一个联合主键,如:UNIT_CODE|CI_NO|SYS_TYPE,最终插入更新时只根据这一个字段来插入更新。效率会有提升
4.当Kettle执行完一个作业脚本后,该脚本包含的所有资源(包括内存)就会被自动释放。所以如果循环某个转换需要消耗巨大的资源时,可能会报内存溢出,可以将该转换套一个作业试试。
参考链接:https://blog.youkuaiyun.com/m0_69586799/article/details/131085197
5.kettle的线有两种,一种是普通的线,另一种是复制发送线(代表会把前边步骤的数据复制分发给后边的步骤,第二种线如下图,将数据发送改为“复制发送模式”)
6.想让某个步骤执行快些,可以修改该步骤的线程数,如下图