BODS 一个的Job的基本结构
BODS作为BO的ETL工具,为我们提供了便捷的图形化界面来设计ETL,虽然在功能和性能上还有很多不尽如人意的地方,但基本来说,很多的功能还是可以实现的。目前项目是一个电信行业的项目,在ETL过程, Dimension数据主要是customer, service, service offering, Clause,…. ; Fact数据主要是每个SLI (Service Level Indicator)的时间段,Open-Close-Incident的,另外每个SLA(Service Level Agreement)的有效期,实际起止时间,检查结果(onTarget, Breach..).
具体业务逻辑就不再多说了,主要看看对BODS的基本的设计情况:
以下是对每一块的介绍:
图一:
是一个Job的典型结构,SC_Initialise_Var和SC_Post_Process..是两个初始化脚本,一个是开始的初始化脚本,里面包括对一个job的local varable和global varable的定义,还有对job control table的update;Post_process的脚本是对结束时的时间戳update。中间的condition run df是执行过程的控制结构。
SC_Initialise_Var的内容如下:
--- 调用了一个function, 自动定义的:
Fn_Initialise_Job_Variables('Dim Load');
关于这个function的内容如下:
(主要做几件事: 1. 收集job起止时间,2. 记录Job sequence,3. 取得最近的process time, 4. insert last job process到stg表中,5. 检查job dependancy )
#################################################################
$LV_JobStartTime = sysdate(); --- 参数1,Job开始的时间, Local参数
print(' Job execution start time : '||to_char( $LV_JobStartTime , 'YYYYMMDD HH24:MI:SS'));
#注意以下sql的写法: sql(‘Data store name’,’query’)
$GV_runId = sql('SLAM_Target','select SEQ_ETL_RUNID.nextval from dual ');
--- 参数2,Job的sequence, Global参数
print ( ' JOB PARAMETERS AND THEIR PASSED VALUES ');
print ( ' The value of paramter $GV_runId : ' || $GV_runId );
if ( $GV_runId is null )
begin
print (job_name()|| 'expects parameters missing. Please check the no. of parameters passed and their values');
print ( '----------------------------------------------------------------------------');
print ( ' JOB PARAMETERS AND THEIR PASSED VALUES ');
print ( ' The value of paramter $GV_runId : ' || $GV_runId );
raise_exception_ext('Please validate the parameters passed to job. Please check the log file for exact error', -1);
end
$LV_LastProcessedTime = sql('SLAM_Target','SELECT to_char(max(LAST_PROCESS_TIME),\'yyyymmdd hh24:mi:ss\') FROM STG_JOB_LAST_PROCESSED WHERE upper(JOB_NAME)= '|| '\'' || upper(job_name()) || '\' ');
-- 参数3,取得Last Processed Time,即最近的Job的结束时间
print ($LV_LastProcessedTime);
IF ($LV_LastProcessedTime IS NULL)
begin
#print ('null!!!!!!!');
# last processed timestamp is null, that means this job is never executed.
# initializing parameter and db records
$LV_LastProcessedTime = '19000101 00:00:00';
sql('SLAM_Target',
'INSERT INTO STG_JOB_LAST_PROCESSED (JOB_NAME, LAST_PROCESS_TIME, SOURCE_TABLE, SLA_GROUP_ID) '
||'values( '||'\'' || job_name() || '\', to_date(\'' || $LV_LastProcessedTime || '\', \'yyyymmdd hh24:mi:ss\'), null, null)');
-- 把最后一次JOB的Process time, source able,等信息放到Log 表中。
end
$GV_LastProcessedTimestame = to_date($LV_LastProcessedTime, 'yyyymmdd hh24:mi:ss');
print ($GV_LastProcessedTimestame );
#else
# begin
# if ( $P_In_JobType = 'Fact Load' or $P_In_JobType = 'Summary' )
# Fn_Restore_Data( job_name());
# end
# The following line of code check the dependency jobs status.
try
begin
$LV_DependencyStatus = sql('SLAM_Target',
'SELECT 1 FROM stg_job_dependency dp, stg_job_status st ' ||
'WHERE dp.job_name =\''||job_name() || '\' '||
'AND dp.parent_job_name = st.job_name AND (st.status = 1 OR st.status = -1)');
-- Local变量,取得LV_DependencyStatus,取得dependency status “1” or null
#print($LV_DependencyStatus);
if ( $LV_DependencyStatus is not null )
begin
print (job_name()|| 'Dependency checking is not satisfied, please check the parent jobs');
raise_exception_ext('Dependency checking is not satisfied, please check the parent jobs. Please check the log file for exact error', -1);
end
end
catch ( 7 )
begin
raise_exception_ext( ' DB error occured while initialising variables. To know exact error please check etl log file.', -3);
end
# The following line of code assigns the run id for the job and writes log in to the table stg_job_status as job is in progress.
try
begin
sql('SLAM_Target', 'insert into stg_job_status(run_id, job_name, status, start_time, end_time, remarks) values(' || $GV_runId || ',\'' || job_name() || '\', 1, to_date(\'' || to_char( $LV_JobStartTime , 'yyyymmdd hh24:mi:ss') || '\', \'yyyymmdd hh24:mi:ss\'), null, null)');
end
catch ( 7 )
begin
raise_exception_ext( ' DB error occured while initialising variables. To know exact error please check etl log file.', -3);
end
#STG_JOB_VARIABLES
return 0;
##################################################################
结束的时候,调用了Fn_Initialise_Job_Variables('Dim Load');
##################################################################
sql('SLAM_Target', 'update stg_job_status set status = '||to_char($P_In_Job_Status) ||', end_time = sysdate, remarks = \''||$P_In_Remarks ||'\' where run_id = ' || $P_In_RunId );
$
if ($P_In_Job_Status =0 )
begin
# only job execute in sucess then update last process timestamp
sql('SLAM_Target', 'update STG_JOB_LAST_PROCESSED set LAST_PROCESS_TIME = (select start_time from stg_job_status where run_id = ' || $P_In_RunId ||') where upper(JOB_NAME) = '|| '\'' || upper(job_name()) || '\' ' );
end
return 0;
##################################################################
图二:
Try组件,主要是一个容错机制,用来提供一个窗口去run data flow,如果出错,则由后面的cache组件来做exception处理:
Exception组件:
里面可以包含DataFlow,也可以包含Script, Script内容如下:
##################################################################
begin
fn_post_process( job_name(), $GV_RunId, -1, 'DF execution error, check it using the BODI Web Administrator!');
raise_exception_ext( 'Exceptions occurred during the execution of the Job. ',-2);
end
##################################################################
图三:
是一个简单的Data Flow,这是真正的处理数据的过程,头尾是TABLE, 中间是各个处理过程:
Query , 处理所有SQL的Mapping,里面除了列的值,还有where条件。
Table Compare, 处理源表的update,如果有update 或者相同key column的insert,会做一定的处理:两条记录并存,保留最近的纪录。
History_Preserving, 历史结果,主要用于对缓慢变化维二的操作,即设定有效时间范围,并且指定一列作为Flag.
Key_Generation, 用来产生key值。
具体每个组件的用法,将会在接下来的其他文档中介绍。
转载于:https://blog.51cto.com/mengya520/334822