101 uses for ETL job parameters

本文探讨了ETL作业中各种参数的应用与管理方法,包括静态参数、动态参数等,并介绍了如何利用这些参数提高作业灵活性及简化维护过程。
 This is my third post in what I am now calling job parameter week. In future it will be better if I come up with an idea for a theme week at the start of the week instead of the end of the week. I'll chalk that one up to experience.

Welcome to job parameter theme week, where we bring job parameter to life. (Note to marketing department, please provide better mottos for future theme weeks). I have tracked back to my previous two job parameter posts from this week. We looked at Project Specific Job Parameters on Monday and saw how they made management of parameters a lot easier, we looked at the exposure of unencrypted passwords on Wednesday by the NSW police.

Today I am going to talk about what I use job parameters for.

ETL job parameters are values you can change when you run the job, as opposed to properties you can only change by modifying and recompiling the job, don't know if I am going to get to 101 but thought it was an eye catching title.

Static Parameters
In programmer speak you could also call these constants. They are values you never expect to change. So why make them a job parameter? Because they might change, or because you don't want to have to try and remember the value every time you use it in a job.

In my current project I have a static parameter called MAX_DATE which holds a high end date value 2999-12-31 23:59:00. I use this on tables that track variant data such as a slowly changing dimension type 2 table, it provides a high end date for an open or current item. This parameter gets copied into all jobs that update dimension tables, the good thing about having it around is that I don't need to remember the value of the field, I just enter MAX_DATE into my transformer.

Another good thing is that come July of 2999 when a project team is assembled to address Y3K problems they only have to update this date in one place and will not have to recompile jobs.

Slowly Changing Parameters
The most common example is the database password that **should** be changed every month. At the current project the DBA team change the password on the database and the DataStage support team change the project specific environment variable. I would prefer to see IBM provide a tool that lets users maintain these environment variables without having to go through so many Administrator screens so they can change passwords more easily.

Environmental Parameters
These are the job parameters that change value as you move from one environment to another, for example from development into testing and into production. Typical values that change are directory locations and database login details.

The Customer Keeps Changing Their Mind Parameters
On a previous project I had a QualityStage plugin cleansing names and addresses for delivery of marketing campaigns to mail houses. During testing there were some doubts about company name cleansing and name cleansing. Sometimes the cleansed name was worse then the raw name.

There was much umming and aahing in meetings as to whether to turn it on. I create a set of job parameter flags that could be set to true or false that turned on and off personal name, company cleansing and address cleansing.

These parameters were used in a transformer that wrote out the final fields and had original or cleansed fields to choose from. Now in production they can define what type of cleansing the job will perform from the set of flags for each run.

Dynamic Parameters
These are parameters that get changed with every job run.
- A shell script starts a job with the dsjob command, it can set a dynamic parameter using the Unix scripting language and set it in the dsjob command.
- A sequence job calls parallel jobs that require a dynamic parameter, the value can be set in a User Variables stage or in the Job Activity Stage using BASIC commands or a BASIC routine.

Currently I have two parameters PROCESS_ID and PROCESS_DATE. The first is a unique number generated for every execution of a job. It goes into the ODS tables to indicate which job execution loaded the data so we can rollback loads or identify data volumes. The PROCESS_DATE is the delta data we are processing as we do daily loads.

The PROCESS_DATE is retrieved from a database table via a Unix script. The script uses a simple DB2 command line statement to retrieve the date and then uses the dsjob command to set the parameter and call a sequence job. It populates PROCESS_ID with an eight character string representation of the PROCESS_DATE.

The Sequence job uses a User Variable stage to append a unique 2 character sequence job code to the PROCESS_ID to make it unique for this sequence job. This User Variable is then used in each job activity stage with an additional 2 character code added that is unique for each job activity stage within the sequence job. Therefore each parallel or server job across the entire project gets a unique PROCESS_ID each time they run.
【电动车】基于多目标优化遗传算法NSGAII的峰谷分时电价引导下的电动汽车充电负荷优化研究(Matlab代码实现)内容概要:本文围绕“基于多目标优化遗传算法NSGA-II的峰谷分时电价引导下的电动汽车充电负荷优化研究”展开,利用Matlab代码实现优化模型,旨在通过峰谷分时电价机制引导电动汽车有序充电,降低电网负荷波动,提升能源利用效率。研究融合了多目标优化思想与遗传算法NSGA-II,兼顾电网负荷均衡性、用户充电成本和充电满意度等多个目标,构建了科学合理的数学模型,并通过仿真验证了方法的有效性与实用性。文中还提供了完整的Matlab代码实现路径,便于复现与进一步研究。; 适合人群:具备一定电力系统基础知识和Matlab编程能力的高校研究生、科研人员及从事智能电网、电动汽车调度相关工作的工程技术人员。; 使用场景及目标:①应用于智能电网中电动汽车充电负荷的优化调度;②服务于峰谷电价政策下的需求侧管理研究;③为多目标优化算法在能源系统中的实际应用提供案例参考; 阅读建议:建议读者结合Matlab代码逐步理解模型构建与算法实现过程,重点关注NSGA-II算法在多目标优化中的适应度函数设计、约束处理及Pareto前沿生成机制,同时可尝试调整参数或引入其他智能算法进行对比分析,以深化对优化策略的理解。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值