Pentaho Data Integration - Java API Examples

本文介绍如何使用 Pentaho Kettle SDK 创建自定义的数据转换流程,包括从源数据库读取数据、字段重命名及写入目标数据库等步骤。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

 

 

In General

Pentaho Kettle Component

This component shows a good example how to call a transformation or a job from another Java program.

It can be found in the package org.pentaho.plugin.kettle in the Pentaho-BI-Server project.

Get the source code here.

Program your own Kettle transformation

Recommendation for upward compatibility: If you want to create your own Transformation dynamically (e.g. from meta-data), use the method of generating a XML-file (KTR) instead of using the API. The XML-files are compatibility from the first Version of Kettle until now. This is the same for Jobs.

The example described below performs the following actions:

  1. create a new transformation
  2. save the transformation as XML file
  3. generate the SQL for the target table
  4. Execute the transformation
  5. drop the target table to make this program repeatable

The complete source code for the example is distributed in the distribution zip file. You can find this file in the downloads section. (Kettle version 2.1.3 or higher) After unzipping this file, you can find the source code in the "TransBuilder.java" file in the "extra" directory.

The Kettle Java API for Kettle can be generated with Javadoc, e.g. in Eclipse: Project / Generate Javadoc or by exporting to Javadoc. When the wizard opens: Set the location of the Javadoc command and enter the absolute path (e.g. C:\Program Files\Java\jdk1.5.0_12\bin\javadoc.exe). 

// Generate the transformation.

TransMeta transMeta = TransBuilder.buildCopyTable(

   transformationName,

   sourceDatabaseName,

   sourceTableName,

   sourceFields,

   targetDatabaseName,

   targetTableName,

   targetFields

   );

 

// Save it as a file:

String xml = transMeta.getXML();

DataOutputStream dos = new DataOutputStream(new FileOutputStream(new File(fileName)));

dos.write(xml.getBytes("UTF-8"));

dos.close();

System.out.println("Saved transformation to file: "+fileName);

 

// OK, What's the SQL we need to execute to generate the target table?

String sql = transMeta.getSQLStatementsString();

 

// Execute the SQL on the target table:

Database targetDatabase = new Database(transMeta.findDatabase(targetDatabaseName));

targetDatabase.connect();

targetDatabase.execStatements(sql);

 

// Now execute the transformation...

Trans trans = new Trans(transMeta);

trans.execute(null);

trans.waitUntilFinished();

 

// For testing/repeatability, we drop the target table again

targetDatabase.execStatement("drop table "+targetTableName);

targetDatabase.disconnect();






 

Below is the source code for the method that creates the transformation: 

/**

* Creates a new Transformation using input parameters such as the tablename to read from.

* @param transformationName The name of the transformation

* @param sourceDatabaseName The name of the database to read from

* @param sourceTableName The name of the table to read from

* @param sourceFields The field names we want to read from the source table

* @param targetDatabaseName The name of the target database

* @param targetTableName The name of the target table we want to write to

* @param targetFields The names of the fields in the target table (same number of fields as sourceFields)

* @return A new transformation metadata object

* @throws KettleException In the rare case something goes wrong

*/

 

public static final TransMeta buildCopyTable(String transformationName,

        String sourceDatabaseName, String sourceTableName,

        String[] sourceFields, String targetDatabaseName,

        String targetTableName, String[] targetFields)

        throws KettleException {

 

    EnvUtil.environmentInit();

 

    try

    {

        // Create a new transformation...

        //

 

        TransMeta transMeta = new TransMeta();

        transMeta.setName(transformationName);

 

        // Add the database connections

        for (int i = 0; i < databasesXML.length; i++) {

            DatabaseMeta databaseMeta = new DatabaseMeta(databasesXML[i]);

            transMeta.addDatabase(databaseMeta);

        }

 

        DatabaseMeta sourceDBInfo = transMeta.findDatabase(sourceDatabaseName);

        DatabaseMeta targetDBInfo  = transMeta.findDatabase(targetDatabaseName);

 

        //

        // Add a note

        //

 

        String note = "Reads information from table [" + sourceTableName+ "] on database [" + sourceDBInfo + "]" + Const.CR;

        note += "After that, it writes the information to table ["+ targetTableName + "] on database [" + targetDBInfo + "]";

        NotePadMeta ni = new NotePadMeta(note, 15010, -1, -1);

        transMeta.addNote(ni);

 

        //

        // create the source step...

        //

 

        String fromstepname = "read from [" + sourceTableName + "]";

        TableInputMeta tii = new TableInputMeta();

        tii.setDatabaseMeta(sourceDBInfo);

        String selectSQL = "SELECT " + Const.CR;

 

        for (int i = 0; i < sourceFields.length; i++) {

            if (i > 0) selectSQL += ", "else selectSQL += " ";

            selectSQL += sourceFields[i] + Const.CR;

        }

        selectSQL += "FROM " + sourceTableName;

        tii.setSQL(selectSQL);

 

        StepLoader steploader = StepLoader.getInstance();

 

        String fromstepid = steploader.getStepPluginID(tii);

        StepMeta fromstep = new StepMeta(log, fromstepid, fromstepname,(StepMetaInterface) tii);

        fromstep.setLocation(150100);

        fromstep.setDraw(true);

        fromstep.setDescription("Reads information from table [" + sourceTableName + "] on database [" + sourceDBInfo + "]");

        transMeta.addStep(fromstep);

 

        //

        // add logic to rename fields

        // Use metadata logic in SelectValues, use SelectValueInfo...

        //

 

        SelectValuesMeta svi = new SelectValuesMeta();

        svi.allocate(00, sourceFields.length);

 

        for (int i = 0; i < sourceFields.length; i++) {

            svi.getMetaName()[i] = sourceFields[i];

            svi.getMetaRename()[i] = targetFields[i];

        }

 

        String selstepname = "Rename field names";

        String selstepid = steploader.getStepPluginID(svi);

        StepMeta selstep = new StepMeta(log, selstepid, selstepname, (StepMetaInterface) svi);

        selstep.setLocation(350100);

        selstep.setDraw(true);

        selstep.setDescription("Rename field names");

        transMeta.addStep(selstep);

 

        TransHopMeta shi = new TransHopMeta(fromstep, selstep);

        transMeta.addTransHop(shi);

        fromstep = selstep;

 

        //

        // Create the target step...

        //

 

        //

        // Add the TableOutputMeta step...

        //

 

        String tostepname = "write to [" + targetTableName + "]";

        TableOutputMeta toi = new TableOutputMeta();

        toi.setDatabase(targetDBInfo);

        toi.setTablename(targetTableName);

        toi.setCommitSize(200);

        toi.setTruncateTable(true);

 

        String tostepid = steploader.getStepPluginID(toi);

        StepMeta tostep = new StepMeta(log, tostepid, tostepname, (StepMetaInterface) toi);

        tostep.setLocation(550100);

 

        tostep.setDraw(true);

        tostep.setDescription("Write information to table [" + targetTableName + "] on database [" + targetDBInfo + "]");

        transMeta.addStep(tostep);

 

        //

        // Add a hop between the two steps...

        //

 

        TransHopMeta hi = new TransHopMeta(fromstep, tostep);

        transMeta.addTransHop(hi);

 

        // The transformation is complete, return it...

        return transMeta;

    catch (Exception e) {

        throw new KettleException("An unexpected error occurred creating the new transformation", e);

    }

}

### Kettle ETL Tool Job Types Introduction and Usage #### Overview of Jobs in Kettle In the context of Pentaho Data Integration (PDI), commonly known as Kettle, a **Job** is an orchestration mechanism that allows users to define workflows by chaining together multiple tasks or transformations. Each type of job entry serves specific purposes within data integration processes[^1]. #### Commonly Used Job Entries #### Transformation Entry A transformation represents one execution instance of a PDI transformation file (.ktr). This enables complex data manipulation operations such as extracting from various sources, transforming according to business rules, and loading into target systems. ```bash pan.sh -file=/path/to/transformation.ktr ``` This command line invocation demonstrates how a single transformation can be executed outside of Spoon GUI environment using `pan` utility provided with Kettle distribution[^2]. #### SQL Statements Execution Executing arbitrary SQL statements directly against databases through dedicated entries facilitates database maintenance activities like creating tables, indexes, executing stored procedures etc., ensuring necessary structures exist before running actual data processing logic. #### Copy Files Task For scenarios involving moving files between directories on local filesystems or remote servers via FTP/SFTP protocols, this task automates what would otherwise require manual intervention, thereby streamlining batch-oriented workloads where source files need relocating after successful extraction[^3]. #### Mail Notifications Sending email notifications based upon certain conditions met during workflow executions helps keep stakeholders informed about critical events happening inside jobs without requiring constant monitoring efforts from human operators. #### Conditional Logic Implementation Implementing conditional branches depending on previous steps' outcomes provides flexibility when designing error handling mechanisms or implementing decision-making paths dynamically influenced by runtime parameters passed along different stages within same overall process flow definition document(.kjb)[^4]. --related questions-- 1. How does scheduling differ between transformations and jobs? 2. What are best practices for organizing large numbers of job entries efficiently? 3. Can you provide examples demonstrating advanced use cases combining multiple job types? 4. Which logging options should be considered while developing robust production-ready jobs?
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值