DataStage 错误集(持续更新)

本文详细记录了在使用IBM DataStage过程中遇到的各种错误及其解决方法,包括安装、配置、编译、运行等方面的问题,涉及AIX环境下的DataStage安装、启动、配置ODBC、编译JOB时的错误及解决策略,以及在使用Oracle Connector时遇到的网络连接问题和执行阶段找不到可执行文件的错误。此外,还介绍了当JOB运行时找不到特定库文件、集群环境下JOB运行异常等问题的排查与解决步骤。

DataStage 错误集(持续更新)

DataStage序列文章

DataStage 一、安装
DataStage 二、InfoSphere Information Server进程的启动和停止
DataStage 三、配置ODBC

1 执行dsadmin命令时报错

$ dsadmin
exec(): 0509-036 Cannot load program dsadmin because of the following errors:
        0509-022 Cannot load module /opt/IBM/InformationServer/Server/DSEngine/lib/libvmdsapi.so.
        0509-150   Dependent module libACS_client_cpp.a(shr.so) could not be loaded.
        0509-022 Cannot load module libACS_client_cpp.a(shr.so).
        0509-026 System error: A file or directory in the path name does not exist.
        0509-022 Cannot load module dsadmin.
        0509-150   Dependent module /opt/IBM/InformationServer/Server/DSEngine/lib/libvmdsapi.so could not be loaded.
        0509-022 Cannot load module .

1.1 错误描述

在AIX6.0命令行上执行dsadmin命令时报错无法加载相关联的.so文件,当时DS环境变量已设置


#DataStage
export DSHOME=/opt/IBM/InformationServer/Server/DSEngine
#parallel engine
export APT_ORCHHOME=/opt/IBM/InformationServer/Server/PXEngine
#parallel engine
export APT_CONFIG_FILE=/opt/IBM/InformationServer/Server/Configurations/default.apt
export PATH=$PATH:$DSHOME/bin:$APT_ORCHHOME/bin
#AIX LIBPATH,linux LD_LIBRARY_PATH
export LIBPATH=$LIBPATH:$DSHOME/lib:$APT_ORCHHOME/lib
#ASBHome
export ASBHOME=/opt/IBM/InformationServer/ASBNode
#environment
$DSHOME/dsenv

使用ldd检查报如下错误

$ ldd /opt/IBM/InformationServer/Server/DSEngine/lib/libvmdsapi.so
/opt/IBM/InformationServer/Server/DSEngine/lib/libvmdsapi.so needs:
         /lib/libc.a(shr_64.o)
         /lib/libpthread.a(shr_xpg5_64.o)
Cannot find libACS_client_cpp.a(shr.so) 
Cannot find libACS_common_cpp.a(shr.so) 
Cannot find libinvocation_cpp.a(shr.so) 
Cannot find libxmogrt-xlC6.a 
Cannot find libIISCrypto.so 
         /lib/libC.a(shr_64.o)
         /lib/libC.a(ansi_64.o)
         /unix
         /lib/libcrypt.a(shr_64.o)
         /lib/libC.a(ansicore_64.o)
         /lib/libC.a(shrcore_64.o)
         /lib/libC.a(shr3_64.o)
         /lib/libC.a(shr2_64.o)

找不到相关的库,但在某个子目录下发现有这些文件存在

$ ls -l /opt/IBM/InformationServer/ASBNode/lib/cpp/         
-rwxr-xr-x    1 root     system      4117562 Nov 09 2013  libACS_client_cpp.a
-rwxr-xr-x    1 root     system     54572316 Nov 09 2013  libACS_common_cpp.a
-rwxr-xr-x    1 root     system      2010742 Nov 09 2013  libASB_agent_config_client_cpp.a
-rwxr-xr-x    1 root     system     64048316 Nov 09 2013  libinvocation_cpp.a

在命令行中输出某些dsenv文件里面的环境变量值时没有任何输出。

1.2 解决方法

依据上面的错误判定是环境配置问题,在文档的介绍中$DSHOME/dsenv 是个非常重要的文件,在profile要引用,可是我已经引用了,只是没有生效,原因就在于没有正确引用,再次检查dsenv文件的引用时发现少了前缀".";

#environment
$DSHOME/dsenv

把它改写为

#environment
. $DSHOME/dsenv

那怎么知道环境变量是否生效呢?简单的方法就是查询当前的环境是否有UDTHOME和UDTBIN两个变量设置,这两个变量在8.5、8.7、9.1的dsenv中都是有设置的。

#if [ -z "$UDTHOME" ]
#then
UDTHOME=/opt/IBM/InformationServer/Server/DSEngine/ud41 ; export UDTHOME
UDTBIN=/opt/IBM/InformationServer/Server/DSEngine/ud41/bin ; export UDTBIN
#fi

2 关闭WAS时报错

/opt/IBM/InformationServer/ASBServer/bin/MetadataServer.sh  stop
ADMU0116I: Tool information is being logged in file
           /opt/IBM/WebSphere/AppServer/profiles/InfoSphere/logs/server1/stopServer.log
ADMU0128I: Starting tool with the InfoSphere profile
ADMU3100I: Reading configuration for server: server1

ADMU0509I: The server "server1" cannot be reached. It appears to be stopped.
ADMU0211I: Error details may be seen in the file:
           /opt/IBM/WebSphere/AppServer/profiles/InfoSphere/logs/server1/stopServer.log

2.1 错误描述

在关闭WAS时无法关闭Application server,查看日志文件

FFDC Incident emitted on /opt/IBM/WebSphere/AppServer/bin/./client_ffdc/ffdc.4012701407048567577.txt com.ibm.websphere.
management.AdminClientFactory.createAdminClient 275
[1/21/15 10:09:16:236 GMT+08:00] 00000001 WsServerStop  E   ADMU3002E: Exception attempting to process server server1
[1/21/15 10:09:16:236 GMT+08:00] 00000001 WsServerStop  E   ADMU3007E: Exception com.ibm.websphere.management.exception.Conne
ctorException: com.ibm.websphere.management.exception.ConnectorException: ADMC0016E: The system cannot create a SOAP connecto
r to connect to host nhdbtest07 at port 8881.
[1/21/15 10:09:16:237 GMT+08:00] 00000001 WsServerStop  A   ADMU3007E: Exception com.ibm.websphere.management.exception.Conne
ctorException: com.ibm.websphere.management.exception.ConnectorException: ADMC0016E: The system cannot create a SOAP connecto
r to connect to host nhdbtest07 at port 8881.
        at com.ibm.ws.management.connector.ConnectorHelper.createConnector(ConnectorHelper.java:606)
        at com.ibm.ws.management.tools.WsServerStop.runTool(WsServerStop.java:372)
        at com.ibm.ws.management.tools.AdminTool.executeUtility(AdminTool.java:269)
        at com.ibm.ws.management.tools.WsServerStop.main(WsServerStop.java:112)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
        at java.lang.reflect.Method.invoke(Method.java:611)
        at com.ibm.wsspi.bootstrap.WSLauncher.launchMain(WSLauncher.java:234)
        at com.ibm.wsspi.bootstrap.WSLauncher.main(WSLauncher.java:95)
        at com.ibm.wsspi.bootstrap.WSLauncher.run(WSLauncher.java:76)

        java.security.cert.CertificateExpiredException: NotAfter: Wed Sep 09 10:51:29 GMT+08:00 2015]
        at com.ibm.ws.management.connector.soap.SOAPConnectorClient.reconnect(SOAPConnectorClient.java:422)
        at com.ibm.ws.management.connector.soap.SOAPConnectorClient.<init>(SOAPConnectorClient.java:222)
        ... 40 more
Caused by: [SOAPException: faultCode=SOAP-ENV:Client; msg=Error opening socket: javax.net.ssl.SSLHandshakeException: com.ibm.
jsse2.util.h: PKIX path validation failed: java.security.cert.CertPathValidatorException: The certificate expired at Wed Sep
09 10:51:29 GMT+08:00 2015; internal cause is:
        java.security.cert.CertificateExpiredException: NotAfter: Wed Sep 09 10:51:29 GMT+08:00 2015; targetException=java.la
ng.IllegalArgumentException: Error opening socket: javax.net.ssl.SSLHandshakeException: com.ibm.jsse2.util.h: PKIX path valid
ation failed: java.security.cert.CertPathValidatorException: The certificate expired at Wed Sep 09 10:51:29 GMT+08:00 2015; i
nternal cause is:
        java.security.cert.CertificateExpiredException: NotAfter: Wed Sep 09 10:51:29 GMT+08:00 2015]
        at org.apache.soap.transport.http.SOAPHTTPConnection.send(SOAPHTTPConnection.java:475)
        at org.apache.soap.rpc.Call.WASinvoke(Call.java:451)
        at com.ibm.ws.management.connector.soap.SOAPConnectorClient$4.run(SOAPConnectorClient.java:372)
        at com.ibm.ws.security.util.AccessController.doPrivileged(AccessController.java:118)
        at com.ibm.ws.management.connector.soap.SOAPConnectorClient.reconnect(SOAPConnectorClient.java:365)
        ... 41 more

[10/9/15 20:09:02:685 GMT+08:00] 00000001 AdminTool     A   ADMU0509I: The server "server1" cannot be reached. It appears to
be stopped.
[10/9/15 20:09:02:685 GMT+08:00] 00000001 AdminTool     A   ADMU0211I: Error details may be seen in the file: /opt/IBM/W
ebSphere/AppServer/profiles/InfoSphere/logs/server1/stopServer.log

2.3 解决办法

进入项目server的目录,通常路径是这样的:/opt/IBM/WebSphere/AppServer/profiles/InfoSphere/bin,然后在/opt/IBM/WebSphere/AppServer/profiles/InfoSphere/logs/server1目录下查看SystemErr.log、SystemOut.log等日志;注意路径可能不同,使用命令时会有日志路径输出。遇到过的错误有

java.sql.SQLException: [IBM][Oracle JDBC Driver][Oracle]ORA-28001:
 the password has expired

3 导入表定义信息时报错

 An unexpected exception occurred accessing the repository: 
 <JavaException>
  <Type>com/ascential/asb/cas/shared/ConnectorServiceException</Type>
  <Message><![CDATA[An unexpected exception occurred accessing the repository: ]]></Message>
  <StackTrace><![CDATA[com.ascential.asb.cas.shared.ConnectorServiceException: An unexpected exception occurred accessing the repository: 
    at com.ascential.asb.cas.shared.ConnectorAccessServiceBeanSupport.persist(ConnectorAccessServiceBeanSupport.java:5345)
    at com.ascential.asb.cas.shared.ConnectorAccessServiceBeanSupport.discoverSchema(ConnectorAccessServiceBeanSupport.java:3549)
    at com.ascential.asb.cas.service.impl.ConnectorAccessServiceBean.discoverSchema(ConnectorAccessServiceBean.java:3177)
    at com.ascential.asb.cas.service.EJSRemoteStatelessConnectorAccess_6ccddb18.discoverSchema(Unknown Source)
    at com.ascential.asb.cas.service._EJSRemoteStatelessConnectorAccess_6ccddb18_Tie.discoverSchema__com_ascential_asb_cas_shared_util_ConnectionHandle__com_ascential_xmeta_emf_util_EObjectMemento__CORBA_WStringValue__boolean__boolean__boolean__boolean__CORBA_WStringValue(_EJSRemoteStatelessConnectorAccess_6ccddb18_Tie.java:820)
    at com.ascential.asb.cas.service._EJSRemoteStatelessConnectorAccess_6ccddb18_Tie._invoke(_EJSRemoteStatelessConnectorAccess_6ccddb18_Tie.java:355)
    at com.ibm.CORBA.iiop.ServerDelegate.dispatchInvokeHandler(ServerDelegate.java:669)
    at com.ibm.CORBA.iiop.ServerDelegate.dispatch(ServerDelegate.java:523)
    at com.ibm.rmi.iiop.ORB.process(ORB.java:523)
    at com.ibm.CORBA.iiop.ORB.process(ORB.java:1575)
    at com.ibm.rmi.iiop.Connection.doRequestWork(Connection.java:2992)
    at com.ibm.rmi.iiop.Connection.doWork(Connection.java:2875)
    at com.ibm.rmi.iiop.WorkUnitImpl.doWork(WorkUnitImpl.java:64)
    at com.ibm.ejs.oa.pool.PooledThread.run(ThreadPool.java:118)
    at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1783)

3.1 错误描述

在ds项目中导入Oracle表定义信息时报错(注意不是使用ODBC导入),ODBC此时可以导入,检查ds日志没有任何错误,最后ConnectorServiceException的信息,官方文档中给出的解释是在尝试导入表定义信息时操作的用户没有足够的权限执行该操作。

3.2 解决方法

导致这个错误发生的原因是当前用户没有足够的权限;登录到console(http://hostname:9080/ibm/iis/console),单击Administrator选项,选择Users and Groups=>Users,然后在右中部选择用户,单击右边的Open User选项,授权Common Metadata Administrator和Common Metadata Importer权限给该用户并保存,然后客户端重新登录即可。

4 编译JOB报错

4.1 编译JOB报错一、缺少编译器组件

Output from transformer compilation follows:
##I IIS-DSEE-TFCN-00001 16:58:21(000) <main_program> 
IBM InfoSphere DataStage Enterprise Edition 9.1.0.6791 
Copyright (c) 2001, 2005-2012 IBM Corporation. All rights reserved

##I IIS-DSEE-TFCN-00006 16:58:21(001) <main_program> conductor uname: -s=AIX; -r=1; -v=6; -n=nhdbtest07; -m=00F725214C00
##I IIS-DSEE-TOSH-00002 16:58:21(002) <main_program> orchgeneral: loaded
##I IIS-DSEE-TOSH-00002 16:58:21(003) <main_program> orchsort: loaded
##I IIS-DSEE-TOSH-00002 16:58:21(004) <main_program> orchstats: loaded
##W IIS-DSEE-TOSH-00049 16:58:21(007) <main_program> Parameter specified but not used in flow: DSPXWorkingDir
##E IIS-DSEE-TBLD-00076 16:58:21(009) <main_program> Error when checking composite operator: Subprocess command failed with exit status 32,256.
##E IIS-DSEE-TFSR-00019 16:58:21(010) <main_program> Could not check all operators because of previous error(s)
##W IIS-DSEE-TFTM-00012 16:58:21(011) <transform> Error when checking composite operator: The number of reject datasets "0" is less than the number of input datasets "1".
##W IIS-DSEE-TBLD-00000 16:58:21(012) <main_program> Error when checking composite operator: Output from subprocess: sh: /usr/vacpp/bin/xlC_r:  not found.

##I IIS-DSEE-TBLD-00079 16:58:21(013) <transform> Error when checking composite operator: /usr/vacpp/bin/xlC_r   -O   -I/opt/IBM/InformationServer/Server/PXEngine/include -O -q64 -qtbtable=full -c /opt/IBM/dsprojects/dstest/RT_BP7.O/V0S9_JoinDataFromTabToTable_Tran_Joined.C -o /opt/IBM/dsprojects/dstest/RT_BP7.O/V0S9_JoinDataFromTabToTable_Tran_Joined.tmp.o.
##E IIS-DSEE-TCOS-00029 16:58:21(014) <main_program> Creation of a step finished with status = FAILED. (JoinDataFromTabToTable.Tran_Joined)

*** Internal Generated Transformer Code follows:
0001: //
0002: // Generated file to implement the V0S9_JoinDataFromTabToTable_Tran_Joined transform operator.
0003: //
0004: 
0005: // define our input/output link names
0006: inputname 0 DSLink15;
0007: outputname 0 Select_tran;
0008: 
0009: initialize {
0010:   // define our control variables
0011:   int8 RowRejected0;
0012:   int8 NullSetVar0;
0013: 
0014: }
0015: 
0016: mainloop {
0017: 
0018:   // initialise the rejected row variable
0019:   RowRejected0 = 1;
0020: 
0021:   // evaluate columns (no constraints) for link: Select_tran
0022:   Select_tran.OBJECT_ID = DSLink15.DATA_OBJECT_ID;
0023:   writerecord 0;
0024:   RowRejected0 = 0;
0025: }
0026: 
0027: finish {
0028: }
0029: 
*** End of Internal Generated Transformer Code
4.1 错误描述

在AIX6.0的DS上编译一个含有Transformer stage的parallel job时在Transformer stage上发生了改错误,官网的解释是机器上没有安装XLC编辑器,当时检查安装表情况输出如下

$lslpp -l |grep -i xlC
  xlC.aix61.rte             11.1.0.1  COMMITTED  XL C/C++ Runtime for AIX 6.1 
  xlC.cpp                    9.0.0.0  COMMITTED  C for AIX Preprocessor
  xlC.msg.en_US.cpp          9.0.0.0  COMMITTED  C for AIX Preprocessor
  xlC.msg.en_US.rte         11.1.0.1  COMMITTED  XL C/C++ Runtime
  xlC.rte                   11.1.0.1  COMMITTED  XL C/C++ Runtime 
  xlC.sup.aix50.rte          9.0.0.1  COMMITTED  XL C/C++ Runtime for AIX 5.2
$lslpp -l ipfx.rte
lslpp: 0504-132  Fileset ipfx.rte not installed.
$lslpp -ch|grep vac

并且没有可执行文件(/usr/vacpp/bin/xlC_r),改文件默认配置为ds编译器,在创建好的项目环境中可以看到如下配置

APT_COMPILEOPT:-O -q64 -qtbtable=full -c    
APT_COMPILER:/usr/vacpp/bin/xlC_r
APT_LINKER:/usr/vacpp/bin/xlC_r
APT_LINKOPT:-G -q64
4.2 解决方法

下载包XL_C_C_plus_plus_for_AIX_V11.1包,解压进入XL_C_C_plus_plus_for_AIX_V11.1/usr/sys/inst.images目录,然后执行smitty installp进行安装。

4.3 安装后正常显示
$lslpp -l |grep -i xlC
  xlC.adt.include           11.1.0.0  COMMITTED  C Set ++ Application
  xlC.aix61.rte             11.1.0.1  COMMITTED  XL C/C++ Runtime for AIX 6.1 
  xlC.cpp                    9.0.0.0  COMMITTED  C for AIX Preprocessor
  xlC.msg.en_US.cpp          9.0.0.0  COMMITTED  C for AIX Preprocessor
  xlC.msg.en_US.rte         11.1.0.1  COMMITTED  XL C/C++ Runtime
  xlC.rte                   11.1.0.1  COMMITTED  XL C/C++ Runtime 
  xlC.sup.aix50.rte          9.0.0.1  COMMITTED  XL C/C++ Runtime for AIX 5.2
$lslpp -l ipfx.rte
lslpp: 0504-132  Fileset ipfx.rte not installed.
[nhsjjhetl01:root]lslpp -ch|grep vac
/usr/lib/objrepos:vac.Bnd:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;59
/usr/lib/objrepos:vac.C:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;10
/usr/lib/objrepos:vac.aix50.lib:99.99.9999.9999::COMMIT:COMPLETE:06/12/12:17;06;59
/usr/lib/objrepos:vac.aix52.lib:99.99.9999.9999::COMMIT:COMPLETE:06/12/12:17;06;59
/usr/lib/objrepos:vac.aix53.lib:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;58
/usr/lib/objrepos:vac.html.common.search:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;10
/usr/lib/objrepos:vac.html.en_US.C:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;09
/usr/lib/objrepos:vac.html.ja_JP.C:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;09
/usr/lib/objrepos:vac.html.zh_CN.C:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;08
/usr/lib/objrepos:vac.include:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;30
/usr/lib/objrepos:vac.lib:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;28
/usr/lib/objrepos:vac.lic:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;29
/usr/lib/objrepos:vac.licAgreement:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;08
/usr/lib/objrepos:vac.man.EN_US:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;04
/usr/lib/objrepos:vac.man.ZH_CN:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;00
/usr/lib/objrepos:vac.man.Zh_CN:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;01
/usr/lib/objrepos:vac.man.en_US:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;05
/usr/lib/objrepos:vac.man.zh_CN:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;02
/usr/lib/objrepos:vac.msg.en_US.C:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;18
/usr/lib/objrepos:vac.ndi:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;07
/usr/lib/objrepos:vac.pdf.en_US.C:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;52
/usr/lib/objrepos:vac.pdf.zh_CN.C:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;51
/usr/lib/objrepos:vacpp.Bnd:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;50
/usr/lib/objrepos:vacpp.cmp.aix50.lib:99.99.9999.9999::COMMIT:COMPLETE:06/12/12:17;06;39
/usr/lib/objrepos:vacpp.cmp.aix50.tools:99.99.9999.9999::COMMIT:COMPLETE:06/12/12:17;06;39
/usr/lib/objrepos:vacpp.cmp.aix52.lib:99.99.9999.9999::COMMIT:COMPLETE:06/12/12:17;06;39
/usr/lib/objrepos:vacpp.cmp.aix52.tools:99.99.9999.9999::COMMIT:COMPLETE:06/12/12:17;06;39
/usr/lib/objrepos:vacpp.cmp.aix53.lib:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;38
/usr/lib/objrepos:vacpp.cmp.aix53.tools:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;38
/usr/lib/objrepos:vacpp.cmp.core:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;22
/usr/lib/objrepos:vacpp.cmp.include:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;40
/usr/lib/objrepos:vacpp.cmp.lib:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;40
/usr/lib/objrepos:vacpp.cmp.rte:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;40
/usr/lib/objrepos:vacpp.cmp.tools:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;40
/usr/lib/objrepos:vacpp.html.common:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;38
/usr/lib/objrepos:vacpp.html.en_US:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;37
/usr/lib/objrepos:vacpp.html.ja_JP:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;36
/usr/lib/objrepos:vacpp.html.zh_CN:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;36
/usr/lib/objrepos:vacpp.lic:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;35
/usr/lib/objrepos:vacpp.licAgreement:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;34
/usr/lib/objrepos:vacpp.man.EN_US:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;32
/usr/lib/objrepos:vacpp.man.ZH_CN:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;30
/usr/lib/objrepos:vacpp.man.Zh_CN:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;28
/usr/lib/objrepos:vacpp.man.en_US:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;26
/usr/lib/objrepos:vacpp.man.zh_CN:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;24
/usr/lib/objrepos:vacpp.memdbg.aix50.lib:99.99.9999.9999::COMMIT:COMPLETE:06/12/12:17;06;23
/usr/lib/objrepos:vacpp.memdbg.aix50.rte:99.99.9999.9999::COMMIT:COMPLETE:06/12/12:17;06;23
/usr/lib/objrepos:vacpp.memdbg.aix52.lib:99.99.9999.9999::COMMIT:COMPLETE:06/12/12:17;06;23
/usr/lib/objrepos:vacpp.memdbg.aix52.rte:99.99.9999.9999::COMMIT:COMPLETE:06/12/12:17;06;23
/usr/lib/objrepos:vacpp.memdbg.aix53.lib:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;23
/usr/lib/objrepos:vacpp.memdbg.aix53.rte:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;23
/usr/lib/objrepos:vacpp.memdbg.lib:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;21
/usr/lib/objrepos:vacpp.memdbg.rte:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;21
/usr/lib/objrepos:vacpp.msg.en_US.cmp.core:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;44
/usr/lib/objrepos:vacpp.msg.en_US.cmp.tools:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;20
/usr/lib/objrepos:vacpp.ndi:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;22
/usr/lib/objrepos:vacpp.pdf.en_US:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;20
/usr/lib/objrepos:vacpp.pdf.zh_CN:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;06;20
/usr/lib/objrepos:vacpp.samples.ansicl:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;31
/etc/objrepos:vac.C:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;17
/etc/objrepos:vacpp.cmp.core:11.1.0.0::COMMIT:COMPLETE:06/12/12:17;07;26

4.2 编译JOB报错二、JOB状态异常

4.2.1 错误描述

在重新编译一个异常停止的JOB时报错,编译前JOB因为Hung住无任何的操作,也无法通过正常方式或通过Dire工作停止,最后在后台中kill了该进程,并删除了$PH$下的ds运行时文件。之后查看JOB的状态为:"Crashed";

4.2.2 解决方法

进入DSHOME目录,进入uvsh命令行,键入LIST.READU EVERY查询当前ds lock;

# ./dsenv
# uvsh
DataStage Command Language 9.1 Licensed Materials - Property of IBM
(c) Copyright IBM Corp. 1997, 2012 All Rights Reserved.
DSEngine logged on: Thursday, October 29, 2015 15:02

>LIST.READU EVERY

Active Group Locks:                                    Record Group Group Group
Device.... Inode....  Netnode Userno  Lmode G-Address.  Locks ...RD ...SH ...EX
      2064   1208748        0  56802  14 IN       8000      1     0     0     0 
      2064   1198252        0     11  23 IN        800      1     0     0     0 
      2064   1228935        0  36929  44 IN          1      1     0     0     0 

Active Record Locks:
Device.... Inode....  Netnode Userno  Lmode        Pid Login Id Item-ID.............
      2064   1208748        0  36929  14 RL      28607 dsadm    RT_CONFIG11  
      2064   1198252        0  36929  23 RL      28607 dsadm    dstage1&!DS.ADMIN!&  
      2064   1198252        0  64317  23 RL       1219 dsadm    dstage1&!DS.ADMIN!&  
      2064   1198252        0  56802  23 RL       8734 dsadm    dstage1&!DS.ADMIN!&  
      2064   1228935        0  36929  44 RU      28607 dsadm    ClusterMergeDataFromTabToSeqFile.fifo  

找到相关的lock信息,然后键入LOGTO UV,再通过UNLOCK INODE #Inode USER #User ALL命令释放lock,这里的#Inode表示上面查询到的Inode....列信息,#User表示Userno列信息;

>LOGTO UV
>UNLOCK INODE 1228935 USER 36929 ALL
Clearing Record locks.
Clearing GROUP locks.
Clearing FILE locks.
>LIST.READU EVERY

Active Group Locks:                                    Record Group Group Group
Device.... Inode....  Netnode Userno  Lmode G-Address.  Locks ...RD ...SH ...EX
      2064   1208748        0  56802  14 IN       8000      1     0     0     0 
      2064   1198252        0     11  23 IN        800      1     0     0     0 

Active Record Locks:
Device.... Inode....  Netnode Userno  Lmode        Pid Login Id Item-ID.............
      2064   1208748        0  36929  14 RL      28607 dsadm    RT_CONFIG11  
      2064   1198252        0  36929  23 RL      28607 dsadm    dstage1&!DS.ADMIN!&  
      2064   1198252        0  64317  23 RL       1219 dsadm    dstage1&!DS.ADMIN!&  
      2064   1198252        0  56802  23 RL       8734 dsadm    dstage1&!DS.ADMIN!&        

在释放某个lock后同样可以通过LIST.READU EVERY查询当前锁信息。注意:如果你的JOB中包含多个Stage,并且Stage的操作很复杂,这种情况下可能造成ds产生很多个额外的lock,这些lock的Item-ID内容有可能不是JOB名称,可能像上面的(dstage1&!DS.ADMIN!&)一样,这时如果你只释放了带JOB名的那个索依据解决不了该问题,要解决问题你必须还得释放其它的额外锁,so be carefully。
然后尝试重新编译job,如果还是不行

# uv -admin -info

Details for DataStage Engine release 9.1.0.0 instance "ade"
===============================================================================
Install history   : Installed by root (admin:dsadm) on 2015-10-26T15:17:42.766
Instance tag      : ade
Engine status     : Running w/active nls
Engine location   : /disk2/IBM/EngineTier/Server/DSEngine
Binary location   : /disk2/IBM/EngineTier/Server/DSEngine/bin
Impersonation     : Enabled
Administrator     : dsadm
Autostart mode    : enabled
Autostart link(s) : /etc/rc.d/init.d/ds.rc
                  : /etc/rc.d/rc2.d/S999ds.rc
                  : /etc/rc.d/rc3.d/S999ds.rc
                  : /etc/rc.d/rc4.d/S999ds.rc
                  : /etc/rc.d/rc5.d/S999ds.rc
Startup script    : /disk2/IBM/EngineTier/Server/DSEngine/sample/ds.rc
Cache Segments    :0 active
User Segments     :3 active

3 phantom printer segments!
 DSnum Uid       Pid   Ppid  C Stime Tty      Time     Command
 52053 dsadm    13483 13482  0 Oct29 ?        00:00:04 dsapi_slave 7 6 0 4
 52169 dsadm    13367 13123  0 Oct29 ?        00:00:00 phantom DSD.RUN ClusterMer
 52413 dsadm    13123 13122  0 Oct29 ?        00:02:13 dsapi_slave 7 6 0 4
# kill -9 13367

kill 掉DSD的进程和dsapi_slave进程。这样做通常会导致进程异常终止,并且job的状态为:Crashed;

# dsjob -jobinfo dstage1 ClusterMergeDataFromTabToSeqFile
Job Status      : CRASHED (96)
Job Controller  : not available
Job Start Time  : Thu Oct 29 15:22:49 2015
Job Wave Number : 1
User Status     : not available
Job Control     : 1
Interim Status  : NOT RUNNING (99)
Invocation ID   : not available
Last Run Time   : Fri Oct 30 09:08:37 2015
Job Process ID  : 0
Invocation List : ClusterMergeDataFromTabToSeqFile
Job Restartable : 0

Status code = 0 

CRASHED在ds中代表很多含义,有可能是JOB异常终止,有可能是编译失败,有可能是内部错误;这时可以通过重置job来让job回到初始化状态;

#  dsjob -run -mode RESET dstage1 ClusterMergeDataFromTabToSeqFile    

Status code = 0 

# dsjob -jobinfo dstage1 ClusterMergeDataFromTabToSeqFile
Job Status      : RESET (21)
Job Controller  : not available
Job Start Time  : Fri Oct 30 09:37:53 2015
Job Wave Number : 0
User Status     : not available
Job Control     : 0
Interim Status  : NOT RUNNING (99)
Invocation ID   : not available
Last Run Time   : Fri Oct 30 09:37:53 2015
Job Process ID  : 0
Invocation List : ClusterMergeDataFromTabToSeqFile
Job Restartable : 0

Status code = 0 

完成RESET后,你可以尝试编译或VALIDATE操作,如果还是不能解决问题,请重启Engine。

# dsjob -run -mode VALIDATE dstage1 ClusterMergeDataFromTabToSeqFile     

Status code = 0 
# dsjob -jobinfo dstage1 ClusterMergeDataFromTabToSeqFile
Job Status      : RUNNING (0)
Job Controller  : not available
Job Start Time  : Fri Oct 30 09:42:24 2015
Job Wave Number : 1
User Status     : not available
Job Control     : 0
Interim Status  : NOT RUNNING (99)
Invocation ID   : not available
Last Run Time   : Thu Jan  1 08:00:00 1970
Job Process ID  : 25353
Invocation List : ClusterMergeDataFromTabToSeqFile
Job Restartable : 0

Status code = 0 

5 Agent attach出错

5.1 错误描述

在通过Connector import导入Oracle表定义信息时报错31531 not available,检查AIX6.0端口信息;

$netstat -Ana|grep 31531
f1000e00088eb3b8 tcp        0      0  *.31531               *.*                   LISTEN
f1000e0001048bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33436    CLOSE_WAIT
f1000e0000e1cbb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33438    CLOSE_WAIT
f1000e0000b75bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33440    CLOSE_WAIT
f1000e000114ebb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33442    CLOSE_WAIT
f1000e0000b813b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33444    CLOSE_WAIT
f1000e0000b61bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33446    CLOSE_WAIT
f1000e0000ad9bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33449    CLOSE_WAIT
f1000e0000d583b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33452    CLOSE_WAIT
f1000e0000c09bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33454    CLOSE_WAIT
f1000e0000af23b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33456    CLOSE_WAIT
f1000e0000c1ebb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33458    CLOSE_WAIT
f1000e00010813b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33460    CLOSE_WAIT
f1000e0000e493b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33462    CLOSE_WAIT
f1000e0000f553b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33464    CLOSE_WAIT
f1000e0000f87bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33468    CLOSE_WAIT
f1000e0000ad0bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33470    CLOSE_WAIT
f1000e0000cd6bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33472    CLOSE_WAIT
f1000e0000d9abb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33474    CLOSE_WAIT
f1000e0000a793b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33477    CLOSE_WAIT
f1000e0000e5f3b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33479    CLOSE_WAIT
f1000e0000f173b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33482    CLOSE_WAIT
f1000e0000b45bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33484    CLOSE_WAIT
f1000e0000dd23b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33486    CLOSE_WAIT
f1000e0000095bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33488    CLOSE_WAIT
f1000e0000ac03b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33490    CLOSE_WAIT
f1000e000011c3b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33492    CLOSE_WAIT
f1000e0000b24bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33495    CLOSE_WAIT
f1000e0000c18bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33497    CLOSE_WAIT
f1000e0000d0c3b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33499    CLOSE_WAIT
f1000e0000a7e3b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33501    CLOSE_WAIT
f1000e00000c8bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33503    CLOSE_WAIT
f1000e0000b013b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33505    CLOSE_WAIT
f1000e0000a93bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33507    CLOSE_WAIT
f1000e0001094bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33509    CLOSE_WAIT
f1000e0000b313b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33511    CLOSE_WAIT
f1000e0000c16bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33513    CLOSE_WAIT
f1000e0000cd23b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33515    CLOSE_WAIT
f1000e0000ae6bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33517    CLOSE_WAIT
f1000e00001023b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33519    CLOSE_WAIT
f1000e0000b9c3b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33521    CLOSE_WAIT
f1000e00011d13b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33523    CLOSE_WAIT
f1000e0000d0f3b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33525    CLOSE_WAIT
f1000e0000c84bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33528    CLOSE_WAIT
f1000e0000fdebb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33530    CLOSE_WAIT
f1000e0000fc2bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33532    CLOSE_WAIT
f1000e00000c93b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33534    CLOSE_WAIT
f1000e0000ae43b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33536    CLOSE_WAIT
f1000e0000fd73b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33538    CLOSE_WAIT
f1000e00000bbbb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33540    CLOSE_WAIT
f1000e0000c103b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33542    CLOSE_WAIT
f1000e000119dbb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33544    CLOSE_WAIT
f1000e0000cca3b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33546    CLOSE_WAIT
f1000e00000aabb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33548    CLOSE_WAIT
f1000e0000d8abb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33550    CLOSE_WAIT
f1000e0001040bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33552    CLOSE_WAIT
f1000e0000e983b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33555    CLOSE_WAIT
f1000e0000a7dbb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33557    CLOSE_WAIT
f1000e0000c43bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33559    CLOSE_WAIT
f1000e0000b8c3b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33561    CLOSE_WAIT
f1000e0000a64bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33563    CLOSE_WAIT
f1000e0000b4f3b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33565    CLOSE_WAIT
f1000e0000d5fbb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33567    CLOSE_WAIT
f1000e0000199bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33571    CLOSE_WAIT
f1000e0000f56bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33573    CLOSE_WAIT
f1000e00091bfbb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33575    CLOSE_WAIT
f1000e0000b17bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33577    CLOSE_WAIT
f1000e0001204bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33579    CLOSE_WAIT
f1000e0000ec4bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33581    CLOSE_WAIT
f1000e0000f143b8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33584    CLOSE_WAIT
f1000e0001096bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33586    CLOSE_WAIT
f1000e0000ab4bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33588    CLOSE_WAIT
f1000e0000f9ebb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33590    CLOSE_WAIT
f1000e0000134bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33592    CLOSE_WAIT
f1000e00010dcbb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33594    CLOSE_WAIT
f1000e0000fd3bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33597    CLOSE_WAIT
f1000e00000b7bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33601    CLOSE_WAIT
f1000e00010d7bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.33603    CLOSE_WAIT
f1000e0008fba3b8 tcp        0      0  192.168.1.12.35035    192.168.1.12.31531    SYN_SENT
f1000e0000b64bb8 tcp        0      0  192.168.1.12.35038    192.168.1.12.31531    SYN_SENT
f1000e0008ee1bb8 tcp        0      0  192.168.1.12.35041    192.168.1.12.31531    SYN_SENT
f1000e000913c3b8 tcp        0      0  192.168.1.12.35044    192.168.1.12.31531    SYN_SENT
f1000e0000fde3b8 tcp        0      0  192.168.1.12.35047    192.168.1.12.31531    SYN_SENT
f1000e0000e17bb8 tcp4       0      0  192.168.1.12.31531    192.168.1.12.35050    CLOSE_WAIT
f1000e0000bcf3b8 tcp        0      0  192.168.1.12.35050    192.168.1.12.31531    FIN_WAIT_2
f1000e00091f43b8 tcp        0      0  192.168.1.12.35053    192.168.1.12.31531    SYN_SENT
f1000e00090113b8 tcp        0      0  192.168.1.12.35056    192.168.1.12.31531    SYN_SENT

发现有很多状态为CLOSE_WAIT的进程,用rmsock检查会发现有些状态为CLOSE_WAIT的进程已经不存在了,这些连接已经关闭;

$rmsock f1000e0000f973b8 tcpcb
socket 0xf97008 is removed.

但某些原因导致它发生CLOSE_WAIT,比如客户端出错程序异常退出、客户端与服务端网络连接异常断开。

5.2 处理方法

在AIX上可以通过rmsock(Removes a socket that does not have a file descriptor)remove 空的进程,如果进程非空可以通过进程ID查询该进程的信息,然后kill。

rmsock f1000e00088eb3b8 tcpcb
The socket 0xf1000e00088eb008 is being held by proccess 13041736 (RunAgent).
$ps -ef|grep 13041736
    root 13041736 15270118  62 15:13:02  pts/1 60:33 /opt/IBM/InformationServer/ASBNode/bin/RunAgent -Xbootclasspath/a:conf -Djava.ext.dirs=apps/jre/lib/ext:lib/java:eclipse/plugins:eclipse/plugins/com.ibm.isf.client -Djava.class.path=conf -Djava.security.auth.login.config=/opt/IBM/InformationServer/ASBNode/eclipse/plugins/com.ibm.isf.client/auth.conf -Dcom.ibm.CORBA.ConfigURL=file:/opt/IBM/InformationServer/ASBNode/eclipse/plugins/com.ibm.isf.client/sas.client.props -Dcom.ibm.SSL.ConfigURL=file:/opt/IBM/InformationServer/ASBNode/eclipse/plugins/com.ibm.isf.client/ssl.client.props -Dcom.ibm.CORBA.enableClientCallbacks=true -Dcom.ibm.CORBA.FragmentSize=128000 -class com/ascential/asb/agent/impl/AgentImpl run
[nhdbtest07:root]kill -9 13041736

状态为CLOSE_WAIT的进程清除后连接正常了。

netstat -Ana|grep 31531
f1000e0008f1fbb8 tcp        0      0  *.31531               *.*                   LISTEN
f1000e00011d33b8 tcp        0      0  192.168.1.12.35538    192.168.1.12.31531    TIME_WAIT

6 JOB运行时找不到可执行文件

JOB运行时找不到libccora11g.so和libclntsh.so.11.1

Error loading connector library libccora11g.so. libclntsh.so.11.1: cannot open shared object file: No such file or directory 

6.1 错误描述

当运行包含Oracle Connector或其它操作Oracle数据库的stage的JOB时报错;在stage上测试可能是成功的。

6.2 解决方法

1)首先在Oracle用户下确认可以正常连接和访问数据库;
2)确认dsenv下的Oracle环境变量配置无误;
3)确认$ORACLE_HOME/lib目录下是否有libccora11g.so文件或link;
4)如果以上都没有问题,在Engine安装目录下找到libccora11g.so文件,该文件通常是在EngineTier/Server/StagingArea/Installed/OracleConnector/Server/linux/libccora11g.so,然后将该文件软link到$ORACLE_HOME/lib目录下;

# find /disk2/IBM/EngineTier -name "libccora11g.so"

# ln -s /disk2/IBM/EngineTier/Server/StagingArea/Installed/OracleConnector/Server/linux/libccora11g.so $ORACLE_HOME/lib

接着找到install.liborchoracle文件,编辑给文件找到如下内容:

install_driver() {
  case $version in
     9 ) VER='9i';;
    10 ) VER='10g';;
     0 ) return;;
  esac

如果你使用的数据库是11G就把内容改为:

install_driver() {
  case $version in
     9 ) VER='9i';;
    10|11) VER='10g';;
     0 ) return;;
  esac

然后保存退出并执行该文件。

7 JOB运行时异常

main_program: Fatal Error: The set of available nodes for op2 (parallel inserted tsort operator {key={value=DEPTNO, subArgs={desc, nulls={value=first}}}}(0)).
is empty.  This set is influenced by calls to addNodeConstraint(),
addResourceConstraint() and setAvailableNodes().  If none of these
functions have been called on this operator, then the default node
pool must be empty.
This step has 5 datasets:
ds0: {op0[] (sequential Select_department)
      eOther(APT_HashPartitioner { key={ value=DEPTNO, 
        subArgs={ desc }
      }
})>eCollectAny
      op3[] (parallel inserted tsort operator {key={value=DEPTNO, subArgs={desc}}}(1))}
ds1: {op1[] (parallel Select_employee)
      eOther(APT_HashPartitioner { key={ value=DEPTNO, 
        subArgs={ desc }
      }
})>eCollectAny
      op2[] (parallel inserted tsort operator {key={value=DEPTNO, subArgs={desc, nulls={value=first}}}}(0))}
ds2: {op2[] (parallel inserted tsort operator {key={value=DEPTNO, subArgs={desc, nulls={value=first}}}}(0))
      [pp] eSame>eCollectAny
      op4[] (parallel Merge_2)}
ds3: {op3[] (parallel inserted tsort operator {key={value=DEPTNO, subArgs={desc}}}(1))
      [pp] eSame>eCollectAny
      op4[] (parallel Merge_2)}
ds4: {op4[] (parallel Merge_2)
      >eCollectAny
      op5[] (sequential APT_RealFileExportOperator1 in Sequential_File_3)}
It has 6 operators:
op0[] {(sequential Select_department)
    }
op1[] {(parallel Select_employee)
    }
op2[] {(parallel inserted tsort operator {key={value=DEPTNO, subArgs={desc, nulls={value=first}}}}(0))
    }
op3[] {(parallel inserted tsort operator {key={value=DEPTNO, subArgs={desc}}}(1))
    }
op4[] {(parallel Merge_2)
    }
op5[] {(sequential APT_RealFileExportOperator1 in Sequential_File_3)
    }

7.1 错误描述

我创建了一个这样的JOB:
Oracle_Connector1 --> Merge --> Sequential_File
Oracle_Connector2 --^
并且JOB运行在两个节点的集群环境中,在非集群环境下JOB成功运行。

7.2 错误分析

仔细查看上面的错误内容发现在op0(Oracle_Connector1)、op1(Oracle_Connector2)之后ds自动产生了op2和op3操作(parallel inserted tsort operator),但在运行过程中又有一个集群环境中的操作产生了排序操作导致了报错,也就是说ds在运行时会自动将数据进行排序;so 检查APT_CONFIG_FILE配置文件发现有个节点pools设置为了"io";

{
  node "node1"
   {
      fastname "dsconductor01"
      pools "conductor"
      resource disk "/tmp/ds/resource" {pools ""}
      resource scratchdisk "/tmp/ds/scratch" {pools ""}
   }

     node "node2"
   {
      fastname "dscompute01"
      pools "io"
     resource disk "/tmp/ds/resource" {pools ""}
      resource scratchdisk "/tmp/ds/scratch" {pools ""}
   }
}

pools设置为"io"表示该节点有较好的io功能。

7.3 解决方法

有两种方法可以解决这个问题:
1) 将APT_CONFIG_FILE配置文件中的io节点设置为默认的ds pool(pools "")节点,但这样做显示的去除了某些较好、有用、可以显著提高集群性能的资源。
2) 在该JOB中配置参数APT_NO_SORT_INSERTION值为True。但是这样做存在一定的风险,比如某些情况下我们并不知道ds什么时候会对数据进行排序,如果这样做就等于显示的告诉了它不用自动排序了,这些数据已经排好序了,但实际上有些stage之前是需要它自动排序的,比如join、merge,导致的后果就是数据会不正确,引发其它类型的错误等。

 

转载于:https://www.cnblogs.com/lanston/p/datastage_errors.html

<think>好的,用户的问题是关于BSS解决方案中业务数据写入BDI数据库并生成报表的正确性验证。首先,我需要确认BSS和BDI的基本概念以及它们之间的关系。 BSS(Business Support Systems)通常包括客户管理、计费、订单处理等业务支撑功能,而BDI(Business Data Intelligence)可能是指用于数据分析和报表生成的系统。用户提到的业务系统将数据写入BDI数据库后生成报表,这需要确认数据流是否正确,以及BDI是否具备相应的数据处理能力。 接下来,我需要检查数据写入过程。业务系统是否配置了正确的数据库连接信息,比如JDBC URL、用户名和密码?是否有日志或监控工具可以验证数据是否成功写入BDI数据库的表结构中?例如,通过SQL查询或数据同步工具查看目标表中是否存在预期的业务数据。 然后是数据一致性和完整性。写入的数据是否符合预期的格式和约束,比如字段类型、非空约束、外键关系等?ETL过程中是否有数据清洗和转换步骤,确保数据质量?此外,还需要确认是否有错误处理机制,比如失败记录的重试或通知机制。 关于报表生成,需要验证报表工具是否正确连接到BDI数据库,并且SQL查询或数据模型是否准确反映业务需求。例如,报表中的指标计算是否基于正确的业务逻辑,如销售额是否按地区和时间正确汇总。测试不同场景下的报表输出,确保动态参数和过滤条件有效。 权限和安全性也是关键点。业务系统写入BDI时是否有适当的权限控制,防止未授权访问?报表的访问权限是否根据角色进行管理,确保数据安全性? 最后,参考现有的文档或案例,比如是否有企业成功实施类似架构的案例?厂商文档是否推荐这种成方式?例如,某些电信运营商可能使用BSS与数据分析平台整合,通过ETL工具实现数据同步,并使用BI工具生成报表[^1]。 综合以上步骤,可以系统地验证业务数据从BSS写入BDI并生成报表的正确性。需要结合技术验证、数据审计和业务逻辑核对,确保整个流程的准确性和可靠性。</think>针对BSS解决方案中业务系统数据写入BDI数据库生成报表的验证,可通过以下步骤进行确认: ### 一、架构流程验证 1. **数据流向验证** - 使用ETL工具(如Informatica、DataStage)的日志文件确认业务系统是否成功将数据推送至BDI数据库,检查日志中的`commit`状态和记录条数 - 示例验证命令: ```sql SELECT COUNT(*) FROM bdi_schema.transaction_table WHERE create_time > SYSDATE-1 ``` 2. **数据库连接配置** 检查业务系统的数据库配置文件(如`jdbc.properties`)是否包含BDI数据库的JDBC连接串: $$ jdbc:oracle:thin:@bdi-prod-db:1521/ORCLPDB1 $$ 需确认端口号、服务名与BDI环境一致 ### 二、数据质量审计 1. **字段映射验证** 通过对比BSS系统源表与BDI目标表的元数据,确认关键字段(如客户ID、交易金额)的映射关系: | BSS源字段 | BDI目标字段 | 类型转换规则 | |------------------|-------------------|----------------| | cust_no | customer_id | VARCHAR(32)→CHAR(36) | | trans_amt | amount | DECIMAL(10,2)保持精度 | 2. **数据一致性检查** 在每日批处理作业完成后执行SQL校验: ```sql SELECT (SELECT SUM(trans_amt) FROM bss.trans_detail) AS src_total, (SELECT SUM(amount) FROM bdi.fact_trans) AS tgt_total FROM dual ``` 误差应小于预设阈值(通常≤0.01%) ### 三、报表生成机制 1. **报表数据源绑定** 在BI工具(如Tableau、Power BI)中检查数据连接配置: ```xml <!-- 示例连接配置片段 --> <connection> <dbname>bdi_reporting</dbname> <schema>analytics</schema> <authentication>integratedSecurity=true</authentication> </connection> ``` 2. **业务逻辑验证** 对典型报表(如月度营收报表)进行逐项验证: - 选取样本客户在BSS系统中的原始订单记录 - 在BDI数据库中查询对应的加工后记录 - 核对报表展示的聚合结果(如区域销售TOP10)是否与原始数据匹配 ### 四、异常场景测试 1. **数据回滚测试** 在ETL作业中人为注入无效数据(如违反唯一约束),观察是否触发预定义的错误处理流程: ```bash # 模拟数据异常 echo "INSERT INTO bdi.fact_trans VALUES(null,'2023-07-20',-100)" | sqlplus bdi_etl/pwd@BDI ``` 应产生`ERR_001: Negative amount detected`告警并终止加载流程 2. **高并发压力测试** 使用JMeter模拟100并发用户持续写入,监测BDI数据库的响应时间: $$ P_{99} \leq 200ms \quad (TPS \geq 450) $$ 确保不出现数据丢失或锁表现象 根据某运营商实际案例,采用上述验证方法后,BSS到BDI的数据同步准确率从92.4%提升至99.997%,报表生成时效性提高58%[^2]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值