DFS Explained

本文深入介绍了Documentum Foundation Services (DFS) 的核心概念和技术细节。DFS作为Documentum平台的服务导向架构(SOA)接口,通过Web服务提供远程调用功能,并允许本地使用Java客户端库。文章详细解释了DFS的数据模型,包括DataPackage、DataObject等关键对象类型,以及它们如何在不同的业务流程中交换数据。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

DFS Explained

 

Documentum Foundation Services the SOA face of Documentum. The remote invocation of DFS is implemented using SOAP based web-services. The beauty of DFS is that though DFS is exposed for the remote usages as Web Services it can be called locally using DFS Client libraries. DFS allows the migration easy by allowing services to be developed on existing BOF (Documentum Business Object Framework) also.
Another interesting point about this is DFS consolidates the numerous inter depended methods into a Single Service. The DFS data model is expressed in both Java client library class and also in Service XML Schemas. This provides a consistent approach to model data that will be exchanged between the various business processes.


As mentioned above the DFS clients can be of two types.
1. Applications (Consumers) written any language that can use WSDL to interact with DFS.
2. Clients written on java that uses DFS Java Client library.


Data Model of DFS
The data passed to and from the Services are encapsulated into DFS Object Model. These are the few important object types on DFS:
 DataPackage
 DataObject
 ObjectIdentity
 Property
 Content
 Permissions
 Relationship

Lets see what all these does now


DataPackage
According to the DFS Reference Guide from EMC. The DataPackage class defines the fundamental unit of information that contains data passed to and returned by services operating in the DFS framework.
That means DataPackage is a collection of DataObject Instances, which is passed back and forth by Object Service Operations. In other words when you call services like Create, Get, Update the Data Object Instances are passed using DataPackage class.
It’s like an Envelope for putting various DataObject Instances. Object service operations process all the DataObject instances in the DataPackage sequentially.


DataObject
A DataObject is the DFS representation of a Persistent Object in Documentum Repository. DataObject potentially has all the information related to an Object in repository. This includes the Content, properties (Both Single and Repeated), relation with other Objects in repository, Its permissions etc. Due the nature of these objects, Data Objects are very complex.
Optionally these object instances may have instructions to services about how the parts of Objects have to be processed.

The type field of DataObject class represents the underlying typed object type name that corresponds to the Object Instance. For example dm_user, dm_document etc. default type is dm_document. I.e. if no type is specified then service implicitly assign that to type dm_document
  DataObject
  ProperySet
  Content
  ObjectIdentity
  Permission
  Relationship

The above Figure represents a DataObject.

 

ObjectIdentity
As the name says this represents a Unique Object in the Repository. An instance of this class must have the repository name and a unique identifier for that object. The value that represents a repository object may be any of the following
OBJECT_ID – Contains r_object_id of the Object (Represents both Current Object and Non Current Object)
OBJECT_PATH – Contains String expression of the path of the Object in repository
[Cabinet]/[Folder]/[File_Name]
(This Represents only Current Object. Since you can create multiple objects with same name in same directory this does not guarantee uniqueness of the object)
QUALIFICATION – DQL Snippet that qualify an Object uniquely
(The DQL Snippet is the Full DQL that follows the keyword FROM. E.G. dm_document where r_object_id=’09xxxxx’)
Note: During the Creation of the object all you need to populate is only repository name.


Property
Each Property represents an Object Attribute (Property) in the Documentum Repository. Property Set is a collection, which works as a container of Property Objects.
As there are 2 types of Properties for an Object a property can either be a Single property (Single Value Attribute) or Array Property (Repeated Attributes).
Property class has been sub classed to accommodate various data types and they are as follows.

StringProperty
NumberProperty
BooleanProperty
DateProperty
ObjectIdProperty
ArrayProperty (Abstract Class)
        StringArrayProperty
        NumberArrayProperty
        BooleanArrayProperty
        DateArrayProperty
        ObjectIdArrayProperty

 

Transient Property
These properties are not the part of persistent properties of a repository object. Transient Property can send custom data fields to a service to be used for a purpose other than setting attributes on repository objects.


Array Property and Value Action
The Repeating Attributes of an Object type is represented as Array Properties. As you can see in the above table The ArrayProperty is an array of the corresponding single property.
Another Interesting aspect of Array Property is Value Action. Now lets see what value action is all about? This is an optional Action – Index mapping pair. These pair has the instruction as to what is the action to be done on a particular element of the ArrayProperty.
The Index is the position in the attribute and Action is what is the action to be performed on that element of that index.
The possible ActionTypeValues are
Append
When processing ValueAction[p], the value at ArrayProperty[p] is appended to the end of repeating properties list of the persistent repository object. The index of the ValueAction item is ignored.
Insert
When processing ValueAction[p], the value at ArrayProperty[p] is inserted into the repeating attribute list before position index. Note that 1, which must be accounted for in subsequent processing, offsets all items in the list to the right of the insertion point.
Delete
The item at position index of the repeating attribute is deleted. When processing ValueAction[p] the value at ArrayProperty[p] must be set to a empty value.

Set
When processing ValueAction[p], the value at ArrayProperty[p] replaces the value in the repeating attribute list at position index. (From DFS Development Guide)

 

Content
Content Class or its Subclass instance represents the actual File Content associated with a DataObject. DataObject can have zero or more Content objects. It can hold Renditions also. Content class object can be configured to hold following
        The Whole Document
        Page of a Document
        Pages (One or More) which has been represented by the characteristic


Permissions
A DataObject has list of Permission Objects, which decides the permission of the object that’s represented by the DataObject. The permission list provides read access to the permission on an object in repository by the user who has been logged in currently.
Also have to note an interesting point here that You Cannot change permissions on a repository object by changing the Permission list and saving the DataObject.
For changing permissions of a repository object the real ACL of that object has to be edited or replaced.
As mentioned above multiple Permission Objects creates Permission List. The Permission Object has a field named permission Type that sets what type of permission is set. The values possible for these fields are BASIC, EXTENDED or CUSTOM
Another important thing about permissions are if you assign one particular permission for one user to one object, all the lower lever permissions below the assigned permissions will be granted to that user. This is known as Compound or Hierarchical Permissions

内容概要:文章详细介绍了ETL工程师这一职业,解释了ETL(Extract-Transform-Load)的概念及其在数据处理中的重要性。ETL工程师负责将分散、不统一的数据整合为有价值的信息,支持企业的决策分析。日常工作包括数据整合、存储管理、挖掘设计支持和多维分析展现。文中强调了ETL工程师所需的核心技能,如数据库知识、ETL工具使用、编程能力、业务理解能力和问题解决能力。此外,还盘点了常见的ETL工具,包括开源工具如Kettle、XXL-JOB、Oozie、Azkaban和海豚调度,以及企业级工具如TASKCTL和Moia Comtrol。最后,文章探讨了ETL工程师的职业发展路径,从初级到高级的技术晋升,以及向大数据工程师或数据产品经理的横向发展,并提供了学习资源和求职技巧。 适合人群:对数据处理感兴趣,尤其是希望从事数据工程领域的人士,如数据分析师、数据科学家、软件工程师等。 使用场景及目标:①了解ETL工程师的职责和技能要求;②选择适合自己的ETL工具;③规划ETL工程师的职业发展路径;④获取相关的学习资源和求职建议。 其他说明:随着大数据技术的发展和企业数字化转型的加速,ETL工程师的需求不断增加,尤其是在金融、零售、制造、人工智能、物联网和区块链等领域。数据隐私保护法规的完善也使得ETL工程师在数据安全和合规处理方面的作用更加重要。
### 关于 PDF 的定义及其工作原理 PDF 是 Portable Document Format(便携式文档格式)的缩写,由 Adobe 开发并于 1993 年首次发布。它是一种通用文件格式,旨在捕获来自任何应用程序的字体、图像、超链接和其他多媒体内容,并将其封装到一个独立且可移植的文档中[^5]。 #### 特性和功能 PDF 文件的主要特性之一是其跨平台兼容性。无论是在 Windows、MacOS 还是 Linux 上打开,PDF 文档都能保持原始布局和样式不变。这种一致性得益于 PDF 使用的一种固定版面技术,该技术基于 PostScript 页面描述语言并扩展了其功能[^6]。 #### 工作机制 PDF 的内部结构可以分为以下几个部分: 1. **对象模型**:PDF 中的内容被表示为一系列的对象,这些对象包括字典、数组、字符串以及流数据等基本单元。 2. **交叉引用表**:为了快速定位特定对象的位置,在文件末尾通常会有一个交叉引用表记录各个对象在文件中的偏移量。 3. **压缩方法**:许多现代 PDF 实现都采用了 Flate 解码或其他更先进的算法来减小文件大小而不损失质量[^7]。 当创建一个新的 PDF 或者编辑现有 PDF 时,软件程序按照上述标准构建相应的对象树并将它们序列化成二进制形式存储起来;而读取 PDF 则相反——解析器先加载整个文件进入内存缓冲区,接着通过查找根目录找到页面集合及其他资源位置,最后渲染显示给用户查看。 此外值得注意的是随着互联网的发展和技术进步,如今除了静态文本图片外还支持嵌入音频视频甚至交互控件等功能丰富的动态型 PDF 应用场景也越来越多见比如电子书在线表单签名验证等等[^8]。 ```python from PyPDF2 import PdfReader reader = PdfReader("example.pdf") number_of_pages = len(reader.pages) page = reader.pages[0] text = page.extract_text() print(text) ``` 以上是一个简单的 Python 脚本例子展示如何利用第三方库 `PyPDF2` 提取 PDF 第一页上的纯文字信息作为演示用途仅限于此实际应用可能涉及更多复杂操作像加密解密旋转裁剪合并拆分转换导出等多种需求均可满足取决于具体开发环境和个人偏好选用合适工具完成相应任务即可[^9]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值