Oracle and RTF data

本文介绍如何使用OracleText将RTF文档转换为纯文本或HTML格式,包括创建索引和使用过滤器的方法。

Yes, Oracle Text can do this.

http://download-west.oracle.com/docs...cpkg.htm#12729

You need a context index with the INSO filter on the RTF document column in
order to use it.

e.g.

create table RTFDOCUMENT
(
ID NUMBER not null,
DOC CLOB not null,
PLAIN_DOC CLOB
);

[insert an RTF document with id=1 using dbms_lob.loadclobfromfile or whatever]
SQL> create index rtfdocument_tx1 on rtfdocument (doc) indextype is
ctxsys.context parameters ('filter ctxsys.INSO_FILTER'); Index created
SQL> declare
2 v_doc clob;
3 v_plain clob;
4 r number;
5 begin
6 select id, doc
7 into r, v_doc
8 from rtfdocument
9 where id = 1 for update;
10
11 dbms_lob.createtemporary(v_plain, true);
12 dbms_lob.append(v_plain, v_doc); 13
14 ctx_doc.filter('RTFDOCUMENT_TX1', to_char(r), v_plain, plaintext =>
true);
15
16 update rtfdocument set plain_doc = v_plain where id = 1;
17 end;
18 /

PL/SQL procedure successfully completed
SQL> commit;

Commit complete

[plain_doc now contains a plain text version of the RTF document].

It seems a bit over-enthusiastic with newlines, but it certainly works.

 

from ask tom

 

Tim -- Thanks for the question regarding "Remove RTF Tags from a Text Field Containing RTF", version 9i

Submitted on 17-Oct-2004 12:44 Central time zone
Tom's latest followup | Bookmark | Bottom
Last updated 2-Nov-2008 16:54

You Asked

Hi Tom,
In our database we have some varchar2(4000) fields (could later be a clob). Where we 
store RTF data. I would like to know if there is anyway I can remove the RTF tags 
returning only the text again. For Example a Function RTFtoTEXT that takes the RTF:

{/rtf1/ansi/deff0{/fonttbl{/f0/fnil/fcharset0 Courier New;}}
{/colortbl ;/red0/green0/blue255;}
{/*/generator Msftedit 5.41.15.1507;}/viewkind4/uc1/pard/lang1033/ul/b/i/f0/fs20 This is 
a test./par
/cf1/ulnone/i0 This is a test./cf0/b0/par
/par
/par
/par
/par
}
 
And Returns:

This is a test.
This is a test.

Can you do this with Oracle Text or Something?

TIA
Tim 

 

and we said...

Yes, we can do this with text -- you can

a) filter to plaintext if you have an index
b) filter to html with or without an index
c) call ctxhx directly from the command line to filter the text and load it

I'll demo a) and b).  you can play with ctxhx from the command line from 
$ORACLE_HOME/ctx/bin if you want (run it, it'll tell you the inputs it takes)



ops$tkyte@ORA9IR2> create table demo
  2  ( id            int primary key,
  3    theblob   blob,
  4    theclob       clob
  5  )
  6  /
 
Table created.
 
ops$tkyte@ORA9IR2> create table filter ( query_id number, document clob );
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create index demo_idx on demo(theblob) indextype is ctxsys.context;
 
Index created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create sequence s;
 
Sequence created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create or replace directory my_files as '/home/tkyte/Desktop/'
  2  /
 
Directory created.
 
ops$tkyte@ORA9IR2> @trace
ops$tkyte@ORA9IR2> alter session set events '10046 trace name context forever, level 12';
 
Session altered.
 
ops$tkyte@ORA9IR2> declare
  2          l_blob  blob;
  3          l_clob  clob;
  4          l_id    number;
  5          l_bfile bfile;
  6  begin
  7          insert into demo values ( s.nextval, empty_blob(), empty_clob() )
  8          returning id, theblob, theclob into l_id, l_blob, l_clob;
  9
 10          l_bfile := bfilename( 'MY_FILES', 'asktom.rtf' );
 11          dbms_lob.fileopen( l_bfile );
 12
 13          dbms_lob.loadfromfile( l_blob, l_bfile,
 14                                                     dbms_lob.getlength( l_bfile ) );
 15
 16          dbms_lob.fileclose( l_bfile );
 17
 18          ctx_doc.ifilter( l_blob, l_clob );
 19          commit;
 20          ctx_doc.filter( 'DEMO_IDX', l_id, 'FILTER', l_id, TRUE );
 21  end;
 22  /
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> set long 500
ops$tkyte@ORA9IR2> select utl_raw.cast_to_varchar2(dbms_lob.substr(theblob,500,1)) from 
demo;
 
UTL_RAW.CAST_TO_VARCHAR2(DBMS_LOB.SUBSTR(THEBLOB,500,1))
-------------------------------------------------------------------------------
{/rtf1/ansi/ansicpg1252/uc1 /deff0/deflang1033/deflangfe1033{/fonttbl{/f0/froma
n/fcharset0/fprq2{/*/panose 02020603050405020304}Times New Roman;}{/f1/fswiss/f
charset0/fprq2{/*/panose 020b0604020202020204}Arial;}
{/f2/fmodern/fcharset0/fprq1{/*/panose 02070309020205020404}Courier New;}{/f23/
froman/fcharset128/fprq1{/*/panose 00000000000000000000}MS Mincho{/*/falt MS ??
};}{/f28/froman/fcharset128/fprq1{/*/panose 00000000000000000000}@MS Mincho;}
{/f29/froman/fcharset238/fprq2 Times New Roman CE;
 
 
ops$tkyte@ORA9IR2> select theclob from demo;
 
THECLOB
-------------------------------------------------------------------------------
<HTML><BODY>
<h1><font size="5" face="Arial"><b>Primary key index in Partitioning</b></font>
</h1>
<p><font size="3" face="Times New Roman"><i>I</i> <i>have a table accounts whic
h has 80 million records (OLTP system). I would like to partition the table by
acct_by_date column.&nbsp;I will be going with range partition and global index
es. My concern is regd the primary key
acct_id. The index that will be created for primary key should it be local or g
lobal and which should I opt for?</i></font></
 
 
ops$tkyte@ORA9IR2> select document from filter;
 
DOCUMENT
-------------------------------------------------------------------------------
 
 
 
 
Primary key index in Partitioning
 
 
 
I have a table accounts which has 80 million records (OLTP system). I would lik
e to partition the table by acct_by_date column.&#65533;I will be going with range par
tition and global indexes. My concern is regd the primary key acct_id. The inde
x that will be created for primary key should it be local or global and which s
hould I opt for?
 
 
 
Well, this is an easy one.&#65533;The primary key index can be local IF and ONLY IF, t
he primary key is in fact the (or part of th



ifilter works without a ctxsys.context index, but only lets you get HTML, filter works 
only with an index -- but lets you get plain text OR html
 


Reviews   
4 stars Thanks but a small question   October 19, 2004 - 10am Central time zone
Reviewer:  Tim from Denmark
Hi Tom,
Thanks for that it put me on the right track.

Just a quick followup question or three :)

1) I am assuming the trace commands are not needed for anything special or is there a reason for 
this?

2) When I do the filter on a clob nothing happens. Why? I'm guessing that this is already 
considered plain text or what? 

create table workflow_temprtf
( id  int primary key,
  theblob blob,
  theclob  clob
);
 
create table workflow_rtffilter (query_id number,document clob);
 
create index workflow_rtfidx on workflow_temprtf(theblob) indextype is ctxsys.context;
create index workflow_rtfidx2 on workflow_temprtf(theclob) indextype is ctxsys.context;

create sequence workflow_rtfseq;

declare
     l_blob  blob;
     l_clob  clob;
     l_id    number;
     l_bfile bfile;
     lrtf varchar2(400);
     lresult clob;
begin
    lrtf:='{/rtf1/ansi/ansicpg1252/deff0{/fonttbl{/f0/fnil/fcharset0 Arial;}}'||CHR(13)||CHR(10);
    lrtf:=lrtf||'/viewkind4/uc1/pard/lang1033/fs17 this is a test'||CHR(13)||CHR(10);
    lrtf:=lrtf||'/par }';
    
    select workflow_rtfseq.nextval into l_id from dual;
    insert into workflow_temprtf values (l_id, utl_raw.cast_to_raw(lrtf) , lrtf );
     
    ctx_doc.filter('WORKFLOW_RTFIDX2', l_id, 'WORKFLOW_RTFFILTER', l_id, TRUE );
    select document into lresult from workflow_rtffilter where query_id=l_id;
    dbms_output.put_line(lresult);
end;

Returns RTF still ? why ?

3) I did succeed in getting it to work with:

create table workflow_temprtf
( id  int primary key,
  theblob blob,
  theclob  clob
);
 
create table workflow_rtffilter (query_id number,document clob);
 
create index workflow_rtfidx on workflow_temprtf(theblob) indextype is ctxsys.context;
create index workflow_rtfidx2 on workflow_temprtf(theclob) indextype is ctxsys.context;

create sequence workflow_rtfseq;

declare
     l_blob  blob;
     l_clob  clob;
     l_id    number;
     l_bfile bfile;
     lrtf varchar2(400);
     lresult clob;
begin
    lrtf:='{/rtf1/ansi/ansicpg1252/deff0{/fonttbl{/f0/fnil/fcharset0 Arial;}}'||CHR(13)||CHR(10);
    lrtf:=lrtf||'/viewkind4/uc1/pard/lang1033/fs17 this is a test'||CHR(13)||CHR(10);
    lrtf:=lrtf||'/par }';
    
    select workflow_rtfseq.nextval into l_id from dual;
    insert into workflow_temprtf values (l_id, utl_raw.cast_to_raw(lrtf) , lrtf );
     
    ctx_doc.filter('WORKFLOW_RTFIDX', l_id, 'WORKFLOW_RTFFILTER', l_id, TRUE );
    select document into lresult from workflow_rtffilter where query_id=l_id;
    dbms_output.put_line(lresult);
end;

However, and as also in your example there is a lot of extra lines in the result. Whats with those 
any idea?

Thanks for the help.


 


Followup   October 19, 2004 - 10am Central time zone:

1) doh, left that in by accident...

2) yes, the clob is just text -- blobs can have pdfs, xls, ppt, docs, whatever.  the inso filter 
can recognize these.  clobs cannot have those objects as they are binary in nature.

3) they are just "there", you could turn them into spaces easily -- but the filters just produce 
TEXT -- that is what gets fed into the indexing engine -- it just needed words.  

No formatting is saved with the text filter, just the text.  If formatting is relevant, the HTML 
output works. 

5 stars   March 1, 2005 - 2pm Central time zone
Reviewer:  Dmitry from Russia
Excellent! But can we do the same without Text?
Oracle CTX produce strange results with russian encodings. 


Followup   March 1, 2005 - 3pm Central time zone:

it should not produce strange results, but this entire answer is predicated on using text 
specifically. 

4 stars Oracle Text on 64 bit Linux   May 5, 2005 - 8am Central time zone
Reviewer:  Dmitry from Russia
Since russian text in rtf looks like: "/'ea/'e0/'ea /'ec/'e8/'ed/'e8/'ec/'f3/'ec" I have to use 
INSO filter for indexes, but INSO filter not implemented on my production 64 bit linux box. Don't 
you know why? I cannot find anything about this on OTN or metalink. 


Followup   May 5, 2005 - 9am Central time zone:

the latest releases use a new filtering technology -- suggest you open an itar (we are dropping the 
inso filters) to get information on this. 

5 stars Embedded images in RTF   October 31, 2008 - 5pm Central time zone
Reviewer:  Reed from Salt Lake City, UT
Sorry to resurrect an old thread but the ctx_doc.ifilter function helped me get past a hurdle in 
migrating old data to a new system but there is one part missing and I may need to deal with is 
outside of oracle. I cant find what I am looking for in the documentation. (Probably just not 
looking in the right place)

I am importing several notes from a legacy system that are stored in varchar2(4000) fields in RTF 
format. If the RTF document is longer than 4000 characters then another record was created with an 
incremented sequence. I appended the notes into a blob field in a new table and used ifilter to 
convert the RTF tags to HTML and stored the results in a clob. This is working great although I had 
to use a slow-by-slow cursor loop to accomplish all this.

What I am now faced with are those notes that have embedded images. Is it possible to extract those 
images and store them somewhere? If so can you point me to some documentation that will help me 
figure this out and maybe provide an example?

Thanks


Followup   November 2, 2008 - 4pm Central time zone:

but this is a question for "ask-about-the-rtf-file-format.com" site.

I have no information about the internal format of a RTF file, sorry.

 

PL/SQL Developer 7.0 用户指南 目录 1.介绍.................................................9 2. 安装....................................................................................................................................................13 2.1 系统需求......................................................13 2.2 工作站安装....................................................13 2.3 基于服务器安装................................................13 2.4 脚本安装......................................................14 2.5 卸载 PL/SQL DEVELOPER...........................................14 3. 编写程序............................................................................................................................................15 3.1 创建程序......................................................15 3.2 保存程序......................................................16 3.3 修改程序......................................................17 3.4 编译程序......................................................17 3.5 置换变量......................................................18 4. 测试程序............................................................................................................................................19 4.1 创建测试脚本..................................................19 4.2 运行测试脚本..................................................20 4.3 变量类型......................................................21 4.4 保存测试脚本..................................................22 4.5 跟踪运行时间错误..............................................23 4.6 包声明和 JAVA 会话声明.........................................23 4.7 查看结果集....................................................24 4.8 查看 DBMS_OUTPUT................................................24 4.9 查看 HTP 输出.................................................24 4.10 调试..........................................................24 4.11 跟踪运行......................................................28 4.12 回归测试......................................................29 5. 优化....................................................................................................................................................30 5.1 使用解释计划窗口..............................................30 5.2 自动统计......................................................31 5.3 PL/SQL 概览图...................................................32 5.4 SQL 跟踪........................................................33 6. 专用 SQL............................................................................................................................................35 6.1 使用 SQL 窗口.................................................35 6.2 结果表格处理..................................................36 6.3 实例模式查询..................................................40 6.4 连接查询......................................................41 6.5 置换变量......................................................42 4 PL/SQL Developer 7.0 用户指南 6.6 更新数据库....................................................43 6.7 查看和编辑 XMLTYPE 列..........................................44 6.8 直接查询导出..................................................44 6.9 保存 SQL 脚本..................................................44 6.10 创建标准查询..................................................45 7. 命令窗口............................................................................................................................................46 7.1 输入 SQL 语句和命令............................................46 7.2 开发命令文件..................................................47 7.3 支持命令......................................................48 8. 创建与修改非 PL/SQL 对象............................................................................................................51 8.1 表定义编辑器..................................................51 8.2 序列定义编辑器................................................63 8.3 同义词定义编辑器..............................................63 8.4 库定义编辑器..................................................64 8.5 目录定义编辑器................................................64 8.6 作业定义编辑器................................................65 8.7 队列定义编辑器................................................65 8.8 队列表定义编辑器..............................................66 8.9 用户定义编辑器................................................67 8.10 角色定义编辑器................................................70 8.11 概要文件定义编辑器............................................71 8.12 数据库连接定义编辑器..........................................71 9. 图表....................................................................................................................................................72 9.1 创建图表......................................................72 9.2 保存和打开图表文件............................................75 9.3 更新图表......................................................75 10. 报告....................................................................................................................................................76 10.1 标准报告......................................................76 10.2 定制报告......................................................77 10.3 变量..........................................................78 10.4 精制版面......................................................83 10.5 风格库........................................................90 10.6 选项..........................................................91 10.7 报告菜单......................................................93 11. 图形....................................................................................................................................................94 12. 工程....................................................................................................................................................97 12.1 创建一个新工程................................................97 12.2 保存工程......................................................98 12.3 添加文件到工程................................................98 12.4 添加数据库对象到工程..........................................98 PL/SQL Developer 7.0 用户指南 5 12.5 用工程项目工作................................................99 12.6 编译工程......................................................99 13. 任务项目..........................................................................................................................................101 13.1 创建任务项目.................................................102 13.2 编辑任务项目.................................................102 13.3 关闭任务项目.................................................103 13.4 删除任务项目.................................................103 14. 窗口、数据库会话和事务.............................................................................................................104 14.1 会话模式.....................................................104 14.2 运行于多路会话或双路会话模式.................................104 15. 浏览对象..........................................................................................................................................105 15.1 使用浏览器...................................................105 15.2 浏览器过滤器.................................................110 15.3 浏览器文件夹.................................................111 16. 首选项..............................................................................................................................................114 16.1 ORACLE - 连接.................................................115 16.2 ORACLE - 选项.................................................116 16.3 ORACLE - 调试器...............................................117 16.4 ORACLE - 输出.................................................118 16.5 ORACLE - 跟踪.................................................119 16.6 ORACLE - 概览图...............................................119 16.7 ORACLE - 登录历史.............................................121 16.8 ORACLE - 提示.................................................122 16.9 用户界面 - 选项.............................................124 16.10 用户界面 - 工具栏...........................................125 16.11 用户界面 - 浏览器...........................................126 16.12 用户界面 - 编辑器...........................................127 16.13 用户界面 - 字体.............................................129 16.14 用户界面 - 代码助手.........................................130 16.15 用户界面 - 键配置...........................................131 16.16 用户界面 - 外观.............................................132 16.17 用户界面 - 日期/时间........................................133 16.18 窗口类型 - 程序窗口.........................................134 16.19 窗口类型 - SQL 窗口.........................................136 16.20 窗口类型 - 测试窗口.........................................138 16.21 窗口类型 - 计划窗口.........................................138 16.22 工具 - 差异.................................................139 16.23 工具 - 数据生成器...........................................139 16.24 工具 - 任务列表.............................................140 16.25 工具 - 重新调用声明.........................................140 6 PL/SQL Developer 7.0 用户指南 16.26 文件 - 目录..................................................141 16.27 文件 - 扩展名................................................142 16.28 文件 - 格式..................................................143 16.29 文件 - 备份..................................................144 16.30 文件 - HTML/XML..............................................145 16.31 其它 - 打印..................................................146 16.32 其它 - 更新与消息............................................147 16.33 首选项集.....................................................148 17. 工具.................................................................................................................................................150 17.1 浏览器.......................................................150 17.2 查找数据库对象...............................................151 17.3 编译无效对象.................................................152 17.4 导出表.......................................................153 17.5 导入表.......................................................156 17.6 导出用户对象.................................................158 17.7 文本导入器...................................................159 17.8 ODBC 导入器..................................................163 17.9 数据生成器...................................................165 17.10 比较用户对象.................................................169 17.11 比较表数据...................................................171 17.12 事件监视器...................................................173 17.13 会话.........................................................174 17.14 自定义工具...................................................176 17.15 测试管理器...................................................181 18. 编辑器.............................................................................................................................................185 18.1 选择功能.....................................................185 18.2 列选择.......................................................185 18.3 指引线.......................................................186 18.4 代码助手.....................................................186 18.5 重新调用语句.................................................187 18.6 专用复制.....................................................188 18.7 前后关系敏感帮助.............................................189 18.8 数据库对象弹出式菜单.........................................189 18.9 解释计划.....................................................189 18.10 宏...........................................................189 18.11 书签.........................................................191 18.12 颜色标记.....................................................191 18.13 代码目录.....................................................192 18.14 代码层次.....................................................193 18.15 超链接导航...................................................193 18.16 导航按钮.....................................................194 PL/SQL Developer 7.0 用户指南 7 18.17 重构.........................................................194 18.18 搜索栏.......................................................195 19. 大数据编辑器..................................................................................................................................197 19.1 编辑纯文本...................................................198 19.2 编辑 RTF.....................................................198 19.3 编辑 XML.....................................................199 19.4 编辑图象.....................................................200 19.5 编辑十六进制数据.............................................200 19.6 调用外部查看器或编辑器.......................................202 20. 查询设计器......................................................................................................................................204 20.1 创建新的 SELECT 语句...........................................204 20.2 修改现有的 SELECT 语句.........................................208 20.3 处理查询定义.................................................208 20.4 查询设计器首选项.............................................209 20.5 查询设计器插件...............................................210 21. PL/SQL 美化器...............................................................................................................................211 21.1 定义选项.....................................................211 21.2 定义规则.....................................................212 21.3 使用美化器...................................................213 22. 模板..................................................................................................................................................214 22.1 模板窗口.....................................................214 22.2 使用模板.....................................................215 22.3 创建和修改模板...............................................216 23. 窗口列表..........................................................................................................................................221 24. 可停放和浮动的工具.....................................................................................................................222 25. 授权..................................................................................................................................................223 25.1 启用授权.....................................................223 25.2 定义授权.....................................................224 25.3 停用授权.....................................................225 26. ORACLE 文件系统 (OFS)...............................................................................................................226 26.1 OFS 管理器...................................................226 26.2 OFS 用法.....................................................228 27. 帮助系统..........................................................................................................................................230 27.1 MS 帮助文件..................................................230 27.2 HTML 手册....................................................230 28. 定制..................................................................................................................................................234 28.1 首选项.......................................................234 28.2 窗口版面.....................................................234 28.3 在线文档.....................................................234 8 PL/SQL Developer 7.0 用户指南 28.4 命令行参数...................................................236 28.5 SQL、PL/SQL、命令、JAVA 和 XML 关键词.........................238 28.6 插件...............
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值