几种R语言中文分词工具安装尝试

本文介绍如何在R语言中安装并测试中文分词包rmmseg4j、rsmartcn及Rwordseg,包括解决依赖缺失、权限错误等问题,并展示分词效果。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

由于在进行实验时需要应用到中文分词,因此决定尝试安装中文分词的R包;

rmmseg4j

安装

install.packages("rmmseg4j", repos="http://R-Forge.R-project.org",type='source')

结果发现:

install.packages("rmmseg4j", repos="http://R-Forge.R-project.org",type='source')
Warning: dependency ‘rJava’ is not available
trying URL 'http://R-Forge.R-project.org/src/contrib/rmmseg4j_0.2-1.tar.gz'
Content type 'application/x-gzip' length 1490057 bytes (1.4 MB)
opened URL
==================================================
downloaded 1.4 MB

ERROR: dependency ‘rJava’ is not available for package ‘rmmseg4j’
* removing ‘/Library/Frameworks/R.framework/Versions/3.1/Resources/library/rmmseg4j’

The downloaded source packages are in
    ‘/private/var/folders/6s/1r0y5cvs1t74zq22wjxj9qw00000gn/T/RtmpqvRiJI/downloaded_packages’
Warning message:
In install.packages("rmmseg4j", repos = "http://R-Forge.R-project.org",  :
  installation of package ‘rmmseg4j’ had non-zero exit status

再安装rJava包:

install.packages("rJava")

然后

library("rJava")

曝出如下信息:
需要安装java运行环境

然后从more info…可以得知
https://support.apple.com/kb/DL1572?locale=en_US
需要安装javaForOSX,下载地址如下:
http://supportdownload.apple.com/download.info.apple.com/Apple_Support_Area/Apple_Software_Updates/Mac_OS_X/downloads/031-03190.20140529.Pp3r4/JavaForOSX2014-001.dmg

安装好后即可使用:

library("rJava")

接下来报出如下error:

ERROR: failed to lock directory ‘/Library/Frameworks/R.framework/Versions/3.1/Resources/libraryfor modifying
Try removing ‘/Library/Frameworks/R.framework/Versions/3.1/Resources/library/00LOCK-rmmseg4j’

我们进入terminal。通过人工删除:

rm -r /Library/Frameworks/R.framework/Versions/3.1/Resources/library/00LOCK-rmmseg4j

再次重新安装:

install.packages("rmmseg4j", repos="http://R-Forge.R-project.org",type='source')
trying URL 'http://R-Forge.R-project.org/src/contrib/rmmseg4j_0.2-1.tar.gz'
Content type 'application/x-gzip' length 1490057 bytes (1.4 MB)
opened URL
==================================================
downloaded 1.4 MB

* installing *source* package ‘rmmseg4j’ ...
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (rmmseg4j)

至此,安装就算完成啦。

测试

library("rmmseg4j")
> mmseg4j("技术、管理等方面的问题需进一步深入分析和验证,事故调查报告的形成仍需要一段时间")
Apr 3, 2015 11:02:08 AM com.chenlb.mmseg4j.Dictionary getDefalutPath
INFO: look up in mmseg.dic.path=null
Apr 3, 2015 11:02:08 AM com.chenlb.mmseg4j.Dictionary getDefalutPath
INFO: look up in classpath=file:/Library/Frameworks/R.framework/Versions/3.1/Resources/library/rmmseg4j/java/lib/mmseg4j-core-1.8.4.jar!/data
Apr 3, 2015 11:02:08 AM com.chenlb.mmseg4j.Dictionary getDefalutPath
WARNING: defalut dic path=file:/Library/Frameworks/R.framework/Versions/3.1/Resources/library/rmmseg4j/java/lib/mmseg4j-core-1.8.4.jar!/data not exist
Apr 3, 2015 11:02:08 AM com.chenlb.mmseg4j.Dictionary loadDic
INFO: chars loaded time=214ms, line=12638, on file=file:/Library/Frameworks/R.framework/Versions/3.1/Resources/library/rmmseg4j/java/lib/mmseg4j-core-1.8.4.jar!/data/chars.dic
Apr 3, 2015 11:02:08 AM com.chenlb.mmseg4j.Dictionary loadDic
INFO: load all dic use time=215ms
Apr 3, 2015 11:02:08 AM com.chenlb.mmseg4j.Dictionary loadUnit
INFO: unit loaded time=1ms, line=22, on file=file:/Library/Frameworks/R.framework/Versions/3.1/Resources/library/rmmseg4j/java/lib/mmseg4j-core-1.8.4.jar!/data/units.dic
[1] "技术 管理 等方面 的 问题 需 进一步 深入分析 和 验证 事故调查 报告 的 形成 仍需 要 一段时间"
Apr 3, 2015 11:02:08 AM com.chenlb.mmseg4j.Dictionary loadDic
INFO: chars loaded time=39ms, line=12638, on file=file:/Library/Frameworks/R.framework/Versions/3.1/Resources/library/rmmseg4j/java/lib/mmseg4j-core-1.8.4.jar!/data/chars.dic
Apr 3, 2015 11:02:08 AM com.chenlb.mmseg4j.Dictionary loadWord
INFO: words loaded time=1ms, line=3, on file=/Library/Frameworks/R.framework/Versions/3.1/Resources/library/rmmseg4j/userDic/words-rmmseg4j.dic
Apr 3, 2015 11:02:09 AM com.chenlb.mmseg4j.Dictionary loadWord
INFO: words loaded time=386ms, line=331374, on file=/Library/Frameworks/R.framework/Versions/3.1/Resources/library/rmmseg4j/userDic/words.dic
Apr 3, 2015 11:02:09 AM com.chenlb.mmseg4j.Dictionary loadDic
INFO: load all dic use time=427ms
Apr 3, 2015 11:02:09 AM com.chenlb.mmseg4j.Dictionary loadUnit
INFO: unit loaded time=0ms, line=22, on file=file:/Library/Frameworks/R.framework/Versions/3.1/Resources/library/rmmseg4j/java/lib/mmseg4j-core-1.8.4.jar!/data/units.dic

rsmartcn

类似安装,具体就不详细写了,如下所示:

install.packages("rsmartcn", repos="http://R-Forge.R-project.org",type='source')
trying URL 'http://R-Forge.R-project.org/src/contrib/rsmartcn_0.1-0.tar.gz'
Content type 'application/x-gzip' length 4866465 bytes (4.6 MB)
opened URL
==================================================
downloaded 4.6 MB

* installing *source* package ‘rsmartcn’ ...
** R
** demo
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (rsmartcn)

The downloaded source packages are in
    ‘/private/var/folders/6s/1r0y5cvs1t74zq22wjxj9qw00000gn/T/Rtmp9fgxHm/downloaded_packages’
> library(rsmartcn)
>  smartcn("技术、管理等方面的问题需进一步深入分析和验证,事故调查报告的形成仍需要一段时间")
[1] "技术 管理 等 方面 的 问题 需 进一步 深入 分析 和 验证 事故 调查 报告 的 形成 仍 需要 一 段 时间"

Rwordseg

install

install.packages("Rwordseg", repos = "http://R-Forge.R-project.org",type='source')
trying URL 'http://R-Forge.R-project.org/src/contrib/Rwordseg_0.2-1.tar.gz'
Content type 'application/x-gzip' length 5445754 bytes (5.2 MB)
opened URL
==================================================
downloaded 5.2 MB

* installing *source* package ‘Rwordseg’ ...
** R
** demo
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
Apr 3, 2015 11:11:36 AM org.ansj.util.MyStaticValue <clinit>
WARNING: not find library.properties in classpath use it by default !
Apr 3, 2015 11:11:36 AM org.ansj.library.UserDefineLibrary loadLibrary
WARNING: init userLibrary  waring :library/default.dic because : not find that file or can not to read !
Apr 3, 2015 11:11:36 AM org.ansj.library.UserDefineLibrary initAmbiguityLibrary
WARNING: init ambiguity  waring :library/ambiguity.dic because : not find that file or can not to read !
Apr 3, 2015 11:11:36 AM org.ansj.library.UserDefineLibrary loadFile
INFO: init user userLibrary ok path is : /Library/Frameworks/R.framework/Versions/3.1/Resources/library/Rwordseg/config/userdic
Apr 3, 2015 11:11:38 AM org.ansj.library.InitDictionary init
INFO: init core library ok use time :2209
Apr 3, 2015 11:11:39 AM org.ansj.library.NgramLibrary <clinit>
INFO: init ngram ok use time :988
* DONE (Rwordseg)

The downloaded source packages are in
    ‘/private/var/folders/6s/1r0y5cvs1t74zq22wjxj9qw00000gn/T/Rtmp9fgxHm/downloaded_packages’

test

> segmentCN("今天是星期五,天气阴")
[1] "今天"   "是"     "星期五" "天气"   "阴"  

> segmentCN("技术、管理等方面的问题需进一步深入分析和验证,事故调查报告的形成仍需要一段时间")
 [1] "技术"   "管理"   "等"     "方面"   "的"     "问题"   "需"     "进一步" "深入"  
[10] "分析"   "和"     "验证"   "事故"   "调查"   "报告"   "的"     "形成"   "仍"    
[19] "需要"   "一段"   "时间" 
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值