cora数据集

Cora 数据集解析
转自:http://blog.sina.com.cn/s/blog_4c98b96001000boc.html --苯苯的小田园

真是找的很辛苦,唉!记下来吧.感谢论文Object Identication with
Attribute-Mediated Dependences提供了cora dataset 的来源:
http://www.cs.umass.edu/~mccallum/data/(如果复制打不开,请自己手动敲到地址栏中)
论文A Pitfall and Solution in Multi-Class Feature Selection for Text Classification提供了启发,cora是有6大类,36个小类的.这样一来终于解决了相关性的难题.

(a)cora-refs.tar.gz数据集
Cora Citation Matching [reference matching, object correspondence]Text of citations hand-clustered into groups referring to the same paper.
(b) cora-ie.tar.gz数据集
Cora Information Extraction [information extraction] Research paper headers and citations, with labeled segments forauthors, title, institutions, venue, date, page numbers and severalother fields.
(c)cora-classify.tar.gz 数据集
Cora Research Paper Classification [relational document lassification]
Research papers classified into a topic hierarchy with 73 leaves. We call this a relational data set, because the citations provide relations among papers.
(d) cora-hmm.tar.gz
Cora HMM is the C implementation of HMMs used for information extraction in Cora. It was written by Kristie Seymore.


Cora readme

Note that in Cora there are two types of papers: those we found on the
Web, and those that are referenced in bibliography sections. It is
possible that a paper we found on the Web is also referenced by other
papers.


FILE SUMMARY:

* The file 'papers' contains limited information on the papers we found
on the Web.

* The file 'citations' contains the citation.

* The file 'classifications' contains class labels

* The directory `extractions' contains the extracted authors, title,
abstract, etc, plus the references (and in some cases surrounding
text). from the postscript papers we found on the Web.


PAPERS

The file `papers' has a list of all the postscript file papers.
Three fields, tab separated:

<id> <filename> <citation string>

There are about about 52000 lines in this file, but there are a bunch
of papers that have more than one postscript file. If you eliminate
lines with duplicate ids there are about 37000 papers. Note the
citation string is either (1) an arbitrary bibliography reference to
the paper, if one was made or (2) a constructed entry based on the
authors and title extracted from the postscript file.


CITATIONS

The file 'citations' has the citation graph. Two fields, tab
separated:

<referring_id> <cited_id>

The referring_id is the id of the paper that has the bibliography
section (always one we have postscript for). The cited_id is the
paper referenced (we may or may not have postscript for it). There
are about 715000 citations.


CITATIONS.WITHAUTHORS

The file 'citations.withauthors' contains another copy of the
citation graph. This time we have also included authors and file
names of each paper in addition to each papers' unique paper_id and
the paper_id's of the references they make. The format of this file
is:

***
this_paper_id
filename
id_of_first_cited_paper
id_of_second_cited_paper
.
.
.
*
Author#1 (of this paper)
Author#2
.
.
.

CLASSIFICATIONS

The file `classifications' contains the research topic classifications
for each of the files. The format of the file is:
"filename"+"\t"+"classification". For example:

http:##www.ri.cmu.edu#afs#cs#user#alex#docs#idvl#dl97.ps /Information_Retrieval/Retrieval/


The file name is the url where the paper came translated to file name
by changing / to #. The classification the label name in the Cora
directory hierarchy.

Note that the class labels were not perfectly assigned.


EXTRACTIONS

The directory 'extractions' contains 52906 files, one for each
postscript paper that we found on the Web. The directory contains so
many files, that you probably don't want to 'ls' it. Commands like
`find extractions -print' will probably work more efficiently.

Each filename in the 'papers' file should have a file here. I believe
there are also some (perhaps many?) extra files in this tarball that
are not in paper-data that you can just ignore.

Each line of each file corresponds to some bit of data about the
postscript file. Most of the MIME-like field tags are
straightforward and explanatory. A few notes:

The fields URL, Refering-URL, and Root-URL are given by the spider.
All other fields are extracted automatically from the text, some by
hand-coded regular expressions and some by an HMM information
extractor.

The fields Abstract-found and Intro-found are binary valued indicators
of whether Abstract and/or Introduction sections were found by some
regular expression matching in the paper.

Each Reference field is one bibliography entry found at the end of the
paper. Note they are marked up using SGML-like tags. Each Reference
field is optionally followed by one (and possibly more?)
Reference-context fields that are snippets of the postscript file
around where the reference was cited.
<think>好的,我现在需要帮助用户解决如何在VMware中安装合适的USB设备驱动的问题。首先,我需要理解用户的问题到底是什么。用户提到的是“VMware USB device在哪下载合适的驱动”,可能是指当他们在VMware虚拟机中使用USB设备时遇到了驱动问题,需要找到正确的驱动程序。 接下来,我要回忆一下VMware产品中关于USB设备支持的相关知识。VMware Workstation和Fusion等产品通常会自动处理USB设备的连接,但在某些情况下,可能需要安装额外的驱动,尤其是在客户机操作系统中。例如,如果客户机是Windows,可能需要安装VMware Tools来增强USB支持;如果是Linux,可能需要特定的驱动包。 然后,我需要考虑用户可能的具体情况。用户可能已经将USB设备连接到主机,但在虚拟机中无法识别,或者出现驱动错误。这时候需要确定是主机驱动的问题还是客户机驱动的问题。通常,主机不需要额外驱动,因为VMware会通过自身的服务处理USB设备的连接,而客户机可能需要安装VMware Tools来获得更好的兼容性。 接着,我需要思考正确的步骤来引导用户解决问题。首先,应该检查VMware Tools是否已经安装,这是解决很多设备兼容性问题的第一步。如果已经安装,可能需要更新到最新版本。其次,对于客户机操作系统,比如Windows,可能需要检查设备管理器中的USB控制器是否有感叹号,是否需要手动安装驱动。此外,还需要确认主机的USB控制器设置是否正确,比如在虚拟机设置中选择USB兼容性版本(如USB 3.0或2.0)。 如果用户的问题是关于主机系统的驱动,比如VMwareUSB控制器驱动在主机上出现问题,可能需要到VMware官网下载并重新安装VMware Workstation或Player,这样会自动安装所需的驱动。另外,某些情况下,主机的USB驱动可能需要更新,比如芯片组驱动或USB 3.0驱动,这需要到主机厂商的网站下载。 还需要考虑用户可能的误解。例如,用户可能认为需要单独下载USB驱动,但实际上,大多数情况下,VMware Tools已经包含了必要的驱动。如果用户已经安装了VMware Tools但仍然有问题,可能需要进一步的故障排除,比如检查USB设备的兼容性,尝试不同的USB端口,或者重启VMware的相关服务。 最后,整理出清晰的步骤,确保用户能够按照流程操作,同时提醒常见的注意事项,比如以管理员身份运行安装程序,确保虚拟机配置正确等。可能还需要提供VMware官网的链接,方便用户下载工具或更新版本。</think>在VMware虚拟机中使用USB设备时,通常不需要单独下载"VMware USB设备驱动",但需要按以下流程操作: ### 一、基础准备 1. **主机端** - 确保主机已安装最新版VMware Tools(VMware Workstation/Fusion自带) - 无需单独下载驱动,系统会自动加载`vmware-usbarbitrator`服务 2. **虚拟机配置** - 打开虚拟机设置 → USB控制器 → 勾选`USB兼容性`(建议选3.0或自动) ### 二、客户机驱动安装 1. **Windows客户机** - 安装VMware Tools(虚拟机菜单 → 管理 → 安装VMware Tools) - 插入USB设备后,会自动加载`vmware USBCCID`驱动程序 2. **Linux客户机** - 安装open-vm-tools: ```bash sudo apt-get install open-vm-tools-desktop ``` - 加载USB驱动模块: ```bash sudo modprobe uhci_hcd ehci_hcd ohci_hcd xhci_hcd ``` ### 三、常见问题排查 1. **设备无法识别** - 检查主机设备管理器 → 确保无`VMware USB Device`黄色感叹号 - 重启服务:`services.msc` → 重启`VMware USB Arbitration Service` 2. **特殊设备支持** - 加密狗/工控设备需在虚拟机设置 → USB控制器 → 开启`显示所有USB输入设备` - 对于USB转串口设备,建议在客户机安装对应芯片驱动(如FTDI、CH340) ### 四、驱动下载渠道(仅限特殊情况) 如需获取底层驱动文件,可通过: 1. VMware官网支持页面: ``` https://customerconnect.vmware.com/downloads ``` 2. 选择对应产品 → 驱动包通常包含在`VMware Tools Bundle`中 > **注意事项**: > - 虚拟机运行时才能看到USB设备连接选项 > - 苹果M系列芯片需使用ARM版Windows/Linux系统 > - 安卓设备需开启开发者模式+USB调试
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值