Open Tools for Machine Learning

Open Tools for Machine Learning

I. Information Retrieval

1. Lemur/Indri
The Lemur Toolkit for Language Modeling and Information Retrieval
http://www.lemurproject.org/
Indri: Lemur's latest search engine
2. Lucene/Nutch
Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java.
Lucene是apache的顶级开源项目,基于Apache 2.0协议,完全用java编写,具有perl, c/c++, dotNet等多个port
http://lucene.apache.org/
http://www.nutch.org/
3. WGet
GNU Wget is a free software package for retrieving files using HTTP, HTTPS and FTP, the most widely-used Internet protocols. It is a non-interactive commandline tool, so it may easily be called from scripts, cron jobs, terminals without X-Windows support, etc.
http://www.gnu.org/software/wget/wget.html


II. Natural Language Processing
1. EGYPT: A Statistical Machine Translation Toolkit

http://www.clsp.jhu.edu/ws99/projects/mt/
包括GIZA等四个工具
2. GIZA++ (Statistical Machine Translation)
http://www.fjoch.com/GIZA++.html
GIZA++ is an extension of the program GIZA (part of the SMT toolkit EGYPT) which was developed by the Statistical Machine Translation team during the summer workshop in 1999 at the Center for Language and Speech Processing at Johns-Hopkins University (CLSP/JHU). GIZA++ includes a lot of additional features. The extensions of GIZA++ were designed and written by Franz Josef Och.
Franz Josef Och先后在德国Aachen大学,ISI(南加州大学信息科学研究所)和Google工作。GIZA++现已有Windows移植版本,对IBM 的model 1-5有很好支持。
3. PHARAOH (Statistical Machine Translation)
http://www.isi.edu/licensed-sw/pharaoh/
a beam search decoder for phrase-based statistical machine translation models
4. OpenNLP:
http://opennlp.sourceforge.net/
包括Maxent等20多个工具
btw: 这些SMT的工具还都喜欢用埃及相关的名字命名,像什么GIZA、PHARAOH、Cairo等等。Och在ISI时开发了GIZA++,PHARAOH也是由来自ISI的Philipp Koehn 开发的,关系还真是复杂啊
5. MINIPAR by Dekang Lin (Univ. of Alberta, Canada)
MINIPAR is a broad-coverage parser for the English language. An evaluation with the SUSANNE corpus shows that MINIPAR achieves about 88% precision and 80% recall with respect to dependency relationships. MINIPAR is very efficient, on a Pentium II 300 with 128MB memory, it parses about 300 words per second.
binary填一个表后可以免费下载
http://www.cs.ualberta.ca/~lindek/minipar.htm
6. WordNet
http://wordnet.princeton.edu/
WordNet is an online lexical reference system whose design is inspired by current psycholinguistic theories of human lexical memory. English nouns, verbs, adjectives and adverbs are organized into synonym sets, each representing one underlying lexical concept. Different relations link the synonym sets.
WordNet was developed by the Cognitive Science Laboratory at Princeton University under the direction of Professor George A. Miller (Principal Investigator).
WordNet最新版本是2.1 (for Windows & Unix-like OS),提供bin, src和doc。
WordNet的在线版本是http://wordnet.princeton.edu/perl/webwn
7. HowNet
http://www.keenage.com/
HowNet is an on-line common-sense knowledge base unveiling inter-conceptual relations and inter-attribute relations of concepts as connoting in lexicons of the Chinese and their English equivalents.
由CAS的Zhendong Dong & Qiang Dong开发,是一个类似于WordNet的东东
8. Statistical Language Modeling Toolkit
http://svr-www.eng.cam.ac.uk/~prc14/toolkit.html
The CMU-Cambridge Statistical Language Modeling toolkit is a suite of UNIX software tools to facilitate the construction and testing of statistical language models.
9. SRI Language Modeling Toolkit
www.speech.sri.com/projects/srilm/
SRILM is a toolkit for building and applying statistical language models (LMs), primarily for use in speech recognition, statistical tagging and segmentation. It has been under development in the SRI Speech Technology and Research Laboratory since 1995.
10. ReWrite Decoder
http://www.isi.edu/licensed-sw/rewrite-decoder/
The ISI ReWrite Decoder Release 1.0.0a by Daniel Marcu and Ulrich Germann. It is a program that translates from one natural languge into another using statistical machine translation.
11. GATE (General Architecture for Text Engineering)
http://gate.ac.uk/
A Java Library for Text Engineering
12. NLTK (Natural Language Toolkit)
http://nltk.sourceforge.net/index.php/Main_Page


III. Machine Learning
1. YASMET: Yet Another Small MaxEnt Toolkit (Statistical Machine Learning)

http://www.fjoch.com/YASMET.html
由Franz Josef Och编写。此外,OpenNLP项目里有一个java的MaxEnt工具,使用GIS估计参数,由东北大学的张乐(目前在英国留学)port为C++版本
2. LibSVM
由国立台湾大学(ntu)的Chih-Jen Lin开发,有C++,Java,perl,C#等多个语言版本
http://www.csie.ntu.edu.tw/~cjlin/libsvm/
LIBSVM is an integrated software for support vector classification, (C-SVC, nu-SVC ), regression (epsilon-SVR, nu-SVR) and distribution estimation (one-class SVM ). It supports multi-class classification.
3. SVM Light
由cornell的Thorsten Joachims在dortmund大学时开发,成为LibSVM之后最为有名的SVM软件包。开源,用C语言编写,用于ranking问题
http://svmlight.joachims.org/
4. CLUTO
http://www-users.cs.umn.edu/~karypis/cluto/
a software package for clustering low- and high-dimensional datasets
这个软件包只提供executable/library两种形式,不提供源代码下载
5. CRF++
http://chasen.org/~taku/software/CRF++/
Yet Another CRF toolkit for segmenting/labelling sequential data
CRF(Conditional Random Fields),由HMM/MEMM发展起来,广泛用于IE、IR、NLP领域
6. SVM Struct
http://www.cs.cornell.edu/People/tj/svm_light/svm_struct.html
同SVM Light,均由cornell的Thorsten Joachims开发。
SVMstruct is a Support Vector Machine (SVM) algorithm for predicting multivariate outputs. It performs supervised learning by approximating a mapping
h: X --> Y
using labeled training examples (x1,y1), ..., (xn,yn).
Unlike regular SVMs, however, which consider only univariate predictions like in classification and regression, SVMstruct can predict complex objects y like trees, sequences, or sets. Examples of problems with complex outputs are natural language parsing, sequence alignment in protein homology detection, and markov models for part-of-speech tagging.
SVMstruct can be thought of as an API for implementing different kinds of complex prediction algorithms. Currently, we have implemented the following learning tasks:
SVMmulticlass: Multi-class classification. Learns to predict one of k mutually exclusive classes. This is probably the simplest possible instance of SVMstruct and serves as a tutorial example of how to use the programming interface.
SVMcfg: Learns a weighted context free grammar from examples. Training examples (e.g. for natural language parsing) specify the sentence along with the correct parse tree. The goal is to predict the parse tree of new sentences.
SVMalign: Learning to align sequences. Given examples of how sequence pairs align, the goal is to learn the substitution matrix as well as the insertion and deletion costs of operations so that one can predict alignments of new sequences.
SVMhmm: Learns a Markov model from examples. Training examples (e.g. for part-of-speech tagging) specify the sequence of words along with the correct assignment of tags (i.e. states). The goal is to predict the tag sequences for new sentences.
7. MALLET
MAchine Learning for LanguagE Toolkit   http://mallet.cs.umass.edu/index.php

IV. Misc:
1. WinMerge: 用于文本内容比较,找出不同版本的两个程序的差异

winmerge.sourceforge.net/
2. OpenPerlIDE: 开源的perl编辑器,内置编译、逐行调试功能
open-perl-ide.sourceforge.net/
ps: 论起编辑器偶见过的最好的还是VS .NET了,在每个function前面有+/-号支持expand/collapse,支持区域copy/cut/paste,使用ctrl+ c/ctrl+x/ctrl+v可以一次选取一行,使用ctrl+k+c/ctrl+k+u可以comment/uncomment多行,还有还有...... Visual Studio .NET is really kool:D
3. Berkeley DB
http://www.sleepycat.com/
Berkeley DB不是一个关系数据库,它被称做是一个嵌入式数据库:对于c/s模型来说,它的client和server共用一个地址空间。由于数据库最初是从文件系统中发展起来的,它更像是一个key-value pair的字典型数据库。而且数据库文件能够序列化到硬盘中,所以不受内存大小限制。BDB有个子版本Berkeley DB XML,它是一个xml数据库:以xml文件形式存储数据?BDB已被包括microsoft、google、HP、ford、motorola等公司嵌入到自己的产品中去了
Berkeley DB (libdb) is a programmatic toolkit that provides embedded database support for both traditional and client/server applications. It includes b+tree, queue, extended linear hashing, fixed, and variable-length record access methods, transactions, locking, logging, shared memory caching, database recovery, and replication for highly available systems. DB supports C, C++, Java, PHP, and Perl APIs.
It turns out that at a basic level Berkeley DB is just a very high performance, reliable way of persisting dictionary style data structures - anything where a piece of data can be stored and looked up using a unique key. The key and the value can each be up to 4 gigabytes in length and can consist of anything that can be crammed in to a string of bytes, so what you do with it is completely up to you. The only operations available are "store this value under this key", "check if this key exists" and "retrieve the value for this key" so conceptually it's pretty simple - the complicated stuff all happens under the hood.
case study:
Ask Jeeves uses Berkeley DB to provide an easy-to-use tool for searching the Internet.
Microsoft uses Berkeley DB for the Groove collaboration software
AOL uses Berkeley DB for search tool meta-data and other services.
Hitachi uses Berkeley DB in its directory services server product.
Ford uses Berkeley DB to authenticate partners who access Ford's Web applications.
Hewlett Packard uses Berkeley DB in serveral products, including storage, security and wireless software.
Google uses Berkeley DB High Availability for Google Accounts.
Motorola uses Berkeley DB to track mobile units in its wireless radio network products.

R
http://www.r-project.org/
R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.
R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.
One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control.
R is available as Free Software under the terms of the Free Software Foundation's GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS.
R统计软件与MatLab类似,都是用在科学计算领域的。

<think> 我们参考了用户提供的两个引用,其中引用[1]提到LabVIEW机器学习工具包的GitCode项目地址,引用[2]则提到了一些关于LabVIEW机器学习工具包(包括Machine Learning Toolkit和DeepLTK等)的安装和使用信息。用户明确询问LabVIEW Machine Learning Toolkit Installer工具包的使用方法。 根据引用[1]和[2],我们可以归纳出以下步骤: 1. **下载工具包**:从GitCode项目地址下载工具包(注意:用户提供的项目地址为https://gitcode.com/open-source-toolkit/e0d884)。 2. **安装工具包**:在LabVIEW中使用VIPM(VI Package Manager)安装下载的工具包。 3. **配置环境**:根据工具包要求,可能需要安装额外的依赖(如Python环境、TensorFlow等)。 4. **使用工具包**:在LabVIEW中调用相应的机器学习函数。 由于用户特别提到“Installer”,我们重点讲解如何安装这个工具包。 详细步骤: **步骤1:下载工具包** - 访问项目地址:https://gitcode.com/open-source-toolkit/e0d884 - 下载整个项目(通常是一个压缩包)或使用Git克隆。 **步骤2:使用VIPM安装** - 确保已安装VIPM(LabVIEW的包管理工具)。如果没有安装,可以从https://www.vipm.io/ 下载并安装。 - 打开VIPM,选择“File” -> “Open Package”,然后选择下载的工具包文件(通常是.vip文件或.vipb文件)。 - 点击“Install”进行安装。 **步骤3:验证安装** - 安装完成后,重启LabVIEW。 - 在LabVIEW的函数面板中,查看是否有新的“Machine Learning”或类似名称的选板。 - 尝试使用其中的VI(例如,加载一个预训练模型进行推理)。 **步骤4:配置环境(如果需要)** - 根据引用[2]中的信息,部分工具包(如LabVIEW-TensorFlow)需要预先安装Python和TensorFlow。因此,如果该工具包需要Python支持,请确保: - 已安装Python(推荐与工具包要求的版本一致)。 - 已安装必要的Python库(如TensorFlow, scikit-learn等)。 - 在LabVIEW中配置Python环境(通过“Tools” -> “Python”菜单)。 **步骤5:使用示例** - 工具包通常包含示例程序,可以在LabVIEW的示例查找器中搜索(Help -> Find Examples -> 在Browse中选择Installed Packages,然后找到Machine Learning Toolkit的示例)。 - 运行示例程序来学习如何使用。 **注意事项**: - 由于LabVIEW的版本兼容性问题,请确保工具包与LabVIEW版本相匹配。 - 如果遇到问题,可以查看项目页面上的文档或Issues。 根据引用[2]中提到的内容,该工具包可能还包含神经网络工具包(DeepLTK)以及支持FPGA的插件,用户可以根据需要安装这些附加组件。 §§相关问题§§ 1. 如何解决LabVIEW Machine Learning Toolkit安装后找不到函数选板的问题? 2. LabVIEW Machine Learning Toolkit需要哪些Python库支持? 3. 如何将TensorFlow模型导入LabVIEW Machine Learning Toolkit中使用? </think> 以下是 LabVIEW Machine Learning Toolkit (MLT) 的详细安装和使用方法,基于官方文档和社区实践: ### 一、安装前准备 1. **环境要求**: - LabVIEW 2017 或更高版本(推荐 2020+) - VIPM (VI Package Manager) 21.0+:[下载地址](https://www.vipm.io/) - Python 3.6+(如需使用Python集成功能)[^2] 2. **工具包下载**: ```bash # 通过VIPM在线安装(推荐) vipm install "Machine Learning Toolkit" -package-ids open-source-toolkit/e0d884 # 或手动下载地址: https://gitcode.com/open-source-toolkit/e0d884 ``` ### 二、安装步骤 1. **VIPM 一键安装**: - 打开 VIPM → 搜索 "Machine Learning Toolkit" → 点击 Install - 自动处理依赖项(包括OpenCV、TensorFlow Lite等) 2. **手动安装**: - 下载 `.vip` 文件 → 双击安装 - 添加以下关键组件到 LabVIEW 环境: ``` LabVIEW_Machine_Learning_Toolkit.lvlib ML_Main_Palette.vip ``` ### 三、配置与验证 1. **Python集成配置**: ```labview Tools → Python → Python Path → 指向本地Python.exe ``` - 测试连接:运行 `Python Node.vi` 示例 2. **验证安装成功**: - 函数面板出现 "Machine Learning" 选板 - 运行内置示例:`Help → Find Examples → Toolkits → Machine Learning` ### 四、核心功能使用 1. **图像分类**: ```mermaid graph LR A[图像采集] --> B(预处理VI) B --> C{模型推理} C --> D[结果显示] ``` - 模型支持:ONNX, TensorFlow Lite, Scikit-Learn 2. **模型训练流程**: ```labview - 数据加载:Read CSV.vi - 特征提取:Feature Extraction VIs - 模型配置:ML Configuration.vi - 训练:Train Model.vi - 保存:Save Model.vi ``` ### 五、高级功能 1. **FPGA加速**(需DeepLTK): - 部署预训练模型到FPGA目标 - 实时推理速度提升 3-5 倍[^2] 2. **云服务集成**: ```labview AWS S3 Connector.vi // 数据存储 Azure ML Predict.vi // 云端推理 ``` ### 六、故障排除 | 问题现象 | 解决方案 | |---------|----------| | 缺少 Python 模块 | `pip install numpy pandas scikit-learn` | | VI 找不到函数 | 重启 LabVIEW,检查 VIPM 日志 | | FPGA 部署失败 | 安装 DeepLTK FPGA 插件[^2] |
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值