文章目录
- 1.分词 Word Segmentation
- 2.词预测 Word Prediction
- 3. 文本蕴涵 Textual Entailment
- 4. 语音识别 Automatic Speech Recognition
- 5. 自动摘要 Automatic Summarisation
- 6. 文本纠错 Text Correct
- 7.字音转换 Grapheme to Phoneme
- 8. 复述检测 Paraphrase Detection 和 问答 Question Answering
- 9. 音汉互译 Pinyin-To-Chinese
- 10. 情感分析 Sentiment Analysis
- 11. 手语识别 Sign Language Recognition
- 12. 词性标注(POS)、 命名实体识别(NER)、 句法分析(parser)、 语义角色标注(SRL) 等。
- 13. 词干 Word Stemming
- 14. 语言识别 Language Identification
- 15. 机器翻译 Machine Translation
- 16. 复述生成 Paraphrase Generation
- 17. 关系抽取 Relationship Extraction
- 18. 句子边界消歧 Sentence Boundary Disambiguation
- 19.事件抽取 Event Extraction
- 20. 词义消歧 Word Sense Disambiguation
- 21. 命名实体消歧 Named Entity Disambiguation
- 22. 幽默检测 Humor Detection
- 23. 讽刺检测 Sarcasm Detection
- 24. 实体链接 Entity Linking
- 25. 指代消歧 Coreference Resolution
- 26. 关键词/短语抽取和社会标签推荐 Keyphrase Extraction and Social Tag Suggestion
首先声明下,今天发的这些Project都是类似论文实现那样的demo级的,也不是传统的工程实现,用的方法一般比工业界的高端,非常适合练手用。
1.分词 Word Segmentation
GitHub - chqiwang/convseg: Convolutional neural network and word embeddings for Chinese word segmentation,基于CNN做中文分词,提供数据和代码。
对应的论文Convolutional Neural Network with Word Embeddings for Chinese Word Segmentation IJCNLP2017.
2.词预测 Word Prediction
GitHub - Kyubyong/word_prediction: Word Prediction using Convolutional Neural Networks ,基于CNN做词预测,提供数据和代码。
3. 文本蕴涵 Textual Entailment
GitHub - Steven-Hewitt/Entailment-with-Tensorflow: Accompanying notebook for the Entailment with Tensorflow article.,基于Tensorflow做文本蕴涵,提供数据和代码。
4. 语音识别 Automatic Speech Recognition
GitHub - buriburisuri/speech-to-text-wavenet: Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition based on DeepMind's WaveNet and tensorflow,基于DeepMind WaveNet和Tensorflow做句子级语音识别。
5. 自动摘要 Automatic Summarisation
GitHub - PKULCWM/PKUSUMSUM: First publish for PKUSUMSUM,北大万小军老师团队的自动摘要方法汇总,包含了他们大量paper的实现,支持单文档摘要、多文档摘要、topic-focused多文档摘要。
6. 文本纠错 Text Correct
GitHub - atpaino/deep-text-corrector: Deep learning models trained to correct input errors in short, message-like text,基于深度学习做文本纠错,提供数据和代码。
7.字音转换 Grapheme to Phoneme
GitHub - cmusphinx/g2p-seq2seq: G2P with Tensorflow,基于网红transformer做, 提供数据和代码。
8. 复述检测 Paraphrase Detection 和 问答 Question Answering
Paralex: Paraphrase-Driven Learning for Open Question Answering, 基于复述驱动学习的开放域问答。
9. 音汉互译 Pinyin-To-Chinese
10. 情感分析 Sentiment Analysis
情感分析包括的内容太多了,目前没发现比较全的。推荐两个适合练手的吧:Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank,About « SenticNet。
11. 手语识别 Sign Language Recognition
https://signall.us/, 该项目在手语识别做的非常成熟。
12. 词性标注(POS)、 命名实体识别(NER)、 句法分析(parser)、 语义角色标注(SRL) 等。
GitHub - HIT-SCIR/ltp: Language Technology Platform, 包括代码、模型、数据,还有详细的文档,而且效果还很好。
13. 词干 Word Stemming
GitHub - snowballstem/snowball: Snowball compiler and stemming algorithms, 实现的词干效果还不错。
14. 语言识别 Language Identification
GitHub - saffsd/langid.py: Stand-alone language identification system,语言识别比较好的开源工具。
15. 机器翻译 Machine Translation
GitHub - OpenNMT/OpenNMT-py: Open Source Neural Machine Translation and (Large) Language Models in PyTorch, 基于PyTorch的神经机器翻译,很适合练手。
16. 复述生成 Paraphrase Generation
GitHub - vsuthichai/paraphraser: Sentence paraphrase generation at the sentence level,基于Tensorflow的句子级复述生成,适合练手。
17. 关系抽取 Relationship Extraction
18. 句子边界消歧 Sentence Boundary Disambiguation
GitHub - Orekhov/SentenceBreaking: Sentence boundary disambiguation,很有意思。
19.事件抽取 Event Extraction
GitHub - liuhuanyong/ComplexEventExtraction: A concept and obvious expression pattern collection of Chinese compound event extraction which then be evolved into ComplexEventGraph,本项目提出了中文复合事件的概念与显式模式,包括条件事件、因果事件、顺承事件、反转事件等事件抽取,并形成事理图谱。, 中文复合事件抽取,包括条件事件、因果事件、顺承事件、反转事件等事件抽取,并形成事理图谱。
20. 词义消歧 Word Sense Disambiguation
GitHub - alvations/pywsd: Python Implementations of Word Sense Disambiguation (WSD) Technologies.,代码不多,方法简单,适合练手。
21. 命名实体消歧 Named Entity Disambiguation
GitHub - dice-group/AGDISTIS: AGDISTIS - Agnostic Named Entity Disambiguation,实体消歧是很重要的,尤其对于实体融合(比如知识图谱中多源数据融合)、实体链接。
22. 幽默检测 Humor Detection
GitHub - pln-fing-udelar/pghumor: Is This a Joke? Humor Detection in Spanish Tweets
23. 讽刺检测 Sarcasm Detection
GitHub - AniSkywalker/SarcasmDetection: Sarcasm detection on tweets using neural network,基于神经网络的讽刺检测。
24. 实体链接 Entity Linking
GitHub - hasibi/EntityLinkingRetrieval-ELR: Exploiting entity linking in queries for entity retrieval, 实体链接用途非常广,非常适合练手。
25. 指代消歧 Coreference Resolution
GitHub - huggingface/neuralcoref: ✨Fast Coreference Resolution in spaCy with Neural Networks,基于神经网络的指代消歧。
26. 关键词/短语抽取和社会标签推荐 Keyphrase Extraction and Social Tag Suggestion
https://github.com/thunlp/THUTag, 用多种方法 实现了多种关键词/短语抽取和社会标签推荐。