python spacy代码

本文通过具体示例演示了Spacy库在自然语言处理任务中的应用,包括分词、分句、词干化、词性标注、命名实体识别、名词短语提取及词向量相似度计算。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

代码如下:

import spacy
nlp = spacy.load('en')
test_doc = nlp(u"it's word tokenize test for spacy")

# 分词
print("\n1、分词")
print(test_doc)
for token in test_doc:
    print(token)

# 分句
print("\n2、分句")
test_doc = nlp(u'Natural language processing (NLP) deals with the application of computational models to text or speech data. Application areas within NLP include automatic (machine) translation between languages; dialogue systems, which allow a human to interact with a machine using natural language; and information extraction, where the goal is to transform unstructured text into structured (database) representations that can be searched and browsed in flexible ways. NLP technologies are having a dramatic impact on the way people interact with computers, on the way people interact with each other through the use of language, and on the way people access the vast amount of linguistic data now in electronic form. From a scientific viewpoint, NLP involves fundamental questions of how to structure formal models (for example statistical models) of natural language phenomena, and of how to design algorithms that implement these models.')
print(test_doc)
for sent in test_doc.sents:
    print(sent)

# 词干化
print("\n3、词干化")
test_doc = nlp(u"you are best. it is lemmatize test for spacy. I love these books")
print(test_doc)
for token in test_doc:
    print(token, token.lemma_, token.lemma)

# 词性标注
print("\n4、词性标注")
print(test_doc)
for token in test_doc:
    print(token, token.pos_, token.pos)


# 命名实体识别
print("\n5、命名实体识别")
test_doc = nlp(u"Rami Eid is studying at Stony Brook University in New York")
print(test_doc)
for ent in test_doc.ents:
    print(ent, ent.label_, ent.label)

# 名词短语提取
print("\n6、名词短语提取")
test_doc = nlp(u'Natural language processing (NLP) deals with the application of computational models to text or speech data. Application areas within NLP include automatic (machine) translation between languages; dialogue systems, which allow a human to interact with a machine using natural language; and information extraction, where the goal is to transform unstructured text into structured (database) representations that can be searched and browsed in flexible ways. NLP technologies are having a dramatic impact on the way people interact with computers, on the way people interact with each other through the use of language, and on the way people access the vast amount of linguistic data now in electronic form. From a scientific viewpoint, NLP involves fundamental questions of how to structure formal models (for example statistical models) of natural language phenomena, and of how to design algorithms that implement these models.')
print(test_doc)
for np in test_doc.noun_chunks:
    print(np)

# 基于词向量计算两个单词的相似度
print("\n7、基于词向量计算两个单词的相似度")
test_doc = nlp(u"Apples and oranges are the same . Boots and hippos aren't.")
print(test_doc)
apples = test_doc[0]
print(apples)
oranges = test_doc[2]
print(oranges)
boots = test_doc[7]
print(boots)
hippos = test_doc[9]
print(hippos)

print(apples.similarity(oranges))
print(boots.similarity(hippos))

结果:

/usr/bin/python3.5 /home/wmmm/PycharmProjects/untitled/zstp.py

1、分词
it's word tokenize test for spacy
it
's
word
tokenize
test
for
spacy

2、分句
Natural language processing (NLP) deals with the application of computational models to text or speech data. Application areas within NLP include automatic (machine) translation between languages; dialogue systems, which allow a human to interact with a machine using natural language; and information extraction, where the goal is to transform unstructured text into structured (database) representations that can be searched and browsed in flexible ways. NLP technologies are having a dramatic impact on the way people interact with computers, on the way people interact with each other through the use of language, and on the way people access the vast amount of linguistic data now in electronic form. From a scientific viewpoint, NLP involves fundamental questions of how to structure formal models (for example statistical models) of natural language phenomena, and of how to design algorithms that implement these models.
Natural language processing (NLP) deals with the application of computational models to text or speech data.
Application areas within NLP include automatic (machine) translation between languages; dialogue systems, which allow a human to interact with a machine using natural language; and information extraction, where the goal is to transform unstructured text into structured (database) representations that can be searched and browsed in flexible ways.
NLP technologies are having a dramatic impact on the way people interact with computers, on the way people interact with each other through the use of language, and on the way people access the vast amount of linguistic data now in electronic form.
From a scientific viewpoint, NLP involves fundamental questions of how to structure formal models (for example statistical models) of natural language phenomena, and of how to design algorithms that implement these models.

3、词干化
you are best. it is lemmatize test for spacy. I love these books
you -PRON- 561228191312463089
are be 10382539506755952630
best good 5711639017775284443
. . 12646065887601541794
it -PRON- 561228191312463089
is be 10382539506755952630
lemmatize lemmatize 4507259281035238268
test test 1618900948208871284
for for 16037325823156266367
spacy spacy 10639093010105930009
. . 12646065887601541794
I -PRON- 561228191312463089
love love 3702023516439754181
these these 6459564349623679250
books book 13814433107111459297

4、词性标注
you are best. it is lemmatize test for spacy. I love these books
you PRON 94
are VERB 99
best ADJ 83
. PUNCT 96
it PRON 94
is VERB 99
lemmatize ADJ 83
test NOUN 91
for ADP 84
spacy NOUN 91
. PUNCT 96
I PRON 94
love VERB 99
these DET 89
books NOUN 91

5、命名实体识别
Rami Eid is studying at Stony Brook University in New York
Rami Eid PERSON 378
Stony Brook University ORG 381
New York GPE 382

6、名词短语提取
Natural language processing (NLP) deals with the application of computational models to text or speech data. Application areas within NLP include automatic (machine) translation between languages; dialogue systems, which allow a human to interact with a machine using natural language; and information extraction, where the goal is to transform unstructured text into structured (database) representations that can be searched and browsed in flexible ways. NLP technologies are having a dramatic impact on the way people interact with computers, on the way people interact with each other through the use of language, and on the way people access the vast amount of linguistic data now in electronic form. From a scientific viewpoint, NLP involves fundamental questions of how to structure formal models (for example statistical models) of natural language phenomena, and of how to design algorithms that implement these models.
Natural language processing
the application
computational models
Application areas
NLP
automatic (machine) translation
languages
dialogue systems
a human
a machine
natural language
information extraction
the goal
unstructured text
structured (database) representations
flexible ways
NLP technologies
a dramatic impact
the way
people
computers
the way
people
the use
language
the way
people
the vast amount
linguistic data
electronic form
a scientific viewpoint
NLP
fundamental questions
formal models
example
natural language phenomena
algorithms
these models

7、基于词向量计算两个单词的相似度
Apples and oranges are the same . Boots and hippos aren't.
Apples
oranges
Boots
hippos
0.518096
0.158362

进程已结束,退出代码0
<think>嗯,用户让我介绍PythonspaCy,首先我得回想一下spaCy的基本信息。spaCy是一个自然语言处理库,对吧?我记得它和NLTK类似,但更注重于工业应用和性能。用户可能想了解它的主要功能和优势,比如高效的处理速度,预训练模型,还有支持多种语言。 用户可能是开发者,可能在做NLP项目,需要选择一个合适的库。他们可能想知道spaCy和其他库的区别,比如NLTK或者Transformers库。需要强调spaCy的特点,比如易用性、生产环境就绪,以及内置的数据结构如Doc和Span。 还要提到spaCy的功能模块,比如分词、词性标注、依存句法分析、命名实体识别,还有文本分类。可能用户对流程化的处理感兴趣,比如管道机制,可以自定义组件。另外,预训练模型如en_core_web_sm,以及如何安装和使用这些模型。 可能用户的应用场景是实体识别或者信息提取,所以需要举例子说明,比如处理文本后提取实体。还要提一下spaCy支持深度学习整合,比如与Hugging Face的Transformers库结合,处理更复杂的任务。 需要注意的是用户可能不了解spaCy的安装和基本用法,所以应该给出简单的代码示例,展示如何加载模型和处理文本。另外,用户可能有性能需求,所以强调spaCy的高效性,基于Cython优化。 最后,确认是否要比较spaCy和其他库,但用户没有明确问,所以可能不需要深入比较,但可以稍微提到。确保回答结构清晰,分点介绍,让用户能快速抓住重点。</think>!以下是关于 Python **spaCy** 的中文介绍: --- ### **spaCy 是什么?** **spaCy** 是一个流行的开源 Python 库,专注于 **自然语言处理(NLP)**,设计目标是高效、易用和工业级应用。与 NLTK 等学术研究导向的库不同,spaCy 更注重实际生产环境中的性能和便捷性,适合处理大规模文本数据。 --- ### **核心功能** 1. **分词(Tokenization)** 将文本拆分为单词、标点等基本单元(Token),支持多语言规则。 2. **词性标注(Part-of-Speech Tagging)** 自动识别每个词的词性(如名词、动词、形容词等)。 3. **依存句法分析(Dependency Parsing)** 分析句子中词语之间的语法关系(如主谓宾、修饰关系等)。 4. **命名实体识别(Named Entity Recognition, NER)** 识别文本中的实体(如人名、地点、日期、组织等)。 5. **文本分类(Text Classification)** 支持训练自定义模型对文本进行分类。 6. **预训练模型与词向量** 提供多种语言的预训练模型(如英文 `en_core_web_sm`、中文 `zh_core_web_sm`),内置词向量表示。 --- ### **主要优势** - **高效性能**:底层使用 Cython 优化,处理速度极快。 - **工业化设计**:支持管道(Pipeline)机制,方便扩展自定义处理步骤。 - **简洁 API**:提供一致的对象模型(如 `Doc`, `Span`, `Token`),操作直观。 - **多语言支持**:支持 70+ 种语言,涵盖中文、英文、德文等。 - **深度学习集成**:可与 PyTorch、TensorFlow 等框架结合,支持 Transformer 模型。 -- ### **安装与快速使用** 1. **安装 spaCy** ```bash pip install spacy ``` 2. **下载预训练模型**(以英文为例) ```bash python -m spacy download en_core_web_sm ``` 3. **基础示例代码** ```python import spacy # 加载预训练模型 nlp = spacy.load("en_core_web_sm") # 处理文本 text = "Apple is looking at buying a U.K. startup for $1 billion." doc = nlp(text) # 输出分词、词性、实体信息 for token in doc: print(token.text, token.pos_, token.dep_) # 提取命名实体 for ent in doc.ents: print(ent.text, ent.label_) ``` --- ### **应用场景** - 信息提取(如从新闻中提取公司、金额) - 文本预处理(清洗、标准化) - 聊天机器人对话理解 - 情感分析与内容分类 --- ### **扩展能力** - **自定义管道**:通过 `nlp.add_pipe()` 添加自定义处理步骤。 - **规则匹配**:结合 `Matcher` 或 `PhraseMatcher` 实现基于规则的匹配。 - **与 Transformers 结合**:使用 `spacy-transformers` 库集成 BERT 等模型。 --- ### **学习资源** - 官方文档:[https://spacy.io](https://spacy.io) - 中文教程:[spaCy 101 中文指南](https://spacy.io/usage/spacy-101) - GitHub 仓库:[https://github.com/explosion/spaCy](https://github.com/explosion/spaCy) --- 如果需要更具体的功能演示或深入某个模块,可以进一步提问! 😊
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值