python spacy代码

最新推荐文章于 2025-05-27 22:04:24 发布

转载最新推荐文章于 2025-05-27 22:04:24 发布 · 3.1k 阅读

文章标签：

#python #nlp

python 专栏收录该内容

2 篇文章

订阅专栏

本文通过具体示例演示了Spacy库在自然语言处理任务中的应用，包括分词、分句、词干化、词性标注、命名实体识别、名词短语提取及词向量相似度计算。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

代码如下：

import spacy
nlp = spacy.load('en')
test_doc = nlp(u"it's word tokenize test for spacy")

# 分词
print("\n1、分词")
print(test_doc)
for token in test_doc:
    print(token)

# 分句
print("\n2、分句")
test_doc = nlp(u'Natural language processing (NLP) deals with the application of computational models to text or speech data. Application areas within NLP include automatic (machine) translation between languages; dialogue systems, which allow a human to interact with a machine using natural language; and information extraction, where the goal is to transform unstructured text into structured (database) representations that can be searched and browsed in flexible ways. NLP technologies are having a dramatic impact on the way people interact with computers, on the way people interact with each other through the use of language, and on the way people access the vast amount of linguistic data now in electronic form. From a scientific viewpoint, NLP involves fundamental questions of how to structure formal models (for example statistical models) of natural language phenomena, and of how to design algorithms that implement these models.')
print(test_doc)
for sent in test_doc.sents:
    print(sent)

# 词干化
print("\n3、词干化")
test_doc = nlp(u"you are best. it is lemmatize test for spacy. I love these books")
print(test_doc)
for token in test_doc:
    print(token, token.lemma_, token.lemma)

# 词性标注
print("\n4、词性标注")
print(test_doc)
for token in test_doc:
    print(token, token.pos_, token.pos)


# 命名实体识别
print("\n5、命名实体识别")
test_doc = nlp(u"Rami Eid is studying at Stony Brook University in New York")
print(test_doc)
for ent in test_doc.ents:
    print(ent, ent.label_, ent.label)

# 名词短语提取
print("\n6、名词短语提取")
test_doc = nlp(u'Natural language processing (NLP) deals with the application of computational models to text or speech data. Application areas within NLP include automatic (machine) translation between languages; dialogue systems, which allow a human to interact with a machine using natural language; and information extraction, where the goal is to transform unstructured text into structured (database) representations that can be searched and browsed in flexible ways. NLP technologies are having a dramatic impact on the way people interact with computers, on the way people interact with each other through the use of language, and on the way people access the vast amount of linguistic data now in electronic form. From a scientific viewpoint, NLP involves fundamental questions of how to structure formal models (for example statistical models) of natural language phenomena, and of how to design algorithms that implement these models.')
print(test_doc)
for np in test_doc.noun_chunks:
    print(np)

# 基于词向量计算两个单词的相似度
print("\n7、基于词向量计算两个单词的相似度")
test_doc = nlp(u"Apples and oranges are the same . Boots and hippos aren't.")
print(test_doc)
apples = test_doc[0]
print(apples)
oranges = test_doc[2]
print(oranges)
boots = test_doc[7]
print(boots)
hippos = test_doc[9]
print(hippos)

print(apples.similarity(oranges))
print(boots.similarity(hippos))

结果：

/usr/bin/python3.5 /home/wmmm/PycharmProjects/untitled/zstp.py

1、分词
it's word tokenize test for spacy
it
's
word
tokenize
test
for
spacy

2、分句
Natural language processing (NLP) deals with the application of computational models to text or speech data. Application areas within NLP include automatic (machine) translation between languages; dialogue systems, which allow a human to interact with a machine using natural language; and information extraction, where the goal is to transform unstructured text into structured (database) representations that can be searched and browsed in flexible ways. NLP technologies are having a dramatic impact on the way people interact with computers, on the way people interact with each other through the use of language, and on the way people access the vast amount of linguistic data now in electronic form. From a scientific viewpoint, NLP involves fundamental questions of how to structure formal models (for example statistical models) of natural language phenomena, and of how to design algorithms that implement these models.
Natural language processing (NLP) deals with the application of computational models to text or speech data.
Application areas within NLP include automatic (machine) translation between languages; dialogue systems, which allow a human to interact with a machine using natural language; and information extraction, where the goal is to transform unstructured text into structured (database) representations that can be searched and browsed in flexible ways.
NLP technologies are having a dramatic impact on the way people interact with computers, on the way people interact with each other through the use of language, and on the way people access the vast amount of linguistic data now in electronic form.
From a scientific viewpoint, NLP involves fundamental questions of how to structure formal models (for example statistical models) of natural language phenomena, and of how to design algorithms that implement these models.

3、词干化
you are best. it is lemmatize test for spacy. I love these books
you -PRON- 561228191312463089
are be 10382539506755952630
best good 5711639017775284443
. . 12646065887601541794
it -PRON- 561228191312463089
is be 10382539506755952630
lemmatize lemmatize 4507259281035238268
test test 1618900948208871284
for for 16037325823156266367
spacy spacy 10639093010105930009
. . 12646065887601541794
I -PRON- 561228191312463089
love love 3702023516439754181
these these 6459564349623679250
books book 13814433107111459297

4、词性标注
you are best. it is lemmatize test for spacy. I love these books
you PRON 94
are VERB 99
best ADJ 83
. PUNCT 96
it PRON 94
is VERB 99
lemmatize ADJ 83
test NOUN 91
for ADP 84
spacy NOUN 91
. PUNCT 96
I PRON 94
love VERB 99
these DET 89
books NOUN 91

5、命名实体识别
Rami Eid is studying at Stony Brook University in New York
Rami Eid PERSON 378
Stony Brook University ORG 381
New York GPE 382

6、名词短语提取
Natural language processing (NLP) deals with the application of computational models to text or speech data. Application areas within NLP include automatic (machine) translation between languages; dialogue systems, which allow a human to interact with a machine using natural language; and information extraction, where the goal is to transform unstructured text into structured (database) representations that can be searched and browsed in flexible ways. NLP technologies are having a dramatic impact on the way people interact with computers, on the way people interact with each other through the use of language, and on the way people access the vast amount of linguistic data now in electronic form. From a scientific viewpoint, NLP involves fundamental questions of how to structure formal models (for example statistical models) of natural language phenomena, and of how to design algorithms that implement these models.
Natural language processing
the application
computational models
Application areas
NLP
automatic (machine) translation
languages
dialogue systems
a human
a machine
natural language
information extraction
the goal
unstructured text
structured (database) representations
flexible ways
NLP technologies
a dramatic impact
the way
people
computers
the way
people
the use
language
the way
people
the vast amount
linguistic data
electronic form
a scientific viewpoint
NLP
fundamental questions
formal models
example
natural language phenomena
algorithms
these models

7、基于词向量计算两个单词的相似度
Apples and oranges are the same . Boots and hippos aren't.
Apples
oranges
Boots
hippos
0.518096
0.158362

进程已结束,退出代码0