一、xpinyin
汉字转拼音的库github,
https://github.com/lxneng/xpinyin
用起来很方便的 :github给的一些例子
>>> from xpinyin import Pinyin
>>> p = Pinyin()
>>> # default splitter is `-`
>>> p.get_pinyin(u"上海")
'shang-hai'
>>> # show tone marks
>>> p.get_pinyin(u"上海", show_tone_marks=True)
'shàng-hǎi'
>>> # remove splitter
>>> p.get_pinyin(u"上海", '')
'shanghai'
>>> # set splitter as whitespace
>>> p.get_pinyin(u"上海", ' ')
'shang hai'
>>> p.get_initial(u"上")
'S'
>>> p.get_initials(u"上海")
'S-H'
>>> p.get_initials(u"上海", u'')
'SH'
>>> p.get_initials(u"上海", u' ')
'S H'
二、jieba
根据语义将句子拆分为一个一个的词:github
https://github.com/fxsjy/jieba
# encoding=utf-8
import jieba
seg_list = jieba.cut("我来到北京清华大学", cut_all=True)
print("Full Mode: " + "/ ".join(seg_list)) # 全模式
seg_list = jieba.cut("我来到北京清华大学", cut_all=False)
print("Default Mode: " + "/ ".join(seg_list)) # 精确模式
seg_list = jieba.cut("他来到了网易杭研大厦") # 默认是精确模式
print(", ".join(seg_list))
seg_list = jieba.cut_for_search("小明硕士毕业于中国科学院计算所,后在日本京都大学深造") # 搜索引擎模式
print(", ".join(seg_list))