Natural Language Processing with Python - Chapter 0

本文介绍了一位新手使用Python进行自然语言处理的学习经历。从选择Python作为编程语言的原因讲起,介绍了用于自然语言处理的第三方库NLTK,并分享了在PyCharm环境下配置Python环境、安装NLTK及其相关数据集的过程。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

一年之前,我做梦也想不到会来这里写技术总结。误打误撞来到了上海西南某高校,成为了文科专业的工科男,现在每天除了膜ha,就是恶补CS。导师是做计算语言学的,所以当务之急就是先自学计算机自然语言处理,打好底子准备做科研(认真脸)。

进入正题,从图书馆找了本“Natural Language Processing with Python” (影印版),书长这个样子,作者是Steven Bird, Ewan Klein和Edward Loper。粘贴个豆瓣链接供参考:https://book.douban.com/subject/5336893/

对于本书,读者大多将其定义为:NLTK的工具书(再加个“入门级”的定语或许更佳恰当),理论性还有待深入。但是对于小白来讲先刷一遍再说,简单、粗暴、实用、能迅速上手才是王道!

顺便再说一下为什么要选Python,其实对于小白来讲,与其浪费时间去纠结学什么编程语言,不如花时间去多敲几行代码。Python语言本身好不好小白我没资格评价,但是这里必须要说一句Python有很多功能强大的第三方工具包(package),这些工具包是解决具体学科具体问题的利器!比如在自然语言处理方面,NLTK(Natural Language Toolkit)功能极其强大。

好了,这次终于能进入正题了。Chapter 0可以视为学习前的准备工作,正所谓“工欲善其事,必先利其器”。

IDE: PyCharm
IDE我选了PyCharm,据说很好用。下载及安装方法如下:
1.Python官网下载Python,打开terminal,输入python显示版本信息
2.下载Python开发的IDE PyCharm,Professional版的激活码可以求助度娘

关于Python文件编码声明
1.位置:必须放在python文件的第一行或第二行
2.格式:a. 带等于号的
1 #coding=<encoding name>

           b. 带冒号的,最常见,大多数编辑器都可以识别               

1 #!/usr/bin/python
2 # -*- coding: <encoding name> -*-             
           c. vim的:
1 #!/usr/bin/python               
2 # vim: set fileencoding=<encoding name>
3.作用:告诉python interpreter如何解释字符串的编码
           如果没有文件编码类型声明,则python默认以ASCII编码去处理。
           如果没有声明代码,但是文件中又包含非ASCII编码的字符的话,python解释器去解释python文件自然会报错。
4.例子:第一行说明脚本语言是python的;第二行用来指定文件编码为utf-8的 。   
1 #!/usr/bin/python                 
2 # -*- coding: utf-8 -*-                          
5.注意:单个python源码文件中只允许用单一的编码,不允许嵌入多种编码,否则会报错!!!
6.Python分词器+编译器工作逻辑:
            a. 读取文件
            b. 不同的文件,根据其声明的编码去解析为Unicode
            c. 转换为UTF-8字符串
            d. 针对UTF-8字符串去分词
            e. 编译之,创建Unicode对象
7.UTF-8: 8-bit Unicode Transformation Format,是一种针对Unicode的可变长度字符编码,又称万国码。
             总之,要想让Python程序支持中文,就需要在Python源文件开头加上这样一段编码声明。

My First Python Program - Hello World!
1.File --> New Project --> 选择Project的保存路径(个人感觉很像R语言中working directory的设定)
2.右键刚刚建好的project --> New --> Python File --> 给File命个名(个人感觉这就是脚本文件,类似于R语言中的script)
3.敲入文件编码声明(其实并不必要,因为我们输入英文"Hello World!",而不是中文)
4.Hello World
1 print ("Hello World!") 
5.这时会发现运行及调试按钮(绿色三角形)是灰色的,因为我们还没有设置控制台。

 
python设置控制台
1.点击运行旁边的黑色倒三角,进入Run/Debug Configurations配置界面(或者Run —> Edit Configurations)
2.点击绿色加号,新建一个配置项,并选择python(因为是用源代码是python程序)
3.配置界面里Name一栏写一个名字,点击Script选项,找到刚才写的 .py 文件
4.点击OK,自动返回编辑界面,这时运行及调试按钮全部变绿
5.点击运行,观看输出结果

Installing Packages in PyCharm - Mac
1.Pycharm —> preference —> project interpreter
2. +  for adding packages
    -   for deleting packages
    -> for updating packages

NLTK (Natural Language Toolkit)
通过输入以下代码,调用NLTK这个包,然后下载我们所需的data sets(实际上就是书中所用的语料)
1 import nltk
2 nltk.download()
Run and you will get to the NLTK Downloader
The Collections tab on the downloader shows how the packages are grouped into sets, and you should select the line labeled book to obtain all data required for the examples and exercises in this book.
我表示下载速度让人捉鸡,虽然MIT (Minhang Institute of Technology,译作:闵行男子职业技术学院)的网速相当快,而且不用交网费!!!

吃饭前,还是要膜一下!遛了。 
 

 

转载于:https://www.cnblogs.com/timothy1993/p/5881041.html

Python Natural Language Processing by Jalaj Thanaki English | 31 July 2017 | ISBN: 1787121429 | ASIN: B072B8YWCJ | 486 Pages | AZW3 | 11.02 MB Key Features Implement Machine Learning and Deep Learning techniques for efficient natural language processing Get started with NLTK and implement NLP in your applications with ease Understand and interpret human languages with the power of text analysis via Python Book Description This book starts off by laying the foundation for Natural Language Processing and why Python is one of the best options to build an NLP-based expert system with advantages such as Community support, availability of frameworks and so on. Later it gives you a better understanding of available free forms of corpus and different types of dataset. After this, you will know how to choose a dataset for natural language processing applications and find the right NLP techniques to process sentences in datasets and understand their structure. You will also learn how to tokenize different parts of sentences and ways to analyze them. During the course of the book, you will explore the semantic as well as syntactic analysis of text. You will understand how to solve various ambiguities in processing human language and will come across various scenarios while performing text analysis. You will learn the very basics of getting the environment ready for natural language processing, move on to the initial setup, and then quickly understand sentences and language parts. You will learn the power of Machine Learning and Deep Learning to extract information from text data. By the end of the book, you will have a clear understanding of natural language processing and will have worked on multiple examples that implement NLP in the real world. What you will learn Focus on Python programming paradigms, which are used to develop NLP applications Understand corpus analysis and different types of data attribute. Learn NLP using Python libraries such as NLTK, Polyglot,
Mastering Natural Language Processing with Python by Deepti Chopra, Nisheeth Joshi, Iti Mathur 2016 | ISBN: 1783989041 | English | 238 pages Maximize your NLP capabilities while creating amazing NLP projects in Python About This Book Learn to implement various NLP tasks in Python Gain insights into the current and budding research topics of NLP This is a comprehensive step-by-step guide to help students and researchers create their own projects based on real-life applications Who This Book Is For This book is for intermediate level developers in NLP with a reasonable knowledge level and understanding of Python. What You Will Learn Implement string matching algorithms and normalization techniques Implement statistical language modeling techniques Get an insight into developing a stemmer, lemmatizer, morphological analyzer, and morphological generator Develop a search engine and implement POS tagging concepts and statistical modeling concepts involving the n gram approach Familiarize yourself with concepts such as the Treebank construct, CFG construction, the CYK Chart Parsing algorithm, and the Earley Chart Parsing algorithm Develop an NER-based system and understand and apply the concepts of sentiment analysis Understand and implement the concepts of Information Retrieval and text summarization Develop a Discourse Analysis System and Anaphora Resolution based system In Detail Natural Language Processing is one of the fields of computational linguistics and artificial intelligence that is concerned with human-computer interaction. It provides a seamless interaction between computers and human beings and gives computers the ability to understand human speech with the help of machine learning. This book will give you expertise on how to employ various NLP tasks in Python, giving you an insight into the best practices when designing and building NLP-based applications using Python. It will help you become an expert in no time and assist you in creating your own NLP projects using NLTK. You will sequentially be guided through applying machine learning tools to develop various models. We&#39;ll give you clarity on how to create training data and how to implement major NLP applications such as Named Entity Recognition, Question Answering System, Discourse Analysis, Transliteration, Word Sense disambiguation, Information Retrieval, Sentiment Analysis, Text Summarization, and Anaphora Resolution. Style and approach This is an easy-to-follow guide, full of hands-on examples of real-world tasks. Each topic is explained and placed in context, and for the more inquisitive, there are more details of the concepts used.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值