为了做Coursera上的作业,我需要学习Python来写代码,以前就有过想要学习Python的想法,现在马上行动
这次是在另外一个学习网站上:dataquest 上学习的,相比datacamp上的R教程,这个我觉得还不太习惯,虽然两个网站看起来是一家做的,但是帐号不通用。。
贴上API中文文档,感谢他们的付出
找到的一个网站:这个是外国的大数据相关网站,里面会有比赛
我先用dataquest网站上的教程做笔记了,不过先记录下安装PYTHON遇到的问题,首先我去官网上下的最新的版本 3.5.0,但是下载的无论是32位还是64位都无法在我的WINDOWS8.1环境下安装,试了下解决没有解决掉,于是就安装了python2.7 顺利安装上,不知道是不是Windows8.1的问题。。。
IDE我选择的是pycharm ,用起来很顺畅简洁
----------------------------------------------------------------------------
首先是按照教程的第一步:
1.读取文件:
f =
open(
"E:/PY PROGRAM DIC/test1/story.txt",
"r")
story = f.read()
story = f.read()
print(story)
读这里可以看出 对象f
f.open( ”路径.txt“ , "方法") 然后又有对象 story 是
f.read( ) 即是 对象f读出的文本 然后我用
print() 指令打印出来
2.标记(tokenizing the file)
我在story2里面写的是
rat doto best doto
CN doto best doto
f =
open(
"E:/PY PROGRAM DIC/test1/story2.txt",
"r")
story2= f.read()
my_token=story2.split( " ")
story2= f.read()
my_token=story2.split( " ")
print(my_token)
输出结果很有趣:
['rat', 'doto', 'best', 'doto\nCN', 'doto', 'best', 'doto']
把换行也当作标记 记下了 那么我暂时理解是read()方法是全部读取文本而且不分换行(识别成 \n )
3.换标点(replacing punctuation)
指导上的很有意思也很适合理解,直接贴上来
# We can use the .replace function to replace punctuation in a string.
text = "Who really shot John F. Kennedy?"
text = text.replace( "?", "?!")
# The question mark has been replaced with ?!.
print(text)
# We can replace strings with blank spaces, meaning that they are just removed.
text = text.replace( "?", "")
# The question mark is gone now.
print(text)
no_punctuation_tokens = []
for token in my_token:
token = token.replace( ".", "")
token = token.replace( ",", "")
token = token.replace( "'", "")
token = token.replace( ";", "")
token = token.replace( " \n ", "")
no_punctuation_tokens.append(token)
text = "Who really shot John F. Kennedy?"
text = text.replace( "?", "?!")
# The question mark has been replaced with ?!.
print(text)
# We can replace strings with blank spaces, meaning that they are just removed.
text = text.replace( "?", "")
# The question mark is gone now.
print(text)
no_punctuation_tokens = []
for token in my_token:
token = token.replace( ".", "")
token = token.replace( ",", "")
token = token.replace( "'", "")
token = token.replace( ";", "")
token = token.replace( " \n ", "")
no_punctuation_tokens.append(token)
print(no_punctuation_tokens)
输出结果如下:
Who really shot John F. Kennedy?!
Who really shot John F. Kennedy!
['rat', 'doto', 'best', 'dotoCN', 'doto', 'best', 'doto']
知道了: · 定义数组的方法 : x = [ ]
·添加的方法 .append( )
·for ... in ...
4.小写字体 .lower( )
token = token.lower( )
11/27