- Use of file
Text File or Binary File (only different in display):
Text File (is an abstraction and collection of data): a file consisting of a single specific encoding, such as UTF-8.
both of .txt and .py are Text flie.
Binary File only consist 0 and 1, it has NO uniform encoding, such as .png and .avi.Using open(“D:\f.txt”, “rt”) to open flie in Text mode, using open(“D:\f.txt”, “rb”) to open flie in Binary mode.
<
f
>
<f>
<f>.read(size = -1): read-in all the data, if parameter size is assigned value, read-in data before the size length.
<
f
>
<f>
<f>.readline(size = -1): read-in one line of data, if parameter size is assigned value, read-in all data before the size length.
<
f
>
<f>
<f>.readline(hint = -1): read-in all the lines of file that every line is regarded as an element, if hint is assigned, read-in all line before hint line.
<
f
>
<f>
<f>.write(s): write-in a string or byte-stream to file.
<
f
>
<f>
<f>.writelines(lines): write-in a list to flie, this list is full of string(as a element).
<
f
>
<f>
<f>.seek(offset): change the pointer of File Operation, 0 for back to start, 1 for current position, 2 for go to the end.
- Formatting and deal with one-dimensional data
List, collection, array are one-dimensional data. two-dimensioal data is combined by lots of one-dimensional data, base on this, multi-dimensional data is expansed by one-dimensional data or two-dimensional data according to new dimensions.
If there is order between data, use List Type, if data is chaos, use Collection Type.
There are three ways to save one-dimensional data:
Split by Space: use one or multiple Space to separate data without change line, it has a disadvantage, there can be no Space in data.
Split by Comma: use Comma to separate data without change line, it has the same disadvantage with Split by Space.
Spilt by Others: use special symbol to separate data without change line, it has disadvantage in general.
- CSV and saving two-dimensional data
CSV: Comma-Separated Values, international general one/two-dimensional data saving format–.csv, a one-dimensional data in one line, separated by comma, without empty line.
Using Excel Application can read/write .csv file, general edit software can create .csv file.
General index: ls[row][column].
- Wordcloud Library
$ pip install wordcloud
wordcloud.WordCloud() means a word-cloud corresponded the text.
w = wordcloud.WordCloud()
w.generate(txt): add text txt to WordCloud object.
w.to_file(fliename): output word-cloud as image flie, .png or .jpg.
width/height: assign a value to determine the width/height of picture, w = wordcloud.WordCloud(width/height = 600), default value of width is 400, default value of height is 200.
min_font_size/max_font_size: assign the size of font in word-cloud, w = wordcloud.WordCloud(min_font_size = 5), default value of min font is 4, default value of max font is determined by height.
font_step: assign the step of word-cloud, w = wordcloud.WordCloud(font_step = 2), default value is 1.
font_path: assign the path of word-cloud, w = wordcloud.WordCloud(font_path = “xxx.ttf”), default value is None.
max_words: assign the max number of words, w = wordcloud.WordCloud(max_words = 300), default value is 200.
stop_words: the assigned word cannot be displayed, w = wordcloud.WordCloud(stop_words = {“Python”}).
background_color: default value is black.
The use of mask parameter:
from scipy.misc import imread
mk = imread(“pic.png”)
w = wordcloud.WordCloud(mask = mk)