py7_the formatting of flies and data

本文深入探讨了文件操作的基本概念,包括文本文件与二进制文件的区别,以及如何使用Python进行读写操作。同时,文章讲解了一维数据的保存方式,如使用空格、逗号或其他特殊符号进行分隔,并介绍了CSV格式在保存二维数据方面的应用。此外,还涉及了WordCloud库的使用,用于生成基于文本的词云图像。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

  • Use of file

Text File or Binary File (only different in display):
Text File (is an abstraction and collection of data): a file consisting of a single specific encoding, such as UTF-8.
both of .txt and .py are Text flie.
Binary File only consist 0 and 1, it has NO uniform encoding, such as .png and .avi.Using open(“D:\f.txt”, “rt”) to open flie in Text mode, using open(“D:\f.txt”, “rb”) to open flie in Binary mode.
&lt; f &gt; &lt;f&gt; <f>.read(size = -1): read-in all the data, if parameter size is assigned value, read-in data before the size length.
&lt; f &gt; &lt;f&gt; <f>.readline(size = -1): read-in one line of data, if parameter size is assigned value, read-in all data before the size length.
&lt; f &gt; &lt;f&gt; <f>.readline(hint = -1): read-in all the lines of file that every line is regarded as an element, if hint is assigned, read-in all line before hint line.
&lt; f &gt; &lt;f&gt; <f>.write(s): write-in a string or byte-stream to file.
&lt; f &gt; &lt;f&gt; <f>.writelines(lines): write-in a list to flie, this list is full of string(as a element).
&lt; f &gt; &lt;f&gt; <f>.seek(offset): change the pointer of File Operation, 0 for back to start, 1 for current position, 2 for go to the end.

  • Formatting and deal with one-dimensional data

List, collection, array are one-dimensional data. two-dimensioal data is combined by lots of one-dimensional data, base on this, multi-dimensional data is expansed by one-dimensional data or two-dimensional data according to new dimensions.
If there is order between data, use List Type, if data is chaos, use Collection Type.

There are three ways to save one-dimensional data:

Split by Space: use one or multiple Space to separate data without change line, it has a disadvantage, there can be no Space in data.
Split by Comma: use Comma to separate data without change line, it has the same disadvantage with Split by Space.
Spilt by Others: use special symbol to separate data without change line, it has disadvantage in general.

  • CSV and saving two-dimensional data

CSV: Comma-Separated Values, international general one/two-dimensional data saving format–.csv, a one-dimensional data in one line, separated by comma, without empty line.
Using Excel Application can read/write .csv file, general edit software can create .csv file.
General index: ls[row][column].

  • Wordcloud Library

$ pip install wordcloud

wordcloud.WordCloud() means a word-cloud corresponded the text.
w = wordcloud.WordCloud()
w.generate(txt): add text txt to WordCloud object.
w.to_file(fliename): output word-cloud as image flie, .png or .jpg.
width/height: assign a value to determine the width/height of picture, w = wordcloud.WordCloud(width/height = 600), default value of width is 400, default value of height is 200.
min_font_size/max_font_size: assign the size of font in word-cloud, w = wordcloud.WordCloud(min_font_size = 5), default value of min font is 4, default value of max font is determined by height.
font_step: assign the step of word-cloud, w = wordcloud.WordCloud(font_step = 2), default value is 1.
font_path: assign the path of word-cloud, w = wordcloud.WordCloud(font_path = “xxx.ttf”), default value is None.
max_words: assign the max number of words, w = wordcloud.WordCloud(max_words = 300), default value is 200.
stop_words: the assigned word cannot be displayed, w = wordcloud.WordCloud(stop_words = {“Python”}).
background_color: default value is black.

The use of mask parameter:
from scipy.misc import imread
mk = imread(“pic.png”)
w = wordcloud.WordCloud(mask = mk)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值