第一个python程序

最新推荐文章于 2021-10-15 15:39:08 发布

liudongxu_110

最新推荐文章于 2021-10-15 15:39:08 发布

阅读量226

点赞数

本文链接：https://blog.youkuaiyun.com/liudongxu_110/article/details/84631989

版权

最近看python比较流行，尤其是爬网站数据，自己照着大神的模板写了简单的程序，还没怎么熟悉python语法，仅做留念

1、python安装：

Python最新源码，二进制文档，新闻资讯等可以在Python的官网查看到：

Python官网：https://www.python.org/

你可以在以下链接中下载 Python 的文档，你可以下载 HTML、PDF 和 PostScript 等格式的文档。

Python文档下载地址：https://www.python.org/doc/

安装完不能用的，需要配置下环境变量，如果pid命令不能用那就把script目录也加到环境变量中

2、在说说工具吧，用UE也行，只是中文乱码的问题一直没解决，改为用eclipse，安装插件：

eclipse菜单 -> Help -> Install New Software... -> Work with (Add..)
Name:PyDev
Location:http://pydev.org/updates

eclipse和python关联：

eclipse菜单 -> Windows ->Preferences -> PyDev-> Interpreters - Python Interpreter.
点击New按钮,选择python.exe的路径(第1步安装Python的路径),打开后显示出一个包含很多复选框的窗口,点OK结束！

如果报错：Unable to create the selected preference page. An error occurred while automatically activating bundle org.python.pydev，安装旧版插件试下：路径Location=https://dl.bintray.com/fabioz/pydev/old/

3、新建项目：

3、整了个sqlite3数据库，很小，安装很简单，又下了个Sqlite studio3.1.1操作数据库用，建了个books.db，建了个comments表

CREATE TABLE comments (
    moveId     VARCHAR (20),
    nickname   VARCHAR (1000),
    comment    VARCHAR (2000),
    rate       VARCHAR (20),
    city       VARCHAR (50),
    start_time VARCHAR (50) 
);

用的猫眼，需要抓取猫眼接口数据，这个还不知道怎么弄，先把大神的拿来用了，类似下面的地址：

"http://m.maoyan.com/mmdb/comments/movie/1208282.json?_v_=yes&offset=15"

程序没啥东西，抓取json数据，遍历取出需要的值，存入数据库中

import requests
import sqlite3
import json
import time
from datetime import datetime
from datetime import timedelta

def getMovieInfo(url) :
	session = requests.Session()
	headers = {
		"User-Agent":"Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like mac OS X)"
	}
	response = session.get(url, headers=headers)
	if response.status_code == 200:
		#print response.json()
		return response.text
	return None

def saveData(moveId, nickname, comment, rate, city, start_time):
	conn = sqlite3.connect('d:/books.db')
	conn.text_factory=str
	cursor = conn.cursor()
	ins="insert into comments values (?,?,?,?,?,?)"
	v = (moveId, nickname, comment, rate, city, start_time)
	cursor.execute(ins, v)
	cursor.close()
	conn.commit()
	conn.close()

def parseInfo():
	start_time = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
	end_time = '2018-11-20 00:00:00'
	i = 0
	while start_time > end_time:
		print start_time
		url = 'http://m.maoyan.com/mmdb/comments/movie/1203084.json?_v_=yes&offset=0&startTime=' + start_time.replace(' ', '%20')
		#html = getMovieInfo("http://m.maoyan.com/mmdb/comments/movie/1208282.json?_v_=yes&offset=1")
		try:
			html = getMovieInfo(url)
		except Exception as e:
			time.sleep(0.5)
			html = get_data(url)
		else:
			time.sleep(1)
		data = json.loads(html)['cmts']
		for item in data:
			i+=1
			saveData(i, item['nickName'], item['content'], item['score'], item['cityName'], item['startTime'])
			start_time = item['startTime']

parseInfo()

通过时间来遍历请求数据，防止请求过于频繁，加了每次请求完后等待时间

图形化输出，需要的模块：pyecharts

pip install pyecharts

地图模块：

选择自己需要的安装
$ pip install echarts-countries-pypkg
$ pip install echarts-china-provinces-pypkg
$ pip install echarts-china-cities-pypkg
$ pip install echarts-china-counties-pypkg
$ pip install echarts-china-misc-pypkg
$ pip install echarts-united-kingdom-pypkg

https://www.jianshu.com/p/e0b2851672cd

# -*- coding:utf-8 -*-

from pyecharts import Geo
from collections import Counter
from pyecharts import Style

def render():
    data = [(u'广州', 80), (u'漳州', 180)]
    #handle(cities)
    #data = Counter(cities).most_common()
    style = Style(
        title_color='#fff',
        title_pos='center',
        width=1200,
        height=600,
        background_color='#404a59'
    )
    
    geo = Geo('位置分布', '数据来源：111', **style.init_style)
    attr, value = geo.cast(data)
    geo.add('', attr, value, visual_range=[0, 3500],visual_text_color='#fff', symbol_size=15,is_visualmap=True, is_piecewise=True, visual_split_number=10)
    geo.render('11111111112.html')

render()

词云展示：

jieba是一个基于Python的分词库，完美支持中文分词，功能强大

pip install jieba

Matplotlib是一个Python的2D绘图库，能够生成高质量的图形，可以快速生成绘图、直方图、功率谱、柱状图、误差图、散点图等

pip install matplotlib

wordcloud是一个基于Python的词云生成类库，可以生成词云图

pip install wordcloud

程序：

# -*- coding:utf-8 -*-
# 导入jieba模块，用于中文分词
import jieba
# 导入matplotlib，用于生成2D图形
import matplotlib.pyplot as plt
# 导入wordcount，用于制作词云图
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

# 获取所有评论
comments = []
comments.append(u"很好哦")
comments.append(u"完美")
comments.append(u"很美")
comments.append(u"大美")
print(comments)
# with open('comments.txt', mode='r', encoding='utf-8') as f:
#     rows = f.readlines()
#     for row in rows:
#         comment = row.split(',')[3]
#         if comment != '':
#             comments.append(comment)

# 设置分词
comment_after_split = jieba.cut(str(comments), cut_all=False)  # 非全模式分词，cut_all=false，中文无法正确显示，应该为str(comments).decode("unicode-escape")
words = ' '.join(comment_after_split)  # 以空格进行拼接
print(words)

# 设置屏蔽词
stopwords = STOPWORDS.copy()
stopwords.add('电影')
stopwords.add('一部')
stopwords.add('一个')
stopwords.add('没有')
stopwords.add('什么')
stopwords.add('有点')
stopwords.add('这部')
stopwords.add('这个')
stopwords.add('不是')
stopwords.add('真的')
stopwords.add('感觉')
stopwords.add('觉得')
stopwords.add('还是')
stopwords.add('但是')
stopwords.add('就是')
stopwords.add('一出')
stopwords.add('好戏')

# 导入背景图
bg_image = plt.imread('bg.jpg')

# 设置词云参数，参数分别表示：画布宽高、背景颜色、背景图形状、字体、屏蔽词、最大词的字体大小
wc = WordCloud(width=1024, height=768, background_color='white', mask=bg_image, font_path='STKAITI.TTF',
               stopwords=stopwords, max_font_size=400, random_state=50)
# 将分词后数据传入云图
wc.generate_from_text(words)
plt.imshow(wc)
plt.axis('off')  # 不显示坐标轴
plt.show()
# 保存结果到本地
wc.to_file('词云图.jpg')

现在的问题是，中文不能显示出来，只能显示unicode编码，在处理list时，需要转化下：

str(comments).decode("unicode-escape")