1. 多线程多进程的描述
进程: 运行中的程序.每次我们执行一个程序,咱们的操作系统对自动的为这个程序准备一些必要的资源(例如,分配内存,创建一个能够执行的线程.)
进程之间是相互独立的
进程是独立的内存
线程: 程序内,可以直接被CPU调度的执行过程.是操作系统能够进行运算调度的最小单位.它被包含在进程之中,是进程中的实际运作单位.
CPU运行的最小单位
1. 多线程的使用
多线程使用:单线程使用
# 导包
import threading
from concurrent.futures import ThreadPoolExecutor
def func(name):
print(f"打印名字:{name}")
# 创建线程
test_thread1 = threading.Thread(target=func,args=("Jack",))
test_thread1.daemon = False
# 执行线程
test_thread1.start()
多线程使用:线程池
3. 线程池
线程完成后,还需要执行动作
from concurrent.futures import ThreadPoolExecutor
import threading
ThPool = ThreadPoolExecutor(max_workers=5)
class TestThread(threading.Thread):
def __init__(self,name):
threading.Thread.__init__(self)
self.name = name
def run(self):
print(f"执行进程:{self.name}")
test_thread1 = TestThread("Jack")
ThPool.submit(test_thread1.run)
2. 多进程的使用
# 多进程
from multiprocessing import Process,Queue
def func(name):
for i in range(200):
print(f"检索到了第{i}个{name}的名字")
if __name__ == "__main__":
test_process1 = Process(target=func,args=("Jack",))
test_process2 = Process(target=func,args=("Tom",))
test_process1.start()
test_process2.start()
2. 多线程和多进程的使用场景
- 多线程:任务相对统一,互相特别相似
- 多进程:多个任务相互独立,很少有交集
3. 实操项目
from concurrent.futures import ThreadPoolExecutor
from multiprocessing import Process,Queue
import requests
from lxml import etree
def get_img_src(q,url):
head = {
"Referer":"https://www.75ll.com/",
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36"
}
resp = requests.get(url=url,headers=head)
resp.encoding="utf-8"
tree = etree.HTML(resp.text)
src_lst = tree.xpath("//div[@class = 'ABox']/a/img/@src")
for i in src_lst:
q.put(i)
q.put("当前界面图片下载完毕")
def download(url):
res_img = requests.get(url=url)
name = url.split("/")[-1]
with open(file=f"./img/{name}",mode="wb") as f:
f.write(res_img.content)
print(f"{name},下载完成")
def download_img(q):
with ThreadPoolExecutor(10) as t:
while True:
img_src = q.get()
if img_src == "当前界面图片下载完毕":
break
t.submit(download,img_src)
def page_num(n):
url = "https://www.75ll.com/meinv/"
url_lst = []
for i in range(1,n+1):
tmp_url = url + f"list-{i}.html"
url_lst.append(tmp_url)
return url_lst
if __name__ == '__main__':
q = Queue()
n = int(input("请输入想爬取的页数:"))
for i in page_num(n):
p1 = Process(target=get_img_src,args=(q,i))
p2 = Process(target=download_img,args=(q,))
p1.start()
p2.start()