并发基础(1):wait、sleep、await、yield的区别

本文对比了sleep、yield、await/wait方法。在是否释放锁方面,sleep和yield不释放,await/wait释放;调用后恢复情况不同,sleep指定时间后就绪,yield立即就绪,await/wait需notify/signal唤醒;它们所属类不同,执行环境也有差异,await/wait需在同步块内。

是否释放锁:调用sleep和yield的时候不释放当前线程所获得的锁,但是调用await/wait的时候却释放了其获取的锁并阻塞等待。

 

调用后何时恢复:

# sleep让线程阻塞,且在指定的时间之内都不会执行,时间到了之后恢复到就绪状态,也不一定被立即调度执行;

# yield只是让当前对象回到就绪状态,还是有可能马上被再次被调用执行。

# await/wait,它会一直阻塞在条件队列之上,之后某个线程调用对应的notify/signal方法,才会使得await/wait的线程回到就绪状态,也是不一定立即执行。

 

谁的方法:yield和sleep方法都是Thread类的,而wait方法是Object类的,await方法是Condition显示条件队列的。

 

执行环境:yield和sleep方法可以放在线程中的任意位置,而await/wait方法必须放在同步块里面,否则会产生运行时异常。

 

await/wait

Sleep

Yield

是否释放持有的锁

释放

不释放

不释放

调用后何时恢复

唤醒后进入就绪态

指定时间后

立刻进入就绪态

谁的方法

Condition/Object

Thread

Thread

执行环境

同步代码块

任意位置

任意位置

# demo_scrapy_playwright_split.py # 拆分版本:Scrapy 爬虫管理浏览器生命周期(启动单例、关闭、可选监控) # BrowserManager 独立管理浏览器,任务类管理上下文和超时 # Scrapyd 友好,支持 Windows/Linux import os import asyncio import logging import time from pathlib import Path import scrapy from scrapy import signals from playwright.async_api import async_playwright, TimeoutError as PWTimeout ''' 我已经把浏览器管理逻辑独立成 BrowserManager 类: BrowserManager → 负责启动、关闭、监控浏览器; PlaywrightTaskManager → 负责任务执行与超时控制; DemoSpider → 只负责调度和生命周期绑定。 这样结构更清晰,后续要扩展资源监控(CPU、内存、上下文数等)时,可以直接在 BrowserManager 里实现。 ''' SCREENSHOT_DIR = Path("screenshots") SCREENSHOT_DIR.mkdir(exist_ok=True) class BrowserManager: def __init__(self): self._pw = None self.browser = None self._started = False async def start(self): if not self._started: self._pw = await async_playwright().start() self.browser = await self._pw.chromium.launch(headless=True) self._started = True logging.getLogger(__name__).info("Browser singleton started") async def close(self): if self._started: try: await self.browser.close() finally: await self._pw.stop() self._started = False logging.getLogger(__name__).info("Browser singleton closed") def is_running(self): return self._started and self.browser is not None class PlaywrightTaskManager: def __init__(self, browser_manager: BrowserManager, timeout: int = 20): self.browser_manager = browser_manager self.timeout = timeout async def _task_logic(self, url: str): context = await self.browser_manager.browser.new_context() page = await context.new_page() screenshot_path = None try: await page.goto(url, wait_until="load") title = await page.title() # time.sleep(5) await asyncio.sleep(5) screenshot_path = str(SCREENSHOT_DIR / f"{int(asyncio.get_event_loop().time())}.png") await page.screenshot(path=screenshot_path, full_page=True) return {"url": url, "title": title, "screenshot_path": screenshot_path} finally: try: await context.close() except Exception: pass async def run_task_with_timeout(self, url: str): try: result = await asyncio.wait_for(self._task_logic(url), timeout=self.timeout) return result except asyncio.TimeoutError: return {"url": url, "error": f"timeout after {self.timeout}s"} except PWTimeout: return {"url": url, "error": f"Playwright timeout after {self.timeout}s"} except Exception as e: return {"url": url, "error": str(e)} class DemoSpider(scrapy.Spider): name = "demo_playwright_task_manager1" start_urls = [ "https://www.hao123.com/", "http://www.people.com.cn/", "http://renshi.people.com.cn/", ] def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) self.timeout = 3 self._loop = None self.browser_manager = BrowserManager() self.task_manager = None @classmethod def from_crawler(cls, crawler, *args, **kwargs): spider = super(DemoSpider, cls).from_crawler(crawler, *args, **kwargs) crawler.signals.connect(spider.spider_opened, signal=signals.spider_opened) crawler.signals.connect(spider.spider_closed, signal=signals.spider_closed) return spider def spider_opened(self): # 独立事件循环,避免 "no running event loop" self._loop = asyncio.new_event_loop() asyncio.set_event_loop(self._loop) self._loop.run_until_complete(self.browser_manager.start()) self.task_manager = PlaywrightTaskManager(self.browser_manager, timeout=self.timeout) def spider_closed(self): if self.browser_manager.is_running(): self._loop.run_until_complete(self.browser_manager.close()) self._loop.close() def start_requests(self): for url in self.start_urls: self.logger.info("Dispatching task for %s", url) result = self._loop.run_until_complete(self.task_manager.run_task_with_timeout(url)) print(result) yield { "url": result.get("url"), "title": result.get("title"), "screenshot": result.get("screenshot_path"), "error": result.get("error"), }
最新发布
10-12
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值