Error downloading - Could not open CONNECT tunnel

遇到使用Scrapy爬虫访问HTTPS页面时出现错误的情况,可能是由于尝试使用HTTP代理导致无法建立连接。建议检查代理设置,确保其支持HTTPS协议。若已尝试将URL从'https'更改为'http',但问题依旧,可能需要调整代理配置或直接使用默认配置进行访问。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

2        down vote        favorite        

I have written a spider to crawl https://tecnoblog.net/categoria/review/ but when I let the spider crawl, there is one error:

2015-05-19 15:13:20+0100 [scrapy] INFO: Scrapy 0.24.5 started (bot: reviews)
2015-05-19 15:13:20+0100 [scrapy] INFO: Optional features available: ssl, http11
2015-05-19 15:13:20+0100 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'reviews.spiders', 'SPIDER_MODULES': ['reviews.spiders'], 'DOWNLOAD_DELAY': 0.25, 'BOT_NAME': 'reviews'}
2015-05-19 15:13:20+0100 [scrapy] INFO: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2015-05-19 15:13:20+0100 [scrapy] INFO: Enabled downloader middlewares: ProxyMiddleware, HttpAuthMiddleware, DownloadTimeoutMiddleware, RotateUserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2015-05-19 15:13:20+0100 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2015-05-19 15:13:20+0100 [scrapy] INFO: Enabled item pipelines: 
2015-05-19 15:13:20+0100 [tecnoblog] INFO: Spider opened
2015-05-19 15:13:20+0100 [tecnoblog] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2015-05-19 15:13:20+0100 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6030
2015-05-19 15:13:20+0100 [scrapy] DEBUG: Web service listening on 127.0.0.1:6087
2015-05-19 15:13:25+0100 [tecnoblog] DEBUG: Redirecting (301) to <GET https://tecnoblog.net/categoria/review/> from <GET http://tecnoblog.net/categoria/review/>
2015-05-19 15:13:26+0100 [tecnoblog] ERROR: Error downloading <GET https://tecnoblog.net/categoria/review/>: Could not open CONNECT tunnel.
2015-05-19 15:13:26+0100 [tecnoblog] INFO: Closing spider (finished)
2015-05-19 15:13:26+0100 [tecnoblog] INFO: Dumping Scrapy stats:
    {'downloader/exception_count': 1,
     'downloader/exception_type_count/scrapy.core.downloader.handlers.http11.TunnelError': 1,
     'downloader/request_bytes': 644,
     'downloader/request_count': 2,
     'downloader/request_method_count/GET': 2,
     'downloader/response_bytes': 501,
     'downloader/response_count': 1,
     'downloader/response_status_count/301': 1,
     'finish_reason': 'finished',
     'finish_time': datetime.datetime(2015, 5, 19, 14, 13, 26, 227904),
     'log_count/DEBUG': 3,
     'log_count/ERROR': 1,
     'log_count/INFO': 7,
     'scheduler/dequeued': 2,
     'scheduler/dequeued/memory': 2,
     'scheduler/enqueued': 2,
     'scheduler/enqueued/memory': 2,
     'start_time': datetime.datetime(2015, 5, 19, 14, 13, 20, 217735)}
2015-05-19 15:13:26+0100 [tecnoblog] INFO: Spider closed (finished)

Any ideas why this is happening? 2015-05-19 15:13:26+0100 [tecnoblog] ERROR: Error downloading https://tecnoblog.net/categoria/review/>: Could not open CONNECT tunnel. This website I have crawled in the past month... How to fix it? I have tried to change the start url to 'http' instead of 'https' but it is redirecting it :S

scrapy scrapy-spider    

shareimprove this question

asked May 19 '15 at 15:33    

   

Inês Martins        

11311        


add a comment                    

1 Answer

           active                    oldest                    votes

   

up vote        2        down vote

You are probably trying to connect via https with http-only proxies.

You can use online HTTPS proxy tester to check if your proxies support https or use Linux curl command with proxy :

curl -x http://111.222.333.444:80 -L https://myip.ht

   

shareimprove this answer

edited Aug 19 '15 at 9:02    

answered Aug 18 '15 at 10:11    

   

Genjo        

4410        


add a comment


转载于:https://my.oschina.net/airship/blog/628810

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值