1、设置随机UA
修改middlewares.py
from fake_useragent import UserAgent
class RandomUserAgentMiddleware(object):
def process_request(self, request, spider):
ua = UserAgent()
request.headers['User-Agent'] = ua.random
修改settings.py
# Enable or disable downloader middlewares
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
DOWNLOADER_MIDDLEWARES = {
# 'csdn_xila.middlewares.CsdnXilaDownloaderMiddleware': 543,
'scrapy_test.middlewares.RandomUserAgentMiddleware': 543,
}
2、设置IP代理池
测试网站:http://icanhazip.com,网站可以返回当前请求的ip地址,以此检验代理ip是否设置成功。
class ProxyMiddleware(object):
def process_request(self, request, spider):
request.meta["proxy"] = "http://" + 代理ip(包括端口)
修改settings.py
# Enable or disable downloader middlewares
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
DOWNLOADER_MIDDLEWARES = {
# 'csdn_xila.middlewares.CsdnXilaDownloaderMiddleware': 543,
'csdn_xila.middlewares.RandomUserAgentMiddleware': 543,
'csdn_xila.middlewares.ProxyMiddleware': 542,
}
这里给大家推荐个开源的免费ip代理池,当然这只是用来学习测试,需求量大的还是去各大代理ip平台买吧!
3、添加referer
default_headers = {
'referer': 'https://www.baidu.com/',
}
欢迎大家留言讨论,三人行,必有我师!