browser = webdriver.Firefox() # Get local session of Firefox
browser.get("www.baidu.com") # Load page
我们需要爬取的信息在一般的静态网页中,是直接写在源代码里面的。我们可以方便使用正则表达式抓取,比如:
rr.firstInit({"data":[{"author":"袁理,翟堃","change":"首次","companyCode":"80116848","datetime":"2016-01- 28T08:13:29","infoCode":"APPH2FEzZ2tFASearchReport","insCode":"80000031","insName":"东吴证券","insStar":"3","jlrs": ["206000000","259000000","352000000","",""],"rate":"增持","secuFullCode":"002322.SZ","secuName":"理工监测","sratingName":"增持","sy":"","syls":["24.4","19.37","14.19","",""],"sys":["0.5","0.63","0.86","",""],"title":"业绩有望筑底,收购整合加 速","profitYear":"2014","type":"1","newPrice":"16.17"},
但是对于js生成的动态页面,就需要我们模拟浏览器的行为加载页面,再爬取: