之前使用phantomjs爬取京东搜索页数据,发现无法爬取后三十条数据,原因是京东数据动态加载的原因,后发现一款.net爬虫神器Puppeteer
上代码,十分简单:
首先引用headless, chrome .net api
//Enabled headless option
var launchOptions = new LaunchOptions { Headless = true };
//Starting headless browser
var browser = await Puppeteer.LaunchAsync(launchOptions);
//New tab page
var page = await browser.NewPageAsync();
//Request URL to get the page
string url;
string key = HttpUtility.UrlEncode("水果");
url = "https://search.jd.com/Search?keyword=" + key + "&enc=utf-8$page=1";
await page.GoToAsync(url);
await page.Keyboard.PressAsync("Space");
await page.Keyboard.PressAsync("Space");
awai