本程序爬取指定的网页图片 然后上传到程序修改桌面背景。修改桌面背景的代码可以在这下载:https://download.youkuaiyun.com/download/qq_35319925/10416245。(此文章限对于新手和专研C#爬虫技术的人。写的不好,请各位包涵。)
原理:把下载的网页html转为xml格式然后利用xpath的正则表达式爬取需要的数据最后把爬取的数据显示出来。
代码:
try
{
//爬取网页图片代码:
listView1.Items.Clear();
HtmlWeb webClient = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = webClient.Load(this.textBox1.Text); //下载指定路径的html
//这段运行是ok 注释:防止htmlAgilitypack输出乱码
//string htmlurl = zhkj.httpdown.openweb(this.textBox1.Text, "", "utf-8");
//HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
//FileStream fs = new FileStream(Application.StartupPath + "//outhtml.html", FileMode.Create, FileAccess.Write);
//StreamWriter sw = new StreamWriter(fs, Encoding.Default);
//sw.Write(htmlurl);
//sw.Close();
//fs.Close();
//doc.Load(Application.StartupPath + "//outhtml.html");
div[@class='lb_box']/dl 输出图片名、日期等其他信息
//HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//div[@class='lb_box']/dl/dd");
//foreach (HtmlNode item in nodes)
//{
// Console.WriteLine(item.InnerText.ToString());
//}
//图片网站路径: http://pic.yesky.com/c/6_20771_1.shtml //爬取的xpath: //div[@class='lb_box']/dl//img
List<string> listalt = new List<string>();
List<string> listhref = new List<string>();
HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//li[@class='photo-list-padding']//a");
//爬取到数据
foreach (HtmlNode item in nodes)
{
listhref.Add(item.Attributes["href"].Value);
}
nodes = doc.DocumentNode.SelectNodes("//li[@class='photo-list-padding']//a//img");
foreach (HtmlNode item in nodes)
{
listalt.Add(item.Attributes["alt"].Value);
}
显示数据
for (int i = 0; i < listalt.Count; i++)
{
ListViewItem listViewItem = new ListViewItem();
listViewItem.SubItems[0].Text =listalt[i].ToString();
listViewItem.SubItems.Add(listhref[i].ToString());
listView1.Items.Add(listViewItem);
}
}
catch
{
}
若在网上下载HtmlAgilityPack插件时会导致下过来的html中文乱码,这是因为HtmlAgilityPack没有转码为Encoding.default导致的,如果不知道怎样修改HtmlAgilityPack插件可以选择使用注释内容的代码。或者可以在这个里面下载HtmlAgilityPack.dll https://download.youkuaiyun.com/download/qq_35319925/10688398
完成上面的差不多就完成了整个程序,下面的是上一页和下一页的代码:
//上一页
if (pagenumber == 1)
{
return;
}
pagenumber--;
if (urltypename != null)
this.textBox1.Text = "http://desk.zol.com.cn/" + urltypename + "/1920x1080/" + pagenumber + ".html";
else
this.textBox1.Text = "http://desk.zol.com.cn/fengjing/1920x1080/" + pagenumber + ".html";
button1_Click(null,null);
//下一页代码
pagenumber++;
if (urltypename != null)
this.textBox1.Text = "http://desk.zol.com.cn/" + urltypename + "/1920x1080/" + pagenumber + ".html";
else
this.textBox1.Text = "http://desk.zol.com.cn/fengjing/1920x1080/" + pagenumber + ".html";
button1_Click(null, null);