既然自己要学习 Chilkat,那就接着写他的东西吧;
好了,开始吧!
首先你要学习这篇内容你必须了解python语法,python很简单,但是做的事不简单,这也是我学习他的原因;还有你必学安装 Chilkat,具体细节去看我的
Getting Started Spidering a Site使用Chilkat(python)练习的一个爬虫(from :http://www.example-code.com)
http://blog.youkuaiyun.com/Xiao_Qiang_/archive/2008/08/23/2820293.aspx一、源码
- from extra import chilkat
- # The Chilkat Spider component/library is free.
- spider = chilkat.CkSpider()
- # The spider object crawls a single web site at a time. As you'll see
- # in later examples, you can collect outbound links and use them to
- # crawl the web. For now, we'll simply spider 10 pages of chilkatsoft.com
- spider.Initialize("http://www.vtchina.com/")
- # Add the 1st URL:
- spider.AddUnspidered("http://www.vtchina.com/")
- # Begin crawling the site by calling CrawlNext repeatedly.
- for i in range(0,10):
- success = spider.CrawlNext()
- if (success == True):
- # Show the URL of the page just spidered.
- print spider.lastUrl()
- # The HTML META keywords, title, and description are available in these properties:
- print spider.lastHtmlTitle()
- info = spider.lastHtmlDescription()
- HtmlDescription = unicode(info,"utf-8")
- print HtmlDescription
- print spider.lastHtmlKeywords()
- # The HTML is available in the LastHtml property
- else:
- # Did we get an error or are there no more URLs to crawl?
- if (spider.get_NumUnspidered() == 0):
- print "No more URLs to spider"
- else:
- print spider.lastErrorText()
- # Sleep 1 second before spidering the next URL.
- spider.SleepMs(1000)
- def lastHtmlTitle(*args):
- utfchar = _chilkat.CkSpider_lastHtmlTitle(*args)
- info = unicode(utfchar,"utf-8")
- return info
由于是很入门的例子,代码没啥具体可说的,就是取页面title的功能。