http://computer.howstuffworks.com/internet/basics/search-engine1.htm How Internet Search Engines Work
http://www.wordtracker.com/academy/learn-seo/technical-guides/google-spider-crawling The Google spider & you: What you need to know to get your site indexed
http://www.lunametrics.com/blog/2014/08/07/bot-spider-filtering-google-analytics/ Understanding Bot and Spider Filtering from Google Analytics
http://searchenginewatch.com/sew/news/2067357/bye-bye-crawler-blocking-parasites Bye-bye, Crawler: Blocking the Parasites
http://www.google.com/insidesearch/howsearchworks/crawling-indexing.html 谷歌教学文档
http://www.htmlbasictutor.ca/web-crawler-search-engine.htm Web Crawler - Search Engine Robots - Search Engine Spiders
http://www.htmlbasictutor.ca/search-engine-read-web-pages.htm How Search Engines Read Web Pages
http://www.htmlbasictutor.ca/search-engine-submission.htm Search Engine Submissions
http://www.htmlbasictutor.ca/search-engine-web-content.htm Web Page Content Search Engines See
http://www.wisegeek.org/what-is-a-web-crawler.htm What is a Web Crawler?
http://ruby.bastardsbook.com/chapters/web-crawling/ Writing a Web Crawler
http://www.gotomanage.com/help/about/about_crawler About the GoToAssist® Open Source Crawler
网上可以找到honeyspider lanspider等爬虫
http://monstercrawler.com/ 爬虫网站
http://socscibot.wlv.ac.uk/ Free SocSciBot
https://github.com/yasserg/crawler4j Open Source Web Crawler for Java
本文探讨了互联网搜索引擎的工作原理,详细介绍了Google蜘蛛的工作机制及其对于网站索引的重要性。此外,还讨论了如何过滤爬虫及避免恶意爬虫的影响,并提供了多个关于爬虫技术的资源链接。
1497

被折叠的 条评论
为什么被折叠?



