Here's the list of the User-Agent fields that we know are used by robots (such as web crawlers). There are currently three entries ending with an asterisk, which indicates that the name is a prefix. This prefix is used by many User-Agent's, and all of them are robots. For example, there are many different versions of the "FAST-WebCrawler" agent.
| User-Agent: " | Email Collector |
|---|---|
| "DIIbot/1.2 http://www.findsame.com/robot.html" |
| User-Agent: A | Email Collector |
|---|---|
| AcoiRobot/1.0 libwww/5.3.2 | |
| Acoon Robot v1.01 (www.acoon.de) | |
| AgentName/0.1 libwww-perl/5.50 | |
| AlkalineBOT/1.4 (1.4.0326.0 RTM) | |
| AltaVista Intranet* | |
| AnzwersCrawl/2.0 (anzwerscrawl@anzwers.com.au; http://faq.anzwers.com.au/anzwerscrawl.html) | |
| appie/1.1 | |
| Arachnoidea (arachnoidea@euroseek.com) | |
| ArchitextSpider | |
| asterias/2.0 | |
| Autonomy Spider |
| User-Agent: B | Email Collector |
|---|---|
| BaiDuSpider | |
| beholder (www.vigiltech.com/esensedisclaim.html) | |
| BlogBot/1.1 | |
| bumblebee@relevare.com |
| User-Agent: C | Email Collector |
|---|---|
| CherryPickerElite/1.0 | |
| CherryPickerSE/1.0 | |
| cosmos/0.7_(mihai.preda@xyleme.com) | |
| cosmos/0.8_(robot@xyleme.com) | |
| Crawl_Application | |
| Crescent Internet ToolPak HTTP OLE Control v.1.0 |
| User-Agent: D | Email Collector |
|---|---|
| daypopbot/0.2 | |
| DIIbot/1.2 http://www.findsame.com/robot.html |
| User-Agent: E | Email Collector |
|---|---|
| EmailCollector/1.0 | |
| EmailSiphon | |
| EmailWolf 1.00 | |
| Explorer www@openxxx.net | |
| ExtractorPro |
| User-Agent: F | Email Collector |
|---|---|
| FAST-WebCrawler/* | |
| fastlwspider/1.0 | |
| fido/1.0 Harvest/1.4.pl2 |
| User-Agent: G | Email Collector |
|---|---|
| GAIS Robot/1.0B2 | |
| gazz/2.1 (gazz@nttrd.com) | |
| gigabaz/3.14 (baz@gigabaz.com; http://gigabaz.com/gigabaz/) | |
| Googlebot-for-IDG/2.1 (+http://www.googlebot.com/bot.html) | |
| Googlebot/2.1 (+http://googlebot.com/bot.html) | |
| Googlebot/2.1 (+http://www.googlebot.com/bot.html) | |
| Gulliver/1.3 | |
| Gulper Web Bot 0.2.4 (www.ecsl.cs.sunysb.edu/~maxim/cgi-bin/Link/GulperBot) |
| User-Agent: H | Email Collector |
|---|---|
| hcat/1.0 | |
| HLoader | |
| HomePageSearch(hpsearch.uni-trier.de) | |
| htdig/3.1.2 (webmaster@box.sk) | |
| htdig/3.1.5 (unconfigured@htdig.searchengine.maintainer) | |
| http://pawel.dyndns.org/ pol76@wanadoo.fr | |
| httpGlooton/1.0 |
| User-Agent: I | Email Collector |
|---|---|
| ia_archiver | |
| IncyWincy data gatherer(webmaster@loopimprovements.com,http://www.loopimprovements.com/robot.html) | |
| InfoNaviRobot(F107) | |
| Infoseek Sidewinder/0.9 | |
| Inktomi Search | |
| InternetAmi IOR/0.2 (http://internetami.se/ior.html) | |
| InternetLinkAgent/3.1 | |
| InternetSeer.com |
| User-Agent: J | Email Collector |
|---|---|
| JennyBot/0.1 | |
| jokescan/0.08 |
| User-Agent: K | Email Collector |
|---|---|
| Kenjin Spider | |
| KIT-Fireball/2.0 | |
| KIT_Fireball/2.0 | |
| kobot/5.2.8 libwww/5.2.8 |
| User-Agent: L | Email Collector |
|---|---|
| larbin_1.2.2 larbin1.2.2@somewhere.com | |
| larbin_1.2.2 nobody@nowhere | |
| larbin_2.1.0 larbin2.1.0@somewhere.com | |
| larbin_2.1.1 larbin2.1.1@somewhere.com | |
| larbin_devel sebastien.ailleret@inria.fr | |
| larbinDeViennot Laurent.Viennot@inria.fr | |
| LEIA/3.01pr (LEIAcrawler; leia@gseek.com; http://www.gseek.com) | |
| LexiBot/1.00 | |
| libWeb/clsHTTP -- hiongun@kt.co.kr | |
| Linkbot | |
| LinkLint-checkonly/2.1 | |
| Links (0.95; NetBSD 1.5U i386) | |
| LinkWalker | |
| LNSpiderguy | |
| lwp-trivial/1.27 | |
| Lycos_Spider_(modspider) | |
| Lycos_Spider_(T-Rex) | |
| Lycos_Spider_(T-Rex)/3.0 |
| User-Agent: M | Email Collector |
|---|---|
| MARS SV | |
| Mata Hari/2.00 | |
| MedicalMatrix | |
| Mercator-1.0 | |
| Mercator-1.1 | |
| Microsoft URL Control - 6.00.8169 | |
| MIIxpc/4.2 | |
| moget/2.1 (moget@goo.ne.jp) | |
| Mozilla/2.0 (compatible; Ask Jeeves) | |
| Mozilla/2.0 (compatible; EZResult -- Internet Search Engine) | |
| Mozilla/2.0 (compatible; NEWT ActiveX; Win32) | |
| Mozilla/2.0 (compatible; T-H-U-N-D-E-R-S-T-O-N-E) | |
| Mozilla/3.0 (compatible; Indy Library) | |
| Mozilla/3.0 (compatible; MuscatFerret/1.4.1; olly@muscat.co.uk) | |
| Mozilla/3.0 (compatible; MuscatFerret/1.5.2; olly@muscat.co.uk) | |
| Mozilla/3.0 (compatible; MuscatFerret/1.5.3; olly@muscat.co.uk) | |
| Mozilla/3.0 (Slurp/cat; slurp@inktomi.com; http://www.inktomi.com/slurp.html) | |
| Mozilla/3.0 (Slurp/si; slurp@inktomi.com; http://www.inktomi.com/slurp.html) | |
| Mozilla/4.0 (compatible; FastCrawler3, support-fastcrawler3@fast.no) | |
| Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 4.0 Robot) Microsoft | |
| Mozilla/4.0 (compatible; MuscatFerret/2.0; http://www.webtop.com/) | |
| Mozilla/4.0 compatible ZyBorg/1.0 (ZyBorg@WISEnutbot.com; http://www.WISEnutbot.com) | |
| Mozilla/4.04 [de] (Win95; I ;Kolibri gncwebbot) | |
| Mozilla/4.04 [de] (Win95; I ;Nav; Kolibri gncwebbot) | |
| Mozilla/4.1 | |
| MuscatFerret |
| User-Agent: N | Email Collector |
|---|---|
| NationalDirectory-WebSpider/1.3 |
| User-Agent: O | Email Collector |
|---|---|
| Openfind data gatherer, Openbot/3.0+(robot-response@openfind.com.tw;+http://www.openfind.com.tw/robot.html) | |
| Openfind piranha,Shark/0.95h+(tjsheu@gais.cs.ccu.edu.tw;+http://www.openfind.com/) |
| User-Agent: P | Email Collector |
|---|---|
| PlantyNet_WebRobot_V1.9 dhkang@plantynet.com | |
| PM WebSpyder V1.0 | |
| polybot 1.0 | |
| psbot/0.1 (+http://www.picsearch.com/bot.html) | |
| psbot/0.1 (+http://www.picsearch.org/bot.html) |
| User-Agent: Q | Email Collector |
|---|---|
| QUOSA |
| User-Agent: R | Email Collector |
|---|---|
| RaBot/1.0 Agent-admin/webmaster@krivet2.or.kr | |
| RealNamesBot/2.7 | |
| Robozilla/1.0 |
| User-Agent: S | Email Collector |
|---|---|
| Scooter-3.0.d1 | |
| scooter-3.0.DY | |
| Scooter-3.0.EU | |
| Scooter-3.0.FS | |
| Scooter-3.0.g36-47 | |
| Scooter-3.0.g60-71 | |
| Scooter-3.0.g72-83 | |
| Scooter-3.0.g84-95 | |
| Scooter-3.0.s1 | |
| Scooter-W3-1.0 | |
| Scooter/1.0 | |
| Scooter/1.0 scooter@pa.dec.com | |
| Scooter/1.1 (custom) | |
| Scooter/2.0 G.R.A.B. V1.1.0 | |
| Scooter/2.0 G.R.A.B. X2.0 | |
| search.at V1.2 | |
| sexsearcher | |
| Slurp.so/1.0 (slurp@inktomi.com; http://www.inktomi.com/slurp.html) | |
| Slurp/2.0-condor_daily_west (slurp@inktomi.com; http://www.inktomi.com/slurp.html) | |
| Slurp/2.0-condor_hourly (slurp@inktomi.com; http://www.inktomi.com/slurp.html) | |
| Slurp/2.0-Redtail (slurp@inktomi.com; http://www.inktomi.com/slurp.html) | |
| Slurp/cat (slurp@inktomi.com; http://www.inktomi.com/slurp.html) | |
| Slurp/si (slurp@inktomi.com; http://www.inktomi.com/slurp.html) | |
| Slurp/si-emb (slurp@inktomi.com; http://www.inktomi.com/slurp.html) | |
| SlySearch (fitzboy@bmrc.berkeley.edu) | |
| SlySearch (slysearch@slysearch.com) | |
| SlySearch fitzboy@bmrc.berkeley.edu | |
| SlySearch slysearch@slysearch.com | |
| SOFTWING_TEAR_AGENT_1_0 | |
| SpiderMan | |
| Sqworm/2.9.43-BETA | |
| Surfnomore Spider v1.1 | |
| SurfWatchSpider/2.0 | |
| SwishSpider | |
| SwissSearch V1.2 |
| User-Agent: T | Email Collector |
|---|---|
| The Informant | |
| tivraSpider/1.0 | |
| tivraSpider/1.0 (crawler@tivra.com) | |
| TridentSpider3 | |
| True_Robot/1.0 libwww/5.2.8 | |
| TV33_Mercator_1-1.0 |
| User-Agent: U | Email Collector |
|---|---|
| Ultraseek | |
| URL Checker |
| User-Agent: V | Email Collector |
|---|---|
| VbTcpQuery sample (http://www.hexillion.com) | |
| vischeck_spiderBot/0.1libwww-perl/5.48 | |
| vspider | |
| vspider/3.6 |
| User-Agent: W | Email Collector |
|---|---|
| WebBandit/2.1 | |
| WebBandit/3.50 | |
| Webbandit/4.00.0 | |
| WebCrawler-AddURL/2.0 | |
| WebCrawler/3.0 Robot libwww/5.0a | |
| WebTrends Link Analyzer | |
| WiseDoc-Robot (Google WAP Proxy/1.0) | |
| WiseWire | |
| WiseWire* |
| User-Agent: X | Email Collector |
|---|---|
| x-spyonit/1.1 | |
| Xenu's Link Sleuth 1.1a | |
| xyro_(xcrawler@cosmos.inria.fr) |
| User-Agent: Z | Email Collector |
|---|---|
| ZyBorg/1.0 (ZyBorg@WISEnutbot.com; http://www.WISEnutbot.com) |
本文详细列举了常用的爬虫机器人及其特征标识,包括EmailCollector、FAST-WebCrawler等,帮助开发者理解并识别网络爬虫行为。

13万+

被折叠的 条评论
为什么被折叠?



