Download & Extend

Which are the possible/known User-Agent substrings for search robots?

Project:Calais Marmoset
Version:6.x-1.0
Component:Documentation
Category:support request
Priority:normal
Assigned:Unassigned
Status:active

Issue Summary

Yahoo!'s Slurp comes out of the box. But which other User-Agents can we use? Doesn't Google look for RDF data? I mean, Yahoo only visits my site once a week. Google all the time, hundreds of pages a day. Could use a little help here.

Comments

#1

Same question, subscribing.

#2

Following up on this, I eventually found a good list from the FireFox User Agent Switcher tool,

In fact this XML:
http://techpatterns.com/downloads/firefox/useragentswitcher.xml

See the section Spiders - Search,

<folder description="Spiders - Search">
<useragent description="Ask Jeeves/Teoma" useragent="Mozilla/2.0 (compatible; Ask Jeeves/Teoma)" appcodename="" appname="" appversion="" platform="" vendor="" vendorsub=""/>
<useragent description="Baiduspider+" useragent="Baiduspider+(+http://www.baidu.com/search/spider.htm)" appcodename="" appname="" appversion="" platform="" vendor="" vendorsub=""/>
<useragent description="FAST-WebCrawler 3.8" useragent="FAST-WebCrawler/3.8 (crawler at trd dot overture dot com; http://www.alltheweb.com/help/webmaster/crawler)" appcodename="" appname="" appversion="" platform="" vendor="" vendorsub=""/>
<useragent description="Googlebot 2.1 (New version)" useragent="Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" appcodename="" appname="" appversion="" platform="" vendor="" vendorsub=""/>
<useragent description="Googlebot 2.1 (Older Version)" useragent="Googlebot/2.1 (+http://www.googlebot.com/bot.html)" appcodename="" appname="" appversion="" platform="" vendor="" vendorsub=""/>
<useragent description="Msnbot 1.0 (current version)" useragent="msnbot/1.0 (+http://search.msn.com/msnbot.htm)" appcodename="" appname="" appversion="" platform="" vendor="" vendorsub=""/>
<useragent description="Msnbot 0.11 (beta version)" useragent="msnbot/0.11 (+http://search.msn.com/msnbot.htm)" appcodename="" appname="" appversion="" platform="" vendor="" vendorsub=""/>
<useragent description="Yahoo Slurp" useragent="Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" appcodename="" appname="" appversion="" platform="" vendor="" vendorsub=""/>
</folder>

<folder description="Miscellaneous">

<folder description="Bots - Spiders">
<useragent description="Email Wolf" useragent="EmailWolf 1.00" appcodename="" appname="" appversion="" platform="" vendor="" vendorsub=""/>
<useragent description="Gaisbot 3.0" useragent="Gaisbot/3.0+(robot@gais.cs.ccu.edu.tw;+http://gais.cs.ccu.edu.tw/robot.php)" appcodename="" appname="" appversion="" platform="" vendor="" vendorsub=""/>
<useragent description="gulperbot" useragent="Gulper Web Bot 0.2.4 (www.ecsl.cs.sunysb.edu/~maxim/cgi-bin/Link/GulperBot)" appcodename="" appname="" appversion="" platform="" vendor="" vendorsub=""/>
</folder>

<folder description="Browsers - Beos">
<useragent description="Net Positive 2.1" useragent="Mozilla/3.0 (compatible; NetPositive/2.1.1; BeOS)" appcodename="" appname="" appversion="" platform="" vendor="" vendorsub=""/>
<useragent description="SeaMonkey 1.5a" useragent="Mozilla/5.0 (BeOS; U; BeOS BePC; en-US; rv:1.9a1) Gecko/20060702 SeaMonkey/1.5a" appcodename="" appname="" appversion="" platform="" vendor="" vendorsub=""/>
</folder>

#3

As per this list, would it be good to put this in the Calais Marmoset's User Agent Substring List?

Ask Jeeves/Teoma
Baiduspider
Gaisbot
Googlebot
gulperbot
Msnbot
WebCrawler
Yahoo Slurp

#4

J0nathan, any luck with this list?

#5

WildBill, I was not able to make that module work because of other issues.