Posted by morningtime on August 18, 2009 at 10:09am
3 followers
Jump to:
| Project: | Calais Marmoset |
| Version: | 6.x-1.0 |
| Component: | Documentation |
| Category: | support request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | active |
Issue Summary
Yahoo!'s Slurp comes out of the box. But which other User-Agents can we use? Doesn't Google look for RDF data? I mean, Yahoo only visits my site once a week. Google all the time, hundreds of pages a day. Could use a little help here.
Comments
#1
Same question, subscribing.
#2
Following up on this, I eventually found a good list from the FireFox User Agent Switcher tool,
In fact this XML:
http://techpatterns.com/downloads/firefox/useragentswitcher.xml
See the section Spiders - Search,
<folder description="Spiders - Search"><useragent description="Ask Jeeves/Teoma" useragent="Mozilla/2.0 (compatible; Ask Jeeves/Teoma)" appcodename="" appname="" appversion="" platform="" vendor="" vendorsub=""/>
<useragent description="Baiduspider+" useragent="Baiduspider+(+http://www.baidu.com/search/spider.htm)" appcodename="" appname="" appversion="" platform="" vendor="" vendorsub=""/>
<useragent description="FAST-WebCrawler 3.8" useragent="FAST-WebCrawler/3.8 (crawler at trd dot overture dot com; http://www.alltheweb.com/help/webmaster/crawler)" appcodename="" appname="" appversion="" platform="" vendor="" vendorsub=""/>
<useragent description="Googlebot 2.1 (New version)" useragent="Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" appcodename="" appname="" appversion="" platform="" vendor="" vendorsub=""/>
<useragent description="Googlebot 2.1 (Older Version)" useragent="Googlebot/2.1 (+http://www.googlebot.com/bot.html)" appcodename="" appname="" appversion="" platform="" vendor="" vendorsub=""/>
<useragent description="Msnbot 1.0 (current version)" useragent="msnbot/1.0 (+http://search.msn.com/msnbot.htm)" appcodename="" appname="" appversion="" platform="" vendor="" vendorsub=""/>
<useragent description="Msnbot 0.11 (beta version)" useragent="msnbot/0.11 (+http://search.msn.com/msnbot.htm)" appcodename="" appname="" appversion="" platform="" vendor="" vendorsub=""/>
<useragent description="Yahoo Slurp" useragent="Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" appcodename="" appname="" appversion="" platform="" vendor="" vendorsub=""/>
</folder>
−
<folder description="Miscellaneous">
−
<folder description="Bots - Spiders">
<useragent description="Email Wolf" useragent="EmailWolf 1.00" appcodename="" appname="" appversion="" platform="" vendor="" vendorsub=""/>
<useragent description="Gaisbot 3.0" useragent="Gaisbot/3.0+(robot@gais.cs.ccu.edu.tw;+http://gais.cs.ccu.edu.tw/robot.php)" appcodename="" appname="" appversion="" platform="" vendor="" vendorsub=""/>
<useragent description="gulperbot" useragent="Gulper Web Bot 0.2.4 (www.ecsl.cs.sunysb.edu/~maxim/cgi-bin/Link/GulperBot)" appcodename="" appname="" appversion="" platform="" vendor="" vendorsub=""/>
</folder>
−
<folder description="Browsers - Beos">
<useragent description="Net Positive 2.1" useragent="Mozilla/3.0 (compatible; NetPositive/2.1.1; BeOS)" appcodename="" appname="" appversion="" platform="" vendor="" vendorsub=""/>
<useragent description="SeaMonkey 1.5a" useragent="Mozilla/5.0 (BeOS; U; BeOS BePC; en-US; rv:1.9a1) Gecko/20060702 SeaMonkey/1.5a" appcodename="" appname="" appversion="" platform="" vendor="" vendorsub=""/>
</folder>
#3
As per this list, would it be good to put this in the Calais Marmoset's User Agent Substring List?
Ask Jeeves/TeomaBaiduspider
Gaisbot
Googlebot
gulperbot
Msnbot
WebCrawler
Yahoo Slurp
#4
J0nathan, any luck with this list?
#5
WildBill, I was not able to make that module work because of other issues.