my ongoing fight with bots and spammers
I have had a site up and running on Drupal for a little over a year. It is working great and I am starting to feel like I have a really strong grasp of how Drupal works. One issue that pops up from time to time is bots and spammers. The first time I came across it was when I told the client to look at the live site and see what he thought. I can't tell you how embarrassing it was when he asked why there was links to gay porn in the forums. But that was easy to overcome. I set the forum so that only registered users could post. Not long after that I noticed that accounts were getting created that looked bogus. But I didn't see links in the forum so I ignored it. Well, a few days ago the forums got flooded with spam. I deleted all the spam and changed the settings so that you have to be assigned a Role. I also added CAPTCHA and reCAPTCHA which I am very pleased with. And now I check both the Forums and Log Enteries every day.
My question is this: should I be concerned with page not found entries in the Log Enteries. They all seem to be related to the Calendar. Here are some examples:
warning page not found 06/17/2009 - 8:07am calendar/2002-W43 Anonymous
warning page not found 06/17/2009 - 6:30am calendar/2001-12-30 Anonymous
The first one seems odd because it has "W43" and I don't know what that woudl be. Also, both of these dates were obviously several years ago. I am assuming these are from web crawlers.

=-=
They may be crawlers or they may be humans trying to find ways to expose old vulnerabilities or looking for new vulnerabilies on your site. If drupal is serving 404's to those addresses drupal is doing it's job.
There really isn't anytyhing you can do about this beyond ensuring that your core is always up to date and that your modules are also always up to date.
Also of note: bots aren't the only way to get spammed. There are outfits that pay humans to spam sites by registering with free email accounts and they then sit and wait and spam sparringly hoping not to get noticed. I see this alot here on drupal.org as a site maintainer.
In addition -
In addition to what you're currently doing:
1) Require Administrative approval for all new acounts.
2) Don't acccept accounts from anyone using free email services (Yahoo, MSN, Hotmail, etc).
3) Check out the free Mollom service.
Like I posted in another
Like I posted in another thread: you could try a CAPTCHA safer than image based ones. For example, the Egglue module.
The 2002-W43 is a reference
The 2002-W43 is a reference to week 43 of 2002. Calendar poses a problem with spiders because it generates a potentially infinite number of links to past and future days, weeks, months, etc., so we at least try to control well-behaved spiders with appropriate robots.txt entries.