We've been using Drupal for a few major versions and have upgraded to the 6.x line. Recently someone noticed that a search result was returning preview text that did not appear to match the page.
If I search for "Which CMS is best", the search results say:
Join BARC
... Club App 2008.pdf 117.27 KB why not cms Which CMS is best?...custom or ? order buy prilosec buy buy nexium ...
Story - kd7yko - 2008-12-18 11:42 - 44 comments - 8 attachments
If you go to the "Join BARC" story, there are no comments, nor anything about CMS, priosec or nexium. Different key words pull up even more unsavory terms linked with that page. I have tried clearing the cache from the site admin pages in case the search results were cached from some spammer post years ago, but the same results come back. I don't understand why this says there are 44 comments when the whole site has a fraction of that. I apologize if this is a known issue. I was not able to find it using the drupal.org search box or Google.
Comments
Garbage in, garbage out: Old junk in comments.
I tracked down the offensive text to a set of posts by two users in the comments table. The users were deleted a very long time ago and I thought the delete message said their posts & etc would be deleted. As far as I could see from admin/content/comment and admin/content/comment/approval those posts were gone.
Selecting a count of the comments table I found 60 records, when there are only 16 listed in admin/content/comment. Selecting uid from comments where uid not in users returned 44 rows. After deleting from comments where uid in (set of deleted users), resetting and rebuilding (via cron.php) the search index, the bad results are gone.
When a user is deleted, the
When a user is deleted, the comments module simply changes the uid for the comment to 0, anonymous.
See http://api.drupal.org/api/function/comment_user/6
So, yes, you can end up with unwanted content lying around.