Through my own configuration mistake, I let a private list of names in my Drupal site become public to the spiders, and some of the names are now in Google's cache. I've deleted the nodes, but they are living on int he cache. Some of the people, who happen to have somewhat unique names, are now finding that the very first link in Google search on their name brings up the record from my site. Said folks are not happy campers, i.e. mad as H**L.

I've learned that I can get the records purged from Google fairly quickly, but only if the URL throws a 404.
see http://lifehacker.com/software/google/deleting-things-from-googles-cache...

I've read gobs of links in these Drupal Forums, but so far, it looks like they only deal with URLs for files, not Drupal nodes.

Is there a way to make a request for a deleted Drupal node throw a 404 through Apache instead of internal to Drupal?

Many thanks!
Tom

Comments

michelle’s picture

They are real 404 errors. When I delete nodes and don't take the time to redirect them, they show up as 404 in Google's webmaster tools just like they should.

Michelle

--------------------------------------
See my Drupal articles and tutorials or come check out life in the Coulee Region.

jraper@groups.drupal.org’s picture

If so, I believe it doesn't return a 404 when you call a non-existent URL. You would have to turn it off until everything settles out.

doubtintom’s picture

Thanks Michelle and jraper, both comments helped my thought process.

What I've realized is that the originally exposed pages that I now want to expurgate from the Google cache are now throwing HTML 403 - Forbidden. That is because I had fixed the access mistake that was the original reason the sensitive info got into the claws of G's robot.

I don't really want to open that access again, because more private info will go into the cache again. Vicious circle.

Some of the pages are nodes, so I have tried using Redirect gone /path/to/page in .htaccess. Works great.

But other of the private pages are View URLs or other "per page" URLs that are linked to specific important functionality in the site. :-(

Well, maybe I can turn off that functionality for a couple of days while "Redirect gone" and Google's Webmaster Tools do their work. I'll try that.

Whew, a lot of work to clean up a "one checkbox" access permission error.