I"m looking at the header produced by a page with a forced error
16:19:30 mike@dev ~$ HEAD http://example.ca/NoURLHere-GoElsewhere
200 OK
Cache-Control: public, max-age=21600
Connection: close
Date: Tue, 27 Nov 2012 21:23:32 GMT
ETag: "1354051412-0"
Server: Apache
Vary: Accept-Encoding
Content-Language: en
Content-Type: text/html; charset=utf-8
Expires: Sun, 19 Nov 1978 05:00:00 GMT
Last-Modified: Tue, 27 Nov 2012 21:23:32 +0000
Client-Date: Tue, 27 Nov 2012 21:23:33 GMT
Client-Peer: 198.72.101.116:80
Client-Response-Num: 1
X-Drupal-Cache: MISS
X-Generator: Drupal 7 (http://drupal.org)
X-Powered-By: PHP/5.3.3
It should clearly produce a "404 Not Found" message so that a tool like Xenu or Google know that it's a 404. It doesn't get that now.
I searched for headers, but unlike Fast404 I didn't see on in the search404_page() function. Adding this didn't seem to help either, but it needs to be added I'm pretty sure:
header('HTTP/1.0 404 Not Found');
Comment | File | Size | Author |
---|---|---|---|
#21 | search404-applied-1852240.patch | 2.19 KB | anish_zyxware |
#15 | 404_code_on_custom_search_pages-1852240-15.patch | 2.19 KB | simonyeldon |
Comments
Comment #1
zyxware CreditAttribution: zyxware commentedI am not able to replicate this. Can you please try on a default installation and check this again?
$ wget http://localhost/~user/search404/drupal7/does-not-exist
--2012-11-29 03:17:18-- http://localhost/~user/search404/drupal7/does-not-exist
Resolving localhost... 127.0.0.1
Connecting to localhost|127.0.0.1|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2012-11-29 03:17:25 ERROR 404: Not Found.
Comment #2
mgiffordI'll send you the full domain, but this is my result:
15:29:35 mike@dev ~$ wget http://example.ca/en/NoFilesHere2
--2012-11-30 15:30:15-- http://example.ca/en/NoFilesHere2
Resolving example.ca... 198.72.101.119
Connecting to example.ca|198.72.101.119|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://example.ca/en/recherche-search/NoFilesHere2 [following]
--2012-11-30 15:30:15-- http://example.ca/en/recherche-search/NoFilesHere2
Connecting to example.ca|198.72.101.119|:80... connected.
HTTP request sent, awaiting response...
200 OK
Length: unspecified [text/html]
Saving to: `NoFilesHere2'
[ <=> ] 31,811 --.-K/s in 0.02s
2012-11-30 15:30:17 (1.62 MB/s) - `NoFilesHere2' saved [31811]
Comment #3
zyxware CreditAttribution: zyxware commented@mgifford - What menu is this - "recherche-search"? There seems to be a redirection to that URL.
Comment #4
mgiffordWe'll look into this and get back to you, thanks!
Comment #5
zyxware CreditAttribution: zyxware commentedWas this problem sorted out?
Comment #6
mgiffordHaven't had a chance, sorry. December can be crazy.
Comment #7
mropanen CreditAttribution: mropanen commentedI have the same problem if the option "Do a "Search" with custom path instead of a Drupal Search when a 404 occurs" is selected. Without it everything works and a 404 header is set.
Comment #8
yang_yi_cn CreditAttribution: yang_yi_cn commentedthe custom page search option seems to be not implemented very well.
Comment #9
zyxware CreditAttribution: zyxware commentedThe custom page option is where the module redirects the request to the custom page currently. This would be a 301/302 redirect. If instead we can execute the menu then it should be able to send 404 status on custom pages. Will have to check how that would work.
Comment #10
zyxware CreditAttribution: zyxware commentedClosing ticket assuming that this issue has been clarified. Please feel free to re-open if you are still facing problems.
Comment #11
Charles BelovReopening. The custom page needs the option to be the 404 page. That is, it looks to humans like it is providing search results (which it is), but it looks to search engines like the page sought does not exist (because it doesn't).
We're not giving the site visitor a relocated page, we're giving them a hopefully useful 404 page. However, we have no guarantee that the search is actually providing useful results (and often, if it involves an old URL that is from a previous, non-Drupal website, it's totally useless). Therefore we don't want search engines to continue sending people to this URL; we want the search engine to remove the bad URL from its index.
What I would request is that
Use a 301 Redirect instead of 302 Redirect
get a companion
Use a 404 Not Found instead of 302 Redirect
Only workaround right now with custom page is to also check Disable Auto Search. Then the 404 will be sent.
Comment #12
GaëlGThis seems to work, at least in my use case (Custom search path:
recherche?search_api_views_fulltext=@keys
).Comment #13
simonyeldon CreditAttribution: simonyeldon commentedApplied the patch in comment 12 and it works perfectly, thanks very much.
Comment #14
Vincenzo CreditAttribution: Vincenzo commentedI agree with Charles Belov in comment #11.
Indeed, I even think that the 404 code should always be returned. However, I am happy enough if the new settings gets merged in.
We had to apply patch #12 to a production platform serving 90 sites.
Comment #15
simonyeldon CreditAttribution: simonyeldon commentedThe patch in #12 didnt work as well as I thought, it ended up in us getting a double page outputted.
I have made a slight modification to the patch to prevent this from happening, please feel free to review.
Comment #16
simonyeldon CreditAttribution: simonyeldon commentedComment #17
mrded CreditAttribution: mrded commentedComment #18
mrded CreditAttribution: mrded commentedComment #19
Vincenzo CreditAttribution: Vincenzo commentedPatch #15 has been used on our 100+ sites for over a year now.
I guess that makes it "reviewed and tested".
Comment #21
anish_zyxware CreditAttribution: anish_zyxware at Zyxware Technologies commentedThe issue is fixed and available on 7.x-1.x branch. Final patch is attached.
Will be included in next release (which would be happening soon).