Problem/Motivation

This is a blocker for #2352155: Remove HtmlFragment/HtmlPage.

\Drupal\Core\EventSubscriber\DefaultExceptionHtmlSubscriber hardcodes responses, circumventing the routing system. But it does use @html_fragment_renderer and @html_page_renderer. Whereas it could also just be implemented exactly like \Drupal\Core\EventSubscriber\CustomExceptionHtmlSubscriber, which would make the system easier to understand.

#2352155: Remove HtmlFragment/HtmlPage is removing @html_fragment_renderer and @html_page_renderer, so it'd be great if those services wouldn't be used anymore, since it'd be one step towards their removal.

But, most importantly, this is the only case where we're not rendering "a regular page", but where we do want blocks to be rendered. Hence it eminently makes sense to not duplicate the regular page rendering pipeline just for 403/404 pages, but to just use the routing system, which does use the regular rendering pipeline automatically.

Proposed resolution

Changed \Drupal\Core\EventSubscriber\DefaultExceptionHtmlSubscriber: it now works like \Drupal\Core\EventSubscriber\CustomExceptionHtmlSubscriber already does in HEAD: rather than hardcoding responses, it now performs a sub-request to newly added system.403 and system.404 routes.

Hence \Drupal\Core\EventSubscriber\CustomExceptionHtmlSubscriber now is almost empty; it only needs to subclass \Drupal\Core\EventSubscriber\DefaultExceptionHtmlSubscriber.

This brings more consistency to both exception handling and HTML page rendering: that then leaves only a handful of routes/responses operating in limited environments (the complete list: install.php/update.php/authorize.php/PHP code exceptions (DefaultExceptionSubscriber)/maintenance mode (MaintenanceModeSubscriber)), which is conceptually simpler to understand.

Finally, this removes the 405 handling in \Drupal\Core\EventSubscriber\DefaultExceptionHtmlSubscriber — a 405 is extremely rarely needed, there's no need to render blocks in this case, and it's not handled by \Drupal\Core\EventSubscriber\CustomExceptionHtmlSubscriber, so let's go for simplicity and consistency, and just have both the default and the custom exception HTML subscriber handle 403/404 only.

See the [HTTP-STATUS] section in the parent issue.

Remaining tasks

Review.

User interface changes

None.

API changes

None.

Comments

wim leers’s picture

Status: Active » Needs review
Parent issue: » #2352155: Remove HtmlFragment/HtmlPage
StatusFileSize
new15.71 KB

Status: Needs review » Needs work

The last submitted patch, 1: 403_404_routes-2362517-1.patch, failed testing.

fabianx’s picture

Not reviewed, but a question:

How do modules / contrib override the 403 / 404 page now?

Implement a larger priority event handler?

Or override the route / path?

dawehner’s picture

\Drupal\Core\EventSubscriber\DefaultExceptionHtmlSubscriber hardcodes responses, circumventing the routing system

This perse is not a problem if you ask me. Having the exception being decoupled from routing is a nice thing. Please try to make a point why circumventing the routing system is a bad thing.

@Fabianx
At least in HEAD you can easily subscribe to that exception event and do whatever you want.

But, most importantly, this is the only case where we're not rendering "a regular page", but where we do want blocks to be rendered. Hence it eminently makes sense to not duplicate the regular page rendering pipeline just for 403/404 pages, but to just use the routing system, which does use the regular rendering pipeline automatically.

Is there a reason why we can't just have the page rendering pipeline be not coupled to the routing system?

Finally, this removes the 405 handling in \Drupal\Core\EventSubscriber\DefaultExceptionHtmlSubscriber — a 405 is extremely rarely needed, there's no need to render blocks in this case, and it's not handled by \Drupal\Core\EventSubscriber\CustomExceptionHtmlSubscriber, so let's go for simplicity and consistency, and just have both the default and the custom exception HTML subscriber handle 403/404 only.

I am actually quite sure that in a REST world a 405 is quite common, especially if you deal with POST/GET/PUSH/PATCH/DELETE etc.

Note: This also makes 403/404 pages potentially slower to execute and more brittle against problems in the routing system.

Crell’s picture

I am inclined to agree with dawehner.

1) Skipping 405 for HTML responses is one thing; They probably won't happen much given the way browsers work, although since we now can have routes that are strictly GET or strictly POST they are more likely to happen then they used to. However, not supporting 405 for non-HTML requests is absolutely not OK. That's a required part of a RESTful response mechanism.

2) Keeping page building decoupled from routing is a benefit. If the only way to render a page-like response is to go through the router then we've screwed up. Let's not screw up.

3) Going back through routing a second time does introduce a performance hit. I don't know exactly how much but making 404s faster to avoid inadvertent DDOS attacks has been discussed before many times throughout the years so I would not want to slow that down more than we have to.

4) The whole point of the exception listeners is to generate responses themselves. There's no reason they have to go back through routing to do so. So I don't see why there's a problem here to be solved, other than swapping out the fragment/page renderer services with whatever their render API equivalents are. (If render API doesn't have equivalents that are independent of routing, that's a flaw in render API that should be addressed.)

fabianx’s picture

Issue tags: +needs profiling

There is two use cases here that I see:

a) Fast_404 - We want to generate a 404 as fast as possible. The layout does not matter.
b) Nice 404, we want to give a 404 page, but with complete menus and blocks.

The same for 403s, through there is even more cases with:

a) 403 when logged out
b) 403 when logged in

x CartesianProduct of the two cases before.

In D7 we had neither (for a long time), there have been numerous reports of "menus" (and partially blocks) missing on the 404 page with numerous contrib modules like r4032login and 404_page, etc. and fast_404() handling got introduced very very late in the game and not that fast out of the box (though its a good compromise).

From all that having the 404 handling be a full page response and go via a full rendering is a Win.

Other than performance (and the router can't be that bad for a static route generated - it it is we have other problems ...) why is a sub request for the full blown cases a bad idea overall?

I agree that there should be a mechanism to support the 404 a) case of a message displayed to the user that is not dependent on routing, but I think this would then not need to take the page build (with page variants, etc.) into account but just wrap some HTML around it - similar to how maintenance.html.twig works today.

fabianx’s picture

Regarding performance:

I think we can have the 404 / 403 pages render cached (at least for anonymous users) with the message being potentially rendered via #post_render_cache - if its dynamic.

In this case we would be even faster as the 403 / 404 page is always the same (with perhaps except the message) - which would make sense in general.

A route makes much sense here as the 404 page - and that is again a compromise - would have the new context of the /page-404 page and not the original page, so e.g. a block differing by URL would be the same for the 404 page always - which usually should be okay - as long as its _one_ defined context.

And in the end you can override the response anyway - if you need more flexibility.

Edit: For authenticated users we could cache the whole "page build" at least per @cache_context.user. (and that would work for anon as well, as its one user)

This whole system would be similar to a D7 module forwarding users to /404 and having a custom handler there that caches responses of the whole page per USER. (where anonymous is one user obviously).

Just that this whole process would be internal to Drupal and the original input URL still showing in the address bar.

catch’s picture

Note: This also makes 403/404 pages potentially slower to execute and more brittle against problems in the routing system.

We already use a route if you configure a custom 403/404 page via the UI, this doesn't introduce that at all, what it does is introduce a route for the unconfigured case so it's handled exactly the same whether configured or not.

dawehner’s picture

We already use a route if you configure a custom 403/404 page via the UI, this doesn't introduce that at all, what it does is introduce a route for the unconfigured case so it's handled exactly the same whether configured or not.

This is a good point. This removes the brittleness (we have to assume that its already working) argument.

For the performance argument we have too test that, obviously.

1) Skipping 405 for HTML responses is one thing; They probably won't happen much given the way browsers work, although since we now can have routes that are strictly GET or strictly POST they are more likely to happen then they used to. However, not supporting 405 for non-HTML requests is absolutely not OK. That's a required part of a RESTful response mechanism.

The main question is here, can the user trigger that with just a browser? I think this might be possible given that you might access paths which are POST/... only?

catch’s picture

I think you can potentially trigger those via the browser, but I also think it's OK if you just get a page that says "405: method not allowed". Same as directly accessing 404s for images with fast_404 enabled doesn't get you a fully themed site with blocks either. In that case it really is an 'exception' as opposed to 403/404 which are part of the normal operation of a site.

wim leers’s picture

Status: Needs work » Needs review
Issue tags: -needs profiling
StatusFileSize
new228.97 KB
new17.21 KB
new1.53 KB

Thanks, Fabianx and catch, for already alleviating all raised concerns :)

I don't know what else to add.


This reroll should be green.

The two deleted lines in BlockTest probably look suspect, so here's an explanation: the reason they need to be removed is that it's a 404 page, and it actually revealed a bug in the existing DefaultExceptionHtmlSubscriber: because it doesn't perform a subrequest, unlike CustomExceptionHtmlSubscriber, it would apply the same block visibility rules as if it weren't a 404 (the path is still /user/1), but CustomExceptionHtmlSubscriber wouldn't, since the path would be whatever path is configured to be the custom 404 page.
Possibly worse yet: the test is misleading, because it pretends the routing system treats /node/1 and /NODE/1 as the same, but it doesn't; the latter will be a 404 unless it's an URL alias. So the test was utterly wrong to begin with, and only worked by accident.

Profiling

Note: I think it's important that we apply the same flow for both the default and custom 403/404 pages. Having them work in very different ways like in HEAD, results in subtle problems like the above, where block visibility works differently, depending on whether you use the default or the custom 403/404 page!

Tested as the anonymous user. With APCu enabled. PHP 5.5.11.

ab -c 2 -n 100 http://tres/this-will-be-a-404, median request time, best out of 3 runs, with XHProf disabled
HEAD: 139 ms
with patch: 144 ms
Conclusion: 5 ms slower
XHProf, best out of 3 runs
HEAD patch Diff Diff%
Number of Function Calls 57,351 60,555 3,204 5.6%
Incl. Wall Time (microsec) 191,726 192,396 670 0.3%
Incl. MemUse (bytes) 15,198,952 16,636,240 1,437,288 9.5%
Incl. PeakMemUse (bytes) 15,261,720 16,697,880 1,436,160 9.4%

So, overall: several percents slower. The question is whether this matters.

If we want the performance to be better, I'd argue we do what Fabianx suggests, which will make performance MUCH better, even compared to HEAD:

[…] I think this would then not need to take the page build (with page variants, etc.) into account but just wrap some HTML around it - similar to how maintenance.html.twig works today.

Those sites that want fancy 404 pages can then use a "Fancy 404" module that e.g. searches the site for the keywords extracted from the URL, by inserting their own higher-priority subscriber.

dawehner’s picture

Note: I think it's important that we apply the same flow for both the default and custom 403/404 pages. Having them work in very different ways like in HEAD, results in subtle problems like the above, where block visibility works differently, depending on whether you use the default or the custom 403/404 page!

I really have to agree with this statement. A lot of code which calls for example current_path() is still executed in HEAD, which cannot, in a straightforward way, be replaced
with something coming from the matched route, as there is by default no corresponding route.

In general though we should make it possible to at least theoretically render a HTML page in drupal without actually having routing going on.

+++ b/core/lib/Drupal/Core/EventSubscriber/DefaultExceptionHtmlSubscriber.php
@@ -7,44 +7,31 @@
+ * Exception subscriber for handling core default error pages.
  */
 class DefaultExceptionHtmlSubscriber extends HttpExceptionSubscriberBase {

Can we still mention HTML in the one-line description?

  1. +++ b/core/core.services.yml
    index ae3d944..eb054a0 100644
    --- a/core/lib/Drupal/Core/EventSubscriber/CustomPageExceptionHtmlSubscriber.php
    
    --- a/core/lib/Drupal/Core/EventSubscriber/CustomPageExceptionHtmlSubscriber.php
    +++ b/core/lib/Drupal/Core/EventSubscriber/CustomPageExceptionHtmlSubscriber.php
    

    We don't no longer need the use statement for Request;

  2. +++ b/core/lib/Drupal/Core/EventSubscriber/CustomPageExceptionHtmlSubscriber.php
    @@ -105,44 +85,18 @@ public function on404(GetResponseForExceptionEvent $event) {
    +    }
    +    catch (\Exception $e) {
    +      // If an error happened in the subrequest we can't do much else. Instead,
    +      // just log it. The DefaultExceptionSubscriber will catch the original
    +      // exception and handle it normally.
    +      $error = Error::decodeException($e);
    +      $this->logger->log($error['severity_level'], '%type: !message in %function (line %line of %file).', $error);
         }
    

    I think this logic actually belongs into the parent class. The same kind of exception could be thrown by all kind of stuff in makeSubrequest

  3. +++ b/core/modules/system/src/Controller/Http4xxController.php
    @@ -0,0 +1,41 @@
    +  public function on403() {
    +    return [
    +      '#markup' => $this->t('You are not authorized to access this page.'),
    +    ];
    +  }
    ...
    +    return [
    +      '#markup' => $this->t('The requested page could not be found.'),
    +    ];
    

    OT: Just an idea for the future ... is there a reason we don't just return a '#theme' => 'page_404/403' so people can easily change the output of those pages?

wim leers’s picture

StatusFileSize
new18.05 KB
new6.35 KB

Can we still mention HTML in the one-line description?

I made it consistent with CustomExceptionHtmlSubscriber, which doesn't mention HTML. Now updated both to mention HTML consistently.

  1. Fixed.
  2. Good point, done.
  3. I'd like that, and I bet themers would also. But yeah, definitely out of scope.

Now CustomExceptionHtmlSubscriber is really just a super simple, very clear override of DefaultExceptionHtmlSubscriber. I like it :)

dawehner’s picture

Status: Needs review » Patch (to be ported)

Okay cool.

wim leers’s picture

To address the performance regression, Fabianx and I opened #2362999: Render cache the 403/404 HTML response.

catch’s picture

Note that's only a performance regression for the unconfigured case, nearly all Drupal sites I've ever worked on have had custom 404/403 pages configured and there's no change for that.

fabianx’s picture

Status: Patch (to be ported) » Reviewed & tested by the community

I assume patch to be ported means RTBC ...

I re-read it and yes,with the just moving of makeSubrequest to parent, this is indeed RTBC.

catch’s picture

Component: base system » request processing system

Patch looks good to me, it looks like the change caused some confusion to reviewers so leaving RTBC for a bit longer.

larowlan’s picture

Can we get a follow up for 12.3?

fabianx’s picture

Postponed #2230121: Remove exit() from FormBuilder on this issue. @Wim: Would be great if you could take a look.

#19: Can you create one? ;)

larowlan’s picture

alexpott’s picture

Status: Reviewed & tested by the community » Needs work
+++ b/core/lib/Drupal/Core/EventSubscriber/DefaultExceptionHtmlSubscriber.php
@@ -82,40 +78,50 @@ public function on403(GetResponseForExceptionEvent $event) {
+      catch (\Exception $e) {
+        // If an error happened in the subrequest we can't do much else. Instead,
+        // just log it. The DefaultExceptionSubscriber will catch the original
+        // exception and handle it normally.
+        $error = \Error::decodeException($e);
+        $this->logger->log($error['severity_level'], '%type: !message in %function (line %line of %file).', $error);
+      }

I think you mean \Drupal\Core\Utility\Error::decodeException()

Is this possible to test? Not sure.

fabianx’s picture

Yeah, or better yet: Move the use statement from the other file and don't use \ on Exception and Error.

wim leers’s picture

Status: Needs work » Reviewed & tested by the community
StatusFileSize
new18.08 KB
new1.2 KB

Fixed; now looks exactly like the original location of that code. The original location of that code used \Exception, so this also still does.

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

Committed 4838c5d and pushed to 8.0.x. Thanks!

Time to get on with the followups and parent issue :)

  • alexpott committed 4838c5d on 8.0.x
    Issue #2362517 by Wim Leers: Improve default 403/404 exception HTML...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.