Now that we have entity path callback in core, we can go one step further in performance optimization and code cleanup. This was originally motivated by #687712: Optimize rdf.module for displaying multiple comments but it turns out that the rest of core can benefit from it as well. Moshe suggested something similar at #687712-26: Optimize rdf.module for displaying multiple comments and I'd like to investigate a more generic approach on a per entity basis (not only for comments).
Throughout core there are a lot of times where url() calls are used to generate a link to the entity (node, comment, term, etc.). The whole idea of this issue is to centralize these calls during the entity loading phase using the defined path callback, and add this to the entity object so it can be used later in the theme layer or when generating other output formats like RSS, RDF/XML, etc. Each call to url() is expensive, and saving a few of them per page load can help with performance. The RDF module makes extra calls to url() to generate this uri, so having it as part of the entity object allows us to optimize it and only generate the uri only once per entity load.
An example where core can benefit is in template_preprocess_comment():
$variables['title'] = l($comment->subject, 'comment/' . $comment->cid, array('fragment' => "comment-$comment->cid"));
$variables['permalink'] = l('#', 'comment/' . $comment->cid, array('fragment' => "comment-$comment->cid"));
where url is called twice - via l() - to generate the very same link and duplicate the logic of comment_path(). On a page with 50 comments, that would 50 calls to url() saved right away if we were to only call url() once per comment.
Furthermore, having this uri as part of the entity objects allows modules like entitycache to cache it as part of the entity data, which should increased performance even more.
| Comment | File | Size | Author |
|---|---|---|---|
| #101 | entity_uri-use-699440-101.patch | 53.7 KB | effulgentsia |
| #94 | entity_uri-use-699440-94.patch | 53.35 KB | effulgentsia |
| #90 | entity_uri-use-699440-89.patch | 52.23 KB | effulgentsia |
| #85 | entity_uri-use-699440-84.patch | 52.22 KB | effulgentsia |
| #84 | entity_uri-use-699440-84.patch | 52.23 KB | effulgentsia |
Comments
Comment #1
scor commentedA patch should help to understand this issue a bit more! This patch relies and includes the patch I posted at #525622-34: 'as link' formatters need a generic way to build the url of an 'entity'. The entity URI is added in attachLoad() and might be better located somewhere else, but it works as it is (proof of concept).
Comment #3
moshe weitzman commentedI like this a lot. Lets fix up those test failures and move ahead.
Comment #4
scor commentedSometimes the entity does not exist yet (like in a node and comment preview for example). This patch should pass all node and comment tests at least.
Note I left out plenty of calls to url() which are either with the absolute option or hidden behind l(). We should try to catch these suckers as well. I'll open an issue to allow to pass an already existing url to l() unless someone else beats me to it. The absolute option for url() could be reproduced outside url().
Comment #5
scor commentedThe reason for the other tests to fail was that forum_url_outbound_alter() contains a nasty term load, which leads to an infinite recursion when calling url() during the entity load:
Side note: The regular expression feels like a hack and I wonder how this perform on pages with many calls to url() when the forum module is enabled. I'm not sure why this cannot be done in the same fashion as the comment permalinks have been implemented, i.e. via a forum/%forum menu item which loads the term page. For reference, this was introduced in #320331: Turn custom_url_rewrite_inbound and custom_url_rewrite_outbound into hooks.
For now I've added a condition to prevent infinite recursion, but hopefully we can improve this.
We'll need benchmarks for HEAD, HEAD with entitycache, and number of calls to url() saved on pages like node w/ 50 comments, front page, etc.
Comment #6
catchI'd think that forum.module could hook_entity_info_alter() taxonomy.module to add it's own uri_callback (or whatever we end up calling it) - which then checks for forum vocabulary.
Additionally the check on machine name is wrong, it should use variable_get('forum_nav_vocabulary');
Comment #7
scor commentedQuick benchmark of #5:
Comment #8
scor commentedI did some micro benchmarks of l() and url(). Times were measured by calling
1000 times. times are in microseconds (us). each line removes an element of the return at the end of l() and shows in parenthesis the time each removed element takes.
note: add about 8us to url() when the forum module is enabled.
url() represents 65% (20us) of the l() execution time - that means that if we have already built a url via url() prior to calling l(), we should be able to save a lot when calling l(). Unfortunately, l() does not accept pre-built urls. l() does not alter the input parameters of url() at all. It does alter $options, but only the 'attributes' and 'html' keys which are not used by url(). That means it would be possible to pass a prebuilt url to l() and include it in the return A tag (modulo an API change of course). This would be an alternative to hard-coding the A tag like we're doing in the last patch in template_preprocess_comment(). However using l() will always have more overhead than the hard-coded alternative.
When displaying 50 comments, HEAD calls l() twice per comment. The idea is to prebuild the comment uri and pass it to l() - or hardcode the A tag like we do in #5. That saves 50 calls to url(), which counts for 1ms - compared to a page load of 140ms, it starts to matter. (and they are more call to l()/url() to be saved).
For future reference, other random benchmarks I did outside the context of l(), which confirm the numbers above.
Comment #9
scor commentedprototype patch for l() accepting new 'url' option, which will bypass the call to url(), similar to the 'alias' option which bypasses the call to drupal_get_path_alias().
31.1us
l('Praesent Elit Blandit Qui', 'comment/1990', array('fragment' => 'comment-1990'));11.1us
l('Praesent Elit Blandit Qui', '/d7gitfr/comment/1990#comment-1990', array('url' => TRUE));5.17us
'<a href="' . check_plain('/d7gitfr/comment/1990#comment-1990') . '">' . check_plain('Praesent Elit Blandit Qui') . '</a>';so optimized l() still has an overhead, and we might want to use the hard-coded version in performance critical contexts like template_preprocess_comment(). I've made use of the new l() in template_preprocess_comment() to show how to use it.
Comment #10
scor commentedInvestigating a different alternative here: provide an opt-in caching in url(). I've logged the calls to url() for certain pages, and many of them are redundant. Since many others are not, adding a key to $options triggers the cache.
A few examples:
The 2 last lines look different, but they are in fact treated the same by url(), the extra keys are crap that l() left behind. Once removed, all these 4 can be put in the same cache item.
node with 50 comments as anonymous:
is duplicated 50 times... same for user/register.
Comment #11
scor commentedThis patch introduces url_fast(), a static cache atop url(). By using a separate function as opposed to a key in $options, we ensure all normal calls to url() do not have any extra overhead at all.
Comment #12
scor commentedLet's refocus on the original goal of this issue and continue investigating url caching in a separate issue.
This is a reroll of #9, which introduced a new 'url' option to skip the call to url(). It's used when we already have the value return by url(), aka the entity URI. See micro-benchmark at #9 which showed a gain of about 20us when skipping url() in l().
We're mostly saving on comments here. In HEAD we're calling l() twice and url() once in rdf_preprocess_comment(). With the patch we're only calling url() during the entity load and outputing it thereafter. That's 2 calls to url() saved per comment.
There is also one call to url() saved when viewing a full node (template_preprocess_node and node_page_view).
The alternative to adding this new key to l() is to hard-code the A tag, which is in fact even faster. So we need to choose between a non-breaking API change or hard-coded A tag.
This patch depends on #525622: 'as link' formatters need a generic way to build the url of an 'entity' which hopefully will get fixed soon, it's mostly about agreeing on the right terminology...
Comment #13
scor commentedIn HEAD we're calling l() twice and url() once in rdf_preprocess_comment(). With the patch we're only calling url() during the entity load and outputing it thereafter directly or through l(). That's 2 calls to url() saved per comment, which are about 20us each on my machine, total is 40us / comment. There is also one call to url() saved on a full node view. For 50 comments, that's 20 + 50 x 40 = 2020 us (~2ms). For 300 comments, it's 20 + 300 x 40 = 12ms.
Actual page load benchmarks look good:
Comment #14
scor commentedrerolling to reflect #525622: 'as link' formatters need a generic way to build the url of an 'entity' commit.
Comment #15
catchThis looks great to me except for the hunk which has to account for the brokenness in http://api.drupal.org/api/function/forum_url_outbound_alter/7
1. Doing a regexp on every url.
2. Doing an taxonomy_term_load() on every url that matches.
Instead we should have forum_entity_info_alter() change the entity_path() callback to it's own, handle the forums vocabulary properly, then ensure entity_path() is used in field formatters and other places consistently as the replacement to taxonomy_term_path().
That doesn't have to be dealt with in this issue, but the infinite recursion possibilities this currently has can happen anywhere. Note that's not just a problem with this patch - any module anywhere running url('node/1'); inside a hook_$something_load() could end up triggering infinite recursion.
Comment #17
scor commentedThanks bot for catching that. yes, entity_path() is now entity_uri().
Comment #18
catchOpened #712076: Infinite loop possibility and performance problems in forum_url_outbound_alter().
Comment #19
scor commentedadded a @todo to remove the infinite loop prevention code when #712076: Infinite loop possibility and performance problems in forum_url_outbound_alter() is fixed.
Comment #20
scor commented#19: 699440_entity_uri_19.patch queued for re-testing.
Comment #21
effulgentsia commentedsubscribe
Comment #22
catchI'm bumping this to critical. We have a major performance / infinite recursion regression in trying to use hook_url_outbound_alter() instead of taxonomy_term_path() between D6 and D7. Fixing that mainly means a change from taxonomy_term_path($term) to taxonomy_term_uri($term), and adding bundle support to the entity uri callbacks key, which is only an internal change in entity api which will affect very few, if any, entity modules in the wild. See #712076: Infinite loop possibility and performance problems in forum_url_outbound_alter() which has a workaround for both issues but needs to be fixed fully here, marking that other issue RTBC so we can make progress in this one.
Also retitling.
Comment #23
catchComment #24
webchickHere's a summary of some general discussion tonight in #drupal. It kinda spans multiple issues, but figured I'd stick it here as kind of a central 'bridging' place.
catch first pointed me to #712076: Infinite loop possibility and performance problems in forum_url_outbound_alter() as an "easy" patch to commit. :P (Note to self: never trust catch ;))
What that patch does, however, is take 6 or so lines of easily understandable code and turn them into a 25 line kitten killing monster which includes its own recursion detection. When I see code like that in a one-off module it creates the distinct impression that other modules are going to need similar code to handle that. Ick. So I would actually recommend not removing the recursion detection from this one and instead keeping it there. (and marking the other issue won't fix)
However, this approach is only really going to solve the issue for entities. There are tons of other modules out there (Dave Reid mentioned feedburner) that also will implement hook_url_outbound_alter() and don't get to benefit from this approach. It makes me seriously question whether we ought to roll that patch back since it was fairly late to begin with, and this makes at least 2-3 serious performance regressions as a result...
On this patch in particular, my concern is that lines like this:
- drupal_add_html_head_link(array('rel' => 'canonical', 'href' => url('node/' . $node->nid)), TRUE);
+ drupal_add_html_head_link(array('rel' => 'canonical', 'href' => $node->uri), TRUE);
...are really confusing, because nowhere else do we do this sort of "magic" pre-url()ing of urls, so I could see module developers scratching their head a good bit, and ending up doing double-duty on calling url(). I'm also not real keen on that 'url' parameter to $options. It feels like we ought to be consistent everywhere (but catch informs me this may not be possible).
In the forum issue, Dave Reid mentioned an idea about different modules passing various "contexts" they need in that $options array, to make hook_url_alter() more performant. So for example:
and forum_url_outbound_alter could check for $options['term']
This, however, strikes me as a complete nightmare since there'd be no way to document the possible values in that $options array in the url() documentation, and creates a precedent for any random module sticking whatever random crap they want in a honking array that gets passed around a lot.
So.. not a lot of answers here, I'm afraid, but those are my first impressions.
Comment #25
chx commentedRollback++ apparently http://drupal.org/node/320331#comment-2188534 is not so cached after all.
Comment #26
effulgentsia commentedWhether or not to rollback hook_url_(in|out)bound_alter() should be discussed in a separate issue. But I don't think it's fair to treat forum.module's inappropriate use of the hook as an indicator that the hook is bad. hook_url_outbound_alter() is architecturally sound when used correctly. Correct uses include the locale module. Also, http://drupal.org/project/domain. And to handle proxy issues as described in #535612-14: Fix Drupal 6 url() function to call custom_url_rewrite_outbound for external links too. That there are at least 3 independent legitimate uses of this hook makes me like it as a hook rather than a hard-coded function name like custom_url_rewrite_outbound(), but I can also appreciate chx's perspective that it's not worth the performance cost. Anyway, that cost-benefit analysis belongs in a separate issue. If we decide to keep the hook, we should document it better, so that mistakes like forum_url_outbound_alter() don't happen again in core, and are minimized in contrib.
Regardless whether we rollback that hook, I think we all agree here that forum_url_outbound_alter() needs to go. And for that to happen, we need entity_uri() to support per-bundle 'uri callback'. This is just an abstraction of what D6's taxonomy_term_path() does, so it will give us everything that function gave us, but for all entities.
Some of the code in #19 still needs work and can be made much cleaner once we have per-bundle uri callbacks supported.
Flexible path creation for entities is a problem that Views and Panels have been struggling with for years. Solving it is a huge leap forward that we should be massively proud of. Sure, flexible path creation for non-entities would be great, but we're not gonna get that in D7 at this point. If that leads to some contrib modules using hook_url_outbound_alter() to hack around that limitation, that's the price we pay for not having achieved perfect path architecture yet. But these are contrib modules that are already hacking in this way via the http://drupal.org/project/url_alter module.
Comment #27
catchYeah I think the main thing here is to create a division of responsibility between altering arbitrary paths, and allowing paths which have a relationship to something we already know about, and have their own API attached. So if we can only do entity_path() here then that fixes the regressions, we'll find out during the D7 cycle what crazy implementations are done to hook_url_outbound_alter() which could be handled better via another method. The main purpose of this issue should be to not ship with any of those.
Comment #28
scor commentedThis patch extends entity_uri() to allow for optional bundle specific uri callback. It also adds forum_entity_info_alter() and forum_uri() to forum.module so forum_url_outbound_alter() can be removed. I've fixed a few places in taxonomy.module to take advantage of the new term->uri so it plays well with forum.module, but I might have missed some places. I'm hoping the testbot can help me here.
Comment #29
scor commentedhere goes the patch...
Comment #31
scor commentedpath.test tests the hook_url_outbound_alter() functionality so I moved it from forum.module to a dedicated test module forum_url.module.
Comment #32
effulgentsia commentedHow do you guys (and gals) feel about pulling back a bit on the scope of this issue and taking all this l() optimization out? I'd like to see this issue be about bundle support for entity_uri(), making sure all places that make links to entities use entity_uri() (either directly or via a $entity->uri) instead of hard-coding a path (see for example, template_preprocess_username()), and removing forum_url_outbound_alter(). Maybe once we accomplish all that, we can open a new issue for optimizing url() calls.
If we're getting rid of forum_url_outbound_alter(), do we need all this? Also, as per above, can we make $queried_entity->uri be the result of calling entity_uri() rather than flattening it with a url() call?
module_invoke_all() does array_merge_recursive(). So, can this be forum_entity_info() instead? Alter hooks are for making changes, not for adding the primary info data. Or is there something special about entities that makes doing this in an alter hook preferable?
143 critical left. Go review some!
Comment #33
effulgentsia commented#32 was x-post with #31. CNR for bot.
Comment #34
scor commentedI'd like to keep it in here because it accounts for more than half the performance we gain from centralizing entity uris into the each entity object. If others really want to, we could split it into a different issue...
Unless you also store its flatten, "ready to use" value, that would defeat the initial purpose of this issue which was to save on url calls. If you don't flatten it right away and store it in the entity object, then you have to call url() everytime you want to use it. However, we could also keep the structured value of uri so that when you use a variant of url() like 'absolute' for consistency and good design, but that would not bring any performance benefit (and that was the initial goal of this issue).
I removed the infinite loop prevention code, and changed forum_entity_info_alter() to forum_entity_info() per effulgentsia comments.
Comment #36
yched commentedThe $info['bundles'][$bundle]['uri callback'] property needs to be documented in the phpdoc for hook_entity_info() ?
Nitpick : can we refactor this into :
- some logic ending up with
$function = something;-
return $fuction($entity) + array('options' => array());Powered by Dreditor.
Comment #37
effulgentsia commentedWell, if we have to optimize url(), how about this way? This changes entity_uri() to return an object instead of an array. That's why I'm adding the "API change" tag to this issue. But, there's only one place (image.field.inc) in HEAD that currently calls entity_uri() (this patch adds more), and I doubt that that many contrib modules have started calling it either, so I think it's an acceptable trade-off considering it allows the desired url() optimization in what I think is a nice way.
This patch doesn't include the forum_url module from #34, and instead changes path.test to be caught up to the removal of forum_url_outbound_alter().
This patch converts all the places that #34 converted from hard-coded urls to entity_uri() driven urls, but this is not complete. There's a larger sweep that's needed. But, I'd like to get a check from people here on whether this is looking like a good route before doing that sweep.
Comment #38
scor commentedGood job effulgentsia! Initial micro-benchmarking on
url()is promising:url('comment/' . $comment->cid, array('fragment' => 'comment-' . $comment->cid));HEAD = 26.5usurl(entity_uri('comment', $comment));with #37 = 4.2usand a regular uri defined as an object such as
will also show great performance:
url('user/register');HEAD = 25usurl($register_uri);with #37 = 2.3usThis particular one is not part of #37 but could be part of a larger sweep effulgentsia mentions (#10 contains some examples).
I could not find any improvement on l() upfront. This was due to a typo in one of the unset for the 'attributes' key in url() necessary for the proper merge. With that fixed, this is what we get:
l('Praesent Elit Blandit Qui', 'comment/1', array('fragment' => 'comment-1'));HEAD = 40usl('Praesent Elit Blandit Qui', entity_uri('comment', $comment));with #38 = 18usComment #39
catchmicrobenchmarks look good. I also profiled with xhprof on a node with 48 comments and standard display settings, and while it looks like only about .5% improvement in time, general profile of URL looks much more encouraging. Attached screenshots of the url() detail for comparison.
Overall code looks good too. Only thing which concerns me if the caching in url depends on always passing in the same object and acting on it by resource - if we end up cloning by default in #154859: Document that cached entity objects are not necessarily cloned any more then couldn't we end up with different objects being passed in each time, and loose the benefit of caching there?
Comment #40
catchThought about this some more, and it's extremely likely that the caching would be on the object passed around the various node view / build functions, so even if it's cloned from node_load() eventually, it's still going to be the same object passed by resource in between all those functions when we run URL on it - certainly it would be for comment listings and similar where this optimization is most pressing.
This is a very important patch in the sense we absolutely need to remove forum_url_outbound_alter() (as opposed to simply rolling back hook_url_outbound_alter() altogether), and because RDF can add up to 20% overhead to a node + 50 comment listing, a lot of which comes from url(). While it has API changes, they're either additions or contained to entity_info() which has few implementations outside core (and which will run into the same problems we have here so it's a critical bug for the lack of bundle support just by itself).
I think we should go ahead and commit this asap to get the API change out of the way, then do the much larger sweep which effulgentsia mentions in #37 afterwards. The microbenchmarks look great, the profile looks much better url(entity_uri('taxonomy_term', $term)) is a lot better than other ideas we had like having $options['entity'] or similar. As mentioned above, this pattern could also be used for non-entities if we found suitable places for it, which it looks like there are, so it's a generic solution as well. RTBC.
Comment #41
chx commentedNo. Just no. This can not be. Move that hunk out from url() into a wrapper, say entity_url(). You can't possibly suggest a core function that accepts an entity or an arbitrary path.
do_everything($foo, $bar, $universe);is the next?Comment #42
effulgentsia commentedThat's not what #38 suggests. It suggests that url() and l(), which both take $path and $options as two separate parameters also take a single parameter that is an object containing ->path and ->options properties. That object is not an entity, but a uri (I guess, a "uri" being something that contains enough structured information to become a "url"). The problem trying to be solved is this: how to make it so that when url() is called multiple times for the same 'path' and 'options', that its result can be cached? If overloading the first parameter of url() is unacceptable, do you have any other ideas? Static cache within url() keyed on a serialized combination of $path and $options?
Comment #43
catchi spoke to chx in irc, he suggested an entity_url() wrapper around url(). So keep the caching / is_object() (we wouldn't need a check because could guarantee object) outside url(). Didn't play with that yet though.
Comment #44
effulgentsia commentedThis removes the is_object() from url() that chx disapproves of. This also removes the need for the BC break of changing entity_uri() to return an object instead of an array, so removing the "API change" tag: yay!
Comment #45
chx commentedGetting better! Thanks. But array('fragment' => "comment-$comment->cid") this is gone and i can't see where it's added back.
Comment #46
effulgentsia commentedcomment_uri()
Comment #47
chx commentedOK, one last question , why forum_entity_info and not entity_info_alter?
Comment #48
effulgentsia commentedFrom #32: Alter hooks are for making changes, not for adding the primary info data.
forum.module is the primary module responsible for the forum bundle. If it were making a change to anything about a taxonomy_term entity other than the forum bundle, then alter() would make more sense. Do you agree or am I missing something in our hook_*_info()/hook_*_info_alter() pattern?
Comment #49
bcn commentedEdited to add. Never mind, i didn't know about comment_uri()
Ignore this patch.
Comment #50
effulgentsia commented@noahb: Thanks for the good intentions :). Re-uploading prior patch to clarify that it's the one that's on the table.
Comment #51
yched commentedre effulgentsia #48 : well, taxonomy_entity_info() does take care of exposing info for all taxo bundles (i.e taxo vocabs), including the forum vocab, right ?
Then it does look like forum_entity_info() in the patch only alters what taxonomy_entity_info() defined - for that matter it doesn't declare any of the other 'bundle' properties (label, admin), but relies on those set by taxonomy_entity_info().
Or am I missing something ?
Comment #52
yched commentedAbout the comment in forum_entity_info() - would it make sense to have the 'forum_nav_vocabulary' variable store the vocab machine name instead of the vid ?
Comment #53
yched commentedSorry for the spam - I re-read #32, and I'd tend to disagree. Relying on array_merge_recursive(), and having two separate hook implementations define (partial) primary data for an entry, happens to work but is kind of confusing IMO. forum.module is not defining the primary data, it is tweaking the primary data exposed by taxonomy.module. Sounds very much like an alter hook to me.
Comment #54
catchPatch looks good. I'm not sure why we need to call url() recursively though, why not check at the end of the function if we had a cache miss and add to cache there?
I think we should use forum_entity_info_alter() as well, there's no particular benefit to using either, but if it'd been _alter() in the patch no-one would have mentioned it, so that automatically leans me towards that.
forum_nav_vocabulary() having a machine name instead of vid sounds good, not necessary for this patch though.
Comment #55
effulgentsia commentedChanged to forum_entity_info_alter().
url() returns in 5 different places, so that would need to be refactored or adding to the cache would need to be duplicated 5 times. An extra stack call adds 1us to the cache miss (and only when the cache is being employed), but we're already paying a couple us for the drupal_static(). Is it worth attempting the optimization you suggest as part of this issue?
Comment #56
catchArgh I'd forgotten url() returns so many times, NIH then, let's not get into a completely url() refactoring here, keeping this to less than 20k is nice ;)
Will try to run some benchmarks later today.
Comment #57
effulgentsia commentedAdded an @todo explaining the situation so that someone who's inspired to can attempt a url() refactor in a separate issue.
Comment #58
catchNode with some comments on it. Enabled forum module so we can see the impact of removing forum_url_outbound_alter()
Before patch:
After patch:
In xhprof, total cpu used for url() and the functions it calls went down from 40,000us to 28,000us
RTBC.
Comment #59
dries commentedPersonally, I'm not a big fan of the cache handling in this patch, and the fact that people can turn on/off caching. It adds nuances and complexity to url() that are not easily understood without a lot of benchmarking. I recommend that we remove that for now.
Comment #60
catchurl() is one of the worst CPU hogs in core at the moment, top 5 or so, and the main use for this cache is to be invoked transparently when people use entity_uri().
However, the critical bug and API change here (bundle support, removing forum_url_outbound_alter()) doesn't require the caching to be added, so I'm uploading a patch without that hunk, and I've opened #763574: Add caching for url() so we can work on it separately. Patch is untested.
Comment #61
effulgentsia commentedDiff confirms that the only change from #57 is the removal of any modification to url() and the removal of adding a 'cache' key to what entity_uri() returns. Bot's happy, so RTBC.
Comment #62
effulgentsia commentedTitle change to reflect #60.
Comment #63
dries commentedFails to apply locally. Asking for a re-test.
Comment #64
dries commented#60: entity_uri.patch queued for re-testing.
Comment #66
scor commentedrerolling after #761616: Follow-up to revert node titles as fields; revert $node_title to $title commit.
Comment #67
effulgentsia commentedDiff confirms #66 as equivalent to #60.
Comment #68
dries commentedGreat. Committed to CVS HEAD.
Comment #70
effulgentsia commentedHere's the sweep I promised in #37. It might not be complete, but can this be reviewed? I'd like confirmation that this is in fact what we intended when we introduced a entity_uri() function.
Comment #72
effulgentsia commentedAnother try.
Comment #73
catchTo actually allow rewriting of links, this is exactly what we need to do. Patch looks good, on the @todos, it looks to be only comment module which is causing problems? Generally we only render comments from a single node, although D6 used to call node load 3-4 times for each comment, which with drupal_static() was still needlessly expensive when viewing a lot (now it'd be getting them from the entity loader which is probably not much better), the answer to this was to pass $node around with $comments to the functions that need it - http://api-drupal.pajunas.com/api/function/comment_view_multiple for example. For the admin case, we can probably do something similar, for RDF less so.
Comment #75
effulgentsia commentedOnce again. This also solves the @todos by using entity_create_stub_entity(), though catch's idea (#73) of getting a real $node object into the necessary comment functions may be a better way to go.
Comment #76
fagooh, I really like this. As of now we have entity_uri() and the possibility to override it for bundles, however most places in core hard-code the url. It also catches some hard-coded taxonomy urls, which cleans up the forum-alter-url use case. ++
@array_merge($uri['options'], $options)
Perhaps we should use array_merge_recursive() here? What if entity uri callback makes use of 'attributes' or 'query' and the caller wants to do so too?
Comment #77
effulgentsia commentedI like the idea, but:
array_merge_recursive(array('language' => 'en'), array('language' => 'es'))results inarray('language' => array('en', 'es')). This seems broken. How do we deal with this in other places that we use array_merge_recursive()? Scalars should replace. Only arrays should merge.Comment #78
effulgentsia commentedPlease see #791860: array_merge_recursive() is never what we want in Drupal: add a drupal_array_merge_recursive() function instead. for a solution to the array_merge_recursive() problem.
Comment #79
scor commentednice cleanup to remove all hard-coded entity paths!
this seems like an unrelated hunk.
we should explain the difference between entity_uri() and entity_url()
Comment #81
fago>How do we deal with this in other places that we use array_merge_recursive()? Scalars should replace. Only arrays should merge.
We just use array_merge_recursive() and hope nothing breaks :D Anyway, we can incorporate drupal(_array)?_merge() here once it's fixed.
Comment #82
scor commented@fago: not related to entity_uri, but just in case it interests you, an example is #721082-21: Prevent conflicting namespaces and move hook_rdf_namespaces() invocation into rdf.module where had to work around array_merge_recursive() to prevent namespaces wreck, see the rdf_get_namespaces() hunk in the patch.
Comment #83
effulgentsia commentedThis is a re-roll only. I'll follow-up with implementing #76 and #79 later today.
Comment #84
effulgentsia commentedThis implements #76 (see #78/#81).
Re #79.1: It's related in that the example code includes a hard-coded path to 'user/UID'. This could be fixed by using entity_url(), but the example is wrong: it should be linking to 'user/UID/edit', so that's what the patch does.
Re #79.2: I really struggled with this, because I'm not really sure if we're intending to support entity types whose URIs are not URLs. Since Drupal is about making things available on the web, are we anticipating entity types whose URI is a URN rather than a URL? And if so, does our entity_uri() function handle that? But then we have things like the PHPDoc of entity_uri() saying
@return: An array containing the 'path' and 'options' keys used to build the uri of the entity, and matching the signature of url()implying that we're gonna use the url() function to build a uri. So please see if the PHPDoc I added to entity_url() makes sense.Comment #85
effulgentsia commentedNo, that's not how fragments work :)
83 critical left. Go review some!
Comment #86
effulgentsia commentedChanging issue title to focus attention on the part that hasn't already landed.
Comment #87
aspilicious commentedput a newline between the different doc blocks (in this case between @param and @return)
put newline between @param and @return
Powered by Dreditor.
Comment #88
aspilicious commentedComment #89
effulgentsia commentedthanks.
Comment #90
effulgentsia commentedwith the file.
Comment #91
aspilicious commentedNo patch?
Comment #92
aspilicious commentedDamnit crossposting
Comment #93
catchThis looks great. We need this to get the functionality of hook_term_path() - which was removed and replaced by hook_url_outbound_alter() in a way which caused a whole range of issues, back into Drupal 7.
Only question on the patch so far is do we really need those entity_create_stub_entity() calls in there. It seems like in some cases (comment deletion) we could just do a node_load() instead - those pages aren't in the critical path at all.
Comment #94
effulgentsia commentedAgreed with #93. I left entity_create_stub_entity() in rdf_comment_load(), because that does seem to be critical path. But I commented it. I'm a little confused why we need to generate those variables there at all (do we ever need them other than when rendering a comment, in which case should we move that code to rdf_preprocess_comment()?), but perhaps that should be a separate issue?
Comment #95
scor commented@effulgentisa: those variables are added to the entity during hook_comment_load() simply so they can be cached by modules like entity_cache. It doesn't really provide any performance benefit in core alone, but allow better performance with modules providing specialized caching. With all the work going into entity_url(), this could be revisited and we could move these back into rdf_preprocess_comment(), though this does not affect the speed of the date markup whose time relies on PHP's date().
Comment #96
effulgentsia commented@scor: thanks! Given #95, we should leave rdf_comment_load() as it is in #94. Potentially, someone who feels inspired can open a separate issue about removing those variables from rdf_comment_load(), but the performance analysis of that issue would need to include the considerations you mention. On another topic, can you please comment on #94's PHPDoc for entity_url() and whether it addresses #79.2 without introducing anything inaccurate. Please see #84 about my general concerns about how we use URI and URL. It's beyond the scope of this issue to fix incorrect usage of the terms in HEAD, but I don't want this patch adding any new misinformation. Thanks!
Comment #97
yesct commentedneeds work to address the points in #96
Comment #98
effulgentsia commentedI didn't intend #96 to imply "needs work". I would like a review from scor on the PHPDoc of entity_url(). Other than that, catch may want to review if #94 adequately addresses #93. Unless one of them (or another reviewer) has further changes to recommend, I think this can be RTBC.
Comment #99
catchThe changes + comment in #94 looks good to me. Didn't give the patch another full once-over, but didn't see any other issues last time either.
Comment #100
scor commenteddo you have an example for "context-specific URL that's different than the entity's URI"? re #84 I don't think we should try to support URNs at this stage, if we do, calling entity_url() on this entity would break so we would need to extend entity_url (too late). I'm personally quite happy already with the level we're reached with dealing with non-URN URIs.
I can't think of any case where this function would not be called for an entity. What am I missing?
PS: leaving this needs review. the points above are not major concerns and we need people to review this patch in the meantime.
Comment #101
effulgentsia commentedThanks! Is this better? If so, then given catch's and fago's reviews, I think we're RTBC.
Comment #102
effulgentsia commentedAnyone up for RTBCing this? Sure would be nice to get this into alpha5 to help contrib authors be aware of this change.
Comment #103
catchI think we're good now.
Just a reminder of why this is needed:
* The hook_url_outbound_alter() patch removed hook_term_path(). This was used by forum module. forum_url_inbound_alter() led to a high potential for infinite recursion alongside performance issues.
* Rather than just put back hook_term_path(), we already had a need of entity URI for RDF, so we added the entity_uri() function instead.
* This patch just applies the use of that function consistently across core, introducing entity_l() and entity_url() as additional helpers, after the initial API change went in.
Comment #104
yesct commented#101: entity_uri-use-699440-101.patch queued for re-testing.
Comment #105
scor commentedTo address effulgentsia's question in #94 about why we need to generate those variables in rdf_comment_load(), I've added some documentation as part of #815866: Improve rdf.module documentation.
Comment #106
webchickThanks for the summary. But changes made throughout this patch leave me scratching my head as to when I'm supposed to use which function.
1. Do we strictly need the entity_uri() function? Most people wouldn't be able to tell you what URI vs URL is without consulting Wikipedia (which, to prove the point, is linked in the description of entity_url()), and we don't have a corresponding uri() function for symmetry.
2. I think we need far better, and cross-referenced, documentation as to when to use l() or url() vs. entity_l() or entity_url(). The documentation that exists now is technically accurate, but doesn't help me as a module author to not screw up my links to things.
Comment #107
effulgentsia commentedReverting issue status to as it was in #69. Opened a new issue to continue this discussion (so people can join without wading through all the back story): #823428: Impossible to alter node URLs efficiently.
Re #106:
I wrote in the new issue that I'm leaning towards removing it in D8, but we already have it in D7, and don't have a hook_url_outbound_TAG_alter() system in place that could make entity_uri() obsolete.
Indeed. And I agree #101 didn't help address that. URIs are just a superset of URLs, but for Drupal entities, they're really the same thing, as all Drupal entity URIs are also URLs. So, I think we're just slowly changing terminology within Drupal from URL to URI, because URLs are so 1990s, and all the hip web technologists prefer the word URI. The patch in the new issue tries to make this more clear.