At the moment RDF called _rdf_mapping_load for every bundle of every entity type. That means that every time the entity_info cache is cleared, it runs a bunch of queries that could really be optimized.
I propose the addition of _rdf_mapping_load_multiple, which would query the mappings for multiple bundles for a given type at once.
This change would make RDF's queries dependent on the number of entity types, which I think is very important as more modules start to use entity types in their own way, possibly having lots of bundles. We ran across this issue because we have a custom type with 1500 bundles. Obviously that's a lot and possibly not what was intended by the Entity API, but I think issues like this are more and more likely to arise as more people use the Entity API.
| Comment | File | Size | Author |
|---|---|---|---|
| #48 | interdiff-1338966-48.patch | 640 bytes | lokapujya |
| #48 | 1338966-48.patch | 2.26 KB | lokapujya |
| #24 | rdf-mapping-load-multiple-d8-1338966-23-do-not-test.patch | 549 bytes | geerlingguy |
| #22 | rdf-mapping-load-multiple-1338966-22.patch | 2.38 KB | geerlingguy |
| #19 | rdf-mapping-load-multiple-1338966-19.patch | 2.53 KB | geerlingguy |
Comments
Comment #1
loganfsmyth commentedHere's the patch.
I haven't changed any APIs, so I think should could be backported to D7 too.
Comment #2
loganfsmyth commentedBetter title.
Comment #3
scor commentedGreat idea!
why returning FALSE instead of an empty array?
Comment #4
loganfsmyth commentedGood question! It was a failed attempt at consistency because I was thinking that node_load_multiple and such returned FALSE when there were no results, but that's wrong. Here's a new version returning an empty array.
Also, _rdf_mapping_load returns an empty array instead of FALSE, which is inconsistent with node_load. It would have to stay as is if this is backported to D7, but it'd probably be good to change that for D8, so I've added that in too.
Comment #5
loganfsmyth commentedAs a motivation for backporting to D7, and a general crazy story, I want to mention how this came up :D
We have two entity types, each with ~1500 bundles, so that's ~3000 queries every time the entity_info cache is regenerated. That's a lot of queries all on it's own, but then you have to factor in that the entity_info cache is cleared any time someone runs field_update_instance. So when you press Save in main Field UI 'Manage Fields' page to update weights, for instance, and it updates every instance, that's a lot of cache clears. In our case, there are 32 fields. That lead to ~3000 * 32 = ~96000 queries because of RDF, so the Field UI page was simply timing out trying to run all those queries.
Good times :P
Comment #6
webchickHm. Is there a reason there's a leading underscore on these functions? Elsewhere we do $thingy_load() and $thingy_load_multiple(), not _$thingy_load() and _$thingy_load_multuple(). Maybe fodder for a separate issue, but it'd be nice if these functions were consistent with elsewhere in core. We could introduce them as wrapping functions in D7 and just rename them in D8.
Comment #7
webchickTagging.
Comment #8
scor commented@webchick _rdf_mapping_load() is a helper function for rdf_entity_info_alter(), which hits the database. rdf_mapping_load() does exist, and retrieves its data from the entity info cache. I guess your point is that _rdf_mapping_load() is misleading wrt to the $thingy_load() convention?
Comment #9
loganfsmyth commentedUnfortunately rdf_mapping_load is already a (related) thing.
http://api.drupal.org/api/drupal/modules--rdf--rdf.module/function/rdf_m...
When the entity_info cache gets generated, it runs all the queries for RDF data using _rdf_mapping_load and stores it. Then you can get the data from the entity_info cache by running rdf_mapping_load. It's a bit ugly, but we're stuck with it for now.
Comment #10
scor commented@loganfsmyth we could have put the mapping loading from the db in rdf_entity_info_alter() but left it outside as a function in case another module needs to access the database mapping (bypassing the cache). But I guess a better name for _rdf_mapping_load() is in order (not for this issue though).
Comment #11
loganfsmyth commented@scor Yeah, it makes sense to have it as a separate function.
I haven't really looked through RDF's code much. What was the reasoning behind caching all of that info within the entity_info cache instead of in your own cache? It seems to me like putting it in there just makes things more complicated. If you cached it yourself, you could just query on demand, and have more fine-grained control over managing the cache.
Comment #12
scor commentedit's just that you get the RDF mappings for free along with the other properties of each entity type and bundle. if you need to call them separately, it's extra code that you have to repeat all the time, and you might forget to do it. This solution was suggested at the time by catch and fago. This approach is fairly aligned with the entity API btw.
Comment #13
loganfsmyth commented@scor Thanks for the info. I was just curious. :)
Comment #14
geerlingguy commentedYikes; just noticed that my site is barfing after cache clears and module enabling/disabling, partly due to
_rdf_mapping_load()running more than 500 times, using up a ton of memory while it's at it. Would love to see this go in. (This plus #1040790: _field_info_collate_fields() memory usage would go a long way towards helping sites with more than a few hundred fields + instances + bundles).Comment #15
scor commented@geerlingguy could you try the patch in #4 on D7 and see whether it solves your issue?
Comment #16
geerlingguy commentedClear all caches — before patch:
Clear all caches — after patch:
I ran each clear all caches three times, and cleared APC's caches (restarted apache) in between each one as well, to make things fair. The only nit I have is with:
Why switch to a mixed return variable instead of an empty array if nothing found? Marking RTBC, though, since there's precedent elsewhere for mixed return values, and that definitely shouldn't block this patch (unless we have a policy to return only one type of value?).
Comment #17
geerlingguy commented(Attached is D7 patch I used to test on staging site benchmarked above—only difference is the file structure).
Comment #18
scor commenteds/retreive/retrieve
..., or FALSE if the mapping does not exist.
Comment #19
geerlingguy commentedFixed minor typos.
Comment #20
geerlingguy commentedBack to RTBC.
Comment #21
catchLooks great. Committed/pushed to 8.x, moving to 7.x for backport.
Comment #22
geerlingguy commentedAttached patch should work for D7. I found one typo in the patch that was committed to D8:
Comma should go after word 'structure' instead of after 'or'. I fixed that typo in the attached D7 patch.
Comment #23
scor commented@geerlingguy could you roll a patch for D8 as well for the minor typo please?
Comment #24
geerlingguy commentedPatch for D8 attached. (Should be ignored by testbot.)
Comment #25
mgifford#22: rdf-mapping-load-multiple-1338966-22.patch queued for re-testing.
Comment #27
scor commentedDrupal 8 now uses CMI to load the mappings since #1869600: Refactor RDF mappings to be inline with the new Entity Field API went in, and the comment referenced in patch #24 no longer exists in D8, so this issue only focuses on D7 at this point.
Comment #28
scor commentedComment #30
scor commented#22 looks good to me as long as it passes the tests.
Comment #31
lokapujya#22 applies and passes tests (locally) still.
Comment #33
scor commentedTestbot has become flaky these days. patch to be committed in #22
Comment #35
scor commented#22 still green.
Comment #38
scor commentedComment #43
lokapujyaComment #46
David_Rothstein commentedI think this looks good but not sure about this part:
An earlier comment in this issue said that part would not be backported to D7...
Given that it's a private function, I don't believe it's a huge deal to change it, but would be better not to if there's no good reason. And I actually don't think the change makes much sense? If the function returns an array normally, then it's better for it to always return an array to be consistent, right?
(node_load() and company might do it differently and return FALSE, but that's not the best practice and also, they return objects normally as opposed to arrays, which is a bit different...)
Comment #47
scor commented@David_Rothstein: right, a couple of people actually mentioned that in passing about the D7 patch, this mixed return type should not have been backported... let's limit the API changes for D7. All what needs to be done here is revert the docs change for _rdf_get_default_mapping(), and instead of FALSE, just return array():
Comment #48
lokapujyaDone.
Comment #50
scor commentedThanks Jamie :) patch looks good, and verified interdiff locally, it's good to go. back to RTBC.
Comment #51
David_Rothstein commentedCommitted to 7.x - thanks!
Comment #53
scor commented