At the moment RDF called _rdf_mapping_load for every bundle of every entity type. That means that every time the entity_info cache is cleared, it runs a bunch of queries that could really be optimized.

I propose the addition of _rdf_mapping_load_multiple, which would query the mappings for multiple bundles for a given type at once.

This change would make RDF's queries dependent on the number of entity types, which I think is very important as more modules start to use entity types in their own way, possibly having lots of bundles. We ran across this issue because we have a custom type with 1500 bundles. Obviously that's a lot and possibly not what was intended by the Entity API, but I think issues like this are more and more likely to arise as more people use the Entity API.

Comments

loganfsmyth’s picture

Status: Active » Needs review
StatusFileSize
new2.46 KB

Here's the patch.

I haven't changed any APIs, so I think should could be backported to D7 too.

loganfsmyth’s picture

Title: Reduce queries on entity_info_alter() » Introduce _rdf_mapping_load_multiple to reduce queries.

Better title.

scor’s picture

Great idea!

+++ b/core/modules/rdf/rdf.module
@@ -190,17 +190,36 @@ function _rdf_get_default_mapping($type) {
-  if (!$mapping) {
-    return array();
+  if (!$mappings) {
+    return FALSE;
   }

why returning FALSE instead of an empty array?

loganfsmyth’s picture

Good question! It was a failed attempt at consistency because I was thinking that node_load_multiple and such returned FALSE when there were no results, but that's wrong. Here's a new version returning an empty array.

Also, _rdf_mapping_load returns an empty array instead of FALSE, which is inconsistent with node_load. It would have to stay as is if this is backported to D7, but it'd probably be good to change that for D8, so I've added that in too.

loganfsmyth’s picture

As a motivation for backporting to D7, and a general crazy story, I want to mention how this came up :D

We have two entity types, each with ~1500 bundles, so that's ~3000 queries every time the entity_info cache is regenerated. That's a lot of queries all on it's own, but then you have to factor in that the entity_info cache is cleared any time someone runs field_update_instance. So when you press Save in main Field UI 'Manage Fields' page to update weights, for instance, and it updates every instance, that's a lot of cache clears. In our case, there are 32 fields. That lead to ~3000 * 32 = ~96000 queries because of RDF, so the Field UI page was simply timing out trying to run all those queries.

Good times :P

webchick’s picture

Hm. Is there a reason there's a leading underscore on these functions? Elsewhere we do $thingy_load() and $thingy_load_multiple(), not _$thingy_load() and _$thingy_load_multuple(). Maybe fodder for a separate issue, but it'd be nice if these functions were consistent with elsewhere in core. We could introduce them as wrapping functions in D7 and just rename them in D8.

webchick’s picture

Issue tags: +Needs backport to D7

Tagging.

scor’s picture

@webchick _rdf_mapping_load() is a helper function for rdf_entity_info_alter(), which hits the database. rdf_mapping_load() does exist, and retrieves its data from the entity info cache. I guess your point is that _rdf_mapping_load() is misleading wrt to the $thingy_load() convention?

loganfsmyth’s picture

Unfortunately rdf_mapping_load is already a (related) thing.
http://api.drupal.org/api/drupal/modules--rdf--rdf.module/function/rdf_m...

When the entity_info cache gets generated, it runs all the queries for RDF data using _rdf_mapping_load and stores it. Then you can get the data from the entity_info cache by running rdf_mapping_load. It's a bit ugly, but we're stuck with it for now.

scor’s picture

@loganfsmyth we could have put the mapping loading from the db in rdf_entity_info_alter() but left it outside as a function in case another module needs to access the database mapping (bypassing the cache). But I guess a better name for _rdf_mapping_load() is in order (not for this issue though).

loganfsmyth’s picture

@scor Yeah, it makes sense to have it as a separate function.

I haven't really looked through RDF's code much. What was the reasoning behind caching all of that info within the entity_info cache instead of in your own cache? It seems to me like putting it in there just makes things more complicated. If you cached it yourself, you could just query on demand, and have more fine-grained control over managing the cache.

scor’s picture

it's just that you get the RDF mappings for free along with the other properties of each entity type and bundle. if you need to call them separately, it's extra code that you have to repeat all the time, and you might forget to do it. This solution was suggested at the time by catch and fago. This approach is fairly aligned with the entity API btw.

loganfsmyth’s picture

@scor Thanks for the info. I was just curious. :)

geerlingguy’s picture

Yikes; just noticed that my site is barfing after cache clears and module enabling/disabling, partly due to _rdf_mapping_load() running more than 500 times, using up a ton of memory while it's at it. Would love to see this go in. (This plus #1040790: _field_info_collate_fields() memory usage would go a long way towards helping sites with more than a few hundred fields + instances + bundles).

scor’s picture

@geerlingguy could you try the patch in #4 on D7 and see whether it solves your issue?

geerlingguy’s picture

Status: Needs review » Reviewed & tested by the community

Clear all caches — before patch:

265,225 calls to _rdf_mapping_load(), total of about 280ms of total query time, page execution in ~13600ms, ~190MB peak mem usage.

Clear all caches — after patch:

42 calls to _rdf_mapping_load_multiple(), total of about 7ms of total query time, page execution in ~13300ms, ~189MB peak mem usage.

I ran each clear all caches three times, and cleared APC's caches (restarted apache) in between each one as well, to make things fair. The only nit I have is with:

@@ -187,20 +187,36 @@ function _rdf_get_default_mapping($type) {
  *   The bundle the mapping refers to.
  *
  * @return
- *   An RDF mapping structure or an empty array if no record was found.
+ *   An RDF mapping structure or FALSE.

Why switch to a mixed return variable instead of an empty array if nothing found? Marking RTBC, though, since there's precedent elsewhere for mixed return values, and that definitely shouldn't block this patch (unless we have a policy to return only one type of value?).

geerlingguy’s picture

(Attached is D7 patch I used to test on staging site benchmarked above—only difference is the file structure).

scor’s picture

Status: Reviewed & tested by the community » Needs work
+++ b/core/modules/rdf/rdf.module
@@ -187,20 +187,36 @@ function _rdf_get_default_mapping($type) {
+ * Helper function to retreive a set of RDF mappings from the database.

s/retreive/retrieve

+++ b/core/modules/rdf/rdf.module
@@ -187,20 +187,36 @@ function _rdf_get_default_mapping($type) {
+ *   An RDF mapping structure or FALSE.

..., or FALSE if the mapping does not exist.

geerlingguy’s picture

Status: Needs work » Needs review
StatusFileSize
new2.53 KB

Fixed minor typos.

geerlingguy’s picture

Status: Needs review » Reviewed & tested by the community

Back to RTBC.

catch’s picture

Version: 8.x-dev » 7.x-dev
Status: Reviewed & tested by the community » Needs work
Issue tags: +Performance

Looks great. Committed/pushed to 8.x, moving to 7.x for backport.

geerlingguy’s picture

Status: Needs work » Needs review
StatusFileSize
new2.38 KB

Attached patch should work for D7. I found one typo in the patch that was committed to D8:

@@ -187,20 +187,36 @@ function _rdf_get_default_mapping($type) {
  *   The bundle the mapping refers to.
  *
  * @return
- *   An RDF mapping structure or an empty array if no record was found.
+ *   An RDF mapping structure or, FALSE if the mapping does not exist.
  */

Comma should go after word 'structure' instead of after 'or'. I fixed that typo in the attached D7 patch.

scor’s picture

@geerlingguy could you roll a patch for D8 as well for the minor typo please?

geerlingguy’s picture

Patch for D8 attached. (Should be ignored by testbot.)

mgifford’s picture

Status: Needs review » Needs work
Issue tags: +Performance, +Needs backport to D7

The last submitted patch, rdf-mapping-load-multiple-1338966-22.patch, failed testing.

scor’s picture

Drupal 8 now uses CMI to load the mappings since #1869600: Refactor RDF mappings to be inline with the new Entity Field API went in, and the comment referenced in patch #24 no longer exists in D8, so this issue only focuses on D7 at this point.

scor’s picture

Issue summary: View changes
Issue tags: +RDF code sprint

scor’s picture

#22 looks good to me as long as it passes the tests.

lokapujya’s picture

Status: Needs work » Reviewed & tested by the community

#22 applies and passes tests (locally) still.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 22: rdf-mapping-load-multiple-1338966-22.patch, failed testing.

scor’s picture

Status: Needs work » Reviewed & tested by the community

Testbot has become flaky these days. patch to be committed in #22

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 22: rdf-mapping-load-multiple-1338966-22.patch, failed testing.

scor’s picture

Status: Needs work » Reviewed & tested by the community

#22 still green.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 22: rdf-mapping-load-multiple-1338966-22.patch, failed testing.

scor’s picture

Status: Needs work » Reviewed & tested by the community

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 22: rdf-mapping-load-multiple-1338966-22.patch, failed testing.

The last submitted patch, 22: rdf-mapping-load-multiple-1338966-22.patch, failed testing.

The last submitted patch, 22: rdf-mapping-load-multiple-1338966-22.patch, failed testing.

lokapujya’s picture

Status: Needs work » Reviewed & tested by the community

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 22: rdf-mapping-load-multiple-1338966-22.patch, failed testing.

David_Rothstein’s picture

Status: Needs work » Needs review

I think this looks good but not sure about this part:

  * @return
- *   An RDF mapping structure or an empty array if no record was found.
+ *   An RDF mapping structure, or FALSE if the mapping does not exist.
  */
 function _rdf_mapping_load($type, $bundle) {

An earlier comment in this issue said that part would not be backported to D7...

Given that it's a private function, I don't believe it's a huge deal to change it, but would be better not to if there's no good reason. And I actually don't think the change makes much sense? If the function returns an array normally, then it's better for it to always return an array to be consistent, right?

(node_load() and company might do it differently and return FALSE, but that's not the best practice and also, they return objects normally as opposed to arrays, which is a bit different...)

scor’s picture

Status: Needs review » Needs work

@David_Rothstein: right, a couple of people actually mentioned that in passing about the D7 patch, this mixed return type should not have been backported... let's limit the API changes for D7. All what needs to be done here is revert the docs change for _rdf_get_default_mapping(), and instead of FALSE, just return array():

+  return $mappings ? reset($mappings) : FALSE;
lokapujya’s picture

Status: Needs work » Needs review
StatusFileSize
new2.26 KB
new640 bytes

Done.

Status: Needs review » Needs work

The last submitted patch, 48: interdiff-1338966-48.patch, failed testing.

scor’s picture

Status: Needs work » Reviewed & tested by the community

Thanks Jamie :) patch looks good, and verified interdiff locally, it's good to go. back to RTBC.

David_Rothstein’s picture

Status: Reviewed & tested by the community » Fixed

Committed to 7.x - thanks!

  • David_Rothstein committed ebf478b on 7.x
    Issue #1338966 by geerlingguy, loganfsmyth, lokapujya: Fixed Introduce...
scor’s picture

Title: Introduce _rdf_mapping_load_multiple to reduce queries. » Introduce _rdf_mapping_load_multiple() to reduce queries

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.