I use search_api_db, but I guess this is a generic issue and not related to the db implementation.

I have some hierarchical vocabulary and fields linking to this vocabulary indexed using the processor that also index parents for each term.

If I change the parent of a term I would expect that the search engine would reindex every content referring to that term but this is not happening.

Should this be handled by the search_api module? Or am I expected to write my own hooks to intercept that changes and invalidate contents?

CommentFileSizeAuthor
#48 2007692-48--react_to_changes_in_related_entities.patch74.33 KBdrunken monkey
#47 2007692-47--react_to_changes_in_related_entities.patch74.38 KBdrunken monkey
#47 2007692-47--react_to_changes_in_related_entities--interdiff.txt2.13 KBdrunken monkey
#43 2007692-43--react_to_changes_in_related_entities.patch74.31 KBdrunken monkey
#43 2007692-43--react_to_changes_in_related_entities--interdiff.txt87.31 KBdrunken monkey
#42 interdiff-39-42.txt718 bytesbucefal91
#42 2007692-42--react_to_changes_in_related_entities.patch44.41 KBbucefal91
#41 2007692-41--react_to_changes_in_related_entities.patch44.41 KBbucefal91
#41 interdiff-39-41.txt768 bytesbucefal91
#39 interdiff-37-39.txt1.95 KBbucefal91
#39 2007692-39--react_to_changes_in_related_entities.patch44.41 KBbucefal91
#37 interdiff-36-37.txt18.17 KBbucefal91
#37 2007692-37--react_to_changes_in_related_entities.patch43.3 KBbucefal91
#36 2007692-36--react_to_changes_in_related_entities.patch36.58 KBdrunken monkey
#36 2007692-36--react_to_changes_in_related_entities--interdiff.txt1.27 KBdrunken monkey
#34 interdiff-31-34.txt19.16 KBbucefal91
#34 2007692-tracking-foreign-entities-34.patch36.85 KBbucefal91
#31 2007692-tracking-foreign-entities-31.patch26.62 KBbucefal91
#31 interdiff-30-31.txt3.5 KBbucefal91
#30 interdiff-29-30.txt506 bytesbucefal91
#30 2007692-tracking-foreign-entities-30.patch26.3 KBbucefal91
#29 2007692-tracking-foreign-entities-29.patch26.28 KBbucefal91
#28 2007692-tracking-foreign-entities-28.patch16.4 KBbucefal91
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

drunken monkey’s picture

Regrettably, it's very hard to well-nigh impossible to keep changes of such second-hand data changes and mark the appropriate items as dirty. You can try to provide a patch if you feel it is (or should be) possible, at least for core fields, but even for those it would, while probably being possible, be a rather complex process. (The Hierarchy data alteration makes this even more complex, and would probably make a exact-match type of query impossible without hard-coding exceptions for it.)
But, I don't know, maybe we should still try to catch this in as many cases as we can … I agree that it is irritating, and could be hard to spot for site builders/admins.

On your own site, apart from using custom code you could also use Rules to fix this.

drunken monkey’s picture

Version: 7.x-1.6 » 7.x-1.x-dev
bago’s picture

I don't know search_api internals, so I may need some guidance...

Here is what I would do:
1) Use hooks to detect when a taxonomy term update involve its relations.
2) In this hook ask search_api for a list of indexed fields of type "taxonomy term"
3) For each field detect if the hierarchical processor is used or not
4) if the hierarchical processor is used run a search to find every content that links my "tid" on one of the fields (using OR search or running one search for each field)
5) mark all of that nodes as changed in the search_api.

I don't think this is easy, but it should work most time: am I missing anything?
Have you any hint about what are the correct ways to ask search_api the informations we need in that steps?

If this issues takes long to be fixed, what about implementing only #1 and add a set_message to tell the user that relations changes in taxonomies are not propagated to the search engine? (my main issue was that I found out that I had out of sync data after a lot of time and it took a while to understand the cause).

drunken monkey’s picture

Yes, the code you describes should work as a custom fix for your site. You can use $fields = $index->getFields() to get a list of all indexed fields on an index, $fields[$field_key]['entity_type'] will then contain the entity type of the field (if it is present).

However, if you mean this as a patch for inclusion in the Search API project, it's much too specific. Changing the taxonomy term name, or a linked node's title, etc., would have to be caught as well. Special-casing for the Hierarchy data alteration would probably be too specific, as well.

If this issues takes long to be fixed, what about implementing only #1 and add a set_message to tell the user that relations changes in taxonomies are not propagated to the search engine? (my main issue was that I found out that I had out of sync data after a lot of time and it took a while to understand the cause).

For that, putting a note prominently in the documentation (I'm currently working on a "common pitfalls" in the Getting started section) should probably suffice, too.

bago’s picture

I'm not sure I follow you about the "generic" vs "specific" stuff.

Isn't search_api indexing only tids? Why should it care if the path or the name of a term is changed?
(is this behaviour depending on the search_api backend of choice?)

What are the other "generic" cases where a taxonomy term change would require to reindex related entities (excluding the taxonomy_term itself)?

drunken monkey’s picture

The problem is that you cannot just index term IDs: you can index the related term's name, the related term's parent's parent's vocabulary's description, a related node's author's mail adress, … – you get the idea. There are an infinite number of possibilities here, so just adding a fix for this tiny portion of use cases (changing the parents when you have the Hierarchy data alteration enabled and are indexing a term reference directly on the item) would do more harm than good.
We'd either need a generic solution, building on generic Entity API functionality, or we should just document this and refer users to using Rules or custom code if this is a common occurrence for their site.

bago’s picture

I didn't even know it was possible, but I see there is a "Add related fields" option in the field tab. Is that feature you are talking about? In that case you should reindex the content whenever a referenced term is changed.

I was proposing a specific "taxonomy term" solution because I thought that taxonomy term was already a special entity type in search api as I was seeing it as field type in my index (I'm not talking about what kind of entities can be indexed but what kind of fields can be indexed). Are you saying that an indexed field can be of any entity type (e.g: a referenced node)? In this case search_api does a lot more than I thought.

drunken monkey’s picture

I didn't even know it was possible, but I see there is a "Add related fields" option in the field tab. Is that feature you are talking about? In that case you should reindex the content whenever a referenced term is changed.

Yes, exactly. You can index any field of any related entity, and even of ones indirectly related. This makes reacting properly here very hard.
Of course, it wouldn't be too hard to keep track of fields from which entity types are indexed. But it would be very hard to reliably figure out what indexed nodes are actually affected by a, say, changed term name. And having to re-index all nodes when you change a single term name is, of course, out of the question.

I was proposing a specific "taxonomy term" solution because I thought that taxonomy term was already a special entity type in search api as I was seeing it as field type in my index (I'm not talking about what kind of entities can be indexed but what kind of fields can be indexed). Are you saying that an indexed field can be of any entity type (e.g: a referenced node)? In this case search_api does a lot more than I thought.

Yes, exactly that is the case. For example, you'll see that the Author field is of type "User". And the same is possible for all other entity types, if there is a reference to them. (You can, e.g., also index the File entity of an image field on the node.)
Taxonomy terms have a bit of special treatment in the Views integration, but other than that the Search API supports all entity types the same way. Which makes it very powerful, but also makes it very hard to solve such issues while keeping it this generic.

drunken monkey’s picture

Title: Changing term parent for a field with the hiearchy processor fails to reindex content » Changes in related entities don't lead to proper re-indexing

Changing the title to reflect the broader issue.

gilsbert’s picture

Hi Drunken!

I'm facing the same challenge: how to reindex contents that had a referenced taxonomy term changed?

I understood all the complexity.
I have two questions.

1 - Are you planning to develop an automatic process for this?
It would be fantastic if we could work on this but I have no idea how to implement this. Perhaps it would be need:
a) a dozen of hooks just to be able to see all possible changes over entities that might be used in search_api indexes (or can we write just one generic hook for it?)
b) a sequence of tests to check: if the changed entity is used in an index and what are the "contents" affected by the change that should be marked as dirty and with that reindexed!
If you have a trial code I'm avaliable to help you with the task.

2 - How can I use "rules" to make it work?
In case there is no better solution how could I use "rules" to integrate changes in an entity with search_api? I would need something similar of what I wrote before right? I would need to trigger all "contents" related to a changed taxonomy term (just an example) as dirty. Am I right?
Finally, how could I make this in a taxonomy with hierarchy option turned on?

Regards,
Gilsberty

drunken monkey’s picture

1 - Are you planning to develop an automatic process for this?

At same point maybe/probably, but not in the near future, no. It's sadly far from trivial to do this, otherwise we'd have implemented it already.
From the top of my head, it might work like this:

  • For all indexes, remember all the indexed related entities' types. E.g., if field_tags and author are indexed, save taxonomy_term and user for this index (in some data strutcture). Probably also save all of the fields directly or indirectly indexed.
  • In hook_entity_update(), don't just check for indexes of that type, but also of indexes who have that type saved as an indexed related entity type.
  • Then it gets complicated. If we don't just want to blindly mark all items as "dirty" (which would, of course, be madness), we have to figure out which items are actually affected by the change, and I'm not really sure how we could best do that. We could of course issue a search to determine items which have the changed item indexed (at least if the related entity's IDs are indexed, too), maybe additionally checking whether a relevant (i.e., indexed) property of the related entity changed.
    However, I guess at some point this would just be a heuristic (not least because of the possible discrepancy between fields on the objects and Entity API metadata properties) and we'd still be sure to miss some edge cases – the "Index hierarchy" data alteration would be a prime example for that.

So, a lot of work and thinking to do here before this has a chance to be fixed. And even then, we'd a) still not catch all cases, probably, and b) it would not work for things other than entities anyways.

To answer your other question, with Rules: I'm honestly not sure what's possible with that and what isn't. If you are using the "Index hierarchy" data alteration, a proper reindexing of affected items sounds very much like it might be in the "isn't" category, though. So if you don't find a way to do it, I guess only some custom code would help you. Implement hook_taxonomy_term_update() (and, probably, the hook_taxonomy_term_delete(), too), check whether relevant properties changed, find the items that are affected by the change and mark them as "dirty" with search_api_track_item_change().

gilsbert’s picture

Hi @drunken monkey!

I would like to start this week a code trying to automatically integrate taxonomy changes and reindexing process.

I'm thinking about to write a code using the two hooks you suggested (term_update and term_delete). At first the code will be very simple and will only test "ids" (vid, tid, etc). We might work on it later and give to the code more smartness to test if the index really use the property/field changed (only in case of term_update of course).

Well please take a look on the pseudo code below.

// The idea is call this function inside both hooks!
function search_api_taxonomy_full_integration( $vid, $tid ) {

  // I would need to check if the vocabulary is in use, but not strongly necessary.
  // I mean: if it is possible the code will be faster but it is not crucial for the logic.
  if ( search_api_vocabulary_in_use ( $vid ) ) {


    // Now we would need a function returning an array with 2 dimensions. The first one will have the "type" of the service and the second will be an array with all "item_ids".
    // In another words a search_api function that will return all items using a given "tid" (term of taxonomy id).
    $search_api_affected_items = search_api_list_items_by_tid( $tid );
    foreach( $search_api_affected_items as $type, $items_id )
      search_api_track_item_change( $type, $items_id );

    // Now we test if the vocabulary has hierarchy option turned on!
    if ( search_api_vocabulary_with_hierarchy( $vid )) {
         
      // I will need to check if there is already a taxonomy function that returns all children for a given tid... but lets suppose there is one.
      $children = list_children_taxonomy( $tid );

      // Recursive call so we can integrate changes over a parent and respect the hierarchy option!
      foreach( $children as $children_tid )
         search_api_taxonomy_full_integration( $vid, $children_tid );
    }
  }
}

Now I have doubts about my approach.

a) Are there those functions?
1 - search_api_vocabulary_in_use ( $vid )
2 - search_api_list_items_by_tid( $tid )
3 - search_api_vocabulary_with_hierarchy( $vid )

b) In case there is not one or more of the mentioned functions can we substitute by a similar one or can we write one?

Special note about "search_api_list_items_by_tid( $tid )": I believe that indexes directly based over taxonomy terms are already integrated with taxonomy changes just like indexes directly based over nodes are already integrated with content changes. Am I right?
In case I'm right this function should not list "items_id" from those indexes because it would be unnecessary.

Regards,
Gilsberty

drunken monkey’s picture

a) Are there those functions?
1 - search_api_vocabulary_in_use ( $vid )
2 - search_api_list_items_by_tid( $tid )
3 - search_api_vocabulary_with_hierarchy( $vid )

No, none of these functions exist. Do the following to emulate them:

  1. Loop over all enabled search indexes and look at their fields. For fields which have entity_type set to taxonomy_term, try to find their Field API field name (for properties directly on the search item, that's the Search API field identifier itself). Then use the Field API functions to find out the field's vocabulary. Use that to build an index mapping the vocabulary IDs to the indexes and indexed fields using them.
  2. Just execute a search filtering for that TID (search_api_query()). Use the information from the previous step to do this: i.e., search for all indexes and then filter for all fields that use this vocabulary.
  3. I don't think you'll need this: if the hierarchy option is activated for a field, the relevant items will already be returned in the search from the last step.

Special note about "search_api_list_items_by_tid( $tid )": I believe that indexes directly based over taxonomy terms are already integrated with taxonomy changes just like indexes directly based over nodes are already integrated with content changes. Am I right?

Yes, changes for the items themselves are already taken into account. However, if you index taxonomy terms along with their parents/ancestors, the same will not be true for changes in those related terms. Only a change on the term object itself will trigger it to be reindexed automatically.

gilsbert’s picture

Okies.

Can you help me with (1) and (2)?
It can be a draft / pseudo code that I will test and make work.
The rest I'm able to handle alone.

Regards,
Gilsberty

drunken monkey’s picture

1.

$vocabulary_fields = array();
foreach (search_api_index_load_multiple(FALSE, array('enabled' => TRUE, 'read_only' => FALSE)) as $index_id => $index) {
  foreach ($index->options['fields'] as $name => $field) {
    if (isset($field['entity_type']) && $field['entity_type'] === 'taxonomy_term') {
      // Review this list once and see how to get the field names for your
      // site.
      dpm($name);
      $field_name = 'something';
      $voc = NULL;
      $field_info = field_info_field($field_name);
      if (isset($field_info['settings']['allowed_values'][0]['vocabulary'])) {
        $voc = $field_info['settings']['allowed_values'][0]['vocabulary'];
      }
      $vocabulary_fields[$voc][$index_id][$name] = $name;
    }
  }
}

2.

if (!empty($vocabulary_fields[$vid])) {
  foreach ($vocabulary_fields[$vid] as $index => $fields) {
    $query = search_api_query($index);
    $or = $query->createFilter('OR');
    foreach ($fields as $field) {
      $or->condition($field, $tid);
    }
    $query->filter($or);
    $results = $query->execute();
    foreach ($results['results'] as $id => $result) {
      $affected_ids[$index][$id] = $id;
    }
  }
}
gilsbert’s picture

Thanks.

I will write a beta code and post it here for your considerations.

It might take sometime but I will do it. :-)

Regards,
Gilsberty

jtbayly’s picture

Issue summary: View changes

Yikes!

I didn't realize that changes to related entities (that are part of the index) didn't make a node get re-indexed.

-Joseph

Poieo’s picture

This also comes into play with Commerce products/displays. If a product entity is updated directly, the product display node is not updated in the index.

Specifically, if I change a product to disabled directly (via admin/commerce/products/[pid]), rather than editing a product display node, the product is not removed from the search results.

drunken monkey’s picture

As mentioned already, you can use Rules for now to work around the problem.
But of course I realize this is less than ideal. It's just a rather hard problem, and I'm chronically short on time.

Poieo’s picture

I've tried Rules. I have a rule that reacts on the event 'After updating an existing commerce product'. It fires the action 'Index and entity'. I have 'commerce-product' as the data selector and 'index immediately' is checked.

Viewing the Rules debug, everything fires correctly. However, the product display is not added/removed from the index. Do I need to add a condition to pull in the product display and use that as the data selector? If so, any thoughts on how to do that?

drunken monkey’s picture

The data selector of course has to match the entity that is indexed in your search index. So if you're indexing product displays, yes, you need to match that in the data selector. No idea how to do that, though, sorry.

Poieo’s picture

I was able to get this working for updating a commerce product. Rule export below.

{ "rules_update_search_index" : {
    "LABEL" : "Update Search Index",
    "PLUGIN" : "reaction rule",
    "OWNER" : "rules",
    "REQUIRES" : [ "commerce", "search_api", "entity" ],
    "ON" : { "commerce_product_update" : [] },
    "IF" : [
      { "entity_exists" : {
          "type" : "node",
          "property" : "field_product",
          "value" : [ "commerce-product:field-product-node:0:nid" ]
        }
      }
    ],
    "DO" : [
      { "search_api_index" : {
          "entity" : [ "commerce-product:field-product-node:0" ],
          "index" : "product_display"
        }
      }
    ]
  }
}
tobiberlin’s picture

This issue is quite old but referenced in https://www.drupal.org/docs/8/modules/search-api/getting-started/common-... - as I run into such a problem now I would like to "ping" this issue. Had there been any progress?

I think of writing a module for this so I would like to know if anyone started to work on this already? And if the maintainer(s) of search API would assist me when it comes to offering settings for each index and so on?!

drunken monkey’s picture

If you can get it working in a clean enough manner, I'd even be interested in adding this to the module itself.
What would have been your approach?
I'm not aware of any work in this direction, no.

tobiberlin’s picture

This is a good question and I still think around this. It seems that the differences between Drupal 7 and Drupal 8 are too big so would you recommend to focus on Drupal 8 for that?

My thoughts about the approach:

- it would be helpful if the settings for this would be done individually on every search index and on bundle/ field level
- On every search index admin page the admin could find a page/ tab "Entity reference updates" (or somethig like that):
-- fieldsets for all bundles of the entity indexed in the search index (or just those bundles indexed)
-- checkboxes for each reference field within this bundle. Referenced entities in the checked fields would cause the parent entity to be re-indexed when updated

I would store these settings in a table in a way that on entity update it is easy to find out if the current entity is referenced from a specific field/ bundle. I will dive into entity reference logic to find out how this should be done without doubled stored data and complex/ big database queries to keep performance in mind.

tobiberlin’s picture

What came into my mind: I just thought of Entity Reference fields (Drupal 7) but this logic would need to respect other relations as well: author, organic groups, terms, entity reference revisions (for paragraphs). Maybe for this something like what is called a plugin in Drupal 8 would be helpful?!

My aim would be to build a logic which could be easily used in Drupal 7 and 8 as I need this for Drupal 7.

drunken monkey’s picture

OK, with such a complex settings and storage infrastructure, a separate module might be best after all. I'd have looked for ways to solve this automatically, but I guess letting the user set all this up themselves makes it a lot easier to implement. Basically, you're making a small-scale "Rules" module clone for just the task of re-indexing items when other entities change.

Supporting all the different reference fields shouldn't be too much of a problem, since they all have to integrate with Entity API anyways. So, you can just check if a property's data type is an entity type (or a list thereof).

Restricting this to just the indexed fields would also be an option. I.e., if you want to re-index based on term changes, you have to index the field_tags field (or whatever other term reference field(s) you have). Then, check it under "Entity reference updates", maybe select that it's of type "Taxonomy term" (although that should be possible to determine automatically) and then every time a taxonomy term is changed you just search the index with condition('field_tags', TERM_ID) and re-index all results.
Otherwise, you have the problem of matching the changed taxonomy term to the items that actually contain it – an entity field query might work for that, too, though, haven't really used those in D7.

bucefal91’s picture

Version: 7.x-1.x-dev » 8.x-1.x-dev
FileSize
16.4 KB

Hello guys!

We also have plenty of 'foreign' entity values participating in the search index, i.e. exactly what this issue encompasses. I would like to give it a shot with implementing at least the most basic general solution to this problem.

We run the 8.x-1.x version, so I am going to switch the version of this issue. I'd like to begin with a failing test (see it attached). It exposes the issue itself in automated/testable way. As you can see, I am targeting only 1 level of nesting and I hope the system should be smart enough to mark for re-index only if the referenced entity did effectively change one of the indexed fields (and should not mark for re-index if only non-indexed fields got changed).

I'll see if I manage to write a patch now that makes this test pass :)

bucefal91’s picture

Status: Active » Needs review
FileSize
26.28 KB

Let's see how it comes.. Here's the patch that includes the test from above but now it also amplifies the search_api.module so it actually fulfills the behavior the test above expects from the module.

I am not going to include the interdiff because it's essentially same test with/without the application code.

I'd like to say a few notes about the implementation. First of all, whoever is the author of FieldsHelper service -- I owe a biiiig thank you!! The task came so easy since I was able to outsource all the low-level analysis of data types and relationships/nesting into that service.
Also, I chose to add it into the TrackerPluginBase. From one hand, this kind of behavior is probably any tracker should be doing (in case some classes inherit from the Basic (or even the base class itself)), so at first I felt tempted to make it part of the TrackerInterface. But on the other hand, I couldn't say it's actually a public interface of a tracker. It feels more like an implementation detail, not part of a public interface. So eventually I settled this dilemma by making it part of the base class (very likely all trackers will have this behavior while still not going as far as making it public interface).

The code came out pretty clean. I really was afraid the outcome will be much uglier. It reads pretty logical. I believe introducing support for more than 1 level of nesting also can be achieved within a reasonable effort. But I stopped here with just 1 level of nesting because I was just afraid of getting myself into a rabbit hole. And I sure would love to get some feedback from the rest of the community before I take it even further.

The best part though -- It really works!! :) On our Drupal instance this piece of code actually even picked up some relationship (and thus necessity to re-index in a spot where I myself wasn't even aware.. I noticed it does pick it while debugging in xdebug).

I ran full set of tests on localhost (except for Javascript). There was only 1 failure due to a module my installation was missing. So I am willing to throw in this patch and I am full of hope it's gonna come out with clean green light.

bucefal91’s picture

Ok :) This one easy.. I just forgot to add the @group to the test class.

bucefal91’s picture

Alright, so here comes the first 'real world' feedback from our instance of using this patch. I made a minor error - I should have accounted for the fact that a field that is part of the search api index belongs to a different bundle than the one that just got updated. Attaching a revision to my last patch where it's taken care of.

drunken monkey’s picture

Wow, great job! Thanks a lot for your work! That would really be a great problem to finally fix, or at least alleviate, and your patch looks like a great step in that direction.

My main concern with the patch in its current form is performance: This new code is gonna take a significant amount of time to run, especially for entities unrelated to the index, and it’s going to run every single time any entity is updated or deleted. For most smaller sites, this will probably be a reasonable trade-off (unless they don’t index any related entities’ properties, in which they wouldn’t benefit at all), but for larger sites it would often be an unreasonable burden on performance. So, at the very least, this functionality should be optional. (Not sure what the default would be, though.)

As almost all of the computations can actually be made beforehand, though, a better solution would probably be to compile in advance (for instance, when the indexed fields change, or maybe also for Field API field config changes) a list detailing changes to which fields of which entity types will lead to the need for re-indexing. That way, at entity CRUD time the check would (hopefully) be much faster – especially for entities that won’t trigger any re-indexing.
Don’t know if you have the time/energy left to try your hand at that as well?

Some more high-level comments (with the sheer size of the patch, there’s of course also lots of coding standards problems, typos, etc.):

+++ b/src/Tracker/TrackerPluginBase.php
@@ -33,7 +41,187 @@ use Drupal\search_api\Plugin\IndexPluginBase;
+                // We've already discovered this entity is affecting our index.
+                // It is pointless to continue examining the rest of the search
+                // index fields as we've already confirm we have to mark the
+                // search items that point to this entity for re-indexing.
+                return;

In theory, this doesn’t have to be the case, if the same entity is referenced via multiple different (Field API) fields.
However, I can’t really think of a common scenario where that would be the case, so it’s probably fine to leave it like that. (This will probably never work for all scenarios anyways, so no need to make it complicated. On the other hand, with the pre-computation as suggested above, this would probably be fixed as well, as a side-effect.)

+++ b/src/Tracker/TrackerPluginBase.php
@@ -33,7 +41,187 @@ use Drupal\search_api\Plugin\IndexPluginBase;
+                $query = $this->entityTypeManager->getStorage($datasource->getEntityTypeId())->getQuery();
+                $query->accessCheck(FALSE);
+                $query->condition($property_path_to_entity, $entity->id());
+
+                $ids_to_reindex = [];
+                $languages = array_keys($this->languageManager->getLanguages());
+
+                foreach (array_values($query->execute()) as $id_to_reindex) {
+                  foreach ($languages as $language) {
+                    $ids_to_reindex[] = "$id_to_reindex:$language";
+                  }
+                }

All of this is specific to content entity datasources and won’t work for (most) others.
The ID generation (which is even specific to our default entity:* datasources) could probably be adapted to work for other datasource plugins as well (via DatasourceInterface::getItemId()) – however, I don’t really have a solution for the entity query. If the datasource doesn’t contain entities, we can’t really find out which items reference this entity (unless the basic reference field is also indexed, in which case we could use a search query, I guess).

So, maybe this would better live in the datasource plugin? We’d move as much as possible to the DatasourcePluginBase class, and implement the part that depends on internals in our default ContentEntity plugin. Other datasource plugins would then be free to implement this, too. (If we commit it like this, without pre-computation and caching, we would just have to make sure we’d make it easy to add that later without breaking any added API.)

Anyways, thanks a lot again for this!

bucefal91’s picture

Hello! :)

Yep, your suggestions make perfect sense. I also felt like doing unnecessary expensive stuff upon each entity change (traversing the index fields looking for any indication on whether any search items need to be rebuilt). Pre-computing it is definitely way better.

I am also much in agreement with your suggestion to split this code between Tracker and the Datasource. So Tracker does most of the abstract logic while data source deals with entity-specific details and thus other data sources at least have some abstract ability to "hook" into this thing and provide their alternative implementation for some stuff related to the inner nature of data that is indexed.

Just a few technical/implementation level questions: where do you prefer to store the "pre-computed" analysis of the search index fields? In \Drupal::state() or somewhere within the index config entity? Also, to be able to determine whenever that 'pre-computed' stuff needs to be recomputed again.. Maybe it's possible to leverage Cache API (cache-tags) in some way? Like if your index is "my_index" and it includes "field_entity_reference" and "field_taxonomies" as part of its fields, then we could listen for invalidation of any of search_api_index:my_index, field_config:field_entity_reference, field_config:field_taxonomies cache-tags and re-compute our bits at that time.
Hm... having that said... Maybe this thing could rather be stored into cache instead of State API? :) Then we'd get the support for smart cache invalidation right out of the box!

Also, thank you for giving such warm welcome to this patch. I understand it affects relatively deeply the tracking layer, so it's a joy to see overall you are in agreement with the direction that I took in the patch. During the work days I am pretty busy at the day job and hardly I can sell "polishing" this patch in the middle of severe battle we are having here with our Drupal 8 instance. But I could give it a shot on the weekend. I'll try to re-distribute this logic better among Tracker and Datasource. I'll also do the pre-computation and will store it as cache (unless you oppose to this idea and/or propose something else). And I'll make sure this code follows coding standards.

bucefal91’s picture

Hello, drunken monkey

Here I am with a new iteration on this patch. Conceptually it is the exactly the same thing, it was just rather restructured/broken down to more logical & manageable pieces. It was pretty clumsy all stuffed into one single method. I also executed on your suggestions. Additionally, I tried to utilize as much of pre-existing search_api functionality/helper methods as I could find (in order not to re-invent the wheel and to maintain the principle of single responsibility).

I've placed a bunch of TODO items. Of course, by no means they all should be committed but I rather just wanted to highlight the pieces I was unsure about the most (and in case you have any feedback/opinion on those, please do let me know). The only TODO I really address to myself and hope to fix any day soon is this one:

      // TODO: You need somehow to figure how to convert cache-contexts into
      // cache tags since potentially you're losing some cacheability here by
      // ignoring them.

Some specific items I took care of:

  • Caching the map of entity relationships -- exactly as you've suggested. Now this stuff gets computed once and gets stored into Cache API (utilizing the cache-tags and the whole "cacheability" notion).
  • While implementing the above, I eventually took away the limitation of exploring just 1 level of depth of entity relationships. Now this patch can go as deep as your index's fields dictate it to be. Here on localhost I've seen it work on the 2nd level of relationships, i.e. smth like field_first_level_entity_reference:entity:field_second_level_entity_reference:entity:field_some_field_that_is_in_the_index
  • With your suggestion that some of this logic was basically "hardwired" to the "content entity" datasource plugin, I've tried my best to split the functionality into 2 pieces: to keep something on the shoulders of the tracker while delegating all the low-level specifics to the datasource plugin. I think it came out pretty nicely. So I introduced an interface that any data source plugin may implement in case it wants to play this game of tracking changes in foreign entities. And of course, I've implemented this interface in the "content entity" datasource.

I hope you still like where this thing is moving :) I haven't cleaned up the coding standards yet since I still have plenty of TODO items but I sure can take care of that once we come to the idea the patch is alright as far as functionality/performance goes.

I am looking forward to your feedback!

drunken monkey’s picture

Status: Needs review » Needs work

Thanks a lot again for the great work!

The location of the code now makes more sense, thanks! However, I’m still not completely convinced. In my mind, determining changed items isn’t the responsibility of the tracker, so I think none of the code should live there. However, it is true that a lot of the code could be reused for other datasources, so putting it all into the ContentEntity datasource plugin also doesn’t make sense.

However, one point to note here is that (at least currently) the search_api_entity_*() hook implementations are (as the doc comments also note) not part of the module’s framework/API, but specific to the ContentEntity datasources.
If the additional code added here will work for other datasources as well, that’s great, but we’d have to note that somehow, and ideally keep those two code blocks separate.

Maybe moving the actual bodies of those hooks to a new service would make sense? Then we could also move the code that’s now in the tracker there, and maybe separate the parts that are specific to ContentEntity datasources into helper methods or something.

Anyways, all this “where should the code live to make sense” is hard for you to guess, since it’s mostly just my personal taste. Therefore, probably better just leave it as-is for now and I’ll later do a refactoring of the patch before committing once you’re satisfied. So, please just tell me when the patch is “done” as far as you’re concerned (or you won’t have any more time to work on it in the near future) and then I’ll take over. (Time I did some work, too, instead of just nagging. ;))

Some more remarks on the patch:

  1. +++ b/src/Tracker/TrackerPluginBase.php
    @@ -33,7 +47,319 @@ use Drupal\search_api\Plugin\IndexPluginBase;
    +    // TODO: Is the "cache.default" a reasonable cache to utilize for our
    +    // purposes?
    +    $tracker->setCacheBackend($container->get('cache.default'));
    

    Probably, yes. Since we’ll only have a handful of entries, a separate cache table would be overkill, I’d say.

  2. +++ b/src/Tracker/TrackerPluginBase.php
    @@ -33,7 +47,319 @@ use Drupal\search_api\Plugin\IndexPluginBase;
    +    $cid = $this->getIndex()->id() . ':foreign_entities_relations_map';
    

    If we use a general-purpose cache bin, we should prefix the key with search_api:, though.

  3. +++ b/src/Tracker/TrackerPluginBase.php
    @@ -33,7 +47,319 @@ use Drupal\search_api\Plugin\IndexPluginBase;
    +   *   0: (string) Entity type $property refers to.
    +   *   1: (array) A list of bundles $property refers to. In case specific
    +   *     bundles cannot be determined or the $property points to all the bundles
    +   *     an empty array will be here.
    

    Please use string keys here, as they make the code easier to read.

  4.       // TODO: You need somehow to figure how to convert cache-contexts into
          // cache tags since potentially you're losing some cacheability here by
          // ignoring them.
    

    I think you misunderstand what cache contexts are for. They are used in some parts of the system (Render API mostly, I think) to determine the cache key ($cid) to be used for an item. So, they are for cache variation, not invalidation. (Unless I’m very mistaken – which can always happen, when caching is concerned.)
    Since I’m pretty sure we need just one cache entry per index, I think we can safely discard the cache contexts.

    However, I’m also actually not sure whether we need to include tags and max-age from the individual fields/properties. I don’t really think we use anything that could change. (I don’t think it’s possible to change the target entity type of a field. And while the target bundles might change, it seems to me the code would also work without them – we’d just run an additional entity query in some cases that could be otherwise avoided. But on the other hand we’d avoid invalidating our cache in a lot of cases – so that might be worth it.)

  5. +++ b/src/Tracker/TrackerPluginBase.php
    @@ -33,7 +47,319 @@ use Drupal\search_api\Plugin\IndexPluginBase;
    +              // Unfortunately this glue of ":" is hardcoded in
    +              // Utility::splitPropertyPath(), ideally we'd like to be some sort
    +              // of constant instead.
    +              // TODO: Maybe we could introduce a helper method into the Utility
    +              // class? Smth like the reverse of Utility::splitPropertyPath(),
    +              // i.e. Utility::gluePropertyPath($seen_path_chunks) or smth along
    +              // these lines.
    

    Yes, I was also surprised to find that this is just hard-coded. We do have a constant for the datasource separator in the item ID, so why not for this?
    Maybe we should just add such a constant. I don’t think we glue property paths back together often enough, though, to warrant a helper method. (I couldn’t, with a quick search, find a single existing instance.)

  6. +++ b/src/Tracker/TrackerPluginBase.php
    @@ -33,7 +47,319 @@ use Drupal\search_api\Plugin\IndexPluginBase;
    +      // TODO: Wouldn't it be nice to let others alter this map so any custom
    +      // logic may be injected into the map too?
    +      // \Drupal::moduleHandler->alterThisThing() :)
    

    Yes, I agree, that could indeed be useful.
    However, we now deprecated all our hooks and are using events instead, so this should use an event, too. (See \Drupal\search_api\Event\SearchApiEvents and #3023704: Convert hooks to events.)
    If you are unsure on events, you can leave that to me, too.

In any case, thanks a lot once more! This is really great work!

drunken monkey’s picture

bucefal91’s picture

Hello!

:) I've addressed all of your input (on top of your #36 patch) and removed the TODO items that you clarified. As a last piece, I've addressed multiple coding standards things (just within the code from this patch).

To be honest, I think yes, it'd be more efficient if you arrange/split the code further per your architecture/vision on the module. Just one additional thing I could suggest is to refactor some of it into a Trait so other trackers/datasources selectively can use the trait instead of 'forcing' this code into the TrackerPluginBase class, for example :) But it's all up to you.
I really enjoy the layout of Search API architecture, very flexible, very concise. Even the fact eventually this patch (that encompasses a pretty big chunk of functionality) took such reasonable (small) amount of code is a strong indicator of a successful design, in my opinion. Without the correct design it would be not flexible enough to easily accommodate this new feature or I wouldn't be able to reuse so much of the existing code. I am quite grateful for having Search API in Drupal, so thank you for maintaining it and bringing the module to such success! =)

Let me just answer a few points from your input to get my point across:

I think you misunderstand what cache contexts are for. They are used in some parts of the system (Render API mostly, I think) to determine the cache key ($cid) to be used for an item. So, they are for cache variation, not invalidation.

Yep, at the moment when I left that TODO item I still wasn't clear how to correctly take into consideration the cache contexts. Today I looked up how Renderer service does it (RenderCache class, to be specific), and eventually they "flat" cache contexts out as additional piece to the $cid (thus your cache effectively varies by cache contexts). So I've implemented this same approach onto our case here. I do not insist on committing it though (I'll cover on it more in the next paragraph).

However, I’m also actually not sure whether we need to include tags and max-age from the individual fields/properties. I don’t really think we use anything that could change.

Yes, especially from practical standpoint your vision makes a lot of sense. My motivation to include it was to "play fair" -- output of this method actively depends on the field config, so let's actively depend on the field config in our cacheability.
Also, about cache max-age and cache contexts... Giving a proper treat to max-age is pretty inoffensive to performance. Anyway in practice max-age is "permanent" for the cacheability of any field config. As for cache contexts, you make valid concern that it will decrease a bit the cache hit ratio. Yet, especially with the addition of the "mapping foreign entities" event, not propagating max-age or cache-contexts into the cache could be a two-sided sword: what if someone's custom logic does depend per language (or expires after certain time), they would report that into the cacheability object, but the search API module would then simply ignore that stuff. Of course, a tiny private patch would solve the problem, but talking theoretically, it's no longer entirely in the hands of Search API module since the appearance of event, so I'd propose to keep max-age and cache-contexts. But it's not a hard requirement, just me trying to think in abstract way :)

Thank you for reverting so quickly on these patches. Sometimes digging through someone else's code is not so much fun :) I guess at this point I do deem the patch as ready and basically invite you to take over from here.

bucefal91’s picture

Status: Needs work » Needs review

.. putting the right issue status ...

bucefal91’s picture

Fixing up the failing test.

Also, while fixing it, I noticed the process of casting cache contexts to cache keys carries some cacheability along with it (you can have a look at the CacheContextsManager::convertTokensToKeys() for extra info). So I made it part of our cacheability too.

Attaching interdiff and the lumpsum patch. :)

s_leu’s picture

Status: Needs review » Needs work

The latest patch breaks the node form for me. I have an index on group_content that also indexes fields of the nodes related in the group_content entities. After saving a node via the form i get this:

Drupal\Core\Entity\EntityStorageException: 'entity_id' not found in Drupal\Core\Entity\Sql\SqlContentEntityStorage->save() (line 846 of core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php).
Drupal\Core\Entity\Query\Sql\Tables->addField('entity_id', 'INNER', NULL) (Line: 52)
Drupal\Core\Entity\Query\Sql\Condition->compile(Object) (Line: 172)
Drupal\Core\Entity\Query\Sql\Query->compile() (Line: 80)
Drupal\Core\Entity\Query\Sql\Query->execute() (Line: 1051)
Drupal\search_api\Plugin\search_api\datasource\ContentEntity->getItemIdsAffectedByReferencedEntityChange(Object, Object, Array) (Line: 229)
Drupal\search_api\Tracker\TrackerPluginBase->trackReferencedEntityCrud(Object, Object) (Line: 124)
Drupal\search_api\Tracker\TrackerPluginBase->entityChanged(Object) (Line: 260)
search_api_entity_update(Object)
call_user_func_array('search_api_entity_update', Array) (Line: 403)
Drupal\Core\Extension\ModuleHandler->invokeAll('entity_update', Array) (Line: 206)
Drupal\Core\Entity\EntityStorageBase->invokeHook('update', Object) (Line: 843)
Drupal\Core\Entity\ContentEntityStorageBase->invokeHook('update', Object) (Line: 535)
Drupal\Core\Entity\EntityStorageBase->doPostSave(Object, 1) (Line: 728)
Drupal\Core\Entity\ContentEntityStorageBase->doPostSave(Object, 1) (Line: 460)
Drupal\Core\Entity\EntityStorageBase->save(Object) (Line: 837)
Drupal\Core\Entity\Sql\SqlContentEntityStorage->save(Object) (Line: 395)
Drupal\Core\Entity\EntityBase->save() (Line: 294)
Drupal\node\NodeForm->save(Array, Object)
call_user_func_array(Array, Array) (Line: 112)
Drupal\Core\Form\FormSubmitter->executeSubmitHandlers(Array, Object) (Line: 52)
Drupal\Core\Form\FormSubmitter->doSubmitForm(Array, Object) (Line: 591)
Drupal\Core\Form\FormBuilder->processForm('node_news_edit_form', Array, Object) (Line: 320)
Drupal\Core\Form\FormBuilder->buildForm(Object, Object) (Line: 91)
Drupal\Core\Controller\FormController->getContentResult(Object, Object)
call_user_func_array(Array, Array) (Line: 123)
Drupal\Core\EventSubscriber\EarlyRenderingControllerWrapperSubscriber->Drupal\Core\EventSubscriber\{closure}() (Line: 573)
Drupal\Core\Render\Renderer->executeInRenderContext(Object, Object) (Line: 124)
Drupal\Core\EventSubscriber\EarlyRenderingControllerWrapperSubscriber->wrapControllerExecutionInRenderContext(Array, Array) (Line: 97)
Drupal\Core\EventSubscriber\EarlyRenderingControllerWrapperSubscriber->Drupal\Core\EventSubscriber\{closure}() (Line: 151)
Symfony\Component\HttpKernel\HttpKernel->handleRaw(Object, 1) (Line: 68)
Symfony\Component\HttpKernel\HttpKernel->handle(Object, 1, 1) (Line: 67)
Drupal\simple_oauth\HttpMiddleware\BasicAuthSwap->handle(Object, 1, 1) (Line: 57)
Drupal\Core\StackMiddleware\Session->handle(Object, 1, 1) (Line: 47)
Drupal\Core\StackMiddleware\KernelPreHandle->handle(Object, 1, 1) (Line: 106)
Drupal\page_cache\StackMiddleware\PageCache->pass(Object, 1, 1) (Line: 85)
Drupal\page_cache\StackMiddleware\PageCache->handle(Object, 1, 1) (Line: 49)
Asm89\Stack\Cors->handle(Object, 1, 1) (Line: 47)
Drupal\Core\StackMiddleware\ReverseProxyMiddleware->handle(Object, 1, 1) (Line: 52)
Drupal\Core\StackMiddleware\NegotiationMiddleware->handle(Object, 1, 1) (Line: 23)
Stack\StackedHttpKernel->handle(Object, 1, 1) (Line: 694)
Drupal\Core\DrupalKernel->handle(Object) (Line: 19)
bucefal91’s picture

Unfortunately the error message is not enough to be able to pinpoint the problem in your case. I've reviewed the code trying to find some error in logic comparing to the information contained in your error message, but I couldn't identify anything.

I think we need to collect more information about your set up to be able to help. It seems your layout of fields has something in particular that I did not account for in my patch.
First of all, could you show what fields are part of your search index?

Also, please try running the patch I attach to this comment. It's the only clue I can think of currently. I've commented out a small piece of code that might be causing issue in your set up. So I wanna see if without it it runs, then I have some indirect support that my guess was right.

bucefal91’s picture

Status: Needs work » Needs review
FileSize
44.41 KB
718 bytes

Finally this patch got some internal review here at our company, Ocelot. The only thing that was picked up is a minor misspelling in my English in a comment. Attaching a fix for it.

Since we haven't heard yet from s_leu and I have no further clues on what's wrong in that Drupal installation. I guess I could bump it into "needs review" once again. s_leu, if you do come back with additional insights on your Drupal, I'll gladly review them.

drunken monkey’s picture

Once again, thank you so much for your great work!
And great to hear that you found the architecture nice to work with – that’s of course always the goal.

I finally got time again to work on this myself, and due to my specific ideas about what should go where, how things should be formatted, etc., I fear there is a pretty huge amount of changes to that patch and the interdiff probably almost unusable. Sorry about that!

Anyways, almost all of the code itself looked pretty solid to me, so very few functional changes. (Though I still ended up breaking things and having to debug. Well, serves me well for being so pedantic. ;))
I fear that I don’t fully understand the code for gathering the “entity relationships map”, which might proof tricky going forward (i.e., maintaining this code in the future), but as there is solid test coverage I think it should be fine. It’s also better than for some of the code I myself wrote, and I have managed there, too.

One thing I’m not sure about is whether this covers all the different cases for entity changes that could affect the index:

  1. Indexing field_tags:entity:name and changing a tag’s name: this is obviously covered already, as the test shows.
  2. However, if the tags are on the user entity, you index uid:entity:field_tags:entity:name and then you change the tags for a user: does this also lead to re-indexing? (So, are all “stations in-between” also covered in the property path?)
  3. For reacting to entity deletions: can you confirm that deleting a referenced entity doesn’t already trigger an entity update for the referencing entity? Seems to me like Core should already take care of that, in order to prevent the whole site’s data becoming corrupt (i.e., dangling references).
    I do know that this doesn’t happen in D7, though, so it’s probably the case for D8 as well and another Core bug we have to work around. Just want to make sure we’re not doing more work than necessary.

If all these cases are covered, this should probably also be reflected in the test, to make sure they stay working. Otherwise, we might want to amend that (and then, of course, also add tests) – but maybe in a follow-up, if this would be too much work otherwise.

Speaking of which, I’m not completely convinced of the terminology we’re using: “foreign entities”, “entity relations”, etc. Can we brainstorm different names? E.g., how about we rename getForeignEntityRelationsMap() to getIndirectlyIndexedPropertiesMap() and change the rest of the terminology accordingly? It’s really about the “foreign” entities’ properties being indexed, not the entities themselves. (Or, sometimes the entities themselves, too, I guess.)
Not at all sure, though, what’s best here. All the concepts are a bit tricky to convey, and if you only have the method name to describe it (so it can’t be, like, ten words) it gets really tough to keep it both correct and concise (enough).

Also, one bug that I did find was this:

  1. +++ b/src/Tracker/TrackerPluginBase.php
    @@ -33,7 +52,340 @@ use Drupal\search_api\Plugin\IndexPluginBase;
    +    $cid = 'search_api:' . $this->getIndex()->id() . ':foreign_entities_relations_map';
    +
    +    $cache = $this->cache->get($cid);
    
    +++ b/src/Tracker/TrackerPluginBase.php
    @@ -33,7 +52,340 @@ use Drupal\search_api\Plugin\IndexPluginBase;
    +      $this->cache->set("$cid:" . implode(':', $cache_keys), $data, $cacheability->getCacheMaxAge(), $cacheability->getCacheTags());
    

    OK, it took me a bit to realize this myself, but this can never work: you are always reading cache with a different $cid than writing it, so you will never get a cache hit. If you’d want to use cache contexts, you’d basically need to do a preliminary first pass (every time) just to collect the contexts, and then only generate the data (if necessary) in a second pass. I really don’t think this will be worthwhile here, so I fear we have to abandon cache contexts – even though your reasoning does make sense that this information might depend on other factors.
    However, I don’t think any of this is set in stone. If people complain that they have this use case, we can still think of something. (I do think there would be a couple of options.)

    I totally agree regarding max-age, in any case.

Any other open questions that I forgot to address?

I agree, more information from s_leu would be needed in order to properly investigate the problem they’re experiencing. Simply catching the exception (which would, of course, be much easier if Core correctly documented the exceptions that can be thrown) will at least help prevent this change from breaking the whole site as described, but it would of course be preferrable to fix the problem at the route, to have this new functionality working as well as possible.
The actually executed entity query when the exception occurs would also be very helpful.

In any case, it’s great if as many people as possible actually test this! This is as complex change, so we want to make sure it’s working properly in as many cases as possible, and doesn‘t break anyone’s site.

drunken monkey’s picture

(Oh, I guess this kinda depends on #3044782: Increase PHP version requirement to 7.0, as it already uses some PHP 7 syntax. But probably also no big deal if we jump the gun on that by a week or so.)

drunken monkey’s picture

Does anyone want to give further feedback on this?
This is a much-requested but very complex and tricky feature. The more testers there are, the more solid the end result will be.
(Also, we might still want to change those method names?)

marassa’s picture

Just stumbled upon a similar problem - adding a tag to a Picture node properly reindexes the Picture per se but not the Artefact the Picture refers to. The patch applied to 8.x-1.15 just fine but it did not solve the problem. Only then I realized that I am using Reverse references in this index - it's the updated Picture that refers to the Artefact that should be reindexed and not vice versa.
I tried the patch the other way around (changed the name of an Artefact) and it worked as advertised: both the Artefact and the Pictures pointing to it were reindexed.
So the patch seems to support "direct" references but not the reverse ones...
Also, after updating the Artefact I found the following bizarre message in my dblog:
Could not load the following items on index Search pictures: "entity:node/0", "entity:node/1", "entity:node/2", "entity:node/3", "entity:node/4", "entity:node/5", "entity:node/6", "entity:node/7", "entity:node/8", "entity:node/9", "entity:node/10", "entity:node/11", "entity:node/12", "entity:node/13", "entity:node/14", "entity:node/15", "entity:node/16", "entity:node/17", "entity:node/18", "entity:node/19", "entity:node/20", "entity:node/21", "entity:node/22", "entity:node/23", "entity:node/24", "entity:node/25", "entity:node/26", "entity:node/27", "entity:node/28", "entity:node/29", "entity:node/30", "entity:node/31", "entity:node/32", "entity:node/33", "entity:node/34", "entity:node/35", "entity:node/36", "entity:node/37", "entity:node/38", "entity:node/39", "entity:node/40", "entity:node/41", "entity:node/42", "entity:node/43", "entity:node/44", "entity:node/45".
As you can see it counts from 0 to 45. Why 46 items? It's the exact number of Pictures referencing the Artefact I updated. As I already said, the Pictures were reindexed properly with the new Artefact name, in spite of the error in the log.
Hope this helps...

drunken monkey’s picture

Great, thanks a lot for testing! With this, I was able to spot a bug in the code: apparently we re-use $ids_to_reindex in getAffectedItemsForEntityChange(), which would have led to various other bugs as well.
Should be fixed with the attached revision, please test again!

Also, again, other testers/reviewers would be very welcome! If this isn’t tested thoroughly, on various setups, this is bound to break something!

(Incidentally, this probably means we should have even more test coverage, too. For a $foreign_entity_relationship_map with multiple applicable entries, for example (as that would previously have not worked properly, I think, due to use overwriting $ids_to_reindex) and maybe with a mock logger.channel.search_api logger set that will throw exceptions when used (for anything above INFO, at least, if this would otherwise lead to incorrect fails). That would also have caught the bug.
I fear I’m totally swamped, though, and unable to add those myself, currently.

drunken monkey’s picture

marassa’s picture

Just tested, the errors in the log are gone but can't see any change in behavior related to [not] reindexing reverse references. Am I missing something?

drunken monkey’s picture

Just tested, the errors in the log are gone but can't see any change in behavior related to [not] reindexing reverse references. Am I missing something?

Ah, no, sorry. That is expected, supporting that would be a different problem, I guess. Since we don’t have any (framework-level) data on those references, it would have to be code specifically geared to the ReverseEntityReferences processor. Maybe getForeignEntityRelationsMap() could offer an extension point for enabled processors? Would be a rather complex setup, though, probably with a dedicated interface for processors adding further relationships, that ReverseEntityReferences would then need to implement.
Certainly doable, but maybe out of scope for a first version of this functionality.

marassa’s picture

Quite frankly, I forgot that ReverseEntityReferences is a separate processor within the module - the fields supplied by it are so seamlessly managed in "Manage fields for search index" that I did not realize that they are not "in the framework" in a similar way to "normal" references (admittedly did not dig deep enough into the module internals to verify). I will try to dig into it when I have time and maybe come up with something. I believe that as long as the module supports both "direct" and "reverse" references, both should be (ultimately) covered by this auto-reindexing functionality.

lukasss’s picture

#48 not working for me

drunken monkey’s picture

@ lukasss: Any chance you could be a tad more specific?

@ everyone: Come on, 24 followers and no-one willing to give this a try? If this works properly, it could be a huge step towards making Search API Just Work™. But it’s really pretty complex, so thorough testing is vital before committing this!

drunken monkey’s picture

Status: Needs review » Fixed

OK, then I’ll just commit this and hope for the best (or, rather, wait for people to complain).

Many, many thanks once again, bucefal91!

penyaskito’s picture

+++ b/tests/search_api_test_example_content_references/config/install/node.type.parent.yml
--- /dev/null
+++ b/tests/search_api_test_example_content_references/search_api_test_example_content_references.info.yml

+++ b/tests/search_api_test_example_content_references/search_api_test_example_content_references.info.yml
@@ -0,0 +1,8 @@
+core: 8.x

This broke the tests for 9.x. See https://www.drupal.org/pift-ci-job/1785819

3CWebDev’s picture

Tested patch #46 and it resolved the issue that was driving me crazy for a bit. Now the search index is updated when a commerce product's referenced term is updated. Thanks!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

sprocketman’s picture

Unfortunately, I think the patch breaks some media library functionality. I've tried both the patch directly on the stable version of search_api and the dev version and both result in the same behavior. This was tested on Drupa 8.9.6.

Scenario:

  1. An entity contains a media reference field that utilizes the Media Library widget
  2. A user opens the media library by clicking on the Add Media button in the field.
  3. Rather than adding an existing media item, the user attempts to upload a new media item.
  4. The file appears to upload, but the media library never updates to allow the user to add the file as a media item reference.

The error generated in the logs is as follows:

Error: Call to a member function getColumns() on bool in Drupal\Core\Entity\Query\Sql\Tables->addField() (line 253 of /var/www/html/public/core/lib/Drupal/Core/Entity/Query/Sql/Tables.php)
#0 /var/www/html/public/core/lib/Drupal/Core/Entity/Query/Sql/Condition.php(52): Drupal\Core\Entity\Query\Sql\Tables->addField('field_my_table...', 'INNER', NULL)
#1 /var/www/html/public/core/lib/Drupal/Core/Entity/Query/Sql/Query.php(172): Drupal\Core\Entity\Query\Sql\Condition->compile(Object(Drupal\Core\Database\Driver\mysql\Select))
#2 /var/www/html/public/core/lib/Drupal/Core/Entity/Query/Sql/Query.php(80): Drupal\Core\Entity\Query\Sql\Query->compile()
#3 /var/www/html/public/modules/contrib/search_api/src/Plugin/search_api/datasource/ContentEntity.php(1074): Drupal\Core\Entity\Query\Sql\Query->execute()
#4 /var/www/html/public/modules/contrib/search_api/src/Utility/TrackingHelper.php(123): Drupal\search_api\Plugin\search_api\datasource\ContentEntity->getAffectedItemsForEntityChange(Object(Drupal\file\Entity\File), Array, Object(Drupal\file\Entity\File))
#5 /var/www/html/public/modules/contrib/search_api/search_api.module(230): Drupal\search_api\Utility\TrackingHelper->trackReferencedEntityUpdate(Object(Drupal\file\Entity\File))
#6 [internal function]: search_api_entity_update(Object(Drupal\file\Entity\File))
#7 /var/www/html/public/core/lib/Drupal/Core/Extension/ModuleHandler.php(403): call_user_func_array('search_api_enti...', Array)
#8 /var/www/html/public/core/lib/Drupal/Core/Entity/EntityStorageBase.php(206): Drupal\Core\Extension\ModuleHandler->invokeAll('entity_update', Array)
#9 /var/www/html/public/core/lib/Drupal/Core/Entity/ContentEntityStorageBase.php(843): Drupal\Core\Entity\EntityStorageBase->invokeHook('update', Object(Drupal\file\Entity\File))
#10 /var/www/html/public/core/lib/Drupal/Core/Entity/EntityStorageBase.php(535): Drupal\Core\Entity\ContentEntityStorageBase->invokeHook('update', Object(Drupal\file\Entity\File))
#11 /var/www/html/public/core/lib/Drupal/Core/Entity/ContentEntityStorageBase.php(728): Drupal\Core\Entity\EntityStorageBase->doPostSave(Object(Drupal\file\Entity\File), true)
#12 /var/www/html/public/core/lib/Drupal/Core/Entity/EntityStorageBase.php(460): Drupal\Core\Entity\ContentEntityStorageBase->doPostSave(Object(Drupal\file\Entity\File), true)
#13 /var/www/html/public/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php(837): Drupal\Core\Entity\EntityStorageBase->save(Object(Drupal\file\Entity\File))
#14 /var/www/html/public/core/lib/Drupal/Core/Entity/EntityBase.php(395): Drupal\Core\Entity\Sql\SqlContentEntityStorage->save(Object(Drupal\file\Entity\File))
#15 /var/www/html/public/core/modules/file/file.module(284): Drupal\Core\Entity\EntityBase->save()
#16 /var/www/html/public/core/modules/media_library/src/Form/FileUploadForm.php(322): file_move(Object(Drupal\file\Entity\File), 'public://images')
#17 /var/www/html/public/core/modules/media_library/src/Form/AddFormBase.php(505): Drupal\media_library\Form\FileUploadForm->createMediaFromValue(Object(Drupal\media\Entity\MediaType), Object(Drupal\media\MediaStorage), 'field_media_ima...', Object(Drupal\file\Entity\File))
#18 [internal function]: Drupal\media_library\Form\AddFormBase->Drupal\media_library\Form\{closure}(Object(Drupal\file\Entity\File))
#19 /var/www/html/public/core/modules/media_library/src/Form/AddFormBase.php(504): array_map(Object(Closure), Array)
#20 /var/www/html/public/core/modules/media_library/src/Form/FileUploadForm.php(305): Drupal\media_library\Form\AddFormBase->processInputValues(Array, Array, Object(Drupal\Core\Form\FormState))
#21 [internal function]: Drupal\media_library\Form\FileUploadForm->uploadButtonSubmit(Array, Object(Drupal\Core\Form\FormState))
#22 /var/www/html/public/core/lib/Drupal/Core/Form/FormSubmitter.php(114): call_user_func_array(Array, Array)
#23 /var/www/html/public/core/lib/Drupal/Core/Form/FormSubmitter.php(52): Drupal\Core\Form\FormSubmitter->executeSubmitHandlers(Array, Object(Drupal\Core\Form\FormState))
#24 /var/www/html/public/core/lib/Drupal/Core/Form/FormBuilder.php(593): Drupal\Core\Form\FormSubmitter->doSubmitForm(Array, Object(Drupal\Core\Form\FormState))
#25 /var/www/html/public/core/lib/Drupal/Core/Form/FormBuilder.php(321): Drupal\Core\Form\FormBuilder->processForm('media_library_a...', Array, Object(Drupal\Core\Form\FormState))
#26 /var/www/html/public/core/modules/media_library/src/MediaLibraryUiBuilder.php(313): Drupal\Core\Form\FormBuilder->buildForm('Drupal\\media_li...', Object(Drupal\Core\Form\FormState))
#27 /var/www/html/public/core/modules/media_library/src/MediaLibraryUiBuilder.php(164): Drupal\media_library\MediaLibraryUiBuilder->buildMediaTypeAddForm(Object(Drupal\media_library\MediaLibraryState))
#28 /var/www/html/public/core/modules/media_library/src/MediaLibraryUiBuilder.php(130): Drupal\media_library\MediaLibraryUiBuilder->buildLibraryContent(Object(Drupal\media_library\MediaLibraryState))
#29 [internal function]: Drupal\media_library\MediaLibraryUiBuilder->buildUi(Object(Drupal\media_library\MediaLibraryState))
#30 /var/www/html/public/core/lib/Drupal/Core/EventSubscriber/EarlyRenderingControllerWrapperSubscriber.php(123): call_user_func_array(Array, Array)
#31 /var/www/html/public/core/lib/Drupal/Core/Render/Renderer.php(573): Drupal\Core\EventSubscriber\EarlyRenderingControllerWrapperSubscriber->Drupal\Core\EventSubscriber\{closure}()
#32 /var/www/html/public/core/lib/Drupal/Core/EventSubscriber/EarlyRenderingControllerWrapperSubscriber.php(124): Drupal\Core\Render\Renderer->executeInRenderContext(Object(Drupal\Core\Render\RenderContext), Object(Closure))
#33 /var/www/html/public/core/lib/Drupal/Core/EventSubscriber/EarlyRenderingControllerWrapperSubscriber.php(97): Drupal\Core\EventSubscriber\EarlyRenderingControllerWrapperSubscriber->wrapControllerExecutionInRenderContext(Array, Array)
#34 /var/www/html/vendor/symfony/http-kernel/HttpKernel.php(151): Drupal\Core\EventSubscriber\EarlyRenderingControllerWrapperSubscriber->Drupal\Core\EventSubscriber\{closure}()
#35 /var/www/html/vendor/symfony/http-kernel/HttpKernel.php(68): Symfony\Component\HttpKernel\HttpKernel->handleRaw(Object(Symfony\Component\HttpFoundation\Request), 1)
#36 /var/www/html/public/core/lib/Drupal/Core/StackMiddleware/Session.php(57): Symfony\Component\HttpKernel\HttpKernel->handle(Object(Symfony\Component\HttpFoundation\Request), 1, true)
#37 /var/www/html/public/core/lib/Drupal/Core/StackMiddleware/KernelPreHandle.php(47): Drupal\Core\StackMiddleware\Session->handle(Object(Symfony\Component\HttpFoundation\Request), 1, true)
#38 /var/www/html/public/core/modules/page_cache/src/StackMiddleware/PageCache.php(106): Drupal\Core\StackMiddleware\KernelPreHandle->handle(Object(Symfony\Component\HttpFoundation\Request), 1, true)
#39 /var/www/html/public/core/modules/page_cache/src/StackMiddleware/PageCache.php(85): Drupal\page_cache\StackMiddleware\PageCache->pass(Object(Symfony\Component\HttpFoundation\Request), 1, true)
#40 /var/www/html/vendor/asm89/stack-cors/src/Asm89/Stack/Cors.php(49): Drupal\page_cache\StackMiddleware\PageCache->handle(Object(Symfony\Component\HttpFoundation\Request), 1, true)
#41 /var/www/html/public/core/lib/Drupal/Core/StackMiddleware/ReverseProxyMiddleware.php(47): Asm89\Stack\Cors->handle(Object(Symfony\Component\HttpFoundation\Request), 1, true)
#42 /var/www/html/public/core/lib/Drupal/Core/StackMiddleware/NegotiationMiddleware.php(52): Drupal\Core\StackMiddleware\ReverseProxyMiddleware->handle(Object(Symfony\Component\HttpFoundation\Request), 1, true)
#43 /var/www/html/vendor/stack/builder/src/Stack/StackedHttpKernel.php(23): Drupal\Core\StackMiddleware\NegotiationMiddleware->handle(Object(Symfony\Component\HttpFoundation\Request), 1, true)
#44 /var/www/html/public/core/lib/Drupal/Core/DrupalKernel.php(708): Stack\StackedHttpKernel->handle(Object(Symfony\Component\HttpFoundation\Request), 1, true)
#45 /var/www/html/public/index.php(19): Drupal\Core\DrupalKernel->handle(Object(Symfony\Component\HttpFoundation\Request))
#46 {main}

Adding existing media items still works and users can still add new media within /admin/content/media, so there are work-arounds. Restoring the stable version of search_api fixes the issue, but of course leaves us back at not seeing updates in the index when referenced content is updated.

carolpettirossi’s picture

Hi guys,

Thanks for the hard work on this. It seems a great feature and it would save my day(weeks tbh) to get this working properly.

I'm testing the dev version of this module but it doesn't work 100% in my case.

I have a site using group/group content module and the different indexes for each group content bundle.

For example:

  • Employers index: indexes group content of Employer node
    • advertiser_name => entity_id:entity:field_advertiser_name
  • Opportunities index: indexes group content of Opportunity node
    • parent_employer_advertiser_name => entity_id:entity:field_parent_employer:entity:field_advertiser_name
      field_parent_employer can reference Employer nodes only.
  • Events index: indexes group content of Event node
    • organisation_alternative_name => entity_id:entity:field_organisation_name:entity:field_advertiser_name
      field_organisation_name can reference Employers or Institutions. But only employer content type has field_advertiser_name

When I update Advertiser name field, I would expect that all references would be reindexed. However, only the Employer index gets the update.

Initially I thought it could be related to field_advertiser_name being reused by Employer, Campus and Institution content types. So I added a field to the Opportunities index that it's only available and used by Employer content type.
parent_employer_org_statement => entity_id:entity:field_parent_employer:entity:field_org_statement

I edited an Employer node, adding text to the field_org_statement field and I get the same result. Opportunities documents are not updated/re-indexed.

I'm wondering if the scenarios above were considered to be covered and they are failing or if this is additional scope somehow.

Any feedback would be great :)

Thanks,
Carol

borutpiletic’s picture

I have also experienced Error: Call to a member function getColumns() on bool issues described in the #59 comment, but with referenced Paragraphs after the core upgrade to 8.9.7.
Issue came from the query in ContentEntity::getAffectedItemsForEntityChange where reference entity IDs for re-indexing are collected.
Patch from this issue fixed it: https://www.drupal.org/project/drupal/issues/2893747

maximpodorov’s picture

A switch is needed to add the possibility to turn this feature off as it can break existing sites (imaging loading 1 million nodes which reference a single term).
Ah, it has been done in https://www.drupal.org/node/3199906