Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
#2232253: Make the indexing batch process work introduced a default tracker which performs write operations during index for keeping track of indexed items. This should be optimized so that keeping track of indexed items has no heavy impact on large indexes.
Comments
Comment #1
drunken monkeyI think that's neither "in focus"/"sprint" nor "Major" – this can be done at any time, in beta, RC or even after the initial stable release. We can either improve the default tracker, or add a new, optimized one (if there is some trade-off involved), or even decide we are happy with this one and wait for people to complain (or tell them to put it into a different project).
In any case, getting the basic framework/API stable has priority now, everything not impacting that (or making testing/development harder in some way) can wait.
However, if you really want to work on it and can't find something else you'd want to do, you can of course also work on this. But what optimization exactly would you implement, and how (in the existing or in a new tracker)?
Comment #2
freblasty CreditAttribution: freblasty commentedComment #3
drunken monkeyAh, I see, so this is about Nick's complaint in #2232253-40: Make the indexing batch process work, right?
OK, then let's continue the discussion here.
Ah, OK, then I misunderstood you in #2232253-27: Make the indexing batch process work. I thought you had now agreed with me that using a pointer for keeping track of indexing status doesn't offer the same flexibility, clarity and fault-tolerance as our old approach.
I don't really want to rehash my arguments from there, but I really think our current implementation is more than "good enough" for now and the pointer solution causes more problems than it solves. Also, with the pluggable tracker system, anyone out there (including us in this module) can provide a new tracker implementation at any point. So if, out in the wild, companies really start to complain about the additional DB writes for this, we can still easily implement this. For now, I'd rather concentrate on the framework and API itself, getting Views integration, facets and (last, but not least) the service classes ready, etc.
Comment #4
Nick_vhThat's also why I agreed to have it as a follow up so we don't block the framework. I don't necessarily need to see the pointer in there but setting 0 to the changed column when there's an update just doesn't make sense in a system like this. It's a hack, and it works. But it's not the most correct way of doing things. The pointer also has it's flzws as people can change it unwillingly in their code and mess up the tracking system.
It's a good issue to think more about it and as we extend the tests we can think of changing the system to be better or add another tracker as you said.
Doing a pointer + having the priority queue was not working in combination. But there was nothing wrong or broken with the pointer system if all you have is a fifo approach.
Comment #5
drunken monkeyNone of this is really an argument, these are just axioms. Why shouldn't it be correct? The
changed
column reflects the time the item was changed since it was last indexed –0
meaning that it wasn't. (Though I'd have to admitNULL
would be cleaner here, don't know why we're not using it.)If you want cleaner, then we can re-introduce the
status
column, I'd be fine with that, too. Would have at least one benefit (getting all three counts (indexed, remaining, total) in one query) and certainly make things a bit cleaner and hopefully clearer.Yes, there was, as I discussed verbosely in #2232253-26: Make the indexing batch process work. None of these arguments (as far as I can see/remember) were targeted at a priority queue, they were all problems with just the normal functionality of a tracker we have defined.
Comment #6
freblasty CreditAttribution: freblasty commentedActually its the otherway around. As items should be processed according to the added order we use a timestamp to determine that order. Once an item is indexed its marked with
changed
value0
.Comment #7
drunken monkeyI think that's what I wrote. It's the same as in D7, right?
Comment #8
freblasty CreditAttribution: freblasty commentedIndeed it's the same as in D7.
Comment #9
drunken monkeyOK, then that's what I meant, too.
Another thing, maybe OK to discuss here, maybe better in a new issue: Do we really want to catch exceptions in the
track*()
methods? This code should only fail in case of database errors anyways, so the chance of this actually happening (without coding errors, which we should fix then ASAP of course) is pretty small.However, when it happens and there really was some action (e.g., a node got created) which succeeded, then we might want to roll back that initial action, too. Otherwise, we'd be in an invalid state, in which only a complete re-indexing would help (or manually looking through the logs for the action which failed to be tracked and adding it to the tracking table).
In general, I'd say that
SearchApiException
s should never leave the Search API ecosystem (the module itself or any module that explicitly uses it – e.g., for a search). However, catching\Exception
is something completely different, since we don't know what kind of error we would hide there. And since it's in any case not an error in our domain, I'm not sure we should be responsible for handling it.For now, in my #2230931: Supporting multiple item types per index code, I have just removed the boolean return values from those methods (as it's more or less impossible to react to those in a meaningful way anyways) but kept the
try
/catch
.What's everyone else's opinion about that?
Comment #10
Nick_vhI agree with keeping the exceptions in the core of search api only. People working with the APIs should not be expected to deal with the exceptions unless they work really low level.
Comment #11
Nick_vhWe discussed that it would be good to split up the changed column with a changed column and with a state column. The state would have just 0 and 1. 1 for needing to be indexed and 0 for the contrary.
This would make it easier to say, index everything from moment X again.
Comment #12
Nick_vhCommitted a change so the status column is now present.
Comment #13
drunken monkeyComment #14
drunken monkeyNot a bug.
Comment #15
legolasboThis issue has not seen activity in over 2,5 years. I am therefore closing this issue to clean up the issue queue. Feel free to re-open and update this issue if you feel this issue is still relevant and of importance.