Use a queue for node create/update indexing [#504012]

Comment	File	Size	Author
#68	search-node-504012-68.patch	3.96 KB	David_Rothstein
#60	six_months_later.patch	3.46 KB	catch
#25	504012-index-on-creation.patch	3.05 KB	damien tournoud
#23	search_index_content_5.patch	3.19 KB	catch
#19	search_index_content.patch	3.94 KB	catch
#16	search_index_content.patch	4.81 KB	catch
#15	search_index_content.patch	4.44 KB	catch
#13	search_index_content.patch	4 KB	catch
#3	search_index_content.patch	3.81 KB	catch
#2	search_index_content.patch	3.83 KB	catch
	search_index_content.patch	4.51 KB	catch

Comment #1

27 June 2009 at 16:50

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Log in or register to post comments

Comment #2

catch

he/him

English

commented 27 June 2009 at 18:05

Status:

Needs work

» Needs review

Status	File	Size
new	search_index_content.patch	3.83 KB

search_index() had a hidden bug after body as field -looks like it's missing some test coverage (this patch at least runs the code through).

Log in or register to post comments

Comment #3

catch

he/him

English

commented 27 June 2009 at 18:13

Status	File	Size
new	search_index_content.patch	3.81 KB

webchick reviewed in irc.

1. remove dpm($node), whoops.

2. Convert the broken search query to dbtng while we're in there.

Otherwise unchanged.

Log in or register to post comments

Comment #4

dries commented 27 June 2009 at 18:39

This patch feels a bit like a one-off.

With this patch, some people might think that they no longer need cron, but the reality is that they still need cron to handle node updates, new comments being posted, taxonomy terms being updated, etc. We _really_ want people to setup cron for search to work correctly. I wonder if there are better ways to educate users by that.

Maybe we should have a very basic poorman's cron in core instead?

Log in or register to post comments

Comment #5

pwolanin commented 27 June 2009 at 18:44

Status:

Needs review

» Needs work

While this is in general a fine idea, this patch totally breaks the framework model of the search module, which basically says that search module itself knows nothing about what kind of data is being indexed by which modules.

However, was discussing the search module in D7 with webchick and she preferred to be able to have search module choose which module's have active search vs. splitting node and user search into separate modules. That makes it hard to delegate this operation to node module.

Log in or register to post comments

Comment #6

webchick

she/they

English

Vancouver 🇨🇦

commented 27 June 2009 at 18:47

@Dries: There's an issue for poormanscron in core at #331611: Add a poormanscron-like feature to core. There were some pretty compelling cases over there not to include it, including from Moshe, but these might not be insurmountable.

Log in or register to post comments

Comment #7

JacobSingh commented 27 June 2009 at 19:06

I'm on the fence about it.

Ideally, Drupal wouldn't need to run cron, and still function, but as Dries says, it is essential. One of the disadvantages of indexing as you save content is that it can introduce load to end users which is not needed. For instance, consider a digg type site which is getting 100 comments a minute. If that node indexes (with all of those comments) every time it is saved, it's going to hurt.

Plus, it puts too great a responsibility on the site to be stable. Since search indexing is asynch, if some part of saving (but not saving itself) fails before search runs, that node would presumably never make it in.

ON the flip side, cron time indexing is slow, and doesn't respond instantly to unpublishing, etc unless a nodeapi, or whatever api hook is hit.

Log in or register to post comments

Comment #8

catch

he/him

English

commented 27 June 2009 at 19:42

Status:

Needs work

» Needs review

@Dries, we already have a cron warning if it's not run after 24 hours. However, even with the poormanscron in core patch, search index would run every three hours. That's a long time for an admin to wait to try out search module.

Webchick's primary concern with enabling search by default in the other issue was that users would 1. install drupal 2. post content 3. search for content 4. post a support request in the forums when that didn't work. If cron runs every 3 hours, one hour, even 30 minutes, that's plenty of time to run into trouble minutes after installing Drupal. So this is about instant gratification in general, not about compensating for lack of cron runs - and the reason I restricted this only to new content.

If we want to add a search box to the 'find content' screen - then again, we need quicker turnaround times - Post my page, lose it, go to 'find content', search, empty result, big fail.

@pwolanin, it doesn't break any search API cleanliness that's not already broken - see http://api.drupal.org/api/function/search_nodeapi/6. That issue needs fixing much more centrally than this one, so I'm marking this back to CNW.

@JacobSingh - please read the patch. This is done on hook_node_insert(), not hook_node_update() - so it's not going to run every time a comment is saved, just the initial node creation so that /something/ is in the index when you immediately try to find that thing you just lost. There's also an option to turn it off which is recommended for high traffic sites.

Log in or register to post comments

Comment #9

gpk commented 27 June 2009 at 20:12

Status:

Needs review

» Needs work

Interesting, I've been experimenting with immediate search index updates on node submit, via a custom action: http://drupal.org/node/216508#comment-1724668.

A few comments:

- search indexing may require a 2nd pass (which generally happens the next time search_cron is invoked), because the bottom of http://api.drupal.org/api/function/search_index/7 may http://api.drupal.org/api/function/search_touch_node/7 various nodes. However, if per #9 the aim is merely to have *something* in the search index then maybe we can live with that. But is the fact that updates to the node (whether comments or directly to the node itself) don't get indexed immediately going to cause more confusion?

- this would likely cause problems with content permissions module since the node content in that case will depend on the current user; simple fix - turn this off. Poormanscron has the same problem, as has the run-cron link on the status report page #431776: Cron should run as anonymous when invoked via the run-cron link on the status report page.

- @2: are we saying that the redirect problem with links to uncacheable nodes has gone away? Or perhaps it was useless anyway since search would in any case take cron down when indexing a PHP node with a drupal_goto() in it?

- @4: as webchick notes, poormanscron in core #331611: Add a poormanscron-like feature to core has currently got the thumbs down. Granted there are technical and usability issues, but there also seem to be 2 perspectives in that debate - those who are thinking from the perspective of "Drupal must be great for big sites" but also the "Drupal must be great for small/blog sites" point of view. 2 different audiences, 2 different requirements. Question: does Drupal need to be great for both audiences? If so, can we manage to achieve it while keeping both audiences happy?

Log in or register to post comments

Comment #10

gpk commented 27 June 2009 at 19:54

Cross posted..

Log in or register to post comments

Comment #11

catch

he/him

English

commented 27 June 2009 at 20:00

On updates - we really can't index every node on update as well because that's going to get slow (especially with lots of comments, that takes ages) - by the time you're updating content, then searching for it, then you've probably been using Drupal slightly longer so should hopefully have found your way to search settings and/or setting up cron by then. I really think it's fine to pre-optimize this just for brand new content and leave the rest as is - it's balance between the immediate feedback and slowing sites down. And it's easy to add more options for this from contrib (I think there's already a Drupal 6 module which does this for new and updated content).

Log in or register to post comments

Comment #12

gpk commented 27 June 2009 at 20:07

@11: can easily distinguish between addition of comments and updating of the node itself... http://api.drupal.org/api/function/hook_comment_insert/7 vs. http://api.drupal.org/api/function/hook_node_update/7. The challenge here is I think to get round the initial 'big fail' without causing confusion for the user on day 2 (why do some things appear in the search index immediately but not others).

Log in or register to post comments

Comment #13

catch

he/him

English

commented 27 June 2009 at 20:17

Status:

Needs work

» Needs review

Status	File	Size
new	search_index_content.patch	4 KB

@gpk, hmm. That's a good point - I'm still worried about performance though. If I add a node, it will be indexed once with or without this patch. If I add a comment, 50 people could add a comment before the next cron run, resulting in 50 times as much work done by the server. It's also less likely that admins are posting comments to their own content and searching for the contents of the comments than it is for nodes, although I agree it's an extremely fine line.

Attached patch uses register_shutdown_function() for node_index_node() and search_update_totals() - this is in line with the logic in search_cron(), and I think it should partially allay Jacob's concerns about node saving getting interrupted.

Log in or register to post comments

Comment #14

gpk commented 27 June 2009 at 20:21

>50 people could add a comment before the next cron run
Sounds like a high traffic site to me ;)

Log in or register to post comments

Comment #15

catch

he/him

English

commented 27 June 2009 at 20:26

Status	File	Size
new	search_index_content.patch	4.44 KB

Agreed, added that to the patch.

Log in or register to post comments

Comment #16

catch

he/him

English

commented 27 June 2009 at 20:34

Status	File	Size
new	search_index_content.patch	4.81 KB

This time in a less broken way.

Log in or register to post comments

Comment #17

TheRec commented 28 June 2009 at 01:41

The patch does what is advertised so far. Tried to reindex multiple times, tested with and without the checkbox enabled and all works as expected. (On a side note... the sentence "100% of the site has been indexed. There are 0 items left to index." is descriptive, but in terms of usability, the user does not know how many items were indexed.)

Indexing content when it is created is a good idea, but this might give a false impression to users that their content is generally indexed on-the-fly and make them think that this will be done on update too, which is not the case (and, as stated before, "should not" or we would end up with huge performance issues). I think that the same people who follow the path described by catch ("1. install drupal 2. post content 3. search for content 4. post a support request in the forums when that didn't work") will just see their problem be delayed until they realize that their content is not updated in the index (because they did not run cron on a regular basis) when they modify it (or it gets modified with comments, etc.).

In all cases, the users who enable search module should be warned that if cron is not ran on a regular basis, their content will not be indexed... or if this patch gets committed that it will be indexed upon creation and not updated (unless cron is ran). We should also give a scope of what is indexed (content, fields, comments, etc.), because this is not clear either and somewhat leave the users in obscurity (unless they read the documentation).

Log in or register to post comments

Comment #18

2 July 2009 at 11:00

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Log in or register to post comments

Comment #19

catch

he/him

English

commented 2 July 2009 at 11:38

Status:

Needs work

» Needs review

Status	File	Size
new	search_index_content.patch	3.94 KB

Re-rolled.

Log in or register to post comments

Comment #20

2 July 2009 at 11:55

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Log in or register to post comments

Comment #21

Bojhan commented 17 July 2009 at 11:19

So, spend some time looking at the code - it looks like its outdated hence why tests are failing. Other then that, I see no UX reason why this shouldn't be done.

Log in or register to post comments

Comment #22

catch

he/him

English

commented 17 July 2009 at 16:30

Hmm, sadly not stale. I have no idea why the search ranking tests or upload are failing.

Log in or register to post comments

Comment #23

catch

he/him

English

commented 31 July 2009 at 20:55

Status:

Needs work

» Needs review

Status	File	Size
new	search_index_content_5.patch	3.19 KB

Scaling back to just nodes.

Log in or register to post comments

Comment #24

31 July 2009 at 21:25

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Log in or register to post comments

Comment #25

damien tournoud commented 20 September 2009 at 14:40

Status:

Needs work

» Needs review

Status	File	Size
new	504012-index-on-creation.patch	3.05 KB

The previous patches failed because the node was indexed before all the hook_node_insert() implementation had the time to kick-in (especially the upload module one). Moved the indexation to a shutdown function instead.

Log in or register to post comments

Comment #26

pwolanin commented 20 September 2009 at 14:45

Seems like this is even less necessary with the poorman's cron in core?

Log in or register to post comments

Comment #27

catch

he/him

English

commented 20 September 2009 at 16:03

The idea here is that you can enter content, then immediately have it return in search results - mainly if we replace the filters at admin/content with a search box, poormanscron wouldn't help with that since it's every three hours rather than immediate. So it's two different things IMO.

Log in or register to post comments

Comment #28

Bojhan commented 20 September 2009 at 16:50

Wow, great - some traction on this topic! I think catch explains it clearly, from almost all of the observations done where the content administration page is flooded (many nodes) it becomes nearly impossible to filter correctly. I spoke about this big usability problem last year, I think its very fundamental to how we should approach filters.

Log in or register to post comments

Comment #29

pwolanin commented 24 September 2009 at 12:39

I'm a bit concerned that anyone trying to do a mass import will have their site crippled by this code - any way to index a max # of nodes in one page load this way?

Log in or register to post comments

Comment #30

catch

he/him

English

commented 24 September 2009 at 13:14

@pwolanin - there's an admin option to switch this off, I imagine sites doing mass imports wouldn't want this enabled anyway.

Log in or register to post comments

Comment #31

pwolanin commented 24 September 2009 at 23:25

@catch - wouldn't want is different than realizing it's an option. It's an edge case, but it would be nice to at least respect the same # per page load as per cron run.

Log in or register to post comments

Comment #32

Bojhan commented 25 September 2009 at 05:07

I think we need to understand that its not the small movement of beign able to search fo a specific item, but current interactions that we can increase tremendously. For example, menu would be able to have a search box. instead of having to remember the url for every menu item.

Log in or register to post comments

Comment #33

pwolanin commented 25 September 2009 at 11:23

@Bojan - what are you talking about and how does it relate to this patch?

Log in or register to post comments

Comment #34

Bojhan commented 25 September 2009 at 21:55

@pwolanin I am kind of getting dissapointed that so code to feature freeze we are getting stuck up on these issues, but basicly as I layed out. The current interaction that you need to know the node url to create a menu item is somewhat broken.

However I am not sure how this is all related to this issue, and why its beign held up. I dont see it beign an edge case that you are trying to find content in less then 3 hours after you have created it. But if node import is the problem, then lets fix it. I dont see any outcome of that hurting the UX at all.

Log in or register to post comments

Comment #35

webchick

she/they

English

Vancouver 🇨🇦

commented 29 September 2009 at 16:55

Bojhan asked me to chime in here.

Poormanscron has fixed a number of cron-related usability problems, including peoples' watchdog entries ballooning to 4,000 GB and bringing down the site. However, the usability issue still remains of when people create a piece of content and immediately lose it, they can't find it through the search box (at least until cron has reset).

What I don't like about this fix though, and believe Dries said the same up in earlier replies, is that it couples search module and node module together, in a way that contrib cannot replicate. It's a one-off for node module and doesn't help any other module who thinks its content ought to be instantly available.

Is it possible to achieve the same results in a more generic manner? For example, node.module triggering that cron needs to be run again on insert/update, so that poormanscron would perform the action in the background when they're redirected to their view page?

Log in or register to post comments

Comment #36

JacobSingh commented 29 September 2009 at 17:11

I totally get how this helps with the obvious problem of time lapse and finding new content.

Couple things to sort out.

The deluge:
Indexing is a dicey operation, and can be quite a drain on the system.
From our experience managing indexing / querying of tons of Drupal sites, there is an issue with indexing on nodeapi. With ApacheSolr, we index immediately in the case of a delete or unpublish (for obvious reasons). But when people do massive deletes or massive inserts, it sends a massive amount of index requests which can be a problem. This would likely kill core search.

Also, take into account a site with a lot of writes, like a blog post that gets dugg or something...

Solution?
I can't think of any other way to do it than to implement a leaky bucket. So the search indexer can handle 5 indexing requests per minute, and if it gets to 10, it will start putting them in the queue to run at cron, if it gets below 5 again, it will start processing them. It adds a check on every page load, but it's likely a small check.

Plugable
Also it needs to be plugable as @webchick mentions. Sadly, we never got a massive overhaul of the search API into D7 as we (speaking to myself as a search guy) should have done. We certainly cannot have node module have anything to do with search module.

Solution? - No, but an idea
I'm thinking in terms of the concept of Entities which we have in fields.

So the psuedo code is:

$entity being created?
search_index($entity);

which actual handler(s) indexes the content and people use to search is defined elsewhere, in a variable, or whatever, and it would be specific per entity. Perhaps they would have a signature something like:

(need not be OOP, just easier to psuedocode)
ApachesolrIndexer ext SearchIndexer
+indexableEntities()
+index($entity)
....

where as entities would have to explain themselves in a common format to Indexers.

Best,
Jacob

Log in or register to post comments

Comment #37

catch

he/him

English

commented 29 September 2009 at 17:13

Search module and node module are already coupled together in horrible, horrible ways, the coupling here is a symptom of that, not its cuase.

Also if other modules are maintaining their own index, and have to provide their own equivalent of node_index_node() to do that, then nothing stops them calling that directly when they insert content. They won't have search module do it for them, but search module doesn't provde an equivalent of hook_node_update_index() for them either.

Log in or register to post comments

Comment #38

catch

he/him

English

commented 29 September 2009 at 17:24

I really think refactoring search module is out of scope for this issue - are we going to deal with http://api.drupal.org/api/function/node_search_execute/7 here too? I agree it's horrible, but needs to be dealt with in its own right since that's not the problem this patch is trying to solve.

If a post gets dugg - then surely the issue is a lot of reads rather than a lot of writes? We're not indexing when comments are posted in the latest patch - so once it's posted in the first place, that's it.

On bulk updates, search_node_index() could either keep a static count, or like Jacob says a leaky bucket across requests - we don't really have a convenient place to store that kind of information across requests though off the top of my head, well, maybe flood control would do it.

Log in or register to post comments

Comment #39

damien tournoud commented 29 September 2009 at 17:31

What I don't like about this fix though, and believe Dries said the same up in earlier replies, is that it couples search module and node module together, in a way that contrib cannot replicate. It's a one-off for node module and doesn't help any other module who thinks its content ought to be instantly available.

This is not really accurate. The search module and the node module *are* tightly coupled together, in the sense that the search modules indexes *nodes*. I already explained in a few other places that this is not wrong per se. Three pieces are, at minimum, required to make any search solution work:

an indexer
one or more data providers
one or more search UIs

The data indexer can be cleanly separated from the other two pieces. It currently lives in the search module.

The data providers provide data from the entities (node, user, etc.) to the data indexer. They cannot be logically separated from either the entity nor the data indexer. In core, the data providers live in their entity (see node_search()). In contrib (see Apachesolr), they are implemented by the search solution itself. None of those architecture make more sense then the other.

The search UIs are kind of a grey area. It could be theoretically possible to cleanly separate them from the indexer (if the data store has a clean query interface), but not from the entities.

All this to say that this patch doesn't add anything to the coupling between node and search.

Log in or register to post comments

Comment #40

xano

he/they

English

Southampton

commented 4 October 2009 at 14:53

This functionality should be in Node and *not* in Search. Let's use Search only for indexing and searching. Other modules should expose themselves to Search so we don't have functions that are specific to Node, User, etc in Search. This patch also stabs #375397: Make Node module optional in the back.

Log in or register to post comments

Comment #41

Bojhan commented 4 October 2009 at 15:39

Status:

Needs review

» Reviewed & tested by the community

After trying to persuade many into looking at this issue, I am going to be bold and marking it RTBC - perhaps to get more eyes, but mostly to get it in. As Damien explained in #39 are webchick's concerns in-exclusive to the implementation direction - and do not increase the coupling of node and search. It seems a lot of arguments, are more related to search.module its weirdness then this actual issue, I understand there is a large gray area - but please don't let that hold up this improvement.

This is important to get in, to fix a major critical (really that critical) issue in creating menu's (searching on a node title, when creating a menu item)

Log in or register to post comments

Comment #42

webchick

she/they

English

Vancouver 🇨🇦

commented 5 October 2009 at 02:24

Status:

Reviewed & tested by the community

» Needs work

Xano makes a good point. Also, the way this is implemented (I think) would result in inconsistent user experience between core search and something like Apache Solr, unless every search module implemented this same code in it (or am I wrong?).

I agree that Jacob's approach is ultimately the best, but too heavy for this late in the development cycle. Is there a middle-ground here, like a more minor "API clean-up" style of refactoring in search.module that can be done to allow this code to be exposed in node rather than search module?

Also, I still need to clear with Dries to see whether he's up for one-offing this behaviour in general, since he was opposed in #4, though those reasons are largely moot now that we do have poormanscron in core. I can see the obvious usability benefits this patch has. But I do not want to sacrifice software architecture for usability.

Log in or register to post comments

Comment #43

catch

he/him

English

commented 5 October 2009 at 03:23

Assigned:	catch	» Unassigned
Status:	Needs work	» Reviewed & tested by the community

I'm slightly perturbed that #38 and #39 have been ignored when marking this CNW, please point out why those comments don't hold up. Xano also admitted on irc that he hadn't read the issue, and that he more or less agreed with me and Damien until there's a significant refactoring after reading the whole thing. This is my last post on this issue, I'm frankly annoyed that stuff like the toolbar gets in with critical bugs all over the place., but small patches fixing critical usability issues get held up on philosophical objections with zero code backing them up.

Oh, and we can do this in node.module by sticking a module_exists('search') in node_save(), that'll be pretty.

Log in or register to post comments

Comment #44

webchick

she/they

English

Vancouver 🇨🇦

commented 5 October 2009 at 03:50

If you can point me to code in menu.module that does an if (module_exists('toolbar')) and/or implements hook_toolbar_whoozit_whatzit(), I will happily eat my misgivings about hard-coding (more) node assumptions into search module to fix a usability issue. Unless I'm terribly mistaken, there is nothing of the sort in there, because it's widely understood that the menu module is an API-related module that should be de-coupled from any specific implementation.

So is search. The fact that it is not currently as de-coupled from node as it ideally needs to be doesn't change that fact.

Is refactoring search out of scope for this issue? Of course. But that doesn't mean it couldn't be a prerequisite for it. We are in API clean-up phase right now, after all. I asked a very simple question about whether or not there was a middle ground which could be explored between gutting the entire thing and de-coupling this one area so that node could implement its own stuff. I didn't say that the lack of this was a complete no-go for this patch, it was simply a question.

Committing this patch anyway would mean committing something with a known flaw in its implementation. Not only does this give general "ooky" feelings to me as a core committer, but it also is totally insufficient for solving this very real usability problem in contributed modules. For example, I'm sure that Advanced Help module would also love to have its content indexed as soon as it's created, but this patch leaves it with no option apart from "modify search module directly, and add these few lines in." :\

So, the question: what would the most bare-bones modification to Search module's APIs to allow both node and Advanced Help module to index their content immediately upon insert/deletion/update? Is it something that could fall within scope of API clean-up? Or is it D8 material? Once we know the answer to that, we can move back to discussing whether we commit this solution anyway.

Log in or register to post comments

Comment #45

webchick

she/they

English

Vancouver 🇨🇦

commented 5 October 2009 at 03:51

Status:

Reviewed & tested by the community

» Needs review

Log in or register to post comments

Comment #46

catch

he/him

English

commented 5 October 2009 at 04:57

node.module is an API module too, and unlike search is currently required, and the order of responsibility currently lies with optional modules to implement any hooks. This is why taxonomy module implements node hooks, and not the other way around - only field API and a generic query builder in core will put an end to those hook implementations / assumptions, a job which belongs to neither taxonomy nor node, and relies on two third party APIs to have proper decoupling. For an example of d7ux patches putting their own specific stuff in API modules, see the latest overlay patch, which adds overlay-specific markup to node module regardless of whether overlay is enabled or not, instead of implementing hook_form_alter(). That code has been in the patch for nearly six months now. toolbar, rather than putting it's own stuff in menu module, or attempting to fix menu module's API, did it's own re-implementation of menu's API functions instead, in the process introducing both bugs and performance issues all over the place, which are only recently starting to get cleaned up.

Back to this patch, there are two ways to index content with search module:

1. Call search_index() within a hook_update_index() to get your content indexed on cron. In fact hook_update_index() is just your own hook_cron() implementation, without the need to do if (module_exists('search')) in there, since it doesn't offer any index updating functionality of its own.

2. Call search_index() at any other time to index content whenever you want.

To index new help as soon as it's available, advanced_help.module could implement hook_modules_enabled() and call search_index() within there for any newly enabled modules, that's equivalent to the node_save()/module_exists('search') option here. So search.module already provides this ability, to the extent that it provides any API for indexing content at all.

However, what's being asked for here, is that search module offer a hook which is run on any possible system event where you might possibly have new indexable content added - to do that, it would not only have to duplicate hook_cron() as it currently does, but also duplicate a few dozen other hooks as well. If you can explain another way it could offer such a service that'd be great, but I don't see it.

We have two modules here - either we put assumptions about search module (which is optional, and has a far less rich API than node does) inside node, or we put assumptions about node inside search.

To avoid either, you need a search_node_sql_index.module which implements both search and node hooks to maintain the {search_index} tables, and would do the trick here on hook_node_insert() - that's not going to get done in 10 days though.

/me fails to not post on this issue again :(

Log in or register to post comments

Comment #47

xano

he/they

English

Southampton

commented 5 October 2009 at 05:47

I'm with catch and Damien that a small search_node_insert() outweighs the downsides in this case. There are plans in the works for Drupal 8 that will make this code (and a lot more) unnecessary for D8, but for now we'll at least have this feature/UX improvement.

Log in or register to post comments

Comment #48

kika commented 12 October 2009 at 20:42

I'd strongly support getting this in. There is a well-documented usability problem: search is broken in early and most tender stage of people first experiencing Drupal. catch has given a thorough explanations why this change do not make things architecturally worse than it is now. We do not have time to rewrite search API for D7 in any case. So -- let's get this in, fix search API in proper way in D8.

Log in or register to post comments

Comment #49

16 October 2009 at 08:25

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Log in or register to post comments

Comment #50

catch

he/him

English

commented 16 October 2009 at 13:57

Haven't looked at those exceptions here.

By the way, while everyone was picking on search_node_insert() being such an API horror, we forgot to mention http://api.drupal.org/api/function/search_node_update/7

Which is almost exactly the same anyway except for the 'instant' bit.

Log in or register to post comments

Comment #51

Bojhan commented 29 November 2009 at 14:13

Category:	task	» bug
Priority:	Normal	» Critical

Moving this to a bug-report, it looks to me to be a bug that you can't search content untill 3 hours after its created. Especially if you are running a news site or anything like that usecase, this could be considered a bug. Also looking at our coming freeze, as a sign that we should move this up - its a critical issue that hasn't gotten any activity.

I would love if we could get more feedback from the committers, it still seems somewhat silly to dismiss the small coupling over the large advantages of being able to search content after you created it and simplify the "create a menu item" administration workflow.

Log in or register to post comments

Comment #52

29 November 2009 at 19:52

Status:

Needs work

» Needs review

romedian requested that failed test be re-tested.

Log in or register to post comments

Comment #53

29 November 2009 at 20:11

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Log in or register to post comments

Comment #54

catch

he/him

English

commented 30 November 2009 at 01:24

Issue tags:

+Needs committer feedback

Yes we're still stuck between #44 and #46, and no architectural changes to search module have happened in the interim. Tagging with 'needs committer feedback'. If the answer is that this is postponed until search module architecture allows it to be done without a module_exists() in node, or search module implementing a node hook, then this is D8 or D9.

Log in or register to post comments

Comment #55

pwolanin commented 30 November 2009 at 02:00

For D7 we now allow the search module to disable any/all of the search implementations, so it seems like this should possibly live as a node hook in search module - otherwise node module would have to also look at the search settings.

Log in or register to post comments

Comment #56

catch

he/him

English

commented 30 November 2009 at 02:07

In that case the last patch on this issue, #25, still stands as the right approach, and just needs a re-roll.

Log in or register to post comments

Comment #57

pwolanin commented 30 November 2009 at 03:33

the new function search_node_insert needs to have an additional condition like:

if (in_array('node', variable_get('search_active_modules', array('node', 'user')))) {

Log in or register to post comments

Comment #58

sun.core commented 8 January 2010 at 03:57

Version:

7.x-dev

» 8.x-dev

That content filter didn't make it, right?

Log in or register to post comments

Comment #59

Bojhan commented 8 January 2010 at 08:39

Version:

8.x-dev

» 7.x-dev

Didn't make it? It never got the work or feedback it needed to continue, so no this is not yet Drupal 8 given that finding content in the listings of content is a big issue.

Log in or register to post comments

Comment #60

catch

he/him

English

commented 1 February 2010 at 08:24

Status:

Needs work

» Needs review

Status	File	Size
new	six_months_later.patch	3.46 KB

#25 + #57

Log in or register to post comments

Comment #61

dries commented 1 February 2010 at 19:24

I don't think this is critical (it is not really a regression), and actually suggest this is postponed to D8.

Log in or register to post comments

Comment #62

Bojhan commented 1 February 2010 at 19:53

@Dries Please see, #51.

I really strongly disagree, that because the implementation of this is maybe not yet right. We should stall it to Drupal 8? From a user experience perspective this is a very critical bug. And I do not see any arguments to support otherwise.

The main reason we have brought search into Drupal, is because people use it a whole lot. If we dissallow them to search, the first 3 hours - the primary usecase (finding content after created) is not taken care of. Secondly, we still can't enable the search pattern on menu creation (which usability tests shown, people really need). So sorry, but this is really still a critical bug.

I am somewhat annoyed, that even though I have outlined this is a critical bug - and had a implementation direction for 6 months. The only commiter feedback is, lets not do it like this - with no direction how to it otherwise. If you say, we can't let this in - like this fine, but it is still a critical issue.

Log in or register to post comments

Comment #63

sun.core commented 14 February 2010 at 08:39

Priority:

Critical

» Normal

Not critical. Please read and understand Priority levels of Issues, thanks.

Log in or register to post comments

Comment #64

dries commented 14 February 2010 at 09:23

Bojhan, sorry, it is not critical. Critical bugs are those that prevent Drupal from working.

Log in or register to post comments

Comment #65

Bojhan commented 14 February 2010 at 09:50

@Dries still can you atleast outline something, that can get this issue further? I am lost on how to proceed on this, and I don't understand why there is no attention to this.

Log in or register to post comments

Comment #66

jhodgdon

she/her

English

commented 15 February 2010 at 20:48

Status:

Needs review

» Needs work

Bojhan: I don't think I agree with #51. A site that is updating its content all the time could be running cron all the time, if it wants to get things indexed on a timely basis. Right at the top (at least if you have the help module turned on):

The search engine maintains an index of words found in your site's content. To build and maintain this index, a correctly configured cron maintenance task is required. Indexing behavior can be adjusted using the settings below.

Maybe there should be some more explanation, but I don't really think that indexing content at create time is necessary.

Aside from that question, the implementation is very flawed. Not all nodes should be indexed -- for instance, unpublished ones should not be indexed. So just deciding that whenever a node is inserted it should be indexed is not a good idea. Also, if a node is updated, the index needs to be updated -- doing it at insert time only is not a good idea.

Log in or register to post comments

Comment #67

jhodgdon

she/her

English

commented 15 February 2010 at 21:57

As I recently mentioned to Bojhan in IRC: I would support updating the help/UI text on the Search settings page to explain what's going on in D7, and fixing this for D8.

Log in or register to post comments

Comment #68

David_Rothstein commented 16 February 2010 at 03:23

Status:

Needs work

» Needs review

Status	File	Size
new	search-node-504012-68.patch	3.96 KB

Although strictly speaking this might not be a critical issue, it sure does make Drupal look pretty broken out of the box. The search module is in the standard install profile now, yet as soon as you try to use it, it doesn't actually work.

If we want to not release something that makes us look incompetent, we should be fixing this bug in Drupal 7.

***

the implementation is very flawed. Not all nodes should be indexed -- for instance, unpublished ones should not be indexed.

Why shouldn't unpublished nodes be indexed?

Also, if a node is updated, the index needs to be updated -- doing it at insert time only is not a good idea.

I agree. I think we should fix this all the way - otherwise we risk adding as much confusion as we remove.

***

In #35, @webchick suggested this:

Is it possible to achieve the same results in a more generic manner? For example, node.module triggering that cron needs to be run again on insert/update, so that poormanscron would perform the action in the background when they're redirected to their view page?

The attached patch implements that suggestion. So basically, for a site using the poormanscron feature, all we do is manipulate the times that cron actually runs so that it indexes nodes right away, rather than e.g. three hours from now. For sites not using the poormanscron feature, we do nothing - i.e., the behavior is the same as Drupal 6 (they are responsible for running their own cron as often as they want the content indexed).

Note that the attached patch also contains some quick hacks for other bugs that were necessary to get this feature working. Obviously, we would want to get those issues fixed for real before finishing this.

Personally, I'm not so convinced that this solution is better than the one in #60. However, it does meet the goal of not introducing any coupling between the node and search modules. Mostly I'm just posting this to try to move the issue along :)

Log in or register to post comments

Comment #69

catch

he/him

English

commented 16 February 2010 at 12:18

Status:

Needs review

» Needs work

If you force a cron run every time content is created (and edited if it goes that way), then all the other expensive stuff, like cache clearing and the rest, is also going to happen, rather than just the search indexing. That makes the performance concerns put earlier on this issue even more to the forefront, and means that if you have automated cron running, and also a cron job set up to run every 30 minutes, that you end up running cron far, far more than needed.

@jhogdon, the original purpose of this issue was to support search-based content filtering on /admin as well as the default search block. That seems very unlikely for D7, but if we ever want to do that, we need to support unpublished nodes as well as published in the search index.

Personally I think we should stick with the original goal here, which is just to get /something/ in the search index when people install Drupal, post content, then try the search block. I'm pretty sure there's a contrib module which handles updating the search index whenever stuff is posted, for people who want that.

Log in or register to post comments

Comment #70

jhodgdon

she/her

English

commented 16 February 2010 at 14:09

My mistake on not indexing unpublished content. I thought node_update_index() already did that, but it does not -- there is no WHERE in the query on any criterion in the node table:

  $result = db_query_range("SELECT n.nid FROM {node} n LEFT JOIN {search_dataset} d ON d.type = 'node' AND d.sid = n.nid WHERE d.sid IS NULL OR d.reindex <> 0 ORDER BY d.reindex ASC, n.nid ASC", 0, $limit, array(), array('target' => 'slave'));

  foreach ($result as $node) {
    _node_index_node($node);
  }

So, back to the implementation in #60. My opinions:
- Should run on update as well as insert.
- Generally, we don't want to increase the amount of coupling between node and search, so all node-specific stuff should be in node.module rather than search.*
-- The setting for yes/no to index immediately should be in http://api.drupal.org/api/function/hook_search_admin/7 not directly in search.admin.inc
-- The hooks should be in the node module rather than in the search module.
- The hooks need to check whether (a) search is enabled (b) node is an enabled search module, before deciding it's OK to index the node. See search_cron():

  foreach(variable_get('search_active_modules', array('node', 'user')) as $module) {
    // Update word index
    module_invoke($module, 'update_index');
  }

Log in or register to post comments

Comment #71

David_Rothstein commented 17 February 2010 at 05:39

If you force a cron run every time content is created (and edited if it goes that way), then all the other expensive stuff, like cache clearing and the rest, is also going to happen, rather than just the search indexing.

Well, for cache clearing at least I don't think it makes much of a difference, since node_save() already clears the page and block caches anyway, which I believe are the only ones (at least in core) that use CACHE_TEMPORARY... So the extra cache clearing calls shouldn't really do anything.

Overall, though, I'm personally pretty happy with the general approach in #60 plus some of the changes from #70, as a targeted fix for this issue.

Log in or register to post comments

Comment #72

catch

he/him

English

commented 17 February 2010 at 06:33

Generally, we don't want to increase the amount of coupling between node and search, so all node-specific stuff should be in node.module rather than search.*

I still disagree with this, that just increases the coupling between search and node then, while node is a required module, and search isn't.

A likely first step to cleaning up this coupling in D8 is copying all the search_node_*() implementations, and node_search_*() implementations to a search_node_sql.module. That's a lot easier when you copy and paste a hook implementation and rename it, rather than grepping for module_exists('search'). See #46.

Log in or register to post comments

Comment #73

Bojhan commented 17 February 2010 at 09:30

Ok, can someone reroll the patch from #60 with the fixes of #70?

Log in or register to post comments

Comment #74

jhodgdon

she/her

English

commented 17 February 2010 at 16:31

RE #72: Yes, we need to decouple node/user from search better in D8 -- there is an issue for that:
#237748: Decouple core search module implementations from node and user module (and turn search/node and search/user to views).

However, that doesn't mean we should add node hooks to search.module for D7, IMO.

Log in or register to post comments

Comment #75

Bojhan commented 17 July 2010 at 19:56

Version:

7.x-dev

» 8.x-dev

Guess searching content still isn't critical enough for the Drupal CMS, pushing to 8

Log in or register to post comments

Comment #76

jhodgdon

she/her

English

commented 17 July 2010 at 21:30

Version:

8.x-dev

» 7.x-dev

Let's leave this to D7 until we decide it's not going into D7. What makes you think it can't happen? We have all sorts of things going on in search.module lately -- patching anyway, though I can't say the patches are getting committed. Sigh. Anyway, I wouldn't say this couldn't happen for D7.

Log in or register to post comments

Comment #77

Bojhan commented 17 July 2010 at 21:53

I did it because Dries said so in : http://drupal.org/node/504012#comment-2545554

Although to me this is as critical as it gets, its clear Dries disagrees and I think we thus should just postpone to Drupal 8.

Log in or register to post comments

Comment #78

gpk commented 23 July 2010 at 10:43

For almost a year on a D6 site I've been using a tiny custom module that implements an action "index post" (see #216508-8: run cron only on new or updated content rather than view). Works very nicely. I think this also addresses the coupling issue? It could be enabled by default for the relevant triggers on installation of search.module. Also it is potentially very configurable/can be disabled by the site admin etc. if they prefer to do it by cron.

[update] I guess the action would live in node.module, and could be enabled by the install profile (rather than when search.module is installed).

Log in or register to post comments

Comment #79

catch

he/him

English

commented 1 October 2010 at 05:51

Version:

7.x-dev

» 8.x-dev

I don't think this will get into D7 without conifiguration, moving it back to D8.

Log in or register to post comments

Comment #80

damien tournoud commented 1 October 2010 at 08:23

Category:

bug

» feature

Only feature requests and tasks can be assigned to D8 for now, because the Drupal 8 tree is not opened yet.

Fortunately, this is a feature request :)

Log in or register to post comments

Comment #81

jhodgdon

she/her

English

commented 30 October 2014 at 16:55

Version:	8.0.x-dev	» 8.1.x-dev
Issue summary:	View changes

Since 8.0.x-beta1 has been released, our policy at this point is No feature requests until 8.1.x. See #2350615: [policy, no patch] What changes can be accepted during the Drupal 8 beta phase?. Sorry, it's just too late for 8.0.x at this point, so even if we had a viable patch, the core committers would not commit it. So unless we decide this is a Task or a Bug (and I don't think it is), we'll have to delay it.

Log in or register to post comments

Comment #82

30 October 2014 at 16:55

Version:

8.1.x-dev

» 8.2.x-dev

Drupal 8.1.0-beta1 was released on March 2, 2016, which means new developments and disruptive changes should now be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #83

30 October 2014 at 16:55

Version:

8.2.x-dev

» 8.3.x-dev

Drupal 8.2.0-beta1 was released on August 3, 2016, which means new developments and disruptive changes should now be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #84

xjm

she/her

English

commented 11 December 2016 at 20:41

Issue tags:

-Needs committer feedback

No longer needs committer feedback, since semver is clear on where this can be added.

Log in or register to post comments

Comment #85

xjm

she/her

English

commented 11 December 2016 at 20:42

No longer needs committer feedback; this is a minor release type of change.

Log in or register to post comments

Comment #86

11 December 2016 at 20:42

Version:

8.3.x-dev

» 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #87

11 December 2016 at 20:42

Version:

8.4.x-dev

» 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #88

11 December 2016 at 20:42

Version:

8.5.x-dev

» 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #89

11 December 2016 at 20:42

Version:

8.6.x-dev

» 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #90

11 December 2016 at 20:42

Version:

8.7.x-dev

» 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #91

11 December 2016 at 20:42

Version:

8.8.x-dev

» 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Log in or register to post comments

Comment #92

11 December 2016 at 20:42

Version:

8.9.x-dev

» 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Log in or register to post comments

Comment #93

11 December 2016 at 20:42

Version:

9.1.x-dev

» 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Log in or register to post comments

Comment #94

11 December 2016 at 20:42

Version:

9.2.x-dev

» 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #95

11 December 2016 at 20:42

Version:

9.3.x-dev

» 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #96

11 December 2016 at 20:42

Version:

9.4.x-dev

» 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #97

11 December 2016 at 20:42

Version:

9.5.x-dev

» 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #98

11 December 2016 at 20:42

Version:

10.1.x-dev

» 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #99

catch

he/him

English

commented 16 September 2023 at 09:03

Status:

Needs work

» Postponed

Thirteen years later there might be a way to do this in a way that the search index would be immediately indexed on sites with automated cron, while faster for sites running queues on a different schedule to cron too, but it'll need #1189464: Add an 'instant' queue runner first. Postponing on that.

Log in or register to post comments

Comment #100

14 August 2024 at 08:18

sukr_s made their first commit to this issue’s fork.

Log in or register to post comments

Comment #101

14 August 2024 at 08:18

sukr_s opened merge request !9206

Log in or register to post comments

Comment #102

sukr_s commented 14 August 2024 at 08:30

Status:

Postponed

» Needs review

Changes committed. Tests cases will pass only after #1189464: Add an 'instant' queue runner is accepted / merged to 11.x / and updating the fork.

Setting to NR for reviewing the code changes.

Log in or register to post comments

Comment #103

smustgrave commented 29 August 2024 at 18:10

Status:

Needs review

» Needs work

Not sure I'm seeing the need to create performUpdateIndex, unless just from the queueworker. Any BC concern of this breaking existing code now that this code is in a new function?

Probably can be postponed though on the issue mentioned in #102.

Log in or register to post comments

Comment #104

sukr_s commented 2 September 2024 at 05:40

performUpdateIndex indexes the given nids. Alternate is to use the updateIndex function in the queue. However this function does a left join on node to determine all the new nodes. This is very expensive operation on large sites. To avoid this expensive query updateIndex was refactored and performUpdateIndex was created. The queue runner invokes performUpdateIndex as the nid is already known saving an expensive db call.

I don't see any risk from BC perspective since it accepts the nids from the updateIndex function does the same steps as before.

Log in or register to post comments

Comment #105

catch

he/him

English

commented 2 September 2024 at 12:09

Title:

Index content when created

» Use a queue for node create/update indexing

Splitting the method into two looks good, we need to make sure the queue worker does minimal work.

I left a couple of comments on the MR. For me this could be committed independently of #1189464: Add an 'instant' queue runner. It wouldn't result in instant search indexing until we have the other issue, but the two issues don't have any hard dependencies on each other otherwise.

Log in or register to post comments

Comment #106

sukr_s commented 3 September 2024 at 04:50

Status:

Needs work

» Needs review

Changing the NR for MR responses.

Log in or register to post comments

Comment #107

sukr_s commented 10 September 2024 at 05:46

Related issue #3197264: Node searches fail if index is not complete, and users do not know why. If the node indexing is being moved entirely to queues, then this scenario also needs to be handled.

Log in or register to post comments

Comment #108

catch

he/him

English

commented 10 September 2024 at 06:29

We won't be able to move all node indexing to queues - at least when search is enabled, we need to index every node in the system, which needs the current search index watermark approach, or at least not without a lot of refactoring which would be its own issue.

Log in or register to post comments

Comment #109

smustgrave commented 7 October 2024 at 14:29

Issue summary:	View changes
Status:	Needs review	» Needs work

Tried cleaning up issue summary but could use 2nd pair of eyes. Original post mentioned something about a new setting but believe that's not the case anymore

Current MR is still marked as draft and appears to have a test failure.

Log in or register to post comments

Comment #110

sukr_s commented 8 October 2024 at 04:40

Functionality is dependent on #1189464: Add an 'instant' queue runner as mentioned in #102. Once that's Fixed and tests pass, will remove the draft on MR.

Log in or register to post comments

Comment #111

xjm

she/her

English

commented 29 June 2025 at 14:15

Something saved credits for me on this issue when I've done nothing creditable; what I commented back in 2016 hardly counts as an RM review so not sure how the credit got added. Un-checking it.

Log in or register to post comments

Comment #112

29 June 2025 at 14:15

Version:

11.x-dev

» main

Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.

Comment #113

quietone commented 1 April 2026 at 04:01

Status:

Needs work

» Postponed

The Search Module was approved for removal in #3476883: [Policy, no patch] Move Search module to contrib .

This is Postponed. The status is set according to two policies. The Remove a core extension and move it to a contributed project and the Extensions approved for removal policies.

The deprecation work is in #3565780: [meta] Tasks to deprecate the Search module and the removal work in #3565783: [meta] Tasks to remove the Search module.

Search will be moved to a contributed project before Drupal 12.0.0 is released.

Log in or register to post comments

Use a queue for node create/update indexing

Problem/Motivation

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

Introduced terminology

API changes

Data model changes

Release notes snippet

Issue fork drupal-504012

Comments