Search module should add content to the search index when it's created by default. This avoids search being 'broken' on new sites when cron hasn't been set up (or is running more infrequently than an admin is testing search). I added a checkbox to the search settings form, defaulting to on, for sites which want to disable this - since with a big search index it could make node creation very slow. Node updates are still done on cron - we just want content added to the index in the first place so it shows up at all.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
3.83 KB

search_index() had a hidden bug after body as field -looks like it's missing some test coverage (this patch at least runs the code through).

catch’s picture

FileSize
3.81 KB

webchick reviewed in irc.

1. remove dpm($node), whoops.

2. Convert the broken search query to dbtng while we're in there.

Otherwise unchanged.

Dries’s picture

This patch feels a bit like a one-off.

With this patch, some people might think that they no longer need cron, but the reality is that they still need cron to handle node updates, new comments being posted, taxonomy terms being updated, etc. We _really_ want people to setup cron for search to work correctly. I wonder if there are better ways to educate users by that.

Maybe we should have a very basic poorman's cron in core instead?

pwolanin’s picture

Status: Needs review » Needs work

While this is in general a fine idea, this patch totally breaks the framework model of the search module, which basically says that search module itself knows nothing about what kind of data is being indexed by which modules.

However, was discussing the search module in D7 with webchick and she preferred to be able to have search module choose which module's have active search vs. splitting node and user search into separate modules. That makes it hard to delegate this operation to node module.

webchick’s picture

@Dries: There's an issue for poormanscron in core at #331611: Add a poormanscron-like feature to core. There were some pretty compelling cases over there not to include it, including from Moshe, but these might not be insurmountable.

JacobSingh’s picture

I'm on the fence about it.

Ideally, Drupal wouldn't need to run cron, and still function, but as Dries says, it is essential. One of the disadvantages of indexing as you save content is that it can introduce load to end users which is not needed. For instance, consider a digg type site which is getting 100 comments a minute. If that node indexes (with all of those comments) every time it is saved, it's going to hurt.

Plus, it puts too great a responsibility on the site to be stable. Since search indexing is asynch, if some part of saving (but not saving itself) fails before search runs, that node would presumably never make it in.

ON the flip side, cron time indexing is slow, and doesn't respond instantly to unpublishing, etc unless a nodeapi, or whatever api hook is hit.

catch’s picture

Status: Needs work » Needs review

@Dries, we already have a cron warning if it's not run after 24 hours. However, even with the poormanscron in core patch, search index would run every three hours. That's a long time for an admin to wait to try out search module.

Webchick's primary concern with enabling search by default in the other issue was that users would 1. install drupal 2. post content 3. search for content 4. post a support request in the forums when that didn't work. If cron runs every 3 hours, one hour, even 30 minutes, that's plenty of time to run into trouble minutes after installing Drupal. So this is about instant gratification in general, not about compensating for lack of cron runs - and the reason I restricted this only to new content.

If we want to add a search box to the 'find content' screen - then again, we need quicker turnaround times - Post my page, lose it, go to 'find content', search, empty result, big fail.

@pwolanin, it doesn't break any search API cleanliness that's not already broken - see http://api.drupal.org/api/function/search_nodeapi/6. That issue needs fixing much more centrally than this one, so I'm marking this back to CNW.

@JacobSingh - please read the patch. This is done on hook_node_insert(), not hook_node_update() - so it's not going to run every time a comment is saved, just the initial node creation so that /something/ is in the index when you immediately try to find that thing you just lost. There's also an option to turn it off which is recommended for high traffic sites.

gpk’s picture

Status: Needs review » Needs work

Interesting, I've been experimenting with immediate search index updates on node submit, via a custom action: http://drupal.org/node/216508#comment-1724668.

A few comments:

- search indexing may require a 2nd pass (which generally happens the next time search_cron is invoked), because the bottom of http://api.drupal.org/api/function/search_index/7 may http://api.drupal.org/api/function/search_touch_node/7 various nodes. However, if per #9 the aim is merely to have *something* in the search index then maybe we can live with that. But is the fact that updates to the node (whether comments or directly to the node itself) don't get indexed immediately going to cause more confusion?

- this would likely cause problems with content permissions module since the node content in that case will depend on the current user; simple fix - turn this off. Poormanscron has the same problem, as has the run-cron link on the status report page #431776: Cron should run as anonymous when invoked via the run-cron link on the status report page.

- @2: are we saying that the redirect problem with links to uncacheable nodes has gone away? Or perhaps it was useless anyway since search would in any case take cron down when indexing a PHP node with a drupal_goto() in it?

- @4: as webchick notes, poormanscron in core #331611: Add a poormanscron-like feature to core has currently got the thumbs down. Granted there are technical and usability issues, but there also seem to be 2 perspectives in that debate - those who are thinking from the perspective of "Drupal must be great for big sites" but also the "Drupal must be great for small/blog sites" point of view. 2 different audiences, 2 different requirements. Question: does Drupal need to be great for both audiences? If so, can we manage to achieve it while keeping both audiences happy?

gpk’s picture

Cross posted..

catch’s picture

On updates - we really can't index every node on update as well because that's going to get slow (especially with lots of comments, that takes ages) - by the time you're updating content, then searching for it, then you've probably been using Drupal slightly longer so should hopefully have found your way to search settings and/or setting up cron by then. I really think it's fine to pre-optimize this just for brand new content and leave the rest as is - it's balance between the immediate feedback and slowing sites down. And it's easy to add more options for this from contrib (I think there's already a Drupal 6 module which does this for new and updated content).

gpk’s picture

@11: can easily distinguish between addition of comments and updating of the node itself... http://api.drupal.org/api/function/hook_comment_insert/7 vs. http://api.drupal.org/api/function/hook_node_update/7. The challenge here is I think to get round the initial 'big fail' without causing confusion for the user on day 2 (why do some things appear in the search index immediately but not others).

catch’s picture

Status: Needs work » Needs review
FileSize
4 KB

@gpk, hmm. That's a good point - I'm still worried about performance though. If I add a node, it will be indexed once with or without this patch. If I add a comment, 50 people could add a comment before the next cron run, resulting in 50 times as much work done by the server. It's also less likely that admins are posting comments to their own content and searching for the contents of the comments than it is for nodes, although I agree it's an extremely fine line.

Attached patch uses register_shutdown_function() for node_index_node() and search_update_totals() - this is in line with the logic in search_cron(), and I think it should partially allay Jacob's concerns about node saving getting interrupted.

gpk’s picture

>50 people could add a comment before the next cron run
Sounds like a high traffic site to me ;)

catch’s picture

FileSize
4.44 KB

Agreed, added that to the patch.

catch’s picture

FileSize
4.81 KB

This time in a less broken way.

TheRec’s picture

The patch does what is advertised so far. Tried to reindex multiple times, tested with and without the checkbox enabled and all works as expected. (On a side note... the sentence "100% of the site has been indexed. There are 0 items left to index." is descriptive, but in terms of usability, the user does not know how many items were indexed.)

Indexing content when it is created is a good idea, but this might give a false impression to users that their content is generally indexed on-the-fly and make them think that this will be done on update too, which is not the case (and, as stated before, "should not" or we would end up with huge performance issues). I think that the same people who follow the path described by catch ("1. install drupal 2. post content 3. search for content 4. post a support request in the forums when that didn't work") will just see their problem be delayed until they realize that their content is not updated in the index (because they did not run cron on a regular basis) when they modify it (or it gets modified with comments, etc.).

In all cases, the users who enable search module should be warned that if cron is not ran on a regular basis, their content will not be indexed... or if this patch gets committed that it will be indexed upon creation and not updated (unless cron is ran). We should also give a scope of what is indexed (content, fields, comments, etc.), because this is not clear either and somewhat leave the users in obscurity (unless they read the documentation).

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
3.94 KB

Re-rolled.

Status: Needs review » Needs work

The last submitted patch failed testing.

Bojhan’s picture

So, spend some time looking at the code - it looks like its outdated hence why tests are failing. Other then that, I see no UX reason why this shouldn't be done.

catch’s picture

Hmm, sadly not stale. I have no idea why the search ranking tests or upload are failing.

catch’s picture

Status: Needs work » Needs review
FileSize
3.19 KB

Scaling back to just nodes.

Status: Needs review » Needs work

The last submitted patch failed testing.

Damien Tournoud’s picture

Status: Needs work » Needs review
FileSize
3.05 KB

The previous patches failed because the node was indexed before all the hook_node_insert() implementation had the time to kick-in (especially the upload module one). Moved the indexation to a shutdown function instead.

pwolanin’s picture

Seems like this is even less necessary with the poorman's cron in core?

catch’s picture

The idea here is that you can enter content, then immediately have it return in search results - mainly if we replace the filters at admin/content with a search box, poormanscron wouldn't help with that since it's every three hours rather than immediate. So it's two different things IMO.

Bojhan’s picture

Wow, great - some traction on this topic! I think catch explains it clearly, from almost all of the observations done where the content administration page is flooded (many nodes) it becomes nearly impossible to filter correctly. I spoke about this big usability problem last year, I think its very fundamental to how we should approach filters.

pwolanin’s picture

I'm a bit concerned that anyone trying to do a mass import will have their site crippled by this code - any way to index a max # of nodes in one page load this way?

catch’s picture

@pwolanin - there's an admin option to switch this off, I imagine sites doing mass imports wouldn't want this enabled anyway.

pwolanin’s picture

@catch - wouldn't want is different than realizing it's an option. It's an edge case, but it would be nice to at least respect the same # per page load as per cron run.

Bojhan’s picture

I think we need to understand that its not the small movement of beign able to search fo a specific item, but current interactions that we can increase tremendously. For example, menu would be able to have a search box. instead of having to remember the url for every menu item.

pwolanin’s picture

@Bojan - what are you talking about and how does it relate to this patch?

Bojhan’s picture

@pwolanin I am kind of getting dissapointed that so code to feature freeze we are getting stuck up on these issues, but basicly as I layed out. The current interaction that you need to know the node url to create a menu item is somewhat broken.

However I am not sure how this is all related to this issue, and why its beign held up. I dont see it beign an edge case that you are trying to find content in less then 3 hours after you have created it. But if node import is the problem, then lets fix it. I dont see any outcome of that hurting the UX at all.

webchick’s picture

Bojhan asked me to chime in here.

Poormanscron has fixed a number of cron-related usability problems, including peoples' watchdog entries ballooning to 4,000 GB and bringing down the site. However, the usability issue still remains of when people create a piece of content and immediately lose it, they can't find it through the search box (at least until cron has reset).

What I don't like about this fix though, and believe Dries said the same up in earlier replies, is that it couples search module and node module together, in a way that contrib cannot replicate. It's a one-off for node module and doesn't help any other module who thinks its content ought to be instantly available.

Is it possible to achieve the same results in a more generic manner? For example, node.module triggering that cron needs to be run again on insert/update, so that poormanscron would perform the action in the background when they're redirected to their view page?

JacobSingh’s picture

I totally get how this helps with the obvious problem of time lapse and finding new content.

Couple things to sort out.

The deluge:
Indexing is a dicey operation, and can be quite a drain on the system.
From our experience managing indexing / querying of tons of Drupal sites, there is an issue with indexing on nodeapi. With ApacheSolr, we index immediately in the case of a delete or unpublish (for obvious reasons). But when people do massive deletes or massive inserts, it sends a massive amount of index requests which can be a problem. This would likely kill core search.

Also, take into account a site with a lot of writes, like a blog post that gets dugg or something...

Solution?
I can't think of any other way to do it than to implement a leaky bucket. So the search indexer can handle 5 indexing requests per minute, and if it gets to 10, it will start putting them in the queue to run at cron, if it gets below 5 again, it will start processing them. It adds a check on every page load, but it's likely a small check.

Plugable
Also it needs to be plugable as @webchick mentions. Sadly, we never got a massive overhaul of the search API into D7 as we (speaking to myself as a search guy) should have done. We certainly cannot have node module have anything to do with search module.

Solution? - No, but an idea
I'm thinking in terms of the concept of Entities which we have in fields.

So the psuedo code is:

$entity being created?
search_index($entity);

which actual handler(s) indexes the content and people use to search is defined elsewhere, in a variable, or whatever, and it would be specific per entity. Perhaps they would have a signature something like:

(need not be OOP, just easier to psuedocode)
ApachesolrIndexer ext SearchIndexer
+indexableEntities()
+index($entity)
....

where as entities would have to explain themselves in a common format to Indexers.

Best,
Jacob

catch’s picture

Search module and node module are already coupled together in horrible, horrible ways, the coupling here is a symptom of that, not its cuase.

Also if other modules are maintaining their own index, and have to provide their own equivalent of node_index_node() to do that, then nothing stops them calling that directly when they insert content. They won't have search module do it for them, but search module doesn't provde an equivalent of hook_node_update_index() for them either.

catch’s picture

I really think refactoring search module is out of scope for this issue - are we going to deal with http://api.drupal.org/api/function/node_search_execute/7 here too? I agree it's horrible, but needs to be dealt with in its own right since that's not the problem this patch is trying to solve.

If a post gets dugg - then surely the issue is a lot of reads rather than a lot of writes? We're not indexing when comments are posted in the latest patch - so once it's posted in the first place, that's it.

On bulk updates, search_node_index() could either keep a static count, or like Jacob says a leaky bucket across requests - we don't really have a convenient place to store that kind of information across requests though off the top of my head, well, maybe flood control would do it.

Damien Tournoud’s picture

What I don't like about this fix though, and believe Dries said the same up in earlier replies, is that it couples search module and node module together, in a way that contrib cannot replicate. It's a one-off for node module and doesn't help any other module who thinks its content ought to be instantly available.

This is not really accurate. The search module and the node module *are* tightly coupled together, in the sense that the search modules indexes *nodes*. I already explained in a few other places that this is not wrong per se. Three pieces are, at minimum, required to make any search solution work:

  1. an indexer
  2. one or more data providers
  3. one or more search UIs

The data indexer can be cleanly separated from the other two pieces. It currently lives in the search module.

The data providers provide data from the entities (node, user, etc.) to the data indexer. They cannot be logically separated from either the entity nor the data indexer. In core, the data providers live in their entity (see node_search()). In contrib (see Apachesolr), they are implemented by the search solution itself. None of those architecture make more sense then the other.

The search UIs are kind of a grey area. It could be theoretically possible to cleanly separate them from the indexer (if the data store has a clean query interface), but not from the entities.

All this to say that this patch doesn't add anything to the coupling between node and search.

Xano’s picture

This functionality should be in Node and *not* in Search. Let's use Search only for indexing and searching. Other modules should expose themselves to Search so we don't have functions that are specific to Node, User, etc in Search. This patch also stabs #375397: Make Node module optional in the back.

Bojhan’s picture

Status: Needs review » Reviewed & tested by the community

After trying to persuade many into looking at this issue, I am going to be bold and marking it RTBC - perhaps to get more eyes, but mostly to get it in. As Damien explained in #39 are webchick's concerns in-exclusive to the implementation direction - and do not increase the coupling of node and search. It seems a lot of arguments, are more related to search.module its weirdness then this actual issue, I understand there is a large gray area - but please don't let that hold up this improvement.

This is important to get in, to fix a major critical (really that critical) issue in creating menu's (searching on a node title, when creating a menu item)

webchick’s picture

Status: Reviewed & tested by the community » Needs work

Xano makes a good point. Also, the way this is implemented (I think) would result in inconsistent user experience between core search and something like Apache Solr, unless every search module implemented this same code in it (or am I wrong?).

I agree that Jacob's approach is ultimately the best, but too heavy for this late in the development cycle. Is there a middle-ground here, like a more minor "API clean-up" style of refactoring in search.module that can be done to allow this code to be exposed in node rather than search module?

Also, I still need to clear with Dries to see whether he's up for one-offing this behaviour in general, since he was opposed in #4, though those reasons are largely moot now that we do have poormanscron in core. I can see the obvious usability benefits this patch has. But I do not want to sacrifice software architecture for usability.

catch’s picture

Assigned: catch » Unassigned
Status: Needs work » Reviewed & tested by the community

I'm slightly perturbed that #38 and #39 have been ignored when marking this CNW, please point out why those comments don't hold up. Xano also admitted on irc that he hadn't read the issue, and that he more or less agreed with me and Damien until there's a significant refactoring after reading the whole thing. This is my last post on this issue, I'm frankly annoyed that stuff like the toolbar gets in with critical bugs all over the place., but small patches fixing critical usability issues get held up on philosophical objections with zero code backing them up.

Oh, and we can do this in node.module by sticking a module_exists('search') in node_save(), that'll be pretty.

webchick’s picture

If you can point me to code in menu.module that does an if (module_exists('toolbar')) and/or implements hook_toolbar_whoozit_whatzit(), I will happily eat my misgivings about hard-coding (more) node assumptions into search module to fix a usability issue. Unless I'm terribly mistaken, there is nothing of the sort in there, because it's widely understood that the menu module is an API-related module that should be de-coupled from any specific implementation.

So is search. The fact that it is not currently as de-coupled from node as it ideally needs to be doesn't change that fact.

Is refactoring search out of scope for this issue? Of course. But that doesn't mean it couldn't be a prerequisite for it. We are in API clean-up phase right now, after all. I asked a very simple question about whether or not there was a middle ground which could be explored between gutting the entire thing and de-coupling this one area so that node could implement its own stuff. I didn't say that the lack of this was a complete no-go for this patch, it was simply a question.

Committing this patch anyway would mean committing something with a known flaw in its implementation. Not only does this give general "ooky" feelings to me as a core committer, but it also is totally insufficient for solving this very real usability problem in contributed modules. For example, I'm sure that Advanced Help module would also love to have its content indexed as soon as it's created, but this patch leaves it with no option apart from "modify search module directly, and add these few lines in." :\

So, the question: what would the most bare-bones modification to Search module's APIs to allow both node and Advanced Help module to index their content immediately upon insert/deletion/update? Is it something that could fall within scope of API clean-up? Or is it D8 material? Once we know the answer to that, we can move back to discussing whether we commit this solution anyway.

webchick’s picture

Status: Reviewed & tested by the community » Needs review
catch’s picture

node.module is an API module too, and unlike search is currently required, and the order of responsibility currently lies with optional modules to implement any hooks. This is why taxonomy module implements node hooks, and not the other way around - only field API and a generic query builder in core will put an end to those hook implementations / assumptions, a job which belongs to neither taxonomy nor node, and relies on two third party APIs to have proper decoupling. For an example of d7ux patches putting their own specific stuff in API modules, see the latest overlay patch, which adds overlay-specific markup to node module regardless of whether overlay is enabled or not, instead of implementing hook_form_alter(). That code has been in the patch for nearly six months now. toolbar, rather than putting it's own stuff in menu module, or attempting to fix menu module's API, did it's own re-implementation of menu's API functions instead, in the process introducing both bugs and performance issues all over the place, which are only recently starting to get cleaned up.

Back to this patch, there are two ways to index content with search module:

1. Call search_index() within a hook_update_index() to get your content indexed on cron. In fact hook_update_index() is just your own hook_cron() implementation, without the need to do if (module_exists('search')) in there, since it doesn't offer any index updating functionality of its own.

2. Call search_index() at any other time to index content whenever you want.

To index new help as soon as it's available, advanced_help.module could implement hook_modules_enabled() and call search_index() within there for any newly enabled modules, that's equivalent to the node_save()/module_exists('search') option here. So search.module already provides this ability, to the extent that it provides any API for indexing content at all.

However, what's being asked for here, is that search module offer a hook which is run on any possible system event where you might possibly have new indexable content added - to do that, it would not only have to duplicate hook_cron() as it currently does, but also duplicate a few dozen other hooks as well. If you can explain another way it could offer such a service that'd be great, but I don't see it.

We have two modules here - either we put assumptions about search module (which is optional, and has a far less rich API than node does) inside node, or we put assumptions about node inside search.

To avoid either, you need a search_node_sql_index.module which implements both search and node hooks to maintain the {search_index} tables, and would do the trick here on hook_node_insert() - that's not going to get done in 10 days though.

/me fails to not post on this issue again :(

Xano’s picture

I'm with catch and Damien that a small search_node_insert() outweighs the downsides in this case. There are plans in the works for Drupal 8 that will make this code (and a lot more) unnecessary for D8, but for now we'll at least have this feature/UX improvement.

kika’s picture

I'd strongly support getting this in. There is a well-documented usability problem: search is broken in early and most tender stage of people first experiencing Drupal. catch has given a thorough explanations why this change do not make things architecturally worse than it is now. We do not have time to rewrite search API for D7 in any case. So -- let's get this in, fix search API in proper way in D8.

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Haven't looked at those exceptions here.

By the way, while everyone was picking on search_node_insert() being such an API horror, we forgot to mention http://api.drupal.org/api/function/search_node_update/7

Which is almost exactly the same anyway except for the 'instant' bit.

Bojhan’s picture

Category: task » bug
Priority: Normal » Critical

Moving this to a bug-report, it looks to me to be a bug that you can't search content untill 3 hours after its created. Especially if you are running a news site or anything like that usecase, this could be considered a bug. Also looking at our coming freeze, as a sign that we should move this up - its a critical issue that hasn't gotten any activity.

I would love if we could get more feedback from the committers, it still seems somewhat silly to dismiss the small coupling over the large advantages of being able to search content after you created it and simplify the "create a menu item" administration workflow.

Status: Needs work » Needs review

romedian requested that failed test be re-tested.

Status: Needs review » Needs work

The last submitted patch failed testing.

catch’s picture

Issue tags: +Needs committer feedback

Yes we're still stuck between #44 and #46, and no architectural changes to search module have happened in the interim. Tagging with 'needs committer feedback'. If the answer is that this is postponed until search module architecture allows it to be done without a module_exists() in node, or search module implementing a node hook, then this is D8 or D9.

pwolanin’s picture

For D7 we now allow the search module to disable any/all of the search implementations, so it seems like this should possibly live as a node hook in search module - otherwise node module would have to also look at the search settings.

catch’s picture

In that case the last patch on this issue, #25, still stands as the right approach, and just needs a re-roll.

pwolanin’s picture

the new function search_node_insert needs to have an additional condition like:

if (in_array('node', variable_get('search_active_modules', array('node', 'user')))) {
sun.core’s picture

Version: 7.x-dev » 8.x-dev

That content filter didn't make it, right?

Bojhan’s picture

Version: 8.x-dev » 7.x-dev

Didn't make it? It never got the work or feedback it needed to continue, so no this is not yet Drupal 8 given that finding content in the listings of content is a big issue.

catch’s picture

Status: Needs work » Needs review
FileSize
3.46 KB

#25 + #57

Dries’s picture

I don't think this is critical (it is not really a regression), and actually suggest this is postponed to D8.

Bojhan’s picture

@Dries Please see, #51.

I really strongly disagree, that because the implementation of this is maybe not yet right. We should stall it to Drupal 8? From a user experience perspective this is a very critical bug. And I do not see any arguments to support otherwise.

The main reason we have brought search into Drupal, is because people use it a whole lot. If we dissallow them to search, the first 3 hours - the primary usecase (finding content after created) is not taken care of. Secondly, we still can't enable the search pattern on menu creation (which usability tests shown, people really need). So sorry, but this is really still a critical bug.

I am somewhat annoyed, that even though I have outlined this is a critical bug - and had a implementation direction for 6 months. The only commiter feedback is, lets not do it like this - with no direction how to it otherwise. If you say, we can't let this in - like this fine, but it is still a critical issue.

sun.core’s picture

Priority: Critical » Normal

Not critical. Please read and understand Priority levels of Issues, thanks.

Dries’s picture

Bojhan, sorry, it is not critical. Critical bugs are those that prevent Drupal from working.

Bojhan’s picture

@Dries still can you atleast outline something, that can get this issue further? I am lost on how to proceed on this, and I don't understand why there is no attention to this.

jhodgdon’s picture

Status: Needs review » Needs work

Bojhan: I don't think I agree with #51. A site that is updating its content all the time could be running cron all the time, if it wants to get things indexed on a timely basis. Right at the top (at least if you have the help module turned on):

The search engine maintains an index of words found in your site's content. To build and maintain this index, a correctly configured cron maintenance task is required. Indexing behavior can be adjusted using the settings below.

Maybe there should be some more explanation, but I don't really think that indexing content at create time is necessary.

Aside from that question, the implementation is very flawed. Not all nodes should be indexed -- for instance, unpublished ones should not be indexed. So just deciding that whenever a node is inserted it should be indexed is not a good idea. Also, if a node is updated, the index needs to be updated -- doing it at insert time only is not a good idea.

jhodgdon’s picture

As I recently mentioned to Bojhan in IRC: I would support updating the help/UI text on the Search settings page to explain what's going on in D7, and fixing this for D8.

See also #715296: cron help/ui text needs work

David_Rothstein’s picture

Status: Needs work » Needs review
FileSize
3.96 KB

Although strictly speaking this might not be a critical issue, it sure does make Drupal look pretty broken out of the box. The search module is in the standard install profile now, yet as soon as you try to use it, it doesn't actually work.

If we want to not release something that makes us look incompetent, we should be fixing this bug in Drupal 7.

***

the implementation is very flawed. Not all nodes should be indexed -- for instance, unpublished ones should not be indexed.

Why shouldn't unpublished nodes be indexed?

Also, if a node is updated, the index needs to be updated -- doing it at insert time only is not a good idea.

I agree. I think we should fix this all the way - otherwise we risk adding as much confusion as we remove.

***

In #35, @webchick suggested this:

Is it possible to achieve the same results in a more generic manner? For example, node.module triggering that cron needs to be run again on insert/update, so that poormanscron would perform the action in the background when they're redirected to their view page?

The attached patch implements that suggestion. So basically, for a site using the poormanscron feature, all we do is manipulate the times that cron actually runs so that it indexes nodes right away, rather than e.g. three hours from now. For sites not using the poormanscron feature, we do nothing - i.e., the behavior is the same as Drupal 6 (they are responsible for running their own cron as often as they want the content indexed).

Note that the attached patch also contains some quick hacks for other bugs that were necessary to get this feature working. Obviously, we would want to get those issues fixed for real before finishing this.

Personally, I'm not so convinced that this solution is better than the one in #60. However, it does meet the goal of not introducing any coupling between the node and search modules. Mostly I'm just posting this to try to move the issue along :)

catch’s picture

Status: Needs review » Needs work

If you force a cron run every time content is created (and edited if it goes that way), then all the other expensive stuff, like cache clearing and the rest, is also going to happen, rather than just the search indexing. That makes the performance concerns put earlier on this issue even more to the forefront, and means that if you have automated cron running, and also a cron job set up to run every 30 minutes, that you end up running cron far, far more than needed.

@jhogdon, the original purpose of this issue was to support search-based content filtering on /admin as well as the default search block. That seems very unlikely for D7, but if we ever want to do that, we need to support unpublished nodes as well as published in the search index.

Personally I think we should stick with the original goal here, which is just to get /something/ in the search index when people install Drupal, post content, then try the search block. I'm pretty sure there's a contrib module which handles updating the search index whenever stuff is posted, for people who want that.

jhodgdon’s picture

My mistake on not indexing unpublished content. I thought node_update_index() already did that, but it does not -- there is no WHERE in the query on any criterion in the node table:

  $result = db_query_range("SELECT n.nid FROM {node} n LEFT JOIN {search_dataset} d ON d.type = 'node' AND d.sid = n.nid WHERE d.sid IS NULL OR d.reindex <> 0 ORDER BY d.reindex ASC, n.nid ASC", 0, $limit, array(), array('target' => 'slave'));

  foreach ($result as $node) {
    _node_index_node($node);
  }

So, back to the implementation in #60. My opinions:
- Should run on update as well as insert.
- Generally, we don't want to increase the amount of coupling between node and search, so all node-specific stuff should be in node.module rather than search.*
-- The setting for yes/no to index immediately should be in http://api.drupal.org/api/function/hook_search_admin/7 not directly in search.admin.inc
-- The hooks should be in the node module rather than in the search module.
- The hooks need to check whether (a) search is enabled (b) node is an enabled search module, before deciding it's OK to index the node. See search_cron():

  foreach(variable_get('search_active_modules', array('node', 'user')) as $module) {
    // Update word index
    module_invoke($module, 'update_index');
  }
David_Rothstein’s picture

If you force a cron run every time content is created (and edited if it goes that way), then all the other expensive stuff, like cache clearing and the rest, is also going to happen, rather than just the search indexing.

Well, for cache clearing at least I don't think it makes much of a difference, since node_save() already clears the page and block caches anyway, which I believe are the only ones (at least in core) that use CACHE_TEMPORARY... So the extra cache clearing calls shouldn't really do anything.

Overall, though, I'm personally pretty happy with the general approach in #60 plus some of the changes from #70, as a targeted fix for this issue.

catch’s picture

Generally, we don't want to increase the amount of coupling between node and search, so all node-specific stuff should be in node.module rather than search.*

I still disagree with this, that just increases the coupling between search and node then, while node is a required module, and search isn't.

A likely first step to cleaning up this coupling in D8 is copying all the search_node_*() implementations, and node_search_*() implementations to a search_node_sql.module. That's a lot easier when you copy and paste a hook implementation and rename it, rather than grepping for module_exists('search'). See #46.

Bojhan’s picture

Ok, can someone reroll the patch from #60 with the fixes of #70?

jhodgdon’s picture

RE #72: Yes, we need to decouple node/user from search better in D8 -- there is an issue for that:
#237748: Decouple core search module implementations from node and user module (and turn search/node and search/user to views).

However, that doesn't mean we should add node hooks to search.module for D7, IMO.

Bojhan’s picture

Version: 7.x-dev » 8.x-dev

Guess searching content still isn't critical enough for the Drupal CMS, pushing to 8

jhodgdon’s picture

Version: 8.x-dev » 7.x-dev

Let's leave this to D7 until we decide it's not going into D7. What makes you think it can't happen? We have all sorts of things going on in search.module lately -- patching anyway, though I can't say the patches are getting committed. Sigh. Anyway, I wouldn't say this couldn't happen for D7.

Bojhan’s picture

I did it because Dries said so in : http://drupal.org/node/504012#comment-2545554

Although to me this is as critical as it gets, its clear Dries disagrees and I think we thus should just postpone to Drupal 8.

gpk’s picture

For almost a year on a D6 site I've been using a tiny custom module that implements an action "index post" (see #216508-8: run cron only on new or updated content rather than view). Works very nicely. I think this also addresses the coupling issue? It could be enabled by default for the relevant triggers on installation of search.module. Also it is potentially very configurable/can be disabled by the site admin etc. if they prefer to do it by cron.

[update] I guess the action would live in node.module, and could be enabled by the install profile (rather than when search.module is installed).

catch’s picture

Version: 7.x-dev » 8.x-dev

I don't think this will get into D7 without conifiguration, moving it back to D8.

Damien Tournoud’s picture

Category: bug » feature

Only feature requests and tasks can be assigned to D8 for now, because the Drupal 8 tree is not opened yet.

Fortunately, this is a feature request :)

jhodgdon’s picture

Version: 8.0.x-dev » 8.1.x-dev
Issue summary: View changes

Since 8.0.x-beta1 has been released, our policy at this point is No feature requests until 8.1.x. See #2350615: [policy, no patch] What changes can be accepted during the Drupal 8 beta phase?. Sorry, it's just too late for 8.0.x at this point, so even if we had a viable patch, the core committers would not commit it. So unless we decide this is a Task or a Bug (and I don't think it is), we'll have to delay it.

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.0-beta1 was released on March 2, 2016, which means new developments and disruptive changes should now be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.0-beta1 was released on August 3, 2016, which means new developments and disruptive changes should now be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

xjm’s picture

Issue tags: -Needs committer feedback

No longer needs committer feedback, since semver is clear on where this can be added.

xjm’s picture

No longer needs committer feedback; this is a minor release type of change.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.6.x-dev » 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.7.x-dev » 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

catch’s picture

Status: Needs work » Postponed

Thirteen years later there might be a way to do this in a way that the search index would be immediately indexed on sites with automated cron, while faster for sites running queues on a different schedule to cron too, but it'll need #1189464: Add an 'instant' queue runner first. Postponing on that.