Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Search module should add content to the search index when it's created by default. This avoids search being 'broken' on new sites when cron hasn't been set up (or is running more infrequently than an admin is testing search). I added a checkbox to the search settings form, defaulting to on, for sites which want to disable this - since with a big search index it could make node creation very slow. Node updates are still done on cron - we just want content added to the index in the first place so it shows up at all.
Comment | File | Size | Author |
---|---|---|---|
#68 | search-node-504012-68.patch | 3.96 KB | David_Rothstein |
#60 | six_months_later.patch | 3.46 KB | catch |
#25 | 504012-index-on-creation.patch | 3.05 KB | Damien Tournoud |
#23 | search_index_content_5.patch | 3.19 KB | catch |
#19 | search_index_content.patch | 3.94 KB | catch |
Comments
Comment #2
catchsearch_index() had a hidden bug after body as field -looks like it's missing some test coverage (this patch at least runs the code through).
Comment #3
catchwebchick reviewed in irc.
1. remove dpm($node), whoops.
2. Convert the broken search query to dbtng while we're in there.
Otherwise unchanged.
Comment #4
Dries CreditAttribution: Dries commentedThis patch feels a bit like a one-off.
With this patch, some people might think that they no longer need cron, but the reality is that they still need cron to handle node updates, new comments being posted, taxonomy terms being updated, etc. We _really_ want people to setup cron for search to work correctly. I wonder if there are better ways to educate users by that.
Maybe we should have a very basic poorman's cron in core instead?
Comment #5
pwolanin CreditAttribution: pwolanin commentedWhile this is in general a fine idea, this patch totally breaks the framework model of the search module, which basically says that search module itself knows nothing about what kind of data is being indexed by which modules.
However, was discussing the search module in D7 with webchick and she preferred to be able to have search module choose which module's have active search vs. splitting node and user search into separate modules. That makes it hard to delegate this operation to node module.
Comment #6
webchick@Dries: There's an issue for poormanscron in core at #331611: Add a poormanscron-like feature to core. There were some pretty compelling cases over there not to include it, including from Moshe, but these might not be insurmountable.
Comment #7
JacobSingh CreditAttribution: JacobSingh commentedI'm on the fence about it.
Ideally, Drupal wouldn't need to run cron, and still function, but as Dries says, it is essential. One of the disadvantages of indexing as you save content is that it can introduce load to end users which is not needed. For instance, consider a digg type site which is getting 100 comments a minute. If that node indexes (with all of those comments) every time it is saved, it's going to hurt.
Plus, it puts too great a responsibility on the site to be stable. Since search indexing is asynch, if some part of saving (but not saving itself) fails before search runs, that node would presumably never make it in.
ON the flip side, cron time indexing is slow, and doesn't respond instantly to unpublishing, etc unless a nodeapi, or whatever api hook is hit.
Comment #8
catch@Dries, we already have a cron warning if it's not run after 24 hours. However, even with the poormanscron in core patch, search index would run every three hours. That's a long time for an admin to wait to try out search module.
Webchick's primary concern with enabling search by default in the other issue was that users would 1. install drupal 2. post content 3. search for content 4. post a support request in the forums when that didn't work. If cron runs every 3 hours, one hour, even 30 minutes, that's plenty of time to run into trouble minutes after installing Drupal. So this is about instant gratification in general, not about compensating for lack of cron runs - and the reason I restricted this only to new content.
If we want to add a search box to the 'find content' screen - then again, we need quicker turnaround times - Post my page, lose it, go to 'find content', search, empty result, big fail.
@pwolanin, it doesn't break any search API cleanliness that's not already broken - see http://api.drupal.org/api/function/search_nodeapi/6. That issue needs fixing much more centrally than this one, so I'm marking this back to CNW.
@JacobSingh - please read the patch. This is done on hook_node_insert(), not hook_node_update() - so it's not going to run every time a comment is saved, just the initial node creation so that /something/ is in the index when you immediately try to find that thing you just lost. There's also an option to turn it off which is recommended for high traffic sites.
Comment #9
gpk CreditAttribution: gpk commentedInteresting, I've been experimenting with immediate search index updates on node submit, via a custom action: http://drupal.org/node/216508#comment-1724668.
A few comments:
- search indexing may require a 2nd pass (which generally happens the next time search_cron is invoked), because the bottom of http://api.drupal.org/api/function/search_index/7 may http://api.drupal.org/api/function/search_touch_node/7 various nodes. However, if per #9 the aim is merely to have *something* in the search index then maybe we can live with that. But is the fact that updates to the node (whether comments or directly to the node itself) don't get indexed immediately going to cause more confusion?
- this would likely cause problems with content permissions module since the node content in that case will depend on the current user; simple fix - turn this off. Poormanscron has the same problem, as has the run-cron link on the status report page #431776: Cron should run as anonymous when invoked via the run-cron link on the status report page.
- @2: are we saying that the redirect problem with links to uncacheable nodes has gone away? Or perhaps it was useless anyway since search would in any case take cron down when indexing a PHP node with a drupal_goto() in it?
- @4: as webchick notes, poormanscron in core #331611: Add a poormanscron-like feature to core has currently got the thumbs down. Granted there are technical and usability issues, but there also seem to be 2 perspectives in that debate - those who are thinking from the perspective of "Drupal must be great for big sites" but also the "Drupal must be great for small/blog sites" point of view. 2 different audiences, 2 different requirements. Question: does Drupal need to be great for both audiences? If so, can we manage to achieve it while keeping both audiences happy?
Comment #10
gpk CreditAttribution: gpk commentedCross posted..
Comment #11
catchOn updates - we really can't index every node on update as well because that's going to get slow (especially with lots of comments, that takes ages) - by the time you're updating content, then searching for it, then you've probably been using Drupal slightly longer so should hopefully have found your way to search settings and/or setting up cron by then. I really think it's fine to pre-optimize this just for brand new content and leave the rest as is - it's balance between the immediate feedback and slowing sites down. And it's easy to add more options for this from contrib (I think there's already a Drupal 6 module which does this for new and updated content).
Comment #12
gpk CreditAttribution: gpk commented@11: can easily distinguish between addition of comments and updating of the node itself... http://api.drupal.org/api/function/hook_comment_insert/7 vs. http://api.drupal.org/api/function/hook_node_update/7. The challenge here is I think to get round the initial 'big fail' without causing confusion for the user on day 2 (why do some things appear in the search index immediately but not others).
Comment #13
catch@gpk, hmm. That's a good point - I'm still worried about performance though. If I add a node, it will be indexed once with or without this patch. If I add a comment, 50 people could add a comment before the next cron run, resulting in 50 times as much work done by the server. It's also less likely that admins are posting comments to their own content and searching for the contents of the comments than it is for nodes, although I agree it's an extremely fine line.
Attached patch uses register_shutdown_function() for node_index_node() and search_update_totals() - this is in line with the logic in search_cron(), and I think it should partially allay Jacob's concerns about node saving getting interrupted.
Comment #14
gpk CreditAttribution: gpk commented>50 people could add a comment before the next cron run
Sounds like a high traffic site to me ;)
Comment #15
catchAgreed, added that to the patch.
Comment #16
catchThis time in a less broken way.
Comment #17
TheRec CreditAttribution: TheRec commentedThe patch does what is advertised so far. Tried to reindex multiple times, tested with and without the checkbox enabled and all works as expected. (On a side note... the sentence "100% of the site has been indexed. There are 0 items left to index." is descriptive, but in terms of usability, the user does not know how many items were indexed.)
Indexing content when it is created is a good idea, but this might give a false impression to users that their content is generally indexed on-the-fly and make them think that this will be done on update too, which is not the case (and, as stated before, "should not" or we would end up with huge performance issues). I think that the same people who follow the path described by catch ("1. install drupal 2. post content 3. search for content 4. post a support request in the forums when that didn't work") will just see their problem be delayed until they realize that their content is not updated in the index (because they did not run cron on a regular basis) when they modify it (or it gets modified with comments, etc.).
In all cases, the users who enable search module should be warned that if cron is not ran on a regular basis, their content will not be indexed... or if this patch gets committed that it will be indexed upon creation and not updated (unless cron is ran). We should also give a scope of what is indexed (content, fields, comments, etc.), because this is not clear either and somewhat leave the users in obscurity (unless they read the documentation).
Comment #19
catchRe-rolled.
Comment #21
Bojhan CreditAttribution: Bojhan commentedSo, spend some time looking at the code - it looks like its outdated hence why tests are failing. Other then that, I see no UX reason why this shouldn't be done.
Comment #22
catchHmm, sadly not stale. I have no idea why the search ranking tests or upload are failing.
Comment #23
catchScaling back to just nodes.
Comment #25
Damien Tournoud CreditAttribution: Damien Tournoud commentedThe previous patches failed because the node was indexed before all the hook_node_insert() implementation had the time to kick-in (especially the upload module one). Moved the indexation to a shutdown function instead.
Comment #26
pwolanin CreditAttribution: pwolanin commentedSeems like this is even less necessary with the poorman's cron in core?
Comment #27
catchThe idea here is that you can enter content, then immediately have it return in search results - mainly if we replace the filters at admin/content with a search box, poormanscron wouldn't help with that since it's every three hours rather than immediate. So it's two different things IMO.
Comment #28
Bojhan CreditAttribution: Bojhan commentedWow, great - some traction on this topic! I think catch explains it clearly, from almost all of the observations done where the content administration page is flooded (many nodes) it becomes nearly impossible to filter correctly. I spoke about this big usability problem last year, I think its very fundamental to how we should approach filters.
Comment #29
pwolanin CreditAttribution: pwolanin commentedI'm a bit concerned that anyone trying to do a mass import will have their site crippled by this code - any way to index a max # of nodes in one page load this way?
Comment #30
catch@pwolanin - there's an admin option to switch this off, I imagine sites doing mass imports wouldn't want this enabled anyway.
Comment #31
pwolanin CreditAttribution: pwolanin commented@catch - wouldn't want is different than realizing it's an option. It's an edge case, but it would be nice to at least respect the same # per page load as per cron run.
Comment #32
Bojhan CreditAttribution: Bojhan commentedI think we need to understand that its not the small movement of beign able to search fo a specific item, but current interactions that we can increase tremendously. For example, menu would be able to have a search box. instead of having to remember the url for every menu item.
Comment #33
pwolanin CreditAttribution: pwolanin commented@Bojan - what are you talking about and how does it relate to this patch?
Comment #34
Bojhan CreditAttribution: Bojhan commented@pwolanin I am kind of getting dissapointed that so code to feature freeze we are getting stuck up on these issues, but basicly as I layed out. The current interaction that you need to know the node url to create a menu item is somewhat broken.
However I am not sure how this is all related to this issue, and why its beign held up. I dont see it beign an edge case that you are trying to find content in less then 3 hours after you have created it. But if node import is the problem, then lets fix it. I dont see any outcome of that hurting the UX at all.
Comment #35
webchickBojhan asked me to chime in here.
Poormanscron has fixed a number of cron-related usability problems, including peoples' watchdog entries ballooning to 4,000 GB and bringing down the site. However, the usability issue still remains of when people create a piece of content and immediately lose it, they can't find it through the search box (at least until cron has reset).
What I don't like about this fix though, and believe Dries said the same up in earlier replies, is that it couples search module and node module together, in a way that contrib cannot replicate. It's a one-off for node module and doesn't help any other module who thinks its content ought to be instantly available.
Is it possible to achieve the same results in a more generic manner? For example, node.module triggering that cron needs to be run again on insert/update, so that poormanscron would perform the action in the background when they're redirected to their view page?
Comment #36
JacobSingh CreditAttribution: JacobSingh commentedI totally get how this helps with the obvious problem of time lapse and finding new content.
Couple things to sort out.
The deluge:
Indexing is a dicey operation, and can be quite a drain on the system.
From our experience managing indexing / querying of tons of Drupal sites, there is an issue with indexing on nodeapi. With ApacheSolr, we index immediately in the case of a delete or unpublish (for obvious reasons). But when people do massive deletes or massive inserts, it sends a massive amount of index requests which can be a problem. This would likely kill core search.
Also, take into account a site with a lot of writes, like a blog post that gets dugg or something...
Solution?
I can't think of any other way to do it than to implement a leaky bucket. So the search indexer can handle 5 indexing requests per minute, and if it gets to 10, it will start putting them in the queue to run at cron, if it gets below 5 again, it will start processing them. It adds a check on every page load, but it's likely a small check.
Plugable
Also it needs to be plugable as @webchick mentions. Sadly, we never got a massive overhaul of the search API into D7 as we (speaking to myself as a search guy) should have done. We certainly cannot have node module have anything to do with search module.
Solution? - No, but an idea
I'm thinking in terms of the concept of Entities which we have in fields.
So the psuedo code is:
$entity being created?
search_index($entity);
which actual handler(s) indexes the content and people use to search is defined elsewhere, in a variable, or whatever, and it would be specific per entity. Perhaps they would have a signature something like:
(need not be OOP, just easier to psuedocode)
ApachesolrIndexer ext SearchIndexer
+indexableEntities()
+index($entity)
....
where as entities would have to explain themselves in a common format to Indexers.
Best,
Jacob
Comment #37
catchSearch module and node module are already coupled together in horrible, horrible ways, the coupling here is a symptom of that, not its cuase.
Also if other modules are maintaining their own index, and have to provide their own equivalent of node_index_node() to do that, then nothing stops them calling that directly when they insert content. They won't have search module do it for them, but search module doesn't provde an equivalent of hook_node_update_index() for them either.
Comment #38
catchI really think refactoring search module is out of scope for this issue - are we going to deal with http://api.drupal.org/api/function/node_search_execute/7 here too? I agree it's horrible, but needs to be dealt with in its own right since that's not the problem this patch is trying to solve.
If a post gets dugg - then surely the issue is a lot of reads rather than a lot of writes? We're not indexing when comments are posted in the latest patch - so once it's posted in the first place, that's it.
On bulk updates, search_node_index() could either keep a static count, or like Jacob says a leaky bucket across requests - we don't really have a convenient place to store that kind of information across requests though off the top of my head, well, maybe flood control would do it.
Comment #39
Damien Tournoud CreditAttribution: Damien Tournoud commentedThis is not really accurate. The search module and the node module *are* tightly coupled together, in the sense that the search modules indexes *nodes*. I already explained in a few other places that this is not wrong per se. Three pieces are, at minimum, required to make any search solution work:
The data indexer can be cleanly separated from the other two pieces. It currently lives in the search module.
The data providers provide data from the entities (node, user, etc.) to the data indexer. They cannot be logically separated from either the entity nor the data indexer. In core, the data providers live in their entity (see node_search()). In contrib (see Apachesolr), they are implemented by the search solution itself. None of those architecture make more sense then the other.
The search UIs are kind of a grey area. It could be theoretically possible to cleanly separate them from the indexer (if the data store has a clean query interface), but not from the entities.
All this to say that this patch doesn't add anything to the coupling between node and search.
Comment #40
XanoThis functionality should be in Node and *not* in Search. Let's use Search only for indexing and searching. Other modules should expose themselves to Search so we don't have functions that are specific to Node, User, etc in Search. This patch also stabs #375397: Make Node module optional in the back.
Comment #41
Bojhan CreditAttribution: Bojhan commentedAfter trying to persuade many into looking at this issue, I am going to be bold and marking it RTBC - perhaps to get more eyes, but mostly to get it in. As Damien explained in #39 are webchick's concerns in-exclusive to the implementation direction - and do not increase the coupling of node and search. It seems a lot of arguments, are more related to search.module its weirdness then this actual issue, I understand there is a large gray area - but please don't let that hold up this improvement.
This is important to get in, to fix a major critical (really that critical) issue in creating menu's (searching on a node title, when creating a menu item)
Comment #42
webchickXano makes a good point. Also, the way this is implemented (I think) would result in inconsistent user experience between core search and something like Apache Solr, unless every search module implemented this same code in it (or am I wrong?).
I agree that Jacob's approach is ultimately the best, but too heavy for this late in the development cycle. Is there a middle-ground here, like a more minor "API clean-up" style of refactoring in search.module that can be done to allow this code to be exposed in node rather than search module?
Also, I still need to clear with Dries to see whether he's up for one-offing this behaviour in general, since he was opposed in #4, though those reasons are largely moot now that we do have poormanscron in core. I can see the obvious usability benefits this patch has. But I do not want to sacrifice software architecture for usability.
Comment #43
catchI'm slightly perturbed that #38 and #39 have been ignored when marking this CNW, please point out why those comments don't hold up. Xano also admitted on irc that he hadn't read the issue, and that he more or less agreed with me and Damien until there's a significant refactoring after reading the whole thing. This is my last post on this issue, I'm frankly annoyed that stuff like the toolbar gets in with critical bugs all over the place., but small patches fixing critical usability issues get held up on philosophical objections with zero code backing them up.
Oh, and we can do this in node.module by sticking a module_exists('search') in node_save(), that'll be pretty.
Comment #44
webchickIf you can point me to code in menu.module that does an if (module_exists('toolbar')) and/or implements hook_toolbar_whoozit_whatzit(), I will happily eat my misgivings about hard-coding (more) node assumptions into search module to fix a usability issue. Unless I'm terribly mistaken, there is nothing of the sort in there, because it's widely understood that the menu module is an API-related module that should be de-coupled from any specific implementation.
So is search. The fact that it is not currently as de-coupled from node as it ideally needs to be doesn't change that fact.
Is refactoring search out of scope for this issue? Of course. But that doesn't mean it couldn't be a prerequisite for it. We are in API clean-up phase right now, after all. I asked a very simple question about whether or not there was a middle ground which could be explored between gutting the entire thing and de-coupling this one area so that node could implement its own stuff. I didn't say that the lack of this was a complete no-go for this patch, it was simply a question.
Committing this patch anyway would mean committing something with a known flaw in its implementation. Not only does this give general "ooky" feelings to me as a core committer, but it also is totally insufficient for solving this very real usability problem in contributed modules. For example, I'm sure that Advanced Help module would also love to have its content indexed as soon as it's created, but this patch leaves it with no option apart from "modify search module directly, and add these few lines in." :\
So, the question: what would the most bare-bones modification to Search module's APIs to allow both node and Advanced Help module to index their content immediately upon insert/deletion/update? Is it something that could fall within scope of API clean-up? Or is it D8 material? Once we know the answer to that, we can move back to discussing whether we commit this solution anyway.
Comment #45
webchickComment #46
catchnode.module is an API module too, and unlike search is currently required, and the order of responsibility currently lies with optional modules to implement any hooks. This is why taxonomy module implements node hooks, and not the other way around - only field API and a generic query builder in core will put an end to those hook implementations / assumptions, a job which belongs to neither taxonomy nor node, and relies on two third party APIs to have proper decoupling. For an example of d7ux patches putting their own specific stuff in API modules, see the latest overlay patch, which adds overlay-specific markup to node module regardless of whether overlay is enabled or not, instead of implementing hook_form_alter(). That code has been in the patch for nearly six months now. toolbar, rather than putting it's own stuff in menu module, or attempting to fix menu module's API, did it's own re-implementation of menu's API functions instead, in the process introducing both bugs and performance issues all over the place, which are only recently starting to get cleaned up.
Back to this patch, there are two ways to index content with search module:
1. Call search_index() within a hook_update_index() to get your content indexed on cron. In fact hook_update_index() is just your own hook_cron() implementation, without the need to do if (module_exists('search')) in there, since it doesn't offer any index updating functionality of its own.
2. Call search_index() at any other time to index content whenever you want.
To index new help as soon as it's available, advanced_help.module could implement hook_modules_enabled() and call search_index() within there for any newly enabled modules, that's equivalent to the node_save()/module_exists('search') option here. So search.module already provides this ability, to the extent that it provides any API for indexing content at all.
However, what's being asked for here, is that search module offer a hook which is run on any possible system event where you might possibly have new indexable content added - to do that, it would not only have to duplicate hook_cron() as it currently does, but also duplicate a few dozen other hooks as well. If you can explain another way it could offer such a service that'd be great, but I don't see it.
We have two modules here - either we put assumptions about search module (which is optional, and has a far less rich API than node does) inside node, or we put assumptions about node inside search.
To avoid either, you need a search_node_sql_index.module which implements both search and node hooks to maintain the {search_index} tables, and would do the trick here on hook_node_insert() - that's not going to get done in 10 days though.
/me fails to not post on this issue again :(
Comment #47
XanoI'm with catch and Damien that a small search_node_insert() outweighs the downsides in this case. There are plans in the works for Drupal 8 that will make this code (and a lot more) unnecessary for D8, but for now we'll at least have this feature/UX improvement.
Comment #48
kika CreditAttribution: kika commentedI'd strongly support getting this in. There is a well-documented usability problem: search is broken in early and most tender stage of people first experiencing Drupal. catch has given a thorough explanations why this change do not make things architecturally worse than it is now. We do not have time to rewrite search API for D7 in any case. So -- let's get this in, fix search API in proper way in D8.
Comment #50
catchHaven't looked at those exceptions here.
By the way, while everyone was picking on search_node_insert() being such an API horror, we forgot to mention http://api.drupal.org/api/function/search_node_update/7
Which is almost exactly the same anyway except for the 'instant' bit.
Comment #51
Bojhan CreditAttribution: Bojhan commentedMoving this to a bug-report, it looks to me to be a bug that you can't search content untill 3 hours after its created. Especially if you are running a news site or anything like that usecase, this could be considered a bug. Also looking at our coming freeze, as a sign that we should move this up - its a critical issue that hasn't gotten any activity.
I would love if we could get more feedback from the committers, it still seems somewhat silly to dismiss the small coupling over the large advantages of being able to search content after you created it and simplify the "create a menu item" administration workflow.
Comment #54
catchYes we're still stuck between #44 and #46, and no architectural changes to search module have happened in the interim. Tagging with 'needs committer feedback'. If the answer is that this is postponed until search module architecture allows it to be done without a module_exists() in node, or search module implementing a node hook, then this is D8 or D9.
Comment #55
pwolanin CreditAttribution: pwolanin commentedFor D7 we now allow the search module to disable any/all of the search implementations, so it seems like this should possibly live as a node hook in search module - otherwise node module would have to also look at the search settings.
Comment #56
catchIn that case the last patch on this issue, #25, still stands as the right approach, and just needs a re-roll.
Comment #57
pwolanin CreditAttribution: pwolanin commentedthe new function search_node_insert needs to have an additional condition like:
Comment #58
sun.core CreditAttribution: sun.core commentedThat content filter didn't make it, right?
Comment #59
Bojhan CreditAttribution: Bojhan commentedDidn't make it? It never got the work or feedback it needed to continue, so no this is not yet Drupal 8 given that finding content in the listings of content is a big issue.
Comment #60
catch#25 + #57
Comment #61
Dries CreditAttribution: Dries commentedI don't think this is critical (it is not really a regression), and actually suggest this is postponed to D8.
Comment #62
Bojhan CreditAttribution: Bojhan commented@Dries Please see, #51.
I really strongly disagree, that because the implementation of this is maybe not yet right. We should stall it to Drupal 8? From a user experience perspective this is a very critical bug. And I do not see any arguments to support otherwise.
The main reason we have brought search into Drupal, is because people use it a whole lot. If we dissallow them to search, the first 3 hours - the primary usecase (finding content after created) is not taken care of. Secondly, we still can't enable the search pattern on menu creation (which usability tests shown, people really need). So sorry, but this is really still a critical bug.
I am somewhat annoyed, that even though I have outlined this is a critical bug - and had a implementation direction for 6 months. The only commiter feedback is, lets not do it like this - with no direction how to it otherwise. If you say, we can't let this in - like this fine, but it is still a critical issue.
Comment #63
sun.core CreditAttribution: sun.core commentedNot critical. Please read and understand Priority levels of Issues, thanks.
Comment #64
Dries CreditAttribution: Dries commentedBojhan, sorry, it is not critical. Critical bugs are those that prevent Drupal from working.
Comment #65
Bojhan CreditAttribution: Bojhan commented@Dries still can you atleast outline something, that can get this issue further? I am lost on how to proceed on this, and I don't understand why there is no attention to this.
Comment #66
jhodgdonBojhan: I don't think I agree with #51. A site that is updating its content all the time could be running cron all the time, if it wants to get things indexed on a timely basis. Right at the top (at least if you have the help module turned on):
Maybe there should be some more explanation, but I don't really think that indexing content at create time is necessary.
Aside from that question, the implementation is very flawed. Not all nodes should be indexed -- for instance, unpublished ones should not be indexed. So just deciding that whenever a node is inserted it should be indexed is not a good idea. Also, if a node is updated, the index needs to be updated -- doing it at insert time only is not a good idea.
Comment #67
jhodgdonAs I recently mentioned to Bojhan in IRC: I would support updating the help/UI text on the Search settings page to explain what's going on in D7, and fixing this for D8.
See also #715296: cron help/ui text needs work
Comment #68
David_Rothstein CreditAttribution: David_Rothstein commentedAlthough strictly speaking this might not be a critical issue, it sure does make Drupal look pretty broken out of the box. The search module is in the standard install profile now, yet as soon as you try to use it, it doesn't actually work.
If we want to not release something that makes us look incompetent, we should be fixing this bug in Drupal 7.
***
Why shouldn't unpublished nodes be indexed?
I agree. I think we should fix this all the way - otherwise we risk adding as much confusion as we remove.
***
In #35, @webchick suggested this:
The attached patch implements that suggestion. So basically, for a site using the poormanscron feature, all we do is manipulate the times that cron actually runs so that it indexes nodes right away, rather than e.g. three hours from now. For sites not using the poormanscron feature, we do nothing - i.e., the behavior is the same as Drupal 6 (they are responsible for running their own cron as often as they want the content indexed).
Note that the attached patch also contains some quick hacks for other bugs that were necessary to get this feature working. Obviously, we would want to get those issues fixed for real before finishing this.
Personally, I'm not so convinced that this solution is better than the one in #60. However, it does meet the goal of not introducing any coupling between the node and search modules. Mostly I'm just posting this to try to move the issue along :)
Comment #69
catchIf you force a cron run every time content is created (and edited if it goes that way), then all the other expensive stuff, like cache clearing and the rest, is also going to happen, rather than just the search indexing. That makes the performance concerns put earlier on this issue even more to the forefront, and means that if you have automated cron running, and also a cron job set up to run every 30 minutes, that you end up running cron far, far more than needed.
@jhogdon, the original purpose of this issue was to support search-based content filtering on /admin as well as the default search block. That seems very unlikely for D7, but if we ever want to do that, we need to support unpublished nodes as well as published in the search index.
Personally I think we should stick with the original goal here, which is just to get /something/ in the search index when people install Drupal, post content, then try the search block. I'm pretty sure there's a contrib module which handles updating the search index whenever stuff is posted, for people who want that.
Comment #70
jhodgdonMy mistake on not indexing unpublished content. I thought node_update_index() already did that, but it does not -- there is no WHERE in the query on any criterion in the node table:
So, back to the implementation in #60. My opinions:
- Should run on update as well as insert.
- Generally, we don't want to increase the amount of coupling between node and search, so all node-specific stuff should be in node.module rather than search.*
-- The setting for yes/no to index immediately should be in http://api.drupal.org/api/function/hook_search_admin/7 not directly in search.admin.inc
-- The hooks should be in the node module rather than in the search module.
- The hooks need to check whether (a) search is enabled (b) node is an enabled search module, before deciding it's OK to index the node. See search_cron():
Comment #71
David_Rothstein CreditAttribution: David_Rothstein commentedWell, for cache clearing at least I don't think it makes much of a difference, since node_save() already clears the page and block caches anyway, which I believe are the only ones (at least in core) that use CACHE_TEMPORARY... So the extra cache clearing calls shouldn't really do anything.
Overall, though, I'm personally pretty happy with the general approach in #60 plus some of the changes from #70, as a targeted fix for this issue.
Comment #72
catchI still disagree with this, that just increases the coupling between search and node then, while node is a required module, and search isn't.
A likely first step to cleaning up this coupling in D8 is copying all the search_node_*() implementations, and node_search_*() implementations to a search_node_sql.module. That's a lot easier when you copy and paste a hook implementation and rename it, rather than grepping for module_exists('search'). See #46.
Comment #73
Bojhan CreditAttribution: Bojhan commentedOk, can someone reroll the patch from #60 with the fixes of #70?
Comment #74
jhodgdonRE #72: Yes, we need to decouple node/user from search better in D8 -- there is an issue for that:
#237748: Decouple core search module implementations from node and user module (and turn search/node and search/user to views).
However, that doesn't mean we should add node hooks to search.module for D7, IMO.
Comment #75
Bojhan CreditAttribution: Bojhan commentedGuess searching content still isn't critical enough for the Drupal CMS, pushing to 8
Comment #76
jhodgdonLet's leave this to D7 until we decide it's not going into D7. What makes you think it can't happen? We have all sorts of things going on in search.module lately -- patching anyway, though I can't say the patches are getting committed. Sigh. Anyway, I wouldn't say this couldn't happen for D7.
Comment #77
Bojhan CreditAttribution: Bojhan commentedI did it because Dries said so in : http://drupal.org/node/504012#comment-2545554
Although to me this is as critical as it gets, its clear Dries disagrees and I think we thus should just postpone to Drupal 8.
Comment #78
gpk CreditAttribution: gpk commentedFor almost a year on a D6 site I've been using a tiny custom module that implements an action "index post" (see #216508-8: run cron only on new or updated content rather than view). Works very nicely. I think this also addresses the coupling issue? It could be enabled by default for the relevant triggers on installation of search.module. Also it is potentially very configurable/can be disabled by the site admin etc. if they prefer to do it by cron.
[update] I guess the action would live in node.module, and could be enabled by the install profile (rather than when search.module is installed).
Comment #79
catchI don't think this will get into D7 without conifiguration, moving it back to D8.
Comment #80
Damien Tournoud CreditAttribution: Damien Tournoud commentedOnly feature requests and tasks can be assigned to D8 for now, because the Drupal 8 tree is not opened yet.
Fortunately, this is a feature request :)
Comment #81
jhodgdonSince 8.0.x-beta1 has been released, our policy at this point is No feature requests until 8.1.x. See #2350615: [policy, no patch] What changes can be accepted during the Drupal 8 beta phase?. Sorry, it's just too late for 8.0.x at this point, so even if we had a viable patch, the core committers would not commit it. So unless we decide this is a Task or a Bug (and I don't think it is), we'll have to delay it.
Comment #84
xjmNo longer needs committer feedback, since semver is clear on where this can be added.
Comment #85
xjmNo longer needs committer feedback; this is a minor release type of change.
Comment #99
catchThirteen years later there might be a way to do this in a way that the search index would be immediately indexed on sites with automated cron, while faster for sites running queues on a different schedule to cron too, but it'll need #1189464: Add an 'instant' queue runner first. Postponing on that.