The search settings page currently gives a textual status of how many items are left to index. This patch keeps the textual status, but adds a progress bar for a better visual.
I made several UI changes in an attempt to make the buttons and text clearer and more accurate.
The Re-index button on the search settings page is confusing because the button doesn't actually perform a re-index. This button clears the index (sort-of), and doesn't do any interactive re-indexing, but rather it clears the flags to that cron will re-index everything. I've tried to cleanup this language.
We are missing a true re-index. This has been available in the reindex.module for a couple years, but only a handful (maybe fewer) people know about it. I've included that code here. This patch adds true search index building using the batchapi. While there's not much too it, it was put into a separate include to minimize it's impact.
@TODO:
- The batchapi is still lacking "Cancel" functionality. I've attempted to add a primitive "Cancel" link, but this link causes errors.
- The SQL needs to be converted to DB TNG
Comment | File | Size | Author |
---|---|---|---|
#48 | 568350.patch | 10.11 KB | douggreen |
#47 | 568350.patch | 10.12 KB | douggreen |
#46 | 568350.patch | 17.13 KB | douggreen |
#43 | Cancel.png | 84.15 KB | douggreen |
#42 | 568350.patch | 16.99 KB | douggreen |
Comments
Comment #1
douggreen CreditAttribution: douggreen commentedUpdated version fixes the #description help text, changes some text, and turns the progress bar from an "active" bar to an "inactive" bar.
Comment #2
douggreen CreditAttribution: douggreen commentedThe previous updates broke the progress bar ... this fixes it. One side effect of this patch is that theme('progress') now has an optional third argument, which is the class. If this class is set to 'inactive', then the progress bar gif is not shown. I don't really like the way this is implemented, because you have to know to use the class 'inactive' but maybe as long as it's document, it's fine.
Comment #4
douggreen CreditAttribution: douggreen commentedTesting failures were due to theme_progress_bar's new argument not having a default value... corrected in attached patch.
Comment #5
douggreen CreditAttribution: douggreen commentedComment #6
Dries CreditAttribution: Dries commentedI always thought it was a feature that the re-indexing happens using cron so I'm not 100% convinced that using the batch API is an improvement. Can't we simply improve people's expectations by changing the name of the label, or adding some extra help text. It seems like that could get the job done with a lot less code.
Comment #7
agentrickard@Dries
Doug and I were discussing this during DC Paris, and the Batch API option is very desirable for developers and power users, especially during site building and testing. It may be that this needs to live in contrib, but we'd like to examine it for core.
/me goes to review the actual patch now.
Comment #8
douggreen CreditAttribution: douggreen commented@dries, yes we could just cleanup the UI, and then leave the batchapi in a contrib module. The contrib module to do this has existed for a couple years and nobody seems to know about it, so that's another problem. We are adding some code to handle the batchapi stuff, but ALL of that code is in a separate include that is only loaded when it is needed, but it's currently 121 lines.
Attached is an updated patch that adds more granular messages based on if all, none, or some things need to be indexed. It also hides the one or the other button when it doesn't make sense. Lastly, I had some help with the messages, and tried to make them more positive, than negative, in tone.
Comment #9
douggreen CreditAttribution: douggreen commentedSee also #394102: Make search module usable
Comment #10
douggreen CreditAttribution: douggreen commentedI've added a hook_requirements for the status page, based on #394102: Make search module usable. I also spent some time with Josh working on the messaging and workflow. We may want to break this up into several patches (usability, batch processing, and hook_requirements), but for now, it's one big patch.
Comment #11
douggreen CreditAttribution: douggreen commentedAnd some screen snapshots ...
Comment #12
douggreen CreditAttribution: douggreen commentedAdding pictures of the status page, Bojhan already tells me to remove the progress bar from the status page.
Comment #13
douggreen CreditAttribution: douggreen commentedAttached patch hopefully has better messaging, as worked out with Bohjan. It also removes the progress bar from the status page. New screenshots also attached.
Comment #14
Bojhan CreditAttribution: Bojhan commentedI sat beside douggreen at Drupalcon to discuss these text. The confirmation messages sound all right to me, obviously the title is somewhat wrong - but I want to fix that in a later patch for all confirmation messages.
I think we don't want to use status bar messages in the status report, since it draws to much attention. The text as displayed in the last images for the status report seem oke to me, clear and concise.
Comment #16
douggreen CreditAttribution: douggreen commentedThere was another commit in system.css that through off the lines here... New patch attached.
Comment #17
agentrickardHm. After I installed the patch and ran a batch update, here is the screen I get. I wonder why the index count is off?
UPDATE: there are 52 items in {search_dataset} but the progress bar says 25. Looking into the code.
Comment #18
agentrickardOK, some of the nodes were set to reindex (or not reset, during the batch process), but I have no idea why. Investigating some more
Comment #19
agentrickardAnd while I have 52 nodes and 52 users, and both search modules are set to index, the search progress indicates I only have 52 items and {search_dataset} only has 52 records, all nodes.
Comment #20
douggreen CreditAttribution: douggreen commentedThanks Ken! I saw the same problem this morning, and forgot about. The problem is that search is noticing internal links on the node to itself, via the "Add new comment" link. I've created a separate issue for this, see #569536: Search links cause many nodes to be indexed twice.
Comment #21
douggreen CreditAttribution: douggreen commentedAttaching a couple minor code cleanups: changed text to remove the word "here" in one place, removed unused code stop the batchapi.
Comment #22
yched CreditAttribution: yched commentedI don't get that part.
Why don't we just have search_batch_index() as one single multipass batch operation that selects the next N nodes to reindex, treats them and returns ?
That would be preferable to adding a potentially large number of batch operations each with a potentially large array of nids. Heavy $batch structures are slower to process.
I'm on crack. Are you, too?
Comment #23
douggreen CreditAttribution: douggreen commented@yched, you're the batch expert! I was concerned that if we just said, "go do the next 100 nodes," that more nodes would be created and/or changed during this time, and that the progress bar would be messed up. I wrote this code two years ago in Barcelona (with your help) when batch api was new. How do we do "have search_batch_index() as one single multipass batch operation"?
Comment #24
yched CreditAttribution: yched commenteddoug: you can look at node_access_rebuild() and _node_access_rebuild_batch_operation() for a single-multipass-op example:
- count the 'total' number of nodes the first time the op is called
- on each pass, grab the next N nodes using a query with a range and an 'order by'.
About nodes being added/updated during the processing: the most important thing is to be sure that you don't report '#finished = 1' before you are sure that all nodes have been processed. When you have treated a number of node equal to 'total', you'll need to check whether there are new ones to process. Thus, the order used in the query that retreives the next N nodes should ensure that such nodes come last.
This might cause the progress bar to jump back a little as you found out 'total' is not your initial estimate, but that's not really an issue IMO.
Then again, those nodes that appear during the processing would be indexed during the next cron, right? Is it that bad if the batch reindex skips them ? They were not here when we started...
Side note: you might want to ensure no cron reindex happens while the batch reindex is running. Is that handled already ? If not, this sounds like a use case for http://api.drupal.org/api/group/lock/7
Comment #25
douggreen CreditAttribution: douggreen commentedI've updated the patch with @yched's batchapi improvements. I still don't like the "cancel" handling. And I wish we could show a progress description using actual node counts instead of "0 of 1".
Comment #26
yched CreditAttribution: yched commentedGreat ! Additional remarks:
- Please use $context['sandbox'] instead of $context['results'] to store processing data between iterations.
$context['sandbox'] persists only during the iterations of a given multipass batch op.
$context['results'] persists through all operations of the batch, and is used to present 'results' (whatever that might mean for the actual processing being batched) at the end of the processing. In this case, this only applies to the 'Finished indexing @total nodes.' message.
- No biggie, but you don't really *need* to use variable_get('search_cron_limit') here. search_cron_limit makes sense for cron, not really for direct 'under your eyes' rebuild. You can chose an arbitrary value like N = 10, 50 or 100. The function indexes N nodes, and will be called repeatedly by batch API until we exceed 1sec processing time, then we report progress and wait for the next HTTP request. Using huge values like 500 (possible value for search_cron_limit) won't really get you done faster, only report progress less frequently to the users. Since search admin proposes values as low as 10 (10, 20, 50, 100, 200, 500), I'd suggest using 10.
- May I suggest _search_index_batch_operation() for the name of the function ? Stays 'consistent' with _node_access_rebuild_batch_operation()
- The code that loads, builds, renders and indexes the node would be best separated in a helper function and reused with 'regular', cron-based reindex. Or there's 99% chances a later change in the one won't be reported in the other.
Note that I'm strictly helping the batch-related code, not reviewing the whole patch or feature.
Comment #27
douggreen CreditAttribution: douggreen commentedThanks @yched! If I changed from $context['results'] to $context['sandbox'], how do I reference my variables in the finished function?
Comment #28
yched CreditAttribution: yched commentedForgot to add : are you sure the query that grabs the next N nodes in _search_batch_index() won't retrieve always the same nodes ?
Typically, we store the nid of the last node treated in the sandbox, and use a WHERE node.nid > $context['sandbox']['last_nid'] SORT BY node.nid in the request that grabs the next N.
If you want to display the number of indexed nodes at the end, you'd store that one in $context['results']['indexed_nodes'].
Comment #29
Dries CreditAttribution: Dries commentedIn the screenshots, a progress bar is shown on the confirmation page. That's confusing me. Is it running? I recommend that we get rid of it, or that we find something better because it is a very unusual and confusing UI pattern to show a non-working progress bar on a confirmation screen.
Comment #30
douggreen CreditAttribution: douggreen commented@dries, We removed the progress bar on the configuration change some time ago based on Bojhan's recommendation. In comment 13 I uploaded new screenshots, but didn't upload one for the config page.
Comment #31
douggreen CreditAttribution: douggreen commentedUpdated version includes:
I still don't like the cancel operation ... but I don't think anything better can be implemented with the existing batchapi, and I think that a cancel is important.
I don't like the extra code to separate $context['results'] and $context['sandbox']
Comment #32
yched CreditAttribution: yched commentedPatch misses search.batch.inc ?
Comment #33
douggreen CreditAttribution: douggreen commentedGrrr :( I already cleared my directory to work on another patch. So here's an updated re-creation. In this version I changed one more thing, which is that I dropped the message in the finished function that needed to know the count from $context['sandbox'], so as to simplify the $context['sandbox'] verse $context['results'] code.
In your review of batchapi, I'm concerned about two things that are a little funky:
Also, please review the use of t() and l() on line 82 of search.batch.inc, and comment on whether this is the best practice way of creating this message. Thanks!
Comment #34
douggreen CreditAttribution: douggreen commentedI'm adding the tag "Needs Usability Review," although Bohjan has already taken one look at this, to see if we can get a "Usability Team Approved" tag.
Comment #35
douggreen CreditAttribution: douggreen commentedAnd I forgot to change the status back to 'needs review'.
Comment #36
yched CreditAttribution: yched commented- 'Completed @current of @total' message:
A bit ugly ;-).
The displayed message is the concatenation of 1) the generic $batch['progress_message'] 2) the specific message set by the last batch op in $context['message'].
I'd suggest you set
$batch['progress_message'] = '';
in search_batch_index_confirm_submit(), and set$context['message'] = t('Completed @current of @total.', ...);
in _search_batch_index().- A generic 'cancel' feature for batch API is a little tricky. *If* in the case of nodes indexing you can safely just break out without any cleanup work (dunno for sure, I leave it up to you to confirm that), then your approach sounds like it should work.
- I still have one concern voiced in #28: It looks like the query
$result = db_query_range("SELECT DISTINCT n.nid FROM {node} n LEFT JOIN {search_dataset} d ON d.type = 'node' AND d.sid = n.nid WHERE d.sid IS NULL OR d.reindex <> 0 ORDER BY d.reindex ASC, n.nid ASC", array(), 0, 100);
will keep retrieveing the 100 same nodes over and over... See #28 for the usual approach.Comment #37
douggreen CreditAttribution: douggreen commentedComment #38
yched CreditAttribution: yched commented- "I don't know what's ugly about the message": Yes, my previous comment was not clear. I was talking about the code:
Use $context['message'] for this, instead of hacking the batch structure. Set
$batch['progress_message'] = '';
when defining the batch in search_batch_index_confirm_submit().- About cancel: the tricky thing is that your search_batch_index_cancel() callback will be executed while the server is still executing the previous batch HTTP request in a different thread. I'm not sure what race conditions can occur there, running _search_batch_update_totals() while some nodes are still being indexed. That's precisely why a 'Cancel' feature for batch API is not easy ;-)
- OK fo the query not returning the same 100 nodes each time. Could use a comment IMO.
Comment #39
douggreen CreditAttribution: douggreen commentedComment #41
douggreen CreditAttribution: douggreen commentedRerolled, included #581594: minor db_query() fix to search for DB TNG because you can't test without it, once it that patch gets committed, I'll remove it from this one.
Comment #42
douggreen CreditAttribution: douggreen commentedRerolled, now that #581594: minor db_query() fix to search for DB TNG was committed.
Comment #43
douggreen CreditAttribution: douggreen commentedAnd while I've worked hard on this patch, I think we're getting close to the time to make a decision. We can NOT commit this as-is. We still need to fix what happens when you hit "cancel". If this can't be fixed, then we need to strip out all of the batch work, and just get the Usability improvements into 7.x. Can anyone help with this error message? See attached.
Comment #44
yched CreditAttribution: yched commented@douggreen:
I'm on vacation and only occasionally getting net access, so I won't be able to help much :-(
Fixing the 'error' on cancel will probably require a tweak around progress.js, so that the ajax error callback is disabled when the user clicks cancel.
Comment #46
douggreen CreditAttribution: douggreen commentedre-rolled to current head
Next I'm going to create a patch that just has the UI improvements and doesn't do the batch changes. I think that we really need to get batchapi pause/resume/cancel working before it's added here. And that's beyond the scope of this patch, and will need to wait until Drupal 8 :(
Comment #47
douggreen CreditAttribution: douggreen commentedI'd rather get these UI improvements in 7 than get no improvements in 7. But as I said above, I just don't think that batchapi without a cancel button is sufficient to use here. And I'm now fairly disappointed with how little this patch adds.
The only thing I can think to make this better, is to add a link to "run cron manually", or even make that a button ... but I have two problems with this: (1) I believe that the word "cron" is still confusing to most users and would prefer to get away from it, (2) if we just redirect to admin/reports/status/cron-run, then after the cron finishes, we've moved from admin/config/search/settings to admin/reports/status, and this workflow seems poor to me ... if we do this, we should stay on the search settings page, and that appears to involve some extra work.
Comment #48
douggreen CreditAttribution: douggreen commentedI missed one change in search.install needed after stripping out the batchapi changes...
Reviewers please look at the new language, especially the messages in _search_get_status_metrics() labeled 'zero', 'none', 'partial', and 'all'.
Comment #50
jhodgdonShould this be bumped up to Drupal 8 at this point?
Comment #51
jhodgdonComment #52
Bojhan CreditAttribution: Bojhan commentedThis improvement is still awesome to happen.
Comment #53
jhodgdonObviously this patch from 2009 cannot be used today...
We have a separate issue about the UI of the settings page that is close to done:
#1559244: Clean up search settings page
It has similar aims to what is here, although it doesn't do a graphical progress bar (which is probably not really necessary -- we mostly I think use progress bars for dynamic things, and this is not dynamic).
Also, the confusing button text was fixed a long time ago, and it now says "Re-index site" and explains reasonably well what will happen.
There is also a separate issue that would show the status on the Status Report page (as is done in this patch):
#1288442: Add search index status to the Status Report page
So at this point I think I'll close this issue as a duplicate.