The search settings page currently gives a textual status of how many items are left to index. This patch keeps the textual status, but adds a progress bar for a better visual.

I made several UI changes in an attempt to make the buttons and text clearer and more accurate.

The Re-index button on the search settings page is confusing because the button doesn't actually perform a re-index. This button clears the index (sort-of), and doesn't do any interactive re-indexing, but rather it clears the flags to that cron will re-index everything. I've tried to cleanup this language.

We are missing a true re-index. This has been available in the reindex.module for a couple years, but only a handful (maybe fewer) people know about it. I've included that code here. This patch adds true search index building using the batchapi. While there's not much too it, it was put into a separate include to minimize it's impact.

@TODO:

  1. The batchapi is still lacking "Cancel" functionality. I've attempted to add a primitive "Cancel" link, but this link causes errors.
  2. The SQL needs to be converted to DB TNG
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

douggreen’s picture

FileSize
13.29 KB

Updated version fixes the #description help text, changes some text, and turns the progress bar from an "active" bar to an "inactive" bar.

douggreen’s picture

FileSize
12.29 KB

The previous updates broke the progress bar ... this fixes it. One side effect of this patch is that theme('progress') now has an optional third argument, which is the class. If this class is set to 'inactive', then the progress bar gif is not shown. I don't really like the way this is implemented, because you have to know to use the class 'inactive' but maybe as long as it's document, it's fine.

Status: Needs review » Needs work

The last submitted patch failed testing.

douggreen’s picture

Status: Needs work » Needs review
FileSize
13.08 KB

Testing failures were due to theme_progress_bar's new argument not having a default value... corrected in attached patch.

douggreen’s picture

Assigned: Unassigned » douggreen
Dries’s picture

I always thought it was a feature that the re-indexing happens using cron so I'm not 100% convinced that using the batch API is an improvement. Can't we simply improve people's expectations by changing the name of the label, or adding some extra help text. It seems like that could get the job done with a lot less code.

agentrickard’s picture

@Dries

Doug and I were discussing this during DC Paris, and the Batch API option is very desirable for developers and power users, especially during site building and testing. It may be that this needs to live in contrib, but we'd like to examine it for core.

/me goes to review the actual patch now.

douggreen’s picture

FileSize
13.52 KB

@dries, yes we could just cleanup the UI, and then leave the batchapi in a contrib module. The contrib module to do this has existed for a couple years and nobody seems to know about it, so that's another problem. We are adding some code to handle the batchapi stuff, but ALL of that code is in a separate include that is only loaded when it is needed, but it's currently 121 lines.

Attached is an updated patch that adds more granular messages based on if all, none, or some things need to be indexed. It also hides the one or the other button when it doesn't make sense. Lastly, I had some help with the messages, and tried to make them more positive, than negative, in tone.

douggreen’s picture

douggreen’s picture

FileSize
15.03 KB

I've added a hook_requirements for the status page, based on #394102: Make search module usable. I also spent some time with Josh working on the messaging and workflow. We may want to break this up into several patches (usability, batch processing, and hook_requirements), but for now, it's one big patch.

douggreen’s picture

FileSize
85.42 KB
93.67 KB
110.06 KB
243.48 KB
231.39 KB

And some screen snapshots ...

douggreen’s picture

FileSize
117.57 KB
237.66 KB
240 KB

Adding pictures of the status page, Bojhan already tells me to remove the progress bar from the status page.

douggreen’s picture

FileSize
239.46 KB
246.29 KB
205.76 KB
218.21 KB
15.52 KB

Attached patch hopefully has better messaging, as worked out with Bohjan. It also removes the progress bar from the status page. New screenshots also attached.

Bojhan’s picture

I sat beside douggreen at Drupalcon to discuss these text. The confirmation messages sound all right to me, obviously the title is somewhat wrong - but I want to fix that in a later patch for all confirmation messages.

I think we don't want to use status bar messages in the status report, since it draws to much attention. The text as displayed in the last images for the status report seem oke to me, clear and concise.

Status: Needs review » Needs work

The last submitted patch failed testing.

douggreen’s picture

FileSize
15.52 KB

There was another commit in system.css that through off the lines here... New patch attached.

agentrickard’s picture

FileSize
44.79 KB

Hm. After I installed the patch and ran a batch update, here is the screen I get. I wonder why the index count is off?

UPDATE: there are 52 items in {search_dataset} but the progress bar says 25. Looking into the code.

agentrickard’s picture

FileSize
18.1 KB

OK, some of the nodes were set to reindex (or not reset, during the batch process), but I have no idea why. Investigating some more

agentrickard’s picture

And while I have 52 nodes and 52 users, and both search modules are set to index, the search progress indicates I only have 52 items and {search_dataset} only has 52 records, all nodes.

douggreen’s picture

Thanks Ken! I saw the same problem this morning, and forgot about. The problem is that search is noticing internal links on the node to itself, via the "Add new comment" link. I've created a separate issue for this, see #569536: Search links cause many nodes to be indexed twice.

douggreen’s picture

Status: Needs work » Needs review
FileSize
15.1 KB

Attaching a couple minor code cleanups: changed text to remove the word "here" in one place, removed unused code stop the batchapi.

yched’s picture

+++ modules/search/search.batch.inc	6 Sep 2009 00:56:05 -0000
@@ -0,0 +1,126 @@
+    $results = db_query("SELECT n.nid FROM {node} n LEFT JOIN {search_dataset} d ON d.type = 'node' AND d.sid = n.nid WHERE d.sid IS NULL OR d.reindex <> 0 ORDER BY d.reindex ASC, n.nid ASC");
+    $nids = array();
+    $limit = (int) variable_get('search_cron_limit', 100);
+    while ($node = db_fetch_object($results)) {
+      $nids[] = $node->nid;
+      if (count($nids) >= $limit) {
+        $operations[] = array('search_batch_index', array($nids));
+        $nids = array();
+      }
+    }

I don't get that part.
Why don't we just have search_batch_index() as one single multipass batch operation that selects the next N nodes to reindex, treats them and returns ?
That would be preferable to adding a potentially large number of batch operations each with a potentially large array of nids. Heavy $batch structures are slower to process.

I'm on crack. Are you, too?

douggreen’s picture

@yched, you're the batch expert! I was concerned that if we just said, "go do the next 100 nodes," that more nodes would be created and/or changed during this time, and that the progress bar would be messed up. I wrote this code two years ago in Barcelona (with your help) when batch api was new. How do we do "have search_batch_index() as one single multipass batch operation"?

yched’s picture

doug: you can look at node_access_rebuild() and _node_access_rebuild_batch_operation() for a single-multipass-op example:
- count the 'total' number of nodes the first time the op is called
- on each pass, grab the next N nodes using a query with a range and an 'order by'.

About nodes being added/updated during the processing: the most important thing is to be sure that you don't report '#finished = 1' before you are sure that all nodes have been processed. When you have treated a number of node equal to 'total', you'll need to check whether there are new ones to process. Thus, the order used in the query that retreives the next N nodes should ensure that such nodes come last.
This might cause the progress bar to jump back a little as you found out 'total' is not your initial estimate, but that's not really an issue IMO.

Then again, those nodes that appear during the processing would be indexed during the next cron, right? Is it that bad if the batch reindex skips them ? They were not here when we started...

Side note: you might want to ensure no cron reindex happens while the batch reindex is running. Is that handled already ? If not, this sounds like a use case for http://api.drupal.org/api/group/lock/7

douggreen’s picture

FileSize
15.05 KB

I've updated the patch with @yched's batchapi improvements. I still don't like the "cancel" handling. And I wish we could show a progress description using actual node counts instead of "0 of 1".

yched’s picture

Great ! Additional remarks:

- Please use $context['sandbox'] instead of $context['results'] to store processing data between iterations.
$context['sandbox'] persists only during the iterations of a given multipass batch op.
$context['results'] persists through all operations of the batch, and is used to present 'results' (whatever that might mean for the actual processing being batched) at the end of the processing. In this case, this only applies to the 'Finished indexing @total nodes.' message.

- No biggie, but you don't really *need* to use variable_get('search_cron_limit') here. search_cron_limit makes sense for cron, not really for direct 'under your eyes' rebuild. You can chose an arbitrary value like N = 10, 50 or 100. The function indexes N nodes, and will be called repeatedly by batch API until we exceed 1sec processing time, then we report progress and wait for the next HTTP request. Using huge values like 500 (possible value for search_cron_limit) won't really get you done faster, only report progress less frequently to the users. Since search admin proposes values as low as 10 (10, 20, 50, 100, 200, 500), I'd suggest using 10.

- May I suggest _search_index_batch_operation() for the name of the function ? Stays 'consistent' with _node_access_rebuild_batch_operation()

- The code that loads, builds, renders and indexes the node would be best separated in a helper function and reused with 'regular', cron-based reindex. Or there's 99% chances a later change in the one won't be reported in the other.

Note that I'm strictly helping the batch-related code, not reviewing the whole patch or feature.

douggreen’s picture

Thanks @yched! If I changed from $context['results'] to $context['sandbox'], how do I reference my variables in the finished function?

yched’s picture

Forgot to add : are you sure the query that grabs the next N nodes in _search_batch_index() won't retrieve always the same nodes ?
Typically, we store the nid of the last node treated in the sandbox, and use a WHERE node.nid > $context['sandbox']['last_nid'] SORT BY node.nid in the request that grabs the next N.

If I changed from $context['results'] to $context['sandbox'], how do I reference my variables in the finished function?

If you want to display the number of indexed nodes at the end, you'd store that one in $context['results']['indexed_nodes'].

Dries’s picture

In the screenshots, a progress bar is shown on the confirmation page. That's confusing me. Is it running? I recommend that we get rid of it, or that we find something better because it is a very unusual and confusing UI pattern to show a non-working progress bar on a confirmation screen.

douggreen’s picture

@dries, We removed the progress bar on the configuration change some time ago based on Bojhan's recommendation. In comment 13 I uploaded new screenshots, but didn't upload one for the config page.

douggreen’s picture

FileSize
10.81 KB

Updated version includes:

  • separates $context['results'] from $context['sandbox']
  • changes the 'progress_message' within the operation, because I thought the default batchapi message of "0 or 1" was unhelpful. You'll now get an actual progress count of the items being indexed.
  • removes use of limit, and just uses 100
  • replaces use of node_load() et.al. with _node_index_node() ... which was separated out in 6.x for just this purpose (thanks @yched, I forgot about that)
  • changed name of _search_batch_index() to _search_index_batch_operations() for consistency
  • changed progress bar to show 100% instead of 0% when 0 of 0 items are indexed

I still don't like the cancel operation ... but I don't think anything better can be implemented with the existing batchapi, and I think that a cancel is important.

I don't like the extra code to separate $context['results'] and $context['sandbox']

yched’s picture

Status: Needs review » Needs work

Patch misses search.batch.inc ?

douggreen’s picture

FileSize
14.67 KB

Grrr :( I already cleared my directory to work on another patch. So here's an updated re-creation. In this version I changed one more thing, which is that I dropped the message in the finished function that needed to know the count from $context['sandbox'], so as to simplify the $context['sandbox'] verse $context['results'] code.

In your review of batchapi, I'm concerned about two things that are a little funky:

  • use of Cancel link in progress_message
  • updating progress message with the search index counts instead of the batch counts, in the operations, using the current_set

Also, please review the use of t() and l() on line 82 of search.batch.inc, and comment on whether this is the best practice way of creating this message. Thanks!

douggreen’s picture

Issue tags: +Needs usability review

I'm adding the tag "Needs Usability Review," although Bohjan has already taken one look at this, to see if we can get a "Usability Team Approved" tag.

douggreen’s picture

Status: Needs work » Needs review

And I forgot to change the status back to 'needs review'.

yched’s picture

- 'Completed @current of @total' message:
A bit ugly ;-).
The displayed message is the concatenation of 1) the generic $batch['progress_message'] 2) the specific message set by the last batch op in $context['message'].
I'd suggest you set $batch['progress_message'] = ''; in search_batch_index_confirm_submit(), and set $context['message'] = t('Completed @current of @total.', ...); in _search_batch_index().

- A generic 'cancel' feature for batch API is a little tricky. *If* in the case of nodes indexing you can safely just break out without any cleanup work (dunno for sure, I leave it up to you to confirm that), then your approach sounds like it should work.

- I still have one concern voiced in #28: It looks like the query $result = db_query_range("SELECT DISTINCT n.nid FROM {node} n LEFT JOIN {search_dataset} d ON d.type = 'node' AND d.sid = n.nid WHERE d.sid IS NULL OR d.reindex <> 0 ORDER BY d.reindex ASC, n.nid ASC", array(), 0, 100); will keep retrieveing the 100 same nodes over and over... See #28 for the usual approach.

douggreen’s picture

FileSize
15.34 KB
1.25 MB
  • I don't know what's ugly about the message. Are we looking at the same thing? I've attached a screen snapshot.
  • Cancel does need some some "finishing", Thanks for catching this :) so I changed the "Cancel" to go to a page that does the cleanup. We're still stuck with an ugly interim message.
  • The query is OK, it does NOT retrieve the same nodes because _node_index_node() calls search_index() which calls search_touch_node() to clear the reindex flag. This is the same query and logic we use on search_cron().
yched’s picture

- "I don't know what's ugly about the message": Yes, my previous comment was not clear. I was talking about the code:

+  // Update progress message.
+  require_once DRUPAL_ROOT . '/includes/batch.inc';
+  $current_set = &_batch_current_set();
+  $current_set['progress_message'] = t('Completed @current of @total.', array('@current' => $context['sandbox']['current'], '@total' => $context['sandbox']['total'])) . ' '. l(t('Cancel'), 'admin/config/search/settings/cancel');

Use $context['message'] for this, instead of hacking the batch structure. Set $batch['progress_message'] = ''; when defining the batch in search_batch_index_confirm_submit().

- About cancel: the tricky thing is that your search_batch_index_cancel() callback will be executed while the server is still executing the previous batch HTTP request in a different thread. I'm not sure what race conditions can occur there, running _search_batch_update_totals() while some nodes are still being indexed. That's precisely why a 'Cancel' feature for batch API is not easy ;-)

- OK fo the query not returning the same 100 nodes each time. Could use a comment IMO.

douggreen’s picture

FileSize
16.99 KB
  • Yes, that code is cleaner. I've incorporated it. I've also turned the "cancel" link to a button.
  • Yes, we do need to handle concurrency. I've added locks around the indexes and the cancel.
  • I added a comment to the SELECT.

Status: Needs review » Needs work

The last submitted patch failed testing.

douggreen’s picture

Status: Needs work » Needs review
FileSize
18.17 KB

Rerolled, included #581594: minor db_query() fix to search for DB TNG because you can't test without it, once it that patch gets committed, I'll remove it from this one.

douggreen’s picture

FileSize
16.99 KB

Rerolled, now that #581594: minor db_query() fix to search for DB TNG was committed.

douggreen’s picture

FileSize
84.15 KB

And while I've worked hard on this patch, I think we're getting close to the time to make a decision. We can NOT commit this as-is. We still need to fix what happens when you hit "cancel". If this can't be fixed, then we need to strip out all of the batch work, and just get the Usability improvements into 7.x. Can anyone help with this error message? See attached.

yched’s picture

@douggreen:
I'm on vacation and only occasionally getting net access, so I won't be able to help much :-(
Fixing the 'error' on cancel will probably require a tweak around progress.js, so that the ajax error callback is disabled when the user clicks cancel.

Status: Needs review » Needs work

The last submitted patch failed testing.

douggreen’s picture

Status: Needs work » Needs review
FileSize
17.13 KB

re-rolled to current head

Next I'm going to create a patch that just has the UI improvements and doesn't do the batch changes. I think that we really need to get batchapi pause/resume/cancel working before it's added here. And that's beyond the scope of this patch, and will need to wait until Drupal 8 :(

douggreen’s picture

Title: UI cleanup of status and re-index button, adds real "Build" button » UI cleanup of status and re-index button
FileSize
10.12 KB

I'd rather get these UI improvements in 7 than get no improvements in 7. But as I said above, I just don't think that batchapi without a cancel button is sufficient to use here. And I'm now fairly disappointed with how little this patch adds.

The only thing I can think to make this better, is to add a link to "run cron manually", or even make that a button ... but I have two problems with this: (1) I believe that the word "cron" is still confusing to most users and would prefer to get away from it, (2) if we just redirect to admin/reports/status/cron-run, then after the cron finishes, we've moved from admin/config/search/settings to admin/reports/status, and this workflow seems poor to me ... if we do this, we should stay on the search settings page, and that appears to involve some extra work.

douggreen’s picture

FileSize
10.11 KB

I missed one change in search.install needed after stripping out the batchapi changes...

Reviewers please look at the new language, especially the messages in _search_get_status_metrics() labeled 'zero', 'none', 'partial', and 'all'.

Status: Needs review » Needs work

The last submitted patch failed testing.

jhodgdon’s picture

Should this be bumped up to Drupal 8 at this point?

jhodgdon’s picture

Version: 7.x-dev » 8.x-dev
Bojhan’s picture

Assigned: douggreen » Unassigned
Issue tags: -Needs usability review +Usability

This improvement is still awesome to happen.

jhodgdon’s picture

Status: Needs work » Closed (duplicate)

Obviously this patch from 2009 cannot be used today...

We have a separate issue about the UI of the settings page that is close to done:
#1559244: Clean up search settings page
It has similar aims to what is here, although it doesn't do a graphical progress bar (which is probably not really necessary -- we mostly I think use progress bars for dynamic things, and this is not dynamic).

Also, the confusing button text was fixed a long time ago, and it now says "Re-index site" and explains reasonably well what will happen.

There is also a separate issue that would show the status on the Status Report page (as is done in this patch):
#1288442: Add search index status to the Status Report page

So at this point I think I'll close this issue as a duplicate.