do_search refactoring: performance, legibility, move validation to validation/submit handlers of form. [#258998]

Comment	File	Size	Author
#33	do_search.patch	10.8 KB	robertdouglass
#32	do_search_no_test.patch	10.4 KB	robertdouglass
#31	do_search_tests_fail.patch	13.3 KB	robertdouglass
#26	do_search.patch	12.77 KB	robertdouglass
#25	do_search.patch	12.75 KB	robertdouglass
#24	do_search.patch	13.74 KB	robertdouglass
#23	258998.patch	12.61 KB	douggreen
#18	search.patch	12.4 KB	robertdouglass
#15	search.patch	10.27 KB	robertdouglass
#14	search.patch	10.24 KB	robertdouglass
#8	258998.patch	10.41 KB	douggreen
#5	do_search_refactor.patch	7.56 KB	robertdouglass
#3	do_search_refactor.patch	7.12 KB	robertdouglass
	do_search_refactor.patch	5.47 KB	robertdouglass

Comment #2

robertdouglass commented 15 May 2008 at 21:21

Status:

Needs review

» Needs work

Oops, looks like I lost part of the patch. Need to try and recover and merge.

Log in or register to post comments

Comment #3

robertdouglass commented 16 May 2008 at 09:58

Status	File	Size
new	do_search_refactor.patch	7.12 KB

This patch includes the changes to the search box form and the search form submit handler which are important for keeping do_search out of the input validation business.

Log in or register to post comments

Comment #4

robertdouglass commented 16 May 2008 at 09:58

Status:

Needs work

» Needs review

Log in or register to post comments

Comment #5

robertdouglass commented 16 May 2008 at 10:49

Status	File	Size
new	do_search_refactor.patch	7.56 KB

Still managed to miss a bit. This version has it all.

Log in or register to post comments

Comment #6

douggreen commented 16 May 2008 at 14:56

Tracking...

I started to do some performance testing on this with the 100k node database, but I had problems with devel module not displaying the query results. I debugged this to the new registry, and devel_boot not getting registered or called.

Log in or register to post comments

Comment #7

douggreen commented 17 May 2008 at 02:06

I get the following warning:

notice: Undefined variable: keyword_arguments1 in ../drupal-7/modules/search/search.module on line 925

When I changed this from "keyword_arguments1" to "keyword_arguments" no results were returned.

I'm also getting warnings when submitting from the search block instead of the search form.

# notice: Undefined index: basic in ../drupal-7/modules/search/search.pages.inc on line 114.
# notice: Undefined index: keys in .../drupal-7/modules/search/search.pages.inc on line 114.
# warning: array_shift() [function.array-shift]: The argument should be an array in .../drupal-7/includes/form.inc on line 1308.

(btw, There's definitely something wrong with the new registry, but that's another issue. I was able to finagle it to register the devel functions to run these tests.)

Performance Tests

I tested search for "commoveo" on the 100k node database with mysql query cache enabled. All of the search rankings were set to 0, which is the same as keyword relevancy set to 5. I suspect that the performance gains would be greater had I used more search rankings.

The pre-patch count query is below. Because I have the mysql query cache enabled, this query is instantaneous when the page is refreshed, and on subsequent pages of the pager query.

pager_query 1243.67 SELECT COUNT(*) FROM (SELECT i.type, i.sid, (4840.64572508 * SUM(i.score * t.count)) AS score FROM search_index i INNER JOIN search_total t ON i.word = t.word INNER JOIN node n ON n.nid = i.sid WHERE n.status = 1 AND (i.word = 'commoveo') AND i.type = 'node' GROUP BY i.type, i.sid HAVING COUNT(*) >= 1) n1

I then RESET QUERY CACHE and applied the patch. The post-patch count queries are:

do_search	874.84	`SELECT COUNT() FROM (SELECT 1 FROM search_index i INNER JOIN node n ON n.nid = i.sid WHERE n.status = 1 AND (i.word = 'commoveo') AND i.type = 'node' GROUP BY i.type, i.sid HAVING COUNT() >= 1) n1`
pager_query	0.53	`SELECT 63733`

I ran each test (pre-patch and post-patch) three times, resetting the query cache each time, with similar results. So, I can confirm Robert's results that this patch results in a faster count query!

Results Testing

I confirmed that the page counts are the same pre-patch and post-patch.

Log in or register to post comments

Comment #8

douggreen commented 17 May 2008 at 03:01

Status:

Needs review

» Needs work

Status	File	Size
new	258998.patch	10.41 KB

I fixed the first error, relating to "keyword_arguments1". I continued the naming of the search_parse_query variables that Robert started, pushing the renaming into that function, and making it return a keyed array. I also modified the creation of the arguments arrays, using array_merge instead of looping through them. I ran simpletest, and the new patched code (as well as the old one) passes all tests.

Log in or register to post comments

Comment #9

Susurrus commented 6 July 2008 at 01:56

subscribe

Log in or register to post comments

Comment #10

moshe weitzman commented 9 July 2008 at 11:42

Great to see this moving along. serscribe

Log in or register to post comments

Comment #11

robertdouglass commented 9 July 2008 at 12:07

Status:

Needs work

» Needs review

I've tested this again and it still applies, still improves the queries and legibility of the search code. Please review #8.

Log in or register to post comments

Comment #12

robertdouglass commented 9 July 2008 at 12:10

I'm also getting warnings when submitting from the search block instead of the search form.

# notice: Undefined index: basic in ../drupal-7/modules/search/search.pages.inc on line 114.
# notice: Undefined index: keys in .../drupal-7/modules/search/search.pages.inc on line 114.
# warning: array_shift() [function.array-shift]: The argument should be an array in .../drupal-7/includes/form.inc on line 1308.

These warnings are dealt with here: http://drupal.org/node/280319

They're unrelated to this patch.

Log in or register to post comments

Comment #13

dries commented 10 July 2008 at 11:03

Status:

Needs review

» Needs work

This patch does not apply.

My only comment is that in the output, we use the term 'positive keyword'. Why don't we simply write 'keyword'? :)

Log in or register to post comments

Comment #14

robertdouglass commented 10 July 2008 at 11:26

Status:

Needs work

» Needs review

Status	File	Size
new	search.patch	10.24 KB

Rerolled to reflect http://drupal.org/cvs?commit=126469

Changed "positive keyword" to "keyword".

Ran tests, did some taste testing manually. Looks ok.

Log in or register to post comments

Comment #15

robertdouglass commented 10 July 2008 at 11:32

Status	File	Size
new	search.patch	10.27 KB

The word "positive" stayed in with that last one.

Log in or register to post comments

Comment #16

catch

he/him

English

commented 10 July 2008 at 11:39

edit: wrong tab.

Log in or register to post comments

Comment #17

Susurrus commented 10 July 2008 at 18:51

Status:

Needs review

» Needs work

The advanced search doesn't seem to work for me. I created a single article title "Blah" with no content. Ran cron. Searched for blah using the basic search and it found the article. Ran it again restricting the search to node types of article and it didn't find it.

This is the query it created "SELECT COUNT(*) FROM (SELECT 1 FROM search_index i INNER JOIN node n ON n.nid = i.sid WHERE n.status = 1 AND (n.type = 'blah' OR n.type = 'node') AND (i.word = '1') AND i.type = '' GROUP BY i.type, i.sid HAVING COUNT(*) >= 0) n1".

Also, has someone tested this on postgre?

Log in or register to post comments

Comment #18

robertdouglass commented 11 July 2008 at 20:46

Status	File	Size
new	search.patch	12.4 KB

I've written tests that check whether the advanced search works. In particular I test the type option. It would be good to add tests to test vocabulary options, language options, etc., but at least I solved the problem of how to test the advanced search form (was a non-trivial effort). I also adjusted do_search so that it is now passing all of the tests. I don't have full confidence in its correctness, though, and would love for others to do some more testing, and perhaps follow the model I set up in the test to test other parts of the advanced search form.

Log in or register to post comments

Comment #19

Susurrus commented 11 July 2008 at 22:36

Status:

Needs work

» Needs review

Needs review now?

Log in or register to post comments

Comment #20

robertdouglass commented 12 July 2008 at 00:33

Yes, please review what is there. Eventually someone needs to add the other tests, but your reviews are very helpful. Thank you.

Log in or register to post comments

Comment #21

Susurrus commented 12 July 2008 at 02:24

Status:

Needs review

» Needs work

Just looking over the code, the last assertion is wrong. Looks like a copy paste error on the description of the last assertText.

I think it'd actually be great to have unit tests for search_parse_query and do_search as well as having a webTest like the tests in this patch currently.

Log in or register to post comments

Comment #22

damien tournoud commented 12 August 2008 at 17:30

Could we also drop the default "WHERE 1 AND" that get inserted when do_search is called without a $where1 parameter, to respond to Dries' concern in #293504: Argument of AND must be type boolean, not type integer?

Log in or register to post comments

Comment #23

douggreen commented 15 August 2008 at 17:55

Status:

Needs work

» Needs review

Status	File	Size
new	258998.patch	12.61 KB

The attached patch fixes the language of the final article search test.

It also removed the additional WHERE 1 problem identified in [293504], although I kinda wish that this was offered on that issue rather than this one, because it is kinda a separate issue.

Log in or register to post comments

Comment #24

robertdouglass commented 1 September 2008 at 08:31

Status:

Needs review

» Needs work

Status	File	Size
new	do_search.patch	13.74 KB

Re-roll. Tests are broken.

Log in or register to post comments

Comment #25

robertdouglass commented 1 September 2008 at 11:02

Status	File	Size
new	do_search.patch	12.75 KB

Some tests still failing.

Log in or register to post comments

Comment #26

robertdouglass commented 1 September 2008 at 15:42

Status	File	Size
new	do_search.patch	12.77 KB

Tests pass.

Log in or register to post comments

Comment #27

robertdouglass commented 1 September 2008 at 15:52

And now for the cool part. It's not in the patch yet. The count query (and I think the search query) can be simplified (for the search query "dolore -sit") in the cases of phrase queries from this:

SELECT COUNT(*) FROM (
  SELECT count(*) FROM 
    search_dataset d 
    INNER JOIN node n ON n.nid = d.sid 
    WHERE n.status = 1 AND 
    d.type = 'node' AND 
    (d.data LIKE '% dolore %' AND 
    d.data NOT LIKE '% sit %') 
    GROUP BY d.type, d.sid 
    HAVING COUNT(*) >= 1
  ) n1;

to this:

SELECT count(*) 
  FROM search_dataset d 
  INNER JOIN node n ON n.nid = d.sid 
  WHERE n.status = 1 AND 
    d.type = 'node' AND 
    (d.data LIKE '% dolore %' AND 
     d.data NOT LIKE '% sit %');

So leaving marked as "needs work" because there is still so much that can be done here.

Log in or register to post comments

Comment #28

robertdouglass commented 1 September 2008 at 15:55

Furthermore, in some cases, the HAVING clause is not needed to get the right result. This needs more investigation:

mysql> SELECT COUNT(*) FROM (SELECT 1 FROM search_index i INNER JOIN node n ON n.nid = i.sid WHERE n.status = 1 AND (i.word = 'dolore') AND i.type = 'node' GROUP BY i.type, i.sid HAVING COUNT(*) >= 1) n1;
+----------+
| COUNT(*) |
+----------+
|        3 | 
+----------+
1 row in set (0.00 sec)

mysql> SELECT COUNT(*) FROM (SELECT 1 FROM search_index i INNER JOIN node n ON n.nid = i.sid WHERE n.status = 1 AND (i.word = 'dolore') AND i.type = 'node' GROUP BY i.type, i.sid) n1;
+----------+
| COUNT(*) |
+----------+
|        3 | 
+----------+
1 row in set (0.00 sec)

Log in or register to post comments

Comment #29

robertdouglass commented 1 September 2008 at 18:19

I'm testing right now to find out if the query variant without the search_index table is a good idea. It probably causes full table scans which is what Steven wanted to avoid, thus the original structure of the query. Will report.

Log in or register to post comments

Comment #30

robertdouglass commented 2 September 2008 at 09:31

Benchmarks answered the question in #29: the shorter version of the query is faster, and markedly so. I tested three different queries of differing complexity, and just used PHP to execute the database query a number of times. In all cases the shorter query was faster and produced identical results. Here is an example of two of the queries I used, and the results.

SELECT COUNT(*) FROM (
  SELECT 1 FROM search_index i 
    INNER JOIN node n ON n.nid = i.sid 
    INNER JOIN search_dataset d ON i.sid = d.sid AND i.type = d.type 
  WHERE n.status = 1 AND 
    (i.word = 'genitus' OR i.word = 'ratis' OR i.word = 'abigo' OR i.word = 'natu' OR i.word = 'ideo') AND 
    i.type = 'node' AND 
    (d.data LIKE '% os genitus ratis abigo natu ideo %') 
  GROUP BY i.type, i.sid 
  HAVING COUNT(*) >= 5) n1;    

SELECT COUNT(*) FROM search_dataset d 
  INNER JOIN node n ON n.nid = d.sid  
  WHERE n.status = 1 AND 
    d.type = 'node' AND 
    (d.data LIKE '% os genitus ratis abigo natu ideo %');

Test #	Big query	Small query
1	4037.23	2095.99
2	56724.5	2894.56
3	7473.2	1090.48

Log in or register to post comments

Comment #31

robertdouglass commented 2 September 2008 at 09:36

Status	File	Size
new	do_search_tests_fail.patch	13.3 KB

Here's the patch that uses the new query structure for count queries. I haven't gotten to using it for search queries (and haven't seen whether it is possible). However, a certain type of tests fail.

The tests that fail seem to fail for reasons that are only documented in the code comments of the unit tests:

  /**
   * Run predefine queries looking for indexed terms.
   */
  function _testQueries() {
    /*
      Note: OR queries that include short words in OR groups are only accepted
      if the ORed terms are ANDed with at least one long word in the rest of the query.

      e.g. enim dolore OR ut = enim (dolore OR ut) = (enim dolor) OR (enim ut) -> good
      e.g. dolore OR ut = (dolore) OR (ut) -> bad

      This is a design limitation to avoid full table scans.
    */

Even though the tests fail, searching for the same terms in a normal website context works. So I don't know what's going on and have to investigate more.

Log in or register to post comments

Comment #32

robertdouglass commented 14 September 2008 at 10:03

Status	File	Size
new	do_search_no_test.patch	10.4 KB

I've moved the test to a separate issue in the hopes that making a smaller patch will help people review. The test is now here: http://drupal.org/node/308233

Log in or register to post comments

Comment #33

robertdouglass commented 14 September 2008 at 14:00

Status	File	Size
new	do_search.patch	10.8 KB

Tests pass in this. But don't rejoice! It's only because of an ugly hack that proves the concept and also proves that the signature of do_search has to be refactored. Basically it comes down to this: passing in snippets of SQL to inject into queries is too fragile. Since the technique of this patch is to remove the search_index table from the query in some cases, the node module can no longer assume that the search_index i part of the query will be there. So we need to move to a more meta-data driven query builder (like the views query builder!!!)

Log in or register to post comments

Comment #34

moshe weitzman commented 14 September 2008 at 17:02

Er, you mean "like the core query builder", now. the code in this patch begs to use db_select. From there, just add your conditions and joins and so on.

Log in or register to post comments

Comment #35

lilou commented 17 May 2009 at 20:58

Issue tags:

+Performance

Add tag.

Log in or register to post comments

Comment #36

berdir

German

Switzerland

commented 4 June 2009 at 20:32

#394182: DBTNG search.module is converting do_search() to a (sort of) Query Extenders, please review! ;)

I did not actually *change* the resulting sql, if there is anything that can be improved in my patch, please comment on it..

Log in or register to post comments

Comment #37

jhodgdon

she/her

English

commented 18 February 2010 at 22:49

Status:

Needs work

» Postponed

I'm assuming this particular refactoring of the search module is dead now, since the DBTNG code totally redid it. The patches won't do anything... I'm setting to postponed for now -- are there things that you feel still should be fixed in the current D7 code, or for D8?

Log in or register to post comments

Comment #38

jhodgdon

she/her

English

commented 19 February 2010 at 20:30

Status:

Postponed

» Postponed (maintainer needs more info)

Log in or register to post comments

Comment #39

jhodgdon

she/her

English

commented 5 March 2010 at 17:07

Status:

Postponed (maintainer needs more info)

» Closed (won't fix)

Feel free to reopen this issue or file a new one if there are still performance things that you think need fixing with the current search architecture.

Log in or register to post comments

do_search refactoring: performance, legibility, move validation to validation/submit handlers of form.

Comments