DBTNG search.module
mradcliffe - March 7, 2009 - 16:35
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | search.module |
| Category: | task |
| Priority: | critical |
| Assigned: | Unassigned |
| Status: | closed |
| Issue tags: | DBTNG Conversion, dc2009 code sprint, Drupal 7 priority, Needs Documentation |
Description
Feel free to change the title and relevant project information if I categorized incorrectly.
This is a placeholder task to avoid collisions of database sprinters and the search sprinters.

#1
#2
mradcliffe says he'll be a while, so this is open again.
#3
Ok, uploading a first version of the patch.
The problematic part is do_search() and everything that is related.
It *seems* to work fine, however, there are a few API changes in do_search() itself, search_parse_query(), _search_parse_query() and hook_rankings() (which is not yet documented, btw)
#4
The last submitted patch failed testing.
#5
I forgot to mention that the tests fail because of #430904: PagerDefault displayes PHP notice when multiple Pagers are used on the same page, not a bug in this patch..
#6
Pager fix has been commited, retesting this...
#7
The last submitted patch failed testing.
#8
Ok, this evolved quite a bit...
The above patch just converted do_search() internally to DBTNG but that doesn't really allow to use what DBTNG provides. I discussed this a bit with DamZ and we agreed that this should probably be a class.
So I went ahead and did the obvious: Converting do_search() and the related functions to a Search DB Extender. However, there were a few problems and limitations, namely:
- The extender only works for the search_index table
- The order of the extenders is now relevant. Because the current implementation of the DefaultPager increases a counter for every call of execute(), it "paged" both the first pass and the second pass query, so while everything worked, theme('pager', NULL) returned nothing. This means that the DefaultPager extender needs to be "outside" of the Search.
The next idea was to extend SelectQuery directly, however, that doesn't work as we don't know which SelectQuery is actually used because each DB Driver can overwrite that class.
Because of this, I made a "special" DB Extender that is created directly (by calling do_search()) and does solve the above two points:
- There is no way to use it on a different table, because the SelectQuery object is created in the constructor
- Because of that, the Search Extender is always the most inner extender
Now, what does the SearchQuery class provide, compared to do_search()
<?php$query->addScore($values['score'], $arguments, $node_rank);
?>
<?phpif ($term = search_query_extract($keys, 'term')) {
$terms = array();
foreach (explode(',', $term) as $c) {
$terms[] = "tn.tid = %d";
$arguments1[] = $c;
}
$conditions1 .= ' AND (' . implode(' OR ', $terms) . ')';
$join1 .= ' INNER JOIN {taxonomy_term_node} tn ON n.vid = tn.vid';
$keys = search_query_insert($keys, 'term');
}
?>
node.module can now simply do:
<?phpif ($query->setOption('term', 'tn.nid')) {
$query->join('taxonomy_term_node', 'tn', 'n.vid = tn.vid');
}
?>
All tests pass on MySQL and PostgreSQL, now that the dwr() bug is fixed.
TODO:
- I don't like how hook_ranking() currently works. Imho, it would be nice if we could pass $query directly to the hooks, so that they could add ranking stuff (especially joins) directly. But I'm not sure how to do that, so that only enabled scores are added.
* According to my tests, search score normalization works for everything except comments but I don't think that's a problem inside SearchQuery.
#9
The last submitted patch failed testing.
#10
more bugs that I forgot to include... The tests should pass with the patch at #467984: DBTNG bugfixes applied.
#11
#467984: DBTNG bugfixes is commited, testing again...
#12
In overall, using the extender facility here IMO is very nice. There are a few thing though that could be refined
=& $and Dries agreed to stick the reference to the variable. Core needs fixing if it does otherwise.do_search()->extend('PagerDefault')I doubt we ever want to run a search unpaged so ->extend('PagerDefault') should be inside do_search. If we want to leave an "out" add an argument defaulting to TRUE, $use _pager. This will give a reason for do_search to exist, too. However, it needs a rename as it no longer does a search.#13
Well, the tests don't need a pager ;)
What about:
<?phpdb_search($use_pager = TRUE, array $options = array())
?>
db_*something* would conform with the other DBTNG functions which return a SomethingQuery object.
Fully agree with the other points, will re-roll asap.
#14
On the contrary we do want the pager to be specified outside of the search extender... The goal is to potentially have several pagers to choose from that use different algorithms. Why hard-code the search extender to just one of them if we don't have to?
#15
Accidently changed the status...
#16
New version...
- changed =& to = &, as this is a de facto standard in core (we should document that somewhere if it isn't already). chx had a good reason for it, =& can be easily confused with &= and that is a completely different thing.
- removed OUTER
- renamed do_search() to db_search(). I tend to agree with Crell regarding the pager. It would also be possible to add more extenders, a TableSort for example so it imho doesn't make sense to handle PagerDefault specially.
- renamed firstPass() to executeFirstPath()
- renamed *query* to *expression*
- updated the example in search.api.php
Questions..
- Still looking for a different approach for hook_rankings()...
- Can someone point me to contrib modules that do implement hook_search() on their own? I'd like to make sure that SearchQuery can be adopted to those too.
#17
The last submitted patch failed testing.
#18
Re-roll, not sure what it did not work.
#19
The last submitted patch failed testing.
#20
D'oh!
Forgot the -N switch in my two latest patches, so SearchQuery was missing...
#21
#22
I ran out of things to complain about.
#23
I'd like Crell to give this a final +1 before committing. But overall, this looks like great work!!
Some minor things as I was glancing through quickly; I'm going to need a good half hour or so to review this patch once it's marked RTBC 'for reals':
+ if (isset($values['join']) && !isset($tables[ $values['join']['alias']])) {Extra space before $values.
+ // Add a count query.+ $inner_query = clone $query;
+ $count_query = db_select($inner_query->fields('i', array('sid')));
+ $count_query->addExpression('COUNT(*)');
Ugh. Really? :\ That syntax is not very intuitive...
+ * This function is normally only called by each module that support thesupports
+ * The used query object has the tag 'search_$type' and can be further extendextended
+ if (!$this->executedFirstPass) {+ $this->firstPass();
+ }
I can tell we are missing test coverage.... ;) It's $this->executeFirstPass();
#24
No, it's not. But neither is the resulting sql query: "SELECT COUNT(*) FROM (SELECT sid FROM....)". DamZ was thinking about making this the default count query: #423888: Use subqueries for ->countQuery(), at least for MySQL.
Good catch. That check isn't needed anymore anyway, as I moved the i.relevance replacement into execute().
I'll update the patch soon. I also wrote Doug Green and he'd like to review the patch too.
#25
Re-roll with the things in #23 fixed.
As I said, search.module does pretty crazy stuff and I don't understand every bit, especially that search expression parsing and I'd like to get Doug Green's review for this.
#26
I barely understand search.module, but I'll try to give this a review tonight at least for the DB coding style bits.
#27
We can leave it for now, the patch is already huge, but what about the following changes to hook_ranking, for a follow-up issue...
- Merge it into SearchQuery
- make it search type agnostic, so that modules can add scores to all search types. the name would change to something like hook_ranking_$type()
- Change how it does handle things like joins and arguments to have better DBTNG compatibility. A possible way would be to move complex things to hook_search_alter() implementations, SearchQuery would just tag the query so that the correct hooks are executed. I'm not sure if that is good DX :)
#28
Another small improvment.
When looking at the implementation of location_search's hook_search, I noticed that options can be more complex than what node.module uses. So I added a new method, extractOption($option) which just extracts (and removes) the value from the search string and simply returns the value.
#29
Great! I'm glad someone stepped up to do this! It will make it easier to try to refactor the code for more extensibility and performance.
#30
The last submitted patch failed testing.
#31
Re-roll.
#32
The last submitted patch failed testing.
#33
Re-roll.
#34
The last submitted patch failed testing.
#35
The last patches missed the statistics.module changes and it wasn't earlier because the search tests weren't run.
#36
- I stared at this patch for 30 minutes and it looks good.
-
This function is normally only called by each module that supports theis no longer valid. It's a class now.- I'd like to brainstorm some more about using the Extenders. I can see why we'd want to use an object, but I'm not sure I 100% buy into using the Extender system. It's not like it cleans up a huge amount of code. Also, it feels like regular database queries and search algorithms are two different categories of problems so it feels slightly awkward to think of search as a query -- it could be a lot more complicated than that, which is already illustrated by the execute() function which executes multiple database queries. It is no longer a single query so it gives me a "forced feeling" -- it is not all that pure, if you will, and not what a user of the API might expect. I feel it goes beyond 'decoration'.
- The documentation of SearchQuery explain the algorithm, but it doesn't show me the potential of being a SelectQuery object. Maybe we could add a paragraph of text to articulate the reason for using an Extender?
#37
I'm neither. And the discussion in #497804-5: Search entitites (nodes, terms, etc.) within the administrative interface shows that we might need a generic search api, that needs to work for other search backends too. However, I have currently no idea on how to do that.
A few points why an Extender might be the right way:
- It still allows access to the actual query object and it can be extended with other Extenders. Given the above discussion, this might not be a good idea, but currently, we need it: http://api.drupal.org/api/function/node_search/7. As I already said, I have no idea on how to make things like the taxonomy options, which require a JOIN, generic enough to work with apache solr and other backends.
- Executing a second query in execute() *does* look strange, however, PagerDefault is doing exactly the same with the count query. And this is very similiar, we check if there are possible matches for our search query and detect the maximal search score.
#38
It is limiting to think of search as a query you can extend or decorate. It's still somewhat different with PagerQuery because for the pager, you're altering/extending a given query -- because the input is a query, and because it doesn't really matter what query you give as input, it correctly feels like decorating to me.
What we're doing in this patch, feels like reducing a search algorithm to a single specific query with some helper queries. Calling that a decoration seems odd.
As you mention, not all search algorithms use (sort of) a single query -- in fact, many of the more advanced search solutions don't use a relational database at all. That said, as far as I can see, this patch does not prevent alternative search algorithms or a generic search API..
#39
The last submitted patch failed testing.
#40
Testbot, I don't believe you :)
#41
The last submitted patch failed testing.
#42
Just a re-roll..
Regarding the approach, as I've said, I'm not sure. The extender approach was the only one that I could get to work without big workarounds. If it would be an own class, we would still need to allow access to the actual query object, and for that, we would have to duplicate the functionality of the extenders. And while maybe not in theory, that's practically a decoration. And that's what I've reused of the extender stuff.
The second thing is that it does not present itself as an Extender to the outside, you invoke it by calling db_search() and not db_select('some_table')->extend('Search').
But I'd like to hear some more opinions too, especially from people who are familiar with search.module and/or DBTNG.
#43
I'm still digesting the architecture discussion but the current code is a massive improvement in terms of legibility and order.
One small note, since it jumped out at me:
<?php- $node->body .= module_invoke('comment', 'node', $node, 'update_index');
+ $node->body .= module_invoke('comment', 'node_update_index', $node);
// Fetch terms for snippet.
- $node->body .= module_invoke('taxonomy', 'node', $node, 'update_index');
+ $node->body .= module_invoke('taxonomy', 'node_update_index', $node);
?>
I think you need to separate the body and the bits you're adding with a space (this is a core bug), else we risk concatenated words (last one in the body and the first one in the comment/taxonomy). This doesn't happen most of the time because of HTML, but it's possible.
#44
The last submitted patch failed testing.
#45
Re-roll, also added the suggested change of #43 as I'm touching these lines anyway...
#46
I spent about 30 minutes going through the patch. I don't understand everything about the code, but I read the comments very carefully and looked for code formatting inconsistencies. It looks like a lot, but they are very minor. Hopefully this helps.
hook_search();
+ // Only continue if the first pass query matches.+ if (!$query->executefirstPass()) {
+ return array();
hook_search(); case "search"
+ // Only continue if the first pass query matches.+ if (!$query->executefirstPass()) {
+ return array();
+ }
+ * advanced text conditions (such negative or phrase matches)."...(such as negative..."
+ /**+ * Type of the search.
+ *
+ * This is maps to the value of the type column in search_index.
+ *
+ * @var string
+ */
"Type of search." (remove the) and "This maps" (remove the "is")
+ /**+ * Positive and negative search keys.
+ * @var <type>
+ */
Missing extra " * " between definition and @var
+ /**+ * Indicates if the first path query has been executed.
+ *
+ * @var boolean
+ */
"if the first pass query..." shouldn't "path" be "pass" ?
+ * TRUE if atleast a value for that option has been found, FALSE if not."atleast" is two words: "at least"
+ $this->conditions = db_and();This function isn't defined in the patch and doesn't look like "DBTNG" so I'm putting it in this list of fixes. Not sure if its a problem, just noting it.
+ if ($warning == 'or') {We're using the variable "$warning" to hold the value of "$match[2]" for the sole purpose of determining if there should be a warning. Sounds like it should at the very least be TRUE or FALSE. At the very most maybe turn it into a generic warning function that can collect other warnings in the future
+ form_set_error('keys', format_plural(variable_get('minimum_word_size', 3), 'You must include at least one positive keyword with 1 character or more.', 'You must include at least one positive keyword with @count characters or more.'));This error message seems redundant.
@@ -295,7 +304,7 @@ function search_update_totals() {// Update word IDF (Inverse Document Frequency) counts for new/changed words
foreach (search_dirty() as $word => $dummy) {
// Get total count
I know this isn't part of the patch, but search_update_totals()'s comments don't include periods.
db_merge('search_index')->key(array(- 'word' => $word,
- 'sid' => $sid,
- 'type' => $type,
- ))->fields(array('score' => $score))->expression('score', 'score + :score', array(':score' => $score))
- ->execute();
+ 'word' => $word,
Shouldn't "->key" be on a newline?
Ok, so that took longer than I expected. This is my first really big code review, hopefully it doesn't sound too nit-picky. And on a side note there was a whole bunch of code in there that was extremely WTF. When people above are saying they "don't get it" I totally understand. I was just like -- what the hell is going on here? Maybe in a follow-up patch we could more thoroughly document the more "this is weird, but we need it because" sections.
Josh
#47
Thanks for the thorough review! Lots of nice catches...
A few short notes, I will re-roll asap.
7:
It is DBTNG, and so are db_or() and db_xor() :) http://api.drupal.org/api/function/db_and/7
8+9:
That error handling stuff is really ugly I know. I just "exploded" ($warning instead of $query[3]) a return key with 10 different things in it (see http://api.drupal.org/api/function/search_parse_query/7). The thing is, this patch is already big and I'm not sure how many changes I should add and what should be done in follow-up patches. #258998: do_search refactoring: performance, legibility, move validation to validation/submit handlers of form. tried to improve these things a bit but got stalled, maybe I can merge some parts of that patch into this.
PS: There are very few who actually understand that search expression parsing and query building stuff inside those functions ;)
#48
Fixed all of #46 except 7, as that is not wrong and 8+9 as I need to think about that. I agree that fapi stuff should be removed from the search query class, but I'm not yet sure how.
#49
Bumping to critical. We really need to get this finished up before code freeze IMO so we can drop the DBTNG backwards compatibility layer.
#50
Tagging.
#51
IMO dropping the BC layer is going to happen anyway, it doesn't have to be before code freeze. The documented API won't be changing in the process.
#52
Maybe, but this patch *will* change search.module API, and it's not a minor change :)
#53
The last submitted patch failed testing.
#54
I spent hours figuring out that this patch works just http://drupal.org/node/496500#comment-1961346 broke it. I have included the rollback of that issue here.
#55
The last submitted patch failed testing.
#56
I suspect a bot fluke. I installed HEAD and worked.
#57
The last submitted patch failed testing.
#58
Stupid me! the search.extender.inc was missing. But why did that fail HEAD ? Curious.
#59
Subscribing.
#60
So yeah,this passes but again: I have rolled back #496500: Remove global placeholder counter inside it.
#61
That has been rolled back.
#62
So, to clarify I changed nada in the patch, I have actually diffed my patch to Berdir's and found that I left in a bit of debug and no other change. I am calling this ready -- I worked several hours with the patch to figure out what's wrong and found no glaring holes it's a very fine piece of work.
#63
I stared at this for a good hour tonight, and had a review written up but lost it when I clicked "cancel" rather than "paste" *smacks self*
But it basically amounted to "this patch hurts my wee head" so I will try looking at this again tomorrow a bit more refreshed, unless Dries wants to jump in and commit it first. Honestly, I don't see this getting many more reviews/progress before code freeze with Berdir out of commission.
#64
Vultures.
#65
#66
Let's try review so the bot picks it up.
#67
I spent another hour or so with this patch tonight, and apart from some confusion around the db_search() function in #62, which was since removed after Larry pointed out that extenders should never instantiate themselves, only be appended to existing queries, I could only find a few minor commenty things.
However, as a result, I'm pretty sure this is now going to fail:
- // Do search.- $find = do_search($keys, 'node', 'INNER JOIN {node} n ON n.nid = i.sid ' . $join1, $conditions1 . (empty($where1) ? '' : ' AND ' . $where1), $arguments1, $select2, $join2, $arguments2);
+ $query = db_search()->extend('PagerDefault');
So will wait for testing bot to come back green before proceeding with a commit.
Here are the minor things which either you could fix or I could. The only one I could use your help on is the regex comment. I know that was copy/pasted and so technically isn't your domain, but I highly doubt anyone touches this again for the next 36 months, so let's fix it here. ;)
+++ modules/search/search.extender.inc 2009-08-29 05:09:49 +0000@@ -0,0 +1,443 @@
+ * This maps to the value of the type column in search_index.
{search_index}.type
+++ modules/search/search.extender.inc 2009-08-29 05:09:49 +0000@@ -0,0 +1,443 @@
+ * These words have to match against search_index.word.
{search_index}.word
+++ modules/search/search.extender.inc 2009-08-29 05:09:49 +0000@@ -0,0 +1,443 @@
+ * @var array()
They rest of these are just "array"
+++ modules/search/search.extender.inc 2009-08-29 05:09:49 +0000@@ -0,0 +1,443 @@
+ preg_match_all('/ (-?)("[^"]+"|[^" ]+)/i', ' ' . $this->searchExpression , $keywords, PREG_SET_ORDER);
comment please.
Beer-o-mania starts in 2 days! Don't drink and patch.
#68
#69
Ok, this addresses all of my feedback, though in reading through the issue I'm not quite sure how to respond to the comments Dries made in #36 and #38. The final approach is not really different from what was done back then. I'm obviously reviewing it at a higher-level than Dries, but I will say that I was mostly able to follow along with what this patch was doing, which was amazing to me given it's search.module. ;) But feels safer to move to RTBC and let Dries have one last look before pressing the button.
Curiously, #66 didn't fail. We're probably lacking test coverage in our search system.
AWESOME work, btw. Once this is committed, it officially marks the end of non-DBTNGified code in core, afaik. :)
#70
My comments still hold, but I'm comfortable committing this for the sake of making good progress. We can have a meta/architecture discussion in the Drupal 8 release cycle.
#71
This just broke HEAD. Can't install anymore. Reversing this patch solves the issue.
#72
Indeed, that solves the problem at install. Now it's time to find what is wrong with the patch... and maybe why testbot did not report this (either bad testing or missing test I guess) ;)
#73
The patch is supposed to add a new file named search.extender.inc, and it has to be added with
cvs add. Registry chokes on non-existant files with empty filectime (maybe a separate bug there) Yay for mad detective skills. :)#74
search.extender.inc is now in CVS, so this can be closed.
#75
Automatically closed -- issue fixed for 2 weeks with no activity.
#76
I'm reopening this -- the search API changed quite a bit with this patch (search modules used to be able to call do_search() and they should do something different now), and this is not currently mentioned in the module upgrade guide. It should have been documented before the issue was allowed to be marked "fixed".
#77
I've added a section to the module upgrade guide - please review.
http://drupal.org/update/modules/6/7#search-api
Also, here's a quick patch to a small part of the search doc, to fix reference to now nonexistent hook_search().
#78
Two things could be improved in the example.. (I can't edit the wiki page myself..)
a) To recieve the same result as in D6, you should call limit(10)
b) execute() just returns the result object, to have an array with nid's as in D6, you should call $query->execute()->fetchCol()
#79
Thanks! Please check if the example is better now.
#80
+++ modules/search/search.module 5 Oct 2009 21:39:54 -0000@@ -830,8 +830,9 @@
+ * implement hook_search_excecute() to perform the search.
That seems to contain a c too much :)
This review is powered by Dreditor.
#81
Good catch Berdir, thanks! A reminder of why we have patch reviews. :)
Here's a fixed patch (I hope with no typos this time).
#82
And Berdir, did you have any other comments on the Module Upgrade Guide section http://drupal.org/update/modules/6/7#search-api or do you think it is OK now?
#83
Last patch is good.
#84
Committed to CVS HEAD. Thanks!
#85
Automatically closed -- issue fixed for 2 weeks with no activity.