Decide on the implemented backend (probably either a database-based simple implementation, or a Solr implementation) and start coding. A SearchApiServiceInterface implentation is needed at minimum, possibly with additional helping classes.

Comments

drunken monkey’s picture

Version: » 7.x-1.x-dev
Status: Active » Needs work

One note first: After consulting my mentor, I decided to implement a database based service, since it is of more immediate value to most users.

I'm almost done, just the actual searching needs to be implemented – and this is where it got tricky.
With my current database layout, I have for each field that should be fulltext-searchable a table with three rows: item_id (the id of the indexed entity object, e.g. the nid), word (one entry for each indexed word per item) and score (a measure of how often and prominently the word appears on that item). Now, when doing a search on "green apple" (let alone "(green apple) OR (yellow banana)"), how do I translate that into a query? I'm not even mentioning filters, yet…
I got as far as this:

SELECT item_id, SUM(score) AS score
FROM (
  SELECT *
    FROM table1
    WHERE word = "green"
    OR word = "apple"
    GROUP BY item_id
  UNION ALL
  SELECT *
    FROM table2
    WHERE word = "green"
    OR word = "apple"
    GROUP BY item_id
) t
GROUP BY item_id
HAVING COUNT(DISTINCT word) = 2
ORDER BY score DESC

It really works, but I'm worried about a) constructing it dynamically, b) extending it to support OR and NOT and c) its performance.

Also I'm really not sure where it would be best, performance-wise, to insert the JOINs and WHERE statements for the filters. At least a filter on the language will probably be quite common, although we could probably optimize that out in single-language settings.

So any help of someone with the mind to think in such relational terms would be greatly appreciated! Suggestions for a better database layout would also be welcome. The only idea I still have is combining single-valued non-fulltext properties into a single table (currently, each of those has its own table).

jhodgdon’s picture

This problem has been solved in D7's core search module with this class:
http://api.scratch.drupal.org/api/drupal/modules--search--search.extende...

Method parseSearchExpression does most of the work of building the query. Your table structure is not that different from what is in D7, I think, so it will probably be helpful ot look at that.

jhodgdon’s picture

Oh. I just remembered that the D7 search query goes against the search_data (or is it search_dataset?) table, which stores the processed text for each item in a large text field, instead of going against the search words table, which has each word as a separate line. Then it joins to the word table to get the scores. The and/or stuff is a lot simpler if you have a field with all the words together, which is undoubtedly why it was done that way. See
#740900: Performance: node search should use search_index for single word queries

One other thing to mention in D7 is that nested and/or conditionals are very nicely handled by the database API - see http://drupal.org/node/310086

drunken monkey’s picture

Thanks for the tips. I already had a look at the core search code (like for almost anything else in the project), but it didn't help much. Firstly, with more than one field to search on, the situation is inherently more complex (or at least I think so, maybe that's one reason I had to fight so hard to get to a solution). Secondly, I couldn't quite follow the whole code in the extender, especially the thing with the two queries.

In any case: as I predicted in my groups post, I now have something that works, at least for simple cases. It should work for all cases, the way it is implemented, but still has some bugs for more complex queries. But I think I'll soon get those, so now trying a completely different approach would be a last ressort.
Also, although my approach uses a huge amount of subqueries, I'd think it will be more efficient than using LIKE. Especially: Does the Schema API provide a way to define a fulltext index? Or is this a MySQL-specific extension altogether?

drunken monkey’s picture

Status: Needs work » Fixed

Done.

Status: Fixed » Closed (fixed)
Issue tags: -gsoc, -gsoc2010, -gsoc2010-drunken_monkey

Automatically closed -- issue fixed for 2 weeks with no activity.