There have been several posts lately surrounding the quality of search results with Drupal's built-in search engine, and how it can be difficult to find targeted results on Drupal.org as a result.

The great news is that our very own Steven has been developing a patch to drastically improve advanced searches by adding a number of features, including filtering results by node type (page, blog, etc.), specifically excluding certain words/phrases, and so on. All it needs now are some people to test it and give feedback.

And the really great news is that YOU can help! Even if you know nothing of foreign terms like "patches" or "CVS," as long as you can install Drupal, YOU can help to test this addition and get it added to Drupal (and thus Drupal.org) faster!

Simply go here to download a copy of Drupal HEAD (the development version of Drupal), with the new search features already added: http://acko.net/dumpx/searchpatched.zip

Test it out, try and break it, see if it works the way you expect. Then, check out the Tips for reviewing patches document and leave feedback about your experience and further ideas!

Comments

travischristopher’s picture

this is a major major improvement!!!

thanks so much

kzeng’s picture

I installed it in my test site and got a problem when I tried to do a search in IE. The search results seemed to be "unthemed". So the page looks messy. But when I did a same search in Firefox, everythink worked very well. I am not sure what caused this problem.

You may test it at http://test.kzeng.info/search and see whether you get the same problem as I did.

Anyway, it's really a great improvement. Thanks!

--------------------------
http://www.kzeng.info

killes@www.drop.org’s picture

the theming problem is due to soem debug code that gets printed out.
--
Drupal services
My Drupal services

tamarian’s picture

One important feature, IMHO, is to offer a selection box or radio buttons to allow searching in the subject/title field only. This would be quite handy in narrowing down the search, and the results.

sami_k’s picture

as well as faster...
--
Please read the handbook, search the forums, then ask...
http://drupal.etopian.net (Drupal Support)

boris mann’s picture

Please do file feature requests directly against the search module.

webchick’s picture

Here's how you should set the options:

Project: Drupal
Version: cvs (there are like 3 of these, just pick one of them :))
Component: search.module
Category: feature request
Priority: usually normal (or minor, depending)
Title and Description: (insert stuff here)

Steven’s picture

Note that this patch requires database updates, so you need to run update.php after unzipping it when using an existing database. Though it is likely that this patch will be part of 4.7, I cannot guarantee the database format or updates will remain the same. So, backup your database, and restore it after testing the patch.

--
If you have a problem, please search before posting a question.

pfmj2005’s picture

I was wondering if there is a way to implement a "This Site" or "Google" search feature... I know you can and I would love to have that on my site if I could do it myself, but I suck at coding! I am suggesting this as an option that can be turned on and off... I love the improvements done to the search module though!

------------------------------------
Paul Malenke
paul.malenke@gmail.com
AfterDeathGraphics.com

BobT-1’s picture

pfmj2005’s picture

I dont need that... thats easy... Thats just a form... I dont think what I said was understood.

-----------------------------------
Paul Malenke
paul.malenke@gmail.com
AfterDeathGraphics.com

kbahey’s picture

I think what you mean is a module that when added would display a Google search box on your site.

Ideally, this would be tied into Google's Adsense for Search as well, so for big sites, it is a revenue source.

https://www.google.com/adsense/ws-overview

Is that what you meant?

--
Drupal development and customization: 2bits.com
Personal: Baheyeldin.com

--
Drupal performance tuning and optimization, hosting, development, and consulting: 2bits.com, Inc. and Twitter at: @2bits
Personal blog: Ba

JohnG-1’s picture

i think he means a single search box with a selector to point the search at a/ this site, b/ google.com c/ yahoo directory ... like the firefox search box?

... this would be integrated into search module.

pfmj2005’s picture

I am actually thinking both! :)

--------------------------------------
Paul Malenke
paul.malenke@gmail.com
AfterDeathGraphics.com

JohnG-1’s picture

Now I'm confused.
would you care to elaborate ?

joel_guesclin’s picture

If all you want is a Google box on your site, that does Google search in your site by displaying a standard Google page, then there's no need for a module - just go to Google, generate the HTML, and stick it in a block.

But here's an example of REAL Google search - which I assume is built on web services to the Google APIs: http://drupal.org/node/10134#comment-59097

In other words, you can display the search results on your site, in your format, and potentially adding other information (eg categories), or including sorts (like, by date), etc....

This looks very promising and I would love to try it out when available

jozef’s picture

I have been experimenting with MySQL 4.1 and found some problems with search module and database character set. Please look at my comment
http://drupal.org/node/15746#comment-47735
I have found similar issue
http://drupal.org/node/12958#comment-27889

tangibre’s picture

What I really miss is a way of sorting results, say by date. It is really annoying to have to look through a lot of pages of search results because the more recent and relevant results are among older nodes.

noid’s picture

Good point. :)

alexandreracine’s picture

+1

:)

theoa’s picture

mod +1

or find in last 30/60 days/six months/one year

joel_guesclin’s picture

Having found that my site was losing all its disk space to search indexes, I am in the process of installing the trip_search module, which has just about all the features mentioned here and more (including a cool interaction with categories). And it can work with either no indexes at all or with fulltext indexes in MySQL (apparently) - I use no indexes and it is already far more reliable and useful than the existing search module.

My instinct would be to scrap any idea of changing the existing search module and pile all Steven's development effort into the trip_search modile, with the aim of bringing that into core, rather than waste effort by reinventing the wheel.

Oh, and by the way... my own contribution will be the translation of the module....

For what it's worth....

JG

boris mann’s picture

But it really pounds your DB for large sites. Also, this is off topic for this forum thread, which is about helping test an improvement to search.

Please create a new forum thread if you would like to gather support for getting trip_search in core...which is not a bad idea, as the majority of Drupal sites are generally smaller.

noid’s picture

What does it mean when trip_search pounds the DB? Am trying out both the default search module and the trip_search module in 4.6, and am interested to know which is more efficient. Thanks.

joel_guesclin’s picture

I disagree that this is off-topic. What we are after is an improvement to the search provided in Drupal, am I correct? Well, I would make the following points:

1) I see no point in having two search facilities in core - therefore my suggestion would not be to keep the existing search and also include trip_search, but to replace the existing search with trip_search

2) The extra functionality suggested in this improvement to search is mostly in trip_search already; why bother to reinvent the wheel? Where is the advantage of having Steven and nedjo working separately on two different projects when they could be combining their forces to provide an even cooler solution even faster?

3) As far as hammering the database goes, I would like to have some detail on that. At the moment I am using trip_search on a DB with 2000 nodes and climbing, with no noticeable loss of performance compared to core search, PLUS I do not have core search indexes eating up all the disk space and forcing me to shell out more dollars to my hosting service.
Anyway, this point would bear more discussion. No matter which way you look at it, a search means building indexes in a DB - so any kind of search will hammer the database. In trip-search you have three methods of searching proposed:
a) Basic (I use this for the moment) - presumably this has a performance hit because it must do sequential searches through the DB without the benefit of indexing.
b) Regular expression - I don't know how this works
c) SQL Fulltext - this takes advantage of the Fulltext indexing provided native in MySQL (ie it also builds an index, but a much smaller one than core search). This refused to work for me so far, but since Basic filled my needs, I haven't bothered to find out why yet.
So before we talk about "hammering the database", I would like to see a comparison of the different methods.

It may well be the case that the core search indexing is more efficient (in time, not in disk space), and that it is necessary to provide an indexing method which is database-independent. I would have no objection to adding core-search indexing to the trip_search methods. BUT, I still see no advantage in separate development.

I really would like to see a proper search in place in Drupal. It is at present, IMHO, about the only really weak point in what is otherwise a startlingly good piece of software.

Steven’s picture

The issue of trip_search vs core search is not an easy one. There are various arguments either way.

I'm the author of the core search (search.module), but I have looked to trip_search for ideas and inspiration. Vice-versa, trip_search includes code from core search. It might not seem obvious from the outside, but there is cross pollination going on between the two.

It is true that trip_search doesn't have the index building and space requirements. But, unless you use MySQL fulltext indices, it has to perform a full table scan each time, which means it gets linearly slower as your site grows. This makes trip_search okay for small to medium sites, but a very bad choice for large sites.

A lot of core search's benefits are subtle and in the background, so people tend to give the module less credit than it deserves.

Did you know that core search...

  • ...'s query algorithm is implemented entirely in SQL? Trip_search on the other hand always has to fetch all the search results (rather than just the page of results you want to see) and sort them in PHP, which leads to large PHP memory requirements and lots of processing time if there are lots of results.
  • ...indexes a node with its comments and other info (e.g. attachment names) as one result? This means that you search for entire pages in context rather than individual parts of it, like a single comment or the node body.
  • ...understands markup? It uses certain tags (like a header, bold, underlined, ...) to recognize the important parts of a text. It can also recognize links to other Drupal nodes and uses that to give commonly linked nodes a better score. And because it makes use of the filter system, this works for any markup language: HTML, BBcode, Wikimarkup, Textile, etc.
  • ...automatically discovers which words are important for your particular site? This negates the need for noise-word lists or keyword tweaking.
  • ...understands Unicode? This means that it can recognize any character in any language.
  • ...can be tied to an external preprocessor so it can understand related words and synonyms? This also means that search.module can be used to search difficult languages like Chinese or Japanese in a meaningful way.
  • ...is extensible for other modules? This means it is easy to make a module which e.g. indexes the contents of attachments as well.

These are all things that trip_search does not have, and for many professional applications, these are considered essential features of a 'proper search'. And I'd like to point out that it does all of this, while still maintaining the high standards that are expected of a Drupal core module. Many of these features are only possible because of the index, which means that merging them into trip_search's other mechanisms is impossible or impractical.

If I didn't think that search.module had a lot of going for it, I would not spend my time on it. And if I did not think trip_search was good, I would not be making patches that incorporate some of its features in core search. ;)

--
If you have a problem, please search before posting a question.

tvst’s picture

For usability's sake, maybe the trip_search should be renamed to basic_search and made default.

The regular search module should be renamed to something that implies it being better for larger sites.

The main reason why i think trip_search should be the default search module is because many noobs post a couple of nodes and go try the search feature, but no results come up due to cron.php not having run yet. Noobs don't like this. They want their search result to reflect the current state of their site.

People who are managing larger sites usually know much more than the small-site noobs. They can go and choose to use the indexed search module for their sites, and they can set up a cron job for themselves with ease.

Anyways, that my 2 cents.

BTW, i realize some may see this post as slightly off-topic, but under the assumption that this thread is about "improving drupal's built-in search", i think it fits right in.

sepeck’s picture

The regular search module works fine for smaller sites and if it works fine for large sites, then it should end up working better for small. One core with options with contrib search options for those wanting a different experiance.

The new user syndrome is cured through step 6 of the INSTALL.txt where it is mentioned prominantly. The biggest issue we have with new users are those using automated installs via Fantastico or Debian, etc. We do not have that many forum questions regarding it as that answer is easily findable and simply pointing them to it when they ask is quick.

-sp
---------
Test site, always start with a test site.
Drupal Best Practices Guide -|- Black Mountain

-Steven Peck
---------
Test site, always start with a test site.
Drupal Best Practices Guide

joel_guesclin’s picture

First of all I would like to thank Steven for taking the time out to explain all that - it makes a big difference to my understanding. I'm especially relieved to see that there is cross-pollination going on between core search and trip-search.

It is also clear that having a really solid, fast, underlying engine is the necesary foundation on which to build a truly flexible solution. And for all those in the non-English world (my case) proper Unicode support is also vital.

That said, here are a few suggestions if I may:

1) I think that the changes to user options (which already exist in trip_search) are absolutely critical: I decided to switch off the current core search because a) it was eating disk space (but i could live with that if necessary) and b) most important, it was not producing relevant results: especially, the default OR contradicts what one usually wants (keyword a AND b AND c); there is no sort by date option and no way to restrict dates, etc. So, I am very glad to see the proposals in the "Advanced search feature" (though from the screen shot it does seem to be missing the "select by date" feature, which is very important for searching in more or less recent posts and avoiding the return of obsolete information: as an example, when I am looking on the Drupal site for information about a bug or a problem, it is really of not much interest to see posts about 4.3).

2) It should be possible to configure which choices are presented to the user, especially by excluding any or all items: for example, excluding certain vocabularies, excluding certain content types (no point in searching through webforms for example), even being able to change the names would be nice ("book page" means nothing to my users who are used to thinking in terms of articles). Again, trip_search pretty much has all this.

3) The integration of categories in the search results is very attractive and I would like to see this in core search also:
- results returned for categories as well as nodes,
- ability to filter by category after the results are returned.

4) It would be useful to choose which indexing method one uses: incorporate the ones currently used in trip_search as options for sites which don't need the full indexing power of core search. It would also be good to have some guidelines as to which option to use - what is a "big" database?

5) It would be useful to have the ability to select the default sort order (eg by date, by relevance...)

6) It would be useful to be able to set a limit to the number of results returned.

deekayen’s picture

I know the Drupal core team likes to drag on upgrading base installation requirements, but I MySQL 4.1 has been production quality since April 2003. I think that's long enough to use Fulltext in core search.

chx’s picture

It's not like we are against using new software. But up to this day we are getting reports because people do not have PHP 4.3.3 which is from August 2003. Upgrading PHP is a lot easier than upgrading MySQL. Also, we have not dared to drop MySQL 3 (!!) yet, though it's a possibility...
--
Read my developer blog on Drupal4hu.

--
Drupal development: making the world better, one patch at a time. | A bedroom without a teddy is like a face without a smile.

kbahey’s picture

What about PostgreSQL?

If we do this, then we deprived users of PostgreSQL from that feature, which is something that we do not like to do ...

--
Drupal development and customization: 2bits.com
Personal: Baheyeldin.com

--
Drupal performance tuning and optimization, hosting, development, and consulting: 2bits.com, Inc. and Twitter at: @2bits
Personal blog: Ba

Gunny-1’s picture

Well explained Steven. If index size is an issue, then google api can be used for quick searching, ofcourse there is a restriction in the number of queries.

JohnG-1’s picture

crac’s picture

does this patch also work with version 4.6?

--
my httpclient for uploading whole directories to drupal image galleries (image.module)

neofactor’s picture

In my excitement for 4.7 release, I try and search for posts on that topic: "Drupal 4.7" in the search.

I of course get:
The word 47 was not included because it is too short.

It would be nice if I could put quotes around the search and look for the whole word: "drupal 4.7"

NaX’s picture

I did the same thing but what I want to know is why it excludes the dot in 4.7.

JohnG-1’s picture

many of these questions have been raised before and some of them even answered. a couple of threads which might help move this discussion on a bit:

2-character search filter thing
http://drupal.org/node/21159

Better search results
http://drupal.org/node/8099

use native database indexing/searching capabilities (FULLTEXT/BOOLEAN/...) for drupal search (2003!)
http://drupal.org/node/1995

catya’s picture

Any plans to be able to search attached files? Or am I missing it someplace?

cherylchase’s picture

I use Confluence for my intranet and the integrated Lucene (http://lucene.apache.org/java/docs/) search is to die for.

What's the status of Lucene in PHP? I see that it is part of the Zend Framework (http://framework.zend.com/manual/en/zend.search.html).