There have been several posts lately surrounding the quality of search results with Drupal's built-in search engine, and how it can be difficult to find targeted results on Drupal.org as a result.
The great news is that our very own Steven has been developing a patch to drastically improve advanced searches by adding a number of features, including filtering results by node type (page, blog, etc.), specifically excluding certain words/phrases, and so on. All it needs now are some people to test it and give feedback.
And the really great news is that YOU can help! Even if you know nothing of foreign terms like "patches" or "CVS," as long as you can install Drupal, YOU can help to test this addition and get it added to Drupal (and thus Drupal.org) faster!
Simply go here to download a copy of Drupal HEAD (the development version of Drupal), with the new search features already added: http://acko.net/dumpx/searchpatched.zip
Test it out, try and break it, see if it works the way you expect. Then, check out the Tips for reviewing patches document and leave feedback about your experience and further ideas!
Comments
Steven You ROCK
this is a major major improvement!!!
thanks so much
a Problem
I installed it in my test site and got a problem when I tried to do a search in IE. The search results seemed to be "unthemed". So the page looks messy. But when I did a same search in Firefox, everythink worked very well. I am not sure what caused this problem.
You may test it at http://test.kzeng.info/search and see whether you get the same problem as I did.
Anyway, it's really a great improvement. Thanks!
--------------------------
http://www.kzeng.info
the theming problem is due
the theming problem is due to soem debug code that gets printed out.
--
Drupal services
My Drupal services
--
Drupal services
My Drupal services
One important feature, IMHO,
One important feature, IMHO, is to offer a selection box or radio buttons to allow searching in the subject/title field only. This would be quite handy in narrowing down the search, and the results.
Faster
as well as faster...
--
Please read the handbook, search the forums, then ask...
http://drupal.etopian.net (Drupal Support)
Please file feature requests
Please do file feature requests directly against the search module.
For those who have not filed a feature request before...
Here's how you should set the options:
Project: Drupal
Version: cvs (there are like 3 of these, just pick one of them :))
Component: search.module
Category: feature request
Priority: usually normal (or minor, depending)
Title and Description: (insert stuff here)
Database changes
Note that this patch requires database updates, so you need to run update.php after unzipping it when using an existing database. Though it is likely that this patch will be part of 4.7, I cannot guarantee the database format or updates will remain the same. So, backup your database, and restore it after testing the patch.
--
If you have a problem, please search before posting a question.
Search Engines...
I was wondering if there is a way to implement a "This Site" or "Google" search feature... I know you can and I would love to have that on my site if I could do it myself, but I suck at coding! I am suggesting this as an option that can be turned on and off... I love the improvements done to the search module though!
------------------------------------
Paul Malenke
paul.malenke@gmail.com
AfterDeathGraphics.com
Yes.
http://www.google.com/searchcode.html
_________
bob-thompson.com
ok...
I dont need that... thats easy... Thats just a form... I dont think what I said was understood.
-----------------------------------
Paul Malenke
paul.malenke@gmail.com
AfterDeathGraphics.com
Is this what you mean?
I think what you mean is a module that when added would display a Google search box on your site.
Ideally, this would be tied into Google's Adsense for Search as well, so for big sites, it is a revenue source.
https://www.google.com/adsense/ws-overview
Is that what you meant?
--
Drupal development and customization: 2bits.com
Personal: Baheyeldin.com
--
Drupal performance tuning and optimization, hosting, development, and consulting: 2bits.com, Inc. and Twitter at: @2bits
Personal blog: Ba
i think he means a single search box with
i think he means a single search box with a selector to point the search at a/ this site, b/ google.com c/ yahoo directory ... like the firefox search box?
... this would be integrated into search module.
No... I am thinking like Adsense for search...
I am actually thinking both! :)
--------------------------------------
Paul Malenke
paul.malenke@gmail.com
AfterDeathGraphics.com
?
Now I'm confused.
would you care to elaborate ?
Here's an example of what it should be
If all you want is a Google box on your site, that does Google search in your site by displaying a standard Google page, then there's no need for a module - just go to Google, generate the HTML, and stick it in a block.
But here's an example of REAL Google search - which I assume is built on web services to the Google APIs: http://drupal.org/node/10134#comment-59097
In other words, you can display the search results on your site, in your format, and potentially adding other information (eg categories), or including sorts (like, by date), etc....
This looks very promising and I would love to try it out when available
Problems with search module and MySQL 4.1
I have been experimenting with MySQL 4.1 and found some problems with search module and database character set. Please look at my comment
http://drupal.org/node/15746#comment-47735
I have found similar issue
http://drupal.org/node/12958#comment-27889
How about sorting?
What I really miss is a way of sorting results, say by date. It is really annoying to have to look through a lot of pages of search results because the more recent and relevant results are among older nodes.
Good point. :)
Good point. :)
mod that
+1
:)
sorting by date is essential
mod +1
or find in last 30/60 days/six months/one year
Why do this?
Having found that my site was losing all its disk space to search indexes, I am in the process of installing the trip_search module, which has just about all the features mentioned here and more (including a cool interaction with categories). And it can work with either no indexes at all or with fulltext indexes in MySQL (apparently) - I use no indexes and it is already far more reliable and useful than the existing search module.
My instinct would be to scrap any idea of changing the existing search module and pile all Steven's development effort into the trip_search modile, with the aim of bringing that into core, rather than waste effort by reinventing the wheel.
Oh, and by the way... my own contribution will be the translation of the module....
For what it's worth....
JG
trip_search is great for smaller databases
But it really pounds your DB for large sites. Also, this is off topic for this forum thread, which is about helping test an improvement to search.
Please create a new forum thread if you would like to gather support for getting trip_search in core...which is not a bad idea, as the majority of Drupal sites are generally smaller.
trip_search and default search - w/c is more efficient?
What does it mean when trip_search pounds the DB? Am trying out both the default search module and the trip_search module in 4.6, and am interested to know which is more efficient. Thanks.
I beg to differ
I disagree that this is off-topic. What we are after is an improvement to the search provided in Drupal, am I correct? Well, I would make the following points:
1) I see no point in having two search facilities in core - therefore my suggestion would not be to keep the existing search and also include trip_search, but to replace the existing search with trip_search
2) The extra functionality suggested in this improvement to search is mostly in trip_search already; why bother to reinvent the wheel? Where is the advantage of having Steven and nedjo working separately on two different projects when they could be combining their forces to provide an even cooler solution even faster?
3) As far as hammering the database goes, I would like to have some detail on that. At the moment I am using trip_search on a DB with 2000 nodes and climbing, with no noticeable loss of performance compared to core search, PLUS I do not have core search indexes eating up all the disk space and forcing me to shell out more dollars to my hosting service.
Anyway, this point would bear more discussion. No matter which way you look at it, a search means building indexes in a DB - so any kind of search will hammer the database. In trip-search you have three methods of searching proposed:
a) Basic (I use this for the moment) - presumably this has a performance hit because it must do sequential searches through the DB without the benefit of indexing.
b) Regular expression - I don't know how this works
c) SQL Fulltext - this takes advantage of the Fulltext indexing provided native in MySQL (ie it also builds an index, but a much smaller one than core search). This refused to work for me so far, but since Basic filled my needs, I haven't bothered to find out why yet.
So before we talk about "hammering the database", I would like to see a comparison of the different methods.
It may well be the case that the core search indexing is more efficient (in time, not in disk space), and that it is necessary to provide an indexing method which is database-independent. I would have no objection to adding core-search indexing to the trip_search methods. BUT, I still see no advantage in separate development.
I really would like to see a proper search in place in Drupal. It is at present, IMHO, about the only really weak point in what is otherwise a startlingly good piece of software.
Why search.module
The issue of trip_search vs core search is not an easy one. There are various arguments either way.
I'm the author of the core search (search.module), but I have looked to trip_search for ideas and inspiration. Vice-versa, trip_search includes code from core search. It might not seem obvious from the outside, but there is cross pollination going on between the two.
It is true that trip_search doesn't have the index building and space requirements. But, unless you use MySQL fulltext indices, it has to perform a full table scan each time, which means it gets linearly slower as your site grows. This makes trip_search okay for small to medium sites, but a very bad choice for large sites.
A lot of core search's benefits are subtle and in the background, so people tend to give the module less credit than it deserves.
Did you know that core search...
These are all things that trip_search does not have, and for many professional applications, these are considered essential features of a 'proper search'. And I'd like to point out that it does all of this, while still maintaining the high standards that are expected of a Drupal core module. Many of these features are only possible because of the index, which means that merging them into trip_search's other mechanisms is impossible or impractical.
If I didn't think that search.module had a lot of going for it, I would not spend my time on it. And if I did not think trip_search was good, I would not be making patches that incorporate some of its features in core search. ;)
--
If you have a problem, please search before posting a question.
that's great, but
For usability's sake, maybe the trip_search should be renamed to basic_search and made default.
The regular search module should be renamed to something that implies it being better for larger sites.
The main reason why i think trip_search should be the default search module is because many noobs post a couple of nodes and go try the search feature, but no results come up due to cron.php not having run yet. Noobs don't like this. They want their search result to reflect the current state of their site.
People who are managing larger sites usually know much more than the small-site noobs. They can go and choose to use the indexed search module for their sites, and they can set up a cron job for themselves with ease.
Anyways, that my 2 cents.
BTW, i realize some may see this post as slightly off-topic, but under the assumption that this thread is about "improving drupal's built-in search", i think it fits right in.
-1 :one core search
The regular search module works fine for smaller sites and if it works fine for large sites, then it should end up working better for small. One core with options with contrib search options for those wanting a different experiance.
The new user syndrome is cured through step 6 of the INSTALL.txt where it is mentioned prominantly. The biggest issue we have with new users are those using automated installs via Fantastico or Debian, etc. We do not have that many forum questions regarding it as that answer is easily findable and simply pointing them to it when they ask is quick.
-sp
---------
Test site, always start with a test site.
Drupal Best Practices Guide -|- Black Mountain
-Steven Peck
---------
Test site, always start with a test site.
Drupal Best Practices Guide
I didn't know all that...
First of all I would like to thank Steven for taking the time out to explain all that - it makes a big difference to my understanding. I'm especially relieved to see that there is cross-pollination going on between core search and trip-search.
It is also clear that having a really solid, fast, underlying engine is the necesary foundation on which to build a truly flexible solution. And for all those in the non-English world (my case) proper Unicode support is also vital.
That said, here are a few suggestions if I may:
1) I think that the changes to user options (which already exist in trip_search) are absolutely critical: I decided to switch off the current core search because a) it was eating disk space (but i could live with that if necessary) and b) most important, it was not producing relevant results: especially, the default OR contradicts what one usually wants (keyword a AND b AND c); there is no sort by date option and no way to restrict dates, etc. So, I am very glad to see the proposals in the "Advanced search feature" (though from the screen shot it does seem to be missing the "select by date" feature, which is very important for searching in more or less recent posts and avoiding the return of obsolete information: as an example, when I am looking on the Drupal site for information about a bug or a problem, it is really of not much interest to see posts about 4.3).
2) It should be possible to configure which choices are presented to the user, especially by excluding any or all items: for example, excluding certain vocabularies, excluding certain content types (no point in searching through webforms for example), even being able to change the names would be nice ("book page" means nothing to my users who are used to thinking in terms of articles). Again, trip_search pretty much has all this.
3) The integration of categories in the search results is very attractive and I would like to see this in core search also:
- results returned for categories as well as nodes,
- ability to filter by category after the results are returned.
4) It would be useful to choose which indexing method one uses: incorporate the ones currently used in trip_search as options for sites which don't need the full indexing power of core search. It would also be good to have some guidelines as to which option to use - what is a "big" database?
5) It would be useful to have the ability to select the default sort order (eg by date, by relevance...)
6) It would be useful to be able to set a limit to the number of results returned.
fulltext index then
I know the Drupal core team likes to drag on upgrading base installation requirements, but I MySQL 4.1 has been production quality since April 2003. I think that's long enough to use Fulltext in core search.
it's the world
It's not like we are against using new software. But up to this day we are getting reports because people do not have PHP 4.3.3 which is from August 2003. Upgrading PHP is a lot easier than upgrading MySQL. Also, we have not dared to drop MySQL 3 (!!) yet, though it's a possibility...
--
Read my developer blog on Drupal4hu.
--
Drupal development: making the world better, one patch at a time. | A bedroom without a teddy is like a face without a smile.
Postgres?
What about PostgreSQL?
If we do this, then we deprived users of PostgreSQL from that feature, which is something that we do not like to do ...
--
Drupal development and customization: 2bits.com
Personal: Baheyeldin.com
--
Drupal performance tuning and optimization, hosting, development, and consulting: 2bits.com, Inc. and Twitter at: @2bits
Personal blog: Ba
Well explained Steven. If
Well explained Steven. If index size is an issue, then google api can be used for quick searching, ofcourse there is a restriction in the number of queries.
I think Boris meant this thread:
http://drupal.org/node/29196
4.6
does this patch also work with version 4.6?
--
my httpclient for uploading whole directories to drupal image galleries (image.module)
searching for number....
In my excitement for 4.7 release, I try and search for posts on that topic: "Drupal 4.7" in the search.
I of course get:
The word 47 was not included because it is too short.
It would be nice if I could put quotes around the search and look for the whole word: "drupal 4.7"
I did the same thing but
I did the same thing but what I want to know is why it excludes the dot in 4.7.
many of these questions have been discussed before : links
many of these questions have been raised before and some of them even answered. a couple of threads which might help move this discussion on a bit:
2-character search filter thing
http://drupal.org/node/21159
Better search results
http://drupal.org/node/8099
use native database indexing/searching capabilities (FULLTEXT/BOOLEAN/...) for drupal search (2003!)
http://drupal.org/node/1995
Attached files?
Any plans to be able to search attached files? Or am I missing it someplace?
Lucene?
I use Confluence for my intranet and the integrated Lucene (http://lucene.apache.org/java/docs/) search is to die for.
What's the status of Lucene in PHP? I see that it is part of the Zend Framework (http://framework.zend.com/manual/en/zend.search.html).