Hello!
We're running Drupal 4.6.5. The search index is not being rebuilt. Yesterday when I added new content the index went up to 100% but now it's back down to 49% with items left that refuse to be indexed (regular page/post node types). Subsequent cron.php runs do not index the rest of the non-indexed items. cron.php is configured properly, running at various times throughout the day. Running cron.php manually several times does not bring the index up to 100%.
trip_search does a little better job, but it bombs out with...
warning: implode(): Bad arguments. in /export/home1/u03/usr/local/apache/htdocs/kb/modules/trip_search/trip_search.module on line 302.
...which is not exactly user friendly. Besides, trip_search only does node searches, it does not have an interface for accessing the swish index (that I know of). search.module has that. However, search.module is returning document links with a leading slash in front of the filename, like this...
http://myserver/kb/?q=system/files&file=/myfile
...but it really should be ...files&file=myfile (no leading slash before "myfile").
swish-e is properly installed and manual searches on the command line provide the desired results. If you navigate to a node that has an attachment the attachment's link is correct (no leading slash) and clicking on it works.
We're pretty much impressed by the rest of Drupal's features, but what we want to do is create a searchable knowledge base and if the main component behind that requirement (search) is not working properly then we can't use Drupal.
I have done my fair share of searching throughtout Drupal.org but I haven't found a patch or solution to search's problems (again, cron.php isn't it). Drupal.org's own search engine represents a challenge to us newbies that are learning to navigate this site; it just doesn't allow you refine your search and you find yourself reading post after post of stuff not specific to your question(s).
Are there alternatives to Drupal's search.module and trip_search? Or is this pretty much it? Thanks all out there in advance; I understand this is OpenSource so any comments or pointers are greatly appreciated.
Thank you!
ZoneV
Comments
Don't understand
I don't understand your file problem. Search module does not do anything with attached files. The output it indexes is the same as the output that is shown to the user. So if the URL there is messed up, it should be messed up in the normal output as well. Where are these broken links showing up? The search results only link to the nodes themselves, not to any attachments.
As far as not reaching 100% goes, perhaps you could examing the search_index table and compare its sid column to the nid column in the node table. That way, you can see if there are actually nodes that are not being indexed, or whether it is a reporting error.
--
If you have a problem, please search before posting a question.
Let me clarify
Thanks Steven for taking the time to respond. I will look at the search_index table as you have suggested.
If I look at the link of an attachment to a node, the link looks like this:
http://myserver/kb/?q=system/files&file=myfileThat link (above) works. Note that there is no leading slash before myfile (=myfile).
However, if I do a search (I'm now searching for content inside a file, say an MS Word file) and click on the "Uploaded Files" tab, if the search returns something I'll see a file or a list of files having links like this:
http://myserverkb/?q=system/files&file=/myfileNote now how myfile has a leading slash (=/myfile). Links of this type do not resolve.
What am I missing? Thanks much for your help!!!
ZoneV
Not search.module
Search.module is simply an enabler for search. The uploaded files tab is generated by some other module. The bug lies there.
--
If you have a problem, please search before posting a question.
Bad Links
Thanks Steven! I'll take a look.
ZoneV
Found the solution to bad links
By default, the swish module places the index right inside the files folder. This causes the index to be indexed too; that took a long time. So I had moved the index out of the files path and had modified the swish module accordingly, but I had left out a leading "/" in front of the search return links.
It's now fixed.
ZoneV
search_index table
I just recalled (and confirmed by looking at it) that the search_index table is empty. I had emptied it thinking that rebuilding the index would re-populate the table.
Is this not the expected behavior or did I screw it up badly?
ZoneV
Alternative or solved the problem?
One solution would be to go to version 4.7 of drupal witch actually use a merge version of search and trip_search.
But if you want to solve your problem, look below.
You said that :"but now it's back down to 49% with items left that refuse to be indexed (regular page/post node types). Subsequent cron.php runs do not index the rest of the non-indexed items."
As I, my website does not index everything at once since I have put some limit on the node indexing (in admin-settings-search).
So when you are manually executing cron.php :
1- do you see the "cron finished" in your logs?
2- Does your % of the index is higher?
Alexandre Racine
www.gardienvirtuel.com Sécurité informatique, conformité, consultation, etc
www.salsamontreal.com La référence salsa à Montréal
Problem not yet solved
Thank you for your quick response, alexandreracine.
Here's the output from watchdog:
... and then...
...and finally...
...and the % of the index stays put @ 49%...
merci!
ZoneV
mmmm... strange...
You could try to re-index everything... just for fun :)
I have a couples of idears, from access to security, but try this before.
Go in your admin/settings/search config, and change the minimal length of words indexed to... let's say 4. Save. This will re-index all your index. So you will go back to 0%. Double check if you are at 0% before going on. And then, try cron to see if you will simply go back to 47% and stag there again.
I agree, 4.7 is for testing only for now :)
Alexandre Racine
www.gardienvirtuel.com Sécurité informatique, conformité, consultation, etc
www.salsamontreal.com La référence salsa à Montréal
Scratch index and start from zero
Ok, I'll try this. I'll get back to you (et al).
ZoneV
Scratch index and start from zero, Part II
Ok, changed the mininal length of words to index to 4. Saved config.
Then I launched cron.php (via IE, http: // mysite /cron.php ; don't want this to be a link so spaces added). Watchdog said it detected cron activity.
Cron run completed successfully.
And here's the indexing status, verbatim:
:(
ZoneV
One more thing...
I'd love to go to 4.7 as soon as it is at least RC1. Right now with 400 bugs and 16 of them critical I am not sure that we want to commit to that version yet.
ZoneV
Still another comment...
Items to index per cron run is 100.
...but my site does not have 100 nodes yet.
ZoneV
Do you have 49 nodes to be exact?
Do you have 49 nodes to be exact?
If this is the case, we might have a bug present=)
I have 27...
Indexing status says:
Running...
So how many nodes do I have, 27? And indexing says that 49% of the site is indexed, but 27 nodes (or items?) are left to index.
ZoneV
Found the solution to indexing problem
The trip_search module creates full text indexes on two tables (node and I can't remember the other one as I post this). That index was corrupted.
I dropped the indexes on both tables and that took care of the problem (now indexing goes to 100%). I have decided to stay with basic trip_search since that does it without complications (yes I know it takes longer, but works....)
ZoneV
can't use
I'm mulling over your comment that a good search function is essential, and that judging by drupal.org search in 4.6 has important features missing. I have also thought that drupal.org probably represents the best search function available in 4.6 (not using trip search). And we know that drupal.org is difficult to search. 4.7 is supposed to be much better, but still misses important functionality. search this site on jason and smartsearch (I think) for a discussion on that.
you said you can't use Drupal if the search is not powerful enough. what other cms are you considering? I'd be interested to hear which cms' have better (or much better) search.
FIRST thing FIRST...
Search in Drupal 4.7 is way better then 4.6.
Drupal.org does not run 4.7 yet.
And 4.7 is still not stable. So patience is a vertue.
Alexandre Racine
www.gardienvirtuel.com Sécurité informatique, conformité, consultation, etc
www.salsamontreal.com La référence salsa à Montréal
Can't use.
I'm between a rock and a hard place.
Management has asked me to come up with an alternative to a very expensive (and IMO, clumsy) CMS called Docum... Docummmmmmm... my lips cannot speak it!
We were nastily bit in the rear end by multiple vendors a few years back that came here promising the world of tomorrow today. Money was lavi$hly thrown at it, but the solution required business process re-engineering that our organization was not prepared/willing to do at the time; we wanted the tool to adapt to our processes and not the other way around. After a few attempts, the whole thing was discreetly shelved and project members quietly sailed into the sunset.
Now the tiger is back, and with a vengeance! Upper managers are being wined and dined and are mesmerized once more with the promise of tomorrow today...
My team (about 40-50) does not care about CMS, we just want a searchable repository (better than MS networkk shares) that's readily accessible over our Intranet. There are a few other things that we see we can do with Drupal, but being able to search quickly and accurately is key!
So we're attempting to launch a grass roots knowledge base movement founded on Drupal (recommended to us by another team; however this is not priority to them and can't commit a resource to show us the ropes); if it catches enough momentum we can tell upper management what a great solution this little thing called Drupal is for what we need it to and there won't be a big push toward Docummmm mmmmmmmm pfffft! Whatever.
I haven't looked at other alternatives (this effort started a few weeks ago); recently a ran into Mambo but I don't know if it's more adequate to our needs.
ZoneV
What can we say? Again.
What can we say? Again. 4.7 search is vastly improved. Not a little, A lot. I know that I linked you to the issue where the capabilities exist. You can play with 4.7. It's a risk, but one you may feel you need to take. You can only determine if this suits you through actual testing.
http://scratch.drupal.org/search/node
Suplementatl module leveraging the 4.7 base built in.
http://drupal.org/project/porterstemmer
-Steven Peck
---------
Test site, always start with a test site.
Drupal Best Practices Guide -|- Black Mountain
-Steven Peck
---------
Test site, always start with a test site.
Drupal Best Practices Guide
Thanks!
I understand. I truly appreciate everyone's comments; it certainly takes a lot of commitment to keep a community like this running.
Let me try to find another box where I can give 4.7 a spin. I am using Drupal 4.6.5 on Site5 on another (personal) site I'm setting up over there; Fantastico setup was a breeze and searching seems to be working fine (although I don't have a lot of content yet).
Hopefully as 4.7 matures and I learn more about Drupal I'll be able to return the favor to this community and perhaps contribute some and not only ask.
So once again, thanks! I'll keep you all posted on my progress.
ZoneV
Swish-e not available for 4.7 yet...
Bummer...
:p
4.7 is lacking many modules
Took a look. Yes, it is vastly improved but unfortunately many modules that I'm currently using in 4.6.x have not yet been converted to 4.7, so I decided to stay with 4.6.x.
Thx!
ZoneV