Hello!

We're running Drupal 4.6.5. The search index is not being rebuilt. Yesterday when I added new content the index went up to 100% but now it's back down to 49% with items left that refuse to be indexed (regular page/post node types). Subsequent cron.php runs do not index the rest of the non-indexed items. cron.php is configured properly, running at various times throughout the day. Running cron.php manually several times does not bring the index up to 100%.

trip_search does a little better job, but it bombs out with...

warning: implode(): Bad arguments. in /export/home1/u03/usr/local/apache/htdocs/kb/modules/trip_search/trip_search.module on line 302.

...which is not exactly user friendly. Besides, trip_search only does node searches, it does not have an interface for accessing the swish index (that I know of). search.module has that. However, search.module is returning document links with a leading slash in front of the filename, like this...

http://myserver/kb/?q=system/files&file=/myfile

...but it really should be ...files&file=myfile (no leading slash before "myfile").

swish-e is properly installed and manual searches on the command line provide the desired results. If you navigate to a node that has an attachment the attachment's link is correct (no leading slash) and clicking on it works.

We're pretty much impressed by the rest of Drupal's features, but what we want to do is create a searchable knowledge base and if the main component behind that requirement (search) is not working properly then we can't use Drupal.

I have done my fair share of searching throughtout Drupal.org but I haven't found a patch or solution to search's problems (again, cron.php isn't it). Drupal.org's own search engine represents a challenge to us newbies that are learning to navigate this site; it just doesn't allow you refine your search and you find yourself reading post after post of stuff not specific to your question(s).

Are there alternatives to Drupal's search.module and trip_search? Or is this pretty much it? Thanks all out there in advance; I understand this is OpenSource so any comments or pointers are greatly appreciated.

Thank you!
ZoneV

Comments

Steven’s picture

I don't understand your file problem. Search module does not do anything with attached files. The output it indexes is the same as the output that is shown to the user. So if the URL there is messed up, it should be messed up in the normal output as well. Where are these broken links showing up? The search results only link to the nodes themselves, not to any attachments.

As far as not reaching 100% goes, perhaps you could examing the search_index table and compare its sid column to the nid column in the node table. That way, you can see if there are actually nodes that are not being indexed, or whether it is a reporting error.

--
If you have a problem, please search before posting a question.

ZoneV’s picture

Thanks Steven for taking the time to respond. I will look at the search_index table as you have suggested.

If I look at the link of an attachment to a node, the link looks like this:

http://myserver/kb/?q=system/files&file=myfile

That link (above) works. Note that there is no leading slash before myfile (=myfile).

However, if I do a search (I'm now searching for content inside a file, say an MS Word file) and click on the "Uploaded Files" tab, if the search returns something I'll see a file or a list of files having links like this:

http://myserverkb/?q=system/files&file=/myfile

Note now how myfile has a leading slash (=/myfile). Links of this type do not resolve.

What am I missing? Thanks much for your help!!!

ZoneV

Steven’s picture

Search.module is simply an enabler for search. The uploaded files tab is generated by some other module. The bug lies there.

--
If you have a problem, please search before posting a question.

ZoneV’s picture

Thanks Steven! I'll take a look.

ZoneV

ZoneV’s picture

By default, the swish module places the index right inside the files folder. This causes the index to be indexed too; that took a long time. So I had moved the index out of the files path and had modified the swish module accordingly, but I had left out a leading "/" in front of the search return links.

It's now fixed.

ZoneV

ZoneV’s picture

I just recalled (and confirmed by looking at it) that the search_index table is empty. I had emptied it thinking that rebuilding the index would re-populate the table.

Is this not the expected behavior or did I screw it up badly?

ZoneV

alexandreracine’s picture

One solution would be to go to version 4.7 of drupal witch actually use a merge version of search and trip_search.

But if you want to solve your problem, look below.

Indexing status
98% du site a été indexé. Il reste 11 éléments à indexer.

You said that :"but now it's back down to 49% with items left that refuse to be indexed (regular page/post node types). Subsequent cron.php runs do not index the rest of the non-indexed items."

As I, my website does not index everything at once since I have put some limit on the node indexing (in admin-settings-search).

So when you are manually executing cron.php :
1- do you see the "cron finished" in your logs?
2- Does your % of the index is higher?

Alexandre Racine

www.gardienvirtuel.com Sécurité informatique, conformité, consultation, etc

www.salsamontreal.com La référence salsa à Montréal

ZoneV’s picture

Thank you for your quick response, alexandreracine.

Here's the output from watchdog:

/usr/local/bin/swish-e -c tmp/swishcMaaRw -f /usr/local/apache/htdocs/kb/my_swish_index

... and then...

Indexing Data Source: "File-System"
Indexing "/usr/local/apache/htdocs/kb/files"

Checking dir "/usr/local/apache/htdocs/kb/files"...
Checking dir "/usr/local/apache/htdocs/kb/files/logos"...
Checking dir "/usr/local/apache/htdocs/kb/files/pictures"...
Checking dir "/usr/local/apache/htdocs/kb/files/images"...
Checking dir "/usr/local/apache/htdocs/kb/files/images/temp"...
Checking dir "/usr/local/apache/htdocs/kb/files/images/thumbs"...
Checking dir "/usr/local/apache/htdocs/kb/files/thumbs"...

Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 5,504 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: ... Writing word text: Complete
Writing word hash: ... Writing word hash: 10% Writing word hash: 20% Writing word hash: 30% Writing word hash: 40% Writing word hash: 50% Writing word hash: 60% Writing word hash: 70% Writing word hash: 80% Writing word hash: 90% Writing word hash: 100% Writing word hash: Complete
Writing word data: ... Writing word data: Complete
5,504 unique words indexed.
Sorting property: swishdocpath Sorting property: swishtitle Sorting property: swishdocsize Sorting property: swishlastmodified 4 properties sorted.
4 files indexed. 6,533,089 total bytes. 81,453 total words.
Elapsed time: 00:02:14 CPU time: 00:00:02
Indexing done!

...and finally...

Cron run completed

...and the % of the index stays put @ 49%...

merci!
ZoneV

alexandreracine’s picture

You could try to re-index everything... just for fun :)
I have a couples of idears, from access to security, but try this before.

Go in your admin/settings/search config, and change the minimal length of words indexed to... let's say 4. Save. This will re-index all your index. So you will go back to 0%. Double check if you are at 0% before going on. And then, try cron to see if you will simply go back to 47% and stag there again.

I agree, 4.7 is for testing only for now :)

Alexandre Racine

www.gardienvirtuel.com Sécurité informatique, conformité, consultation, etc

www.salsamontreal.com La référence salsa à Montréal

ZoneV’s picture

Ok, I'll try this. I'll get back to you (et al).

ZoneV

ZoneV’s picture

Ok, changed the mininal length of words to index to 4. Saved config.

Then I launched cron.php (via IE, http: // mysite /cron.php ; don't want this to be a link so spaces added). Watchdog said it detected cron activity.

Cron run completed successfully.

And here's the indexing status, verbatim:

49% of the site has been indexed. There are 27 items left to index.

:(

ZoneV

ZoneV’s picture

I'd love to go to 4.7 as soon as it is at least RC1. Right now with 400 bugs and 16 of them critical I am not sure that we want to commit to that version yet.

ZoneV

ZoneV’s picture

Items to index per cron run is 100.

...but my site does not have 100 nodes yet.

ZoneV

rentex’s picture

Do you have 49 nodes to be exact?

If this is the case, we might have a bug present=)

ZoneV’s picture

Indexing status says:

49% of the site has been indexed. There are 27 items left to index.

Running...

mysql> use drupal;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> select count(*) from node;
+----------+
| count(*) |
+----------+
|       27 |
+----------+
1 row in set (0.00 sec)

mysql>

So how many nodes do I have, 27? And indexing says that 49% of the site is indexed, but 27 nodes (or items?) are left to index.

ZoneV

ZoneV’s picture

The trip_search module creates full text indexes on two tables (node and I can't remember the other one as I post this). That index was corrupted.

I dropped the indexes on both tables and that took care of the problem (now indexing goes to 100%). I have decided to stay with basic trip_search since that does it without complications (yes I know it takes longer, but works....)

ZoneV

kae’s picture

I'm mulling over your comment that a good search function is essential, and that judging by drupal.org search in 4.6 has important features missing. I have also thought that drupal.org probably represents the best search function available in 4.6 (not using trip search). And we know that drupal.org is difficult to search. 4.7 is supposed to be much better, but still misses important functionality. search this site on jason and smartsearch (I think) for a discussion on that.

you said you can't use Drupal if the search is not powerful enough. what other cms are you considering? I'd be interested to hear which cms' have better (or much better) search.

alexandreracine’s picture

Search in Drupal 4.7 is way better then 4.6.

Drupal.org does not run 4.7 yet.

And 4.7 is still not stable. So patience is a vertue.

Alexandre Racine

www.gardienvirtuel.com Sécurité informatique, conformité, consultation, etc

www.salsamontreal.com La référence salsa à Montréal

ZoneV’s picture

I'm between a rock and a hard place.

Management has asked me to come up with an alternative to a very expensive (and IMO, clumsy) CMS called Docum... Docummmmmmm... my lips cannot speak it!

We were nastily bit in the rear end by multiple vendors a few years back that came here promising the world of tomorrow today. Money was lavi$hly thrown at it, but the solution required business process re-engineering that our organization was not prepared/willing to do at the time; we wanted the tool to adapt to our processes and not the other way around. After a few attempts, the whole thing was discreetly shelved and project members quietly sailed into the sunset.

Now the tiger is back, and with a vengeance! Upper managers are being wined and dined and are mesmerized once more with the promise of tomorrow today...

My team (about 40-50) does not care about CMS, we just want a searchable repository (better than MS networkk shares) that's readily accessible over our Intranet. There are a few other things that we see we can do with Drupal, but being able to search quickly and accurately is key!

So we're attempting to launch a grass roots knowledge base movement founded on Drupal (recommended to us by another team; however this is not priority to them and can't commit a resource to show us the ropes); if it catches enough momentum we can tell upper management what a great solution this little thing called Drupal is for what we need it to and there won't be a big push toward Docummmm mmmmmmmm pfffft! Whatever.

I haven't looked at other alternatives (this effort started a few weeks ago); recently a ran into Mambo but I don't know if it's more adequate to our needs.

ZoneV

sepeck’s picture

What can we say? Again. 4.7 search is vastly improved. Not a little, A lot. I know that I linked you to the issue where the capabilities exist. You can play with 4.7. It's a risk, but one you may feel you need to take. You can only determine if this suits you through actual testing.

http://scratch.drupal.org/search/node
Suplementatl module leveraging the 4.7 base built in.
http://drupal.org/project/porterstemmer

-Steven Peck
---------
Test site, always start with a test site.
Drupal Best Practices Guide -|- Black Mountain

-Steven Peck
---------
Test site, always start with a test site.
Drupal Best Practices Guide

ZoneV’s picture

I understand. I truly appreciate everyone's comments; it certainly takes a lot of commitment to keep a community like this running.

Let me try to find another box where I can give 4.7 a spin. I am using Drupal 4.6.5 on Site5 on another (personal) site I'm setting up over there; Fantastico setup was a breeze and searching seems to be working fine (although I don't have a lot of content yet).

Hopefully as 4.7 matures and I learn more about Drupal I'll be able to return the favor to this community and perhaps contribute some and not only ask.

So once again, thanks! I'll keep you all posted on my progress.

ZoneV

ZoneV’s picture

Bummer...

:p

ZoneV’s picture

Took a look. Yes, it is vastly improved but unfortunately many modules that I'm currently using in 4.6.x have not yet been converted to 4.7, so I decided to stay with 4.6.x.

Thx!
ZoneV