Has anyone worked on trying to integrate a spider or other search engine into Drupal..

I was looking at trying integrate something like spihder or other web crawler into one of my sites to allow searching outside of our information with a limited number of sites.

The other thing I would like to look into is using the spider functions to provide a crawler to node integration similar to feedparser where it will put the RSS feed into a node. This would be a more plain type of look like topix.net...

If anyone has done this or looked into it I would like to speak with them to see if we could collaberate on something..

thanks

Comments

jsloan’s picture

I just left a comment on the Roadmap post concerning Lucene search of the Zend Framework. The Lucene search demo comes with a simple feed retrieval and indexer.

This past week there was a little discussion of a PHP spider on the Zend developers mailing list. So in time there may be something in the works from the Zend Framework.

I've looked for a PHP spider in the past and have not seen one, I have been using the spider.pl script from the SWISH-E package and more recently I am testing the HarvestMan web crawler developed in Python. Both of these programs will return the documents to your server, then you need to analyze and index them.

Running a spider is time and resource intensive, so if you have your own server it is a possibility but it would not be a good idea on a shared hosting site.

~ jim

robertdouglass’s picture

I've created a working group for Lucene and Nutch: http://groups.drupal.org/lucene-and-nutch

- Robert Douglass

-----
Lullabot | My Drupal book | My Digg RSS feed