Syndicate from static html sites to Drupal portal site

technivant - November 20, 2006 - 18:14

Hi,

We're building a Drupal-based portal which needs to include content from several affiliate websites. The affiliate site admins aren't able to spend time creating RSS/Atom feeds. Instead, we would need to be able to scrape all the content off, format it as we desire, and present it within our new Drupal portal. The key would be automating this process. Does this sound feasible? Any ideas would be greatly appreciated...

Scraper module

glendac - November 20, 2006 - 19:08

Try the contrib module called scraper. See at http://drupal.org/project/scraper. I'd be interested to know how it works for your site since I also see a lot of websites with good content but do not have the resources to create feeds.

You don't need a module, just a few lines of good php code

cbabione - February 23, 2007 - 06:01

I had the same need and just wrote a dirty html based scraper in php. I have it populating my book pages, or so at least the end user or googlebot sees it that way. It works well. Here it is in action:

http://www.killertux.com/node/28

Just follow one of the bookmarks on that page and you can see it in action.
Notice I have book pages I named directly, then I have an "Misc" page to catch links that I really do not care about.
I did write this a earlier today, so some of the formatting may look a bit odd, but all in all, I am happ.y

Let me know if this is what you were looking for, and I can get you the code.

cbabione
http://www.killertux.com

 
 

Drupal is a registered trademark of Dries Buytaert.