For the record I have released my scraper module.
http://drupal.org/project/scraper

Requirements: Drupal 4.7, PHP5, XML DOM, Tidy, some PHP skill, ability to learn XPath.

You would generally use this module if there were (non-syndicated, delimited) data in websites, that you wanted to import as nodes. Scraper could be run on any test Drupal site, and the data it produces could be imported anywhere (MS Excel, Drupal 4.x).

Some advanced features present:

  • submits forms
  • Enjoy

    Comments

    johnchalekson’s picture

    anyone is welcome to test it out there. http://www.4pir2.org/admin/scraper
    i still cant seem to get it to work properly. i get the error on line 506 (i think). any suggestions?

    Only one top level element is allowed in an XML document. Error processing resource 'http://www.4pir2.org/scraper/action/xm...

    Fatal error: Cannot instantiate non-existent class: tidy in /home/thingsto/public_html/4pir2/m...

    projects working on:
    aln

    hedgedfund

    thingstoday

    alc

    dado’s picture

    The module permits the admin to execute PHP code against your server. A malicious user could delete your entire DB!

    Please post an issue her
    http://drupal.org/node/add/project_issue/scraper/bug

    dado

    dado’s picture

    note that this module requires the tidy extension for PHP
    http://www.php.net/tidy

    that might be your problem

    johnchalekson’s picture

    i was just testing this module out.
    thanks for the help