This project is not covered by Drupal’s security advisory policy.

For a modern, working example see the Example Web Scraper (built on Feeds and Feeds XPath Parser.

@TODO: For Drupal 6, use SimpleXML as done by Nick Lewis.

The currently maintained portion of this module is a very simple scraper that can take a URL and beginning and ending code and display the result in a block. (This could easily be extended to display on a page.)

It does not store this scraped data at the moment, therefore it is necessary to use the blockcache module and replace the scraped_content block with its cached equivalent. Otherwise the page you are scraping is called every time the scraped_content block is viewed.

4.7 only: This module also contains code from an industrial strength scraping module written for 4.7, which is not maintained, Legacy scraper.

See also Import HTML module.

Agaric Design Collective as the current maintainer of the Scraper project is quite happy for it to become a collection of web-scraping modules, whether by expanding on simple_scraper or legacy_scraper or throwing in new ones.

Project information

Releases