This project is not covered by Drupal’s security advisory policy.
For a modern, working example see the Example Web Scraper (built on Feeds and Feeds XPath Parser.
@TODO: For Drupal 6, use SimpleXML as done by Nick Lewis.
The currently maintained portion of this module is a very simple scraper that can take a URL and beginning and ending code and display the result in a block. (This could easily be extended to display on a page.)
It does not store this scraped data at the moment, therefore it is necessary to use the blockcache module and replace the scraped_content block with its cached equivalent. Otherwise the page you are scraping is called every time the scraped_content block is viewed.
4.7 only: This module also contains code from an industrial strength scraping module written for 4.7, which is not maintained, Legacy scraper.
See also Import HTML module.
Agaric Design Collective as the current maintainer of the Scraper project is quite happy for it to become a collection of web-scraping modules, whether by expanding on simple_scraper or legacy_scraper or throwing in new ones.
Project information
- Unsupported
Not supported (i.e. abandoned), and no longer being developed. Learn more about dealing with unsupported (abandoned) projects - Obsolete
Use of this project is deprecated. - Module categories: Content Display, Import and Export
- Created by dado on , updated
- This project is not covered by the security advisory policy.
Use at your own risk! It may have publicly disclosed vulnerabilities.