Hello,
is this module ready to be used ?

My use case is :
- i have a feed importer which create a node from an external URL
- i have a external page with several elements, each of them having a link to an URL i want to import
- this page has pagination link with "next" / "previous" buttons

Can i use this module to import a feed from every elements found in the page, find the "next" page url, and continue over and over ..?

Comments

tekken’s picture

subscribing

twistor’s picture

Assigned: Unassigned » mitchell

This module handles the latter part. It will do pagination, as in following a next link: it doesn't, however, grab a set of links from a page.

summit’s picture

Subscribing, how to grab a set of links from a page then please?
greetings, Martijn

dmitriy.trt’s picture

Title: How to use this module ? » Crawler for the list of links
Category: support » feature
Status: Active » Needs work
StatusFileSize
new12.69 KB

Patch implements new fetcher class FeedsListCrawler. It uses listing page as a starting point and imports pages from links found (XPath to get item link is configurable). It is able to parse multiple listing pages. For now "Next" link can be found using XPath only ("auto" and $index pattern methods are missing, this part needs work). Class was tested on HTML pages only.

Original class FeedsCrawler is re-factored a bit to share code with new class and allow it to access some methods.

Patch also includes "Source URL" mapping source implementation, because it becomes quite hard to get original URL of each item.

Looks like there are some problems with periodic import interval. Job scheduler executes only one job on each cron run. I'm going to solve this problem a bit later, but can't make any promises about missing "Next" link extraction methods.

mitchell’s picture

Assigned: mitchell » Unassigned

See also: a 6.x patch in #1431470: Crawl through Links List - patch and a related module, Feeds Spider.

swfindlay’s picture

@Dmitriy.trt Is this patch still working and/or has it been tested further?

dmitriy.trt’s picture

Unfortunately, I'm not working on it anymore.

twistor’s picture

Status: Needs work » Fixed

This is accomplised via Feeds Spider for the moment. Merging the two is a separate issue.

twistor’s picture

Issue tags: -FEED

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.