Closed (fixed)
Project:
Feeds Crawler
Version:
7.x-1.x-dev
Component:
Documentation
Priority:
Normal
Category:
Feature request
Assigned:
Unassigned
Reporter:
Created:
23 Feb 2011 at 12:53 UTC
Updated:
28 Apr 2013 at 02:40 UTC
Jump to comment: Most recent file
Comments
Comment #1
tekken commentedsubscribing
Comment #2
twistor commentedThis module handles the latter part. It will do pagination, as in following a next link: it doesn't, however, grab a set of links from a page.
Comment #3
summit commentedSubscribing, how to grab a set of links from a page then please?
greetings, Martijn
Comment #4
dmitriy.trt commentedPatch implements new fetcher class FeedsListCrawler. It uses listing page as a starting point and imports pages from links found (XPath to get item link is configurable). It is able to parse multiple listing pages. For now "Next" link can be found using XPath only ("auto" and $index pattern methods are missing, this part needs work). Class was tested on HTML pages only.
Original class FeedsCrawler is re-factored a bit to share code with new class and allow it to access some methods.
Patch also includes "Source URL" mapping source implementation, because it becomes quite hard to get original URL of each item.
Looks like there are some problems with periodic import interval. Job scheduler executes only one job on each cron run. I'm going to solve this problem a bit later, but can't make any promises about missing "Next" link extraction methods.
Comment #5
mitchell commentedSee also: a 6.x patch in #1431470: Crawl through Links List - patch and a related module, Feeds Spider.
Comment #6
swfindlay commented@Dmitriy.trt Is this patch still working and/or has it been tested further?
Comment #7
dmitriy.trt commentedUnfortunately, I'm not working on it anymore.
Comment #8
twistor commentedThis is accomplised via Feeds Spider for the moment. Merging the two is a separate issue.
Comment #9
twistor commented