Closed (works as designed)
Project:
Migrate
Version:
7.x-2.2
Component:
Code
Priority:
Normal
Category:
Feature request
Assigned:
Unassigned
Reporter:
Created:
16 Nov 2011 at 16:30 UTC
Updated:
26 Sep 2013 at 01:04 UTC
Hello there,
I have used migrate module successfully to operate on a set of html files stored on the file system. I was wondering if anyone here knew whether this could also be easily extended to doing screen scraping of a website. Is there a particular source class I should be extending.
I was thinking it might make sense to use querypath or some other library to facilitate this, or should I just be using some basic cURL wizardry. If anyone else has any advice on how to integrate this feature into migrate module or just any points in general it would be greatly appreciated!
Comments
Comment #1
mikeryanNo one's done a source plugin along those lines, as far as I know. It would definitely look different from the existing ones, where the list of items to process is statically determined - in this case, as you went through the site you would be adding new pages to a to-do list and then picking them up later. You wouldn't easily be able to get a count, so this is another use case for #1341776: Option to skip counting.
Comment #2
mikeryanComment #3
sylus commented