This is an additional class (just a modification of MigrateItemsXML).
I had to get it working with an array of urls instead of a single url.

You can use it, for your own Migrations, but you should test it before you do so.

Usage
Fist iclude the class into your module folder, where your Migration lives (declare in the *.info file)

//get your urls in an array()
$urls = array('http://yourfeedsource1', 'http://yourfeedsource1' ...);

//create the $items_class obj like with the MigrateItemsXML but with the "List" suffix.
$items_class = new MigrateItemsXMLList($urls, $item_xpath, $itemID_xpath);
$this->source = new MigrateSourceMultiItems($items_class, $fields);

Much Thanks for the great and flexible Module, it was very easy to work with it.
dropfen.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

dropfen’s picture

After some fixes later...

Mo 20 Mai 2013 04:36:36 CEST
Processed 22334 (22334 created, 0 updated, 0 failed, 0 ignored) in 1065.6 sec (1258/min) - done with 'MatchesXML'      [completed]
Mo 20 Mai 2013 04:54:24 CEST

100 URLs à ~220 items
Test Content with only 3 fields from each item.
22334 Test Nodes imported in about 20min.

What are u thinking about the performance?

dropfen’s picture

Status: Active » Needs review
FileSize
6.85 KB

New File, need tests...

dropfen’s picture

Performance Test:
In this test I fist downloaded the data. And then run the migrate script to import.
Test with 500 xml Files(in sum 170mb).
Processed 85699 (85699 created, 0 updated, 0 failed, 0 ignored) in 1590.8 sec (3232/min) - done with 'TestXML'

So, the script took about 5Gb of Memory, maybe someone have some ideas, how to make it more efficient?

mikeryan’s picture

Status: Needs review » Needs work

It would be better to extend the existing MigrateItemsXML class to handle an array of XML files, rather than introduce an entire new class. See MigrateSourceXML for an example of how to manage a list of files.

Memory-wise, it looks like you're loading all the files at once, you should take one file at a time and explicitly close each one when you're done with it.

dropfen’s picture

Thank you for the tips. I will try it, and in the process get more OOP experience.

If the swp will not filled, the import process should run much faster!!! I hope :)
My last Test runs with only 1500/min..

by the way, what do think about this post?
http://posterous.richardcunningham.co.uk/using-a-hybrid-of-xmlreader-and...

mikeryan’s picture

Re: the "hybrid" post - that's exactly the approach MigrateSourceXML takes, using XMLReader to grab each element identified by the element query (which is a restricted subset of xpath syntax, since we have to implement this search ourselves), then SimpleXML over each element retrieved (enabling you to use full xpath syntax within each element).

dropfen’s picture

ah, ok. then the problem with this, that the item we got is cutted from the whole xml file, so we can't access (in my case, the parent) nodes above?

The MigrateItemsXMLs class thas is an extension of MigrateItemsXML works now. The memory Issue is solved, :) thanks for you suggestion mike. When I have finished the development I'll post it to the Issue.

But I still have some problems, with analyze() Funktion. It doesn't work, but the Migrate process self works fine. Does the the analyze Function not access the same methods?

dropfen’s picture

FileSize
3.94 KB

So, here's the beta version of the class. MigrateItemsXMLs
Beta, because it needs to be tested.

I find it works very well and is fast as the MigrateItemsXML with about 4-5000/min
Maybe some one will find it useful for own Migration. You can use it the same way you use MigrateItemsXML just with the (s) at the end and you can put an array of urls in your Migration, or a singe url as a string, it doesn't matter.

However, download, test, enjoy ;)

dropfen’s picture

Status: Needs work » Needs review
dropfen’s picture

@mikeryan, what do you think about the implementation?
Is it smart enought to contrib?

mikeryan’s picture

Status: Needs review » Needs work

Sorry, I think you misunderstood when I said "extend" the MigrateItemXML class to handle multiple URLs - what I meant was not to define a new class extending it, but to modify that class so it can handle an array of URLs as well as a single URL, similarly to what MigrateSourceXML does. There's no need to introduce another class here, it can be enhanced without breaking existing code.

dropfen’s picture

OK, this is what I did first :)

I will merge my last overrides with the MigrateItemsXML. The last version in #8, seems to be stable.
Can you explain me please why the rollback process (1300/min) takes 3-4 times longer then the migration (6100/min)?

Is there a bug in my implementation?
Thx

mikeryan’s picture

No, at the database level deletion is often slower than insertion, it's not surprising for rollback to be slower.

dropfen’s picture

Status: Needs work » Needs review
FileSize
9.68 KB

here's the patch to get MigrateItemsXML to accept an array of urls,
It works very nice by the greater part.

One bug that I could't fix, is that on the analyze method you will get only values of non imported items.

The last submitted patch, 1998632-use_urls_in_migrate-items-xml.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 1998632-use_urls_in_migrate-items-xml.patch, failed testing.

dropfen’s picture

Version: 7.x-2.6-beta1 » 7.x-2.x-dev
Status: Needs work » Needs review
dropfen’s picture

Status: Needs review » Needs work

The last submitted patch, 1998632-use_urls_in_migrate-items-xml.patch, failed testing.

dropfen’s picture

WTF?

mikeryan’s picture

It definitely breaks the wine.inc role migration. Don't have time to look in detail, I did notice at least one code typo "chache_ids"...

When rerolling, please make sure to adhere to Drupal coding standards (such as a space after "if").

dropfen’s picture

I did some cleanup, and fixed drupal coding standards. Maybe the problem comes because of the xml property.
It's dynamically now and it depend on the $id we give to the getItem method.

dropfen’s picture

Status: Needs work » Needs review
dropfen’s picture

Title: Add $url Array Support for MigrateItemsXML » Add MultiFiles Support for MigrateItemsXML (MIgrateItemsXMLList)
FileSize
9.33 KB

new patch, the last had an horrible performance bug :|

dropfen’s picture

Title: Add MultiFiles Support for MigrateItemsXML (MIgrateItemsXMLList) » Add $url Array Support for MigrateItemsXML
dropfen’s picture

Title: Add MultiFiles Support for MigrateItemsXML (MIgrateItemsXMLList) » Add $url Array Support for MigrateItemsXML
FileSize
9.35 KB

fixed: array_unique($ids);

dropfen’s picture

@mikeryan
the patch works very well now, if you have the time to get a look of it,
however I would be happy to see it in the commits ;)

Thanks, dropfen

mikeryan’s picture

Status: Needs review » Needs work

Don't be scared! The actual code looks good, I just finally got around to installing Dreditor, which makes it easier to get extra-picky about comments and coding conventions...

+++ b/plugins/sources/xml.inc
@@ -299,18 +299,22 @@ abstract class XMLMigration extends Migration {
+   * @array or @string if just a single url

@var array - the constructor parameter could be a string, but the property will always be an array.

+++ b/plugins/sources/xml.inc
@@ -299,18 +299,22 @@ abstract class XMLMigration extends Migration {
+   * $activeUrl just for better understanding there could be more files available.

Huh? Better described as essentially a cursor over the urls array, I think.

+++ b/plugins/sources/xml.inc
@@ -299,18 +299,22 @@ abstract class XMLMigration extends Migration {
+   * Stores the current loaded XML document.

currently

+++ b/plugins/sources/xml.inc
@@ -329,44 +333,49 @@ class MigrateItemsXML extends MigrateItems {
+  public function __construct($urls, $itemXpath='item', $itemIDXpath='id') {

Variable naming convention is lower-case separated by _, please don't change the parameters.

+++ b/plugins/sources/xml.inc
@@ -329,44 +333,49 @@ class MigrateItemsXML extends MigrateItems {
+   * Our public face is the URL list we're getting items from

Add a period at the end.

+++ b/plugins/sources/xml.inc
@@ -329,44 +333,49 @@ class MigrateItemsXML extends MigrateItems {
+    $spaces = '       ';
+    $urls = implode('</br>' . $spaces, $this->urls);
+    return 'urls = ' . $urls . '</br></br>' . $spaces . $this->itemXpath .
+    ' | item ID xpath = ' . $this->itemIDXpath;

Looks kind of hacky (and I think you meant <br />), how about an <ul>?

+++ b/plugins/sources/xml.inc
@@ -404,22 +413,47 @@ class MigrateItemsXML extends MigrateItems {
+   * additional store them in cache_ids;

Additionally store them in cache_ids.

+++ b/plugins/sources/xml.inc
@@ -404,22 +413,47 @@ class MigrateItemsXML extends MigrateItems {
+  protected $cache_ids = NULL;

Declare the property before the constructor. Also, don't forget to declare idsMap.

+++ b/plugins/sources/xml.inc
@@ -404,22 +413,47 @@ class MigrateItemsXML extends MigrateItems {
+      //Make sure, to load new xml.

// Make sure to load new xml.

+++ b/plugins/sources/xml.inc
@@ -404,22 +413,47 @@ class MigrateItemsXML extends MigrateItems {
+  public function computeCount() {

Reordering the functions adds to the size of the patch, and makes it harder to follow what's changed.

+++ b/plugins/sources/xml.inc
@@ -548,11 +563,14 @@ class MigrateItemsXML extends MigrateItems {
+   * if $id is not in the currentItems array, look for it in the idsMap, where ids are mapped
+   * in $url => $ids relations.

Misplaced comment - unnecessary anyway I think, the code comments cover it.

+++ b/plugins/sources/xml.inc
@@ -560,8 +578,22 @@ class MigrateItemsXML extends MigrateItems {
+    // Otherwise, get it fom the right url
+    // First get the rigth url from $idsMap
+    foreach ($this->idsMap as $url => $ids) {

Needs an indent.

mikeryan’s picture

Oh, and $cache_ids should be $cacheIDs (lowerCamel convention for class properties).

dropfen’s picture

Thank you very much for reviewing and it's a good feeling to get instructed by a such dev.
So I made the corrections and after installing phpstorm got some notifications which I fixed on the fly.
So it's not dreditor alone so picky.

Thanks for the idea with the urls list markup. I have to say, that it was very late when I wrote the hack with the spaces ;)

I set the xpath selection rules before the url list, since the list could become very large and this info should be available on page load and not after scrolling.

dropfen’s picture

Status: Needs work » Needs review
dropfen’s picture

What do you think, should the getAllItems() method be overriden maybe with getItems()? Because it's not really All, you know. But for now this method is public so I'm not sure if other classes whants to call it.

mikeryan’s picture

Status: Needs review » Fixed
Issue tags: +Migrate 2.6

I wouldn't want to change the public API unless there's a really compelling reason.

Committed, thanks!

dropfen’s picture

The reason is:
You should never be able to load all the Items at the same time, because of performance see your own comment #4.
And now when you call the getAllItems method you will get just the Items from the currentUrl. This is probably not what you want.

So, if the API should be used in a clean way, there is no reason to call the getAllItems method since you always need the special Item ($this->getItem($id) ) depend on the ID which you can get with $this->getIdList()

Thank you, for reviewing&committing!

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

dropfen’s picture

Category: feature » task
Status: Closed (fixed) » Needs review
FileSize
984 bytes

I have added a minimal change to the __toString() method of the MigrateItemsXML class.

Changed the order of the urls and selection rules, because when you deal with more then 50 urls you have to scroll down to see you settings.
So I think selection Rules can be pulled on top.

I think it's not necessary to make an own Issue for that, so I put it here.

Thanks

jcisio’s picture

I think this commit breaks "drush mi --idlist=...". Don't have time to check more.

mikeryan’s picture

Category: Task » Feature request
Issue summary: View changes
Status: Needs review » Closed (fixed)

Restoring status, new patches for a committed feature should be separate issues.