Add $url Array Support for MigrateItemsXML [#1998632]

Comment	File	Size	Author
#36	1998632-36-change_urls_-_selectio_rules_order_in_toString_method.patch	984 bytes	dropfen

#30	1998632-use_urls_in_migrate-items-xml.patch	10.42 KB	dropfen

#26	1998632-use_urls_in_migrate-items-xml.patch	9.35 KB	dropfen

#24	1998632-24-use_urls_in_migrate-items-xml.patch	9.33 KB	dropfen

#22	1998632-use_urls_in_migrate-items-xml.patch	9.3 KB	dropfen

#14	1998632-use_urls_in_migrate-items-xml.patch	9.68 KB	dropfen

#8	MigrateItemsXMLs.inc_.txt	3.94 KB	dropfen
#2	MigrateItemsXMLList.inc_.txt	6.85 KB	dropfen
#1	MigrateItemsXMLList.inc_.txt	7.15 KB	dropfen
	MigrateItemsXMLList.inc_.txt	7.19 KB	dropfen

Comment #1

dropfen CreditAttribution: dropfen commented 20 May 2013 at 03:48

File	Size
MigrateItemsXMLList.inc_.txt	7.15 KB

After some fixes later...

Mo 20 Mai 2013 04:36:36 CEST
Processed 22334 (22334 created, 0 updated, 0 failed, 0 ignored) in 1065.6 sec (1258/min) - done with 'MatchesXML'      [completed]
Mo 20 Mai 2013 04:54:24 CEST

100 URLs à ~220 items
Test Content with only 3 fields from each item.
22334 Test Nodes imported in about 20min.

What are u thinking about the performance?

Log in or register to post comments

Comment #2

dropfen CreditAttribution: dropfen commented 20 May 2013 at 03:55

Status:

Active

» Needs review

File	Size
MigrateItemsXMLList.inc_.txt	6.85 KB

New File, need tests...

Log in or register to post comments

Comment #3

dropfen CreditAttribution: dropfen commented 20 May 2013 at 16:45

Performance Test:
In this test I fist downloaded the data. And then run the migrate script to import.
Test with 500 xml Files(in sum 170mb).
Processed 85699 (85699 created, 0 updated, 0 failed, 0 ignored) in 1590.8 sec (3232/min) - done with 'TestXML'

So, the script took about 5Gb of Memory, maybe someone have some ideas, how to make it more efficient?

Log in or register to post comments

Comment #4

mikeryan

he/him

English

Murphysboro, IL, USA

CreditAttribution: mikeryan commented 21 May 2013 at 14:23

Status:

Needs review

» Needs work

It would be better to extend the existing MigrateItemsXML class to handle an array of XML files, rather than introduce an entire new class. See MigrateSourceXML for an example of how to manage a list of files.

Memory-wise, it looks like you're loading all the files at once, you should take one file at a time and explicitly close each one when you're done with it.

Log in or register to post comments

Comment #5

dropfen CreditAttribution: dropfen commented 21 May 2013 at 22:48

Thank you for the tips. I will try it, and in the process get more OOP experience.

If the swp will not filled, the import process should run much faster!!! I hope :)
My last Test runs with only 1500/min..

by the way, what do think about this post?
http://posterous.richardcunningham.co.uk/using-a-hybrid-of-xmlreader-and...

Log in or register to post comments

Comment #6

mikeryan

he/him

English

Murphysboro, IL, USA

CreditAttribution: mikeryan commented 23 May 2013 at 14:47

Re: the "hybrid" post - that's exactly the approach MigrateSourceXML takes, using XMLReader to grab each element identified by the element query (which is a restricted subset of xpath syntax, since we have to implement this search ourselves), then SimpleXML over each element retrieved (enabling you to use full xpath syntax within each element).

Log in or register to post comments

Comment #7

dropfen CreditAttribution: dropfen commented 23 May 2013 at 16:05

ah, ok. then the problem with this, that the item we got is cutted from the whole xml file, so we can't access (in my case, the parent) nodes above?

The MigrateItemsXMLs class thas is an extension of MigrateItemsXML works now. The memory Issue is solved, :) thanks for you suggestion mike. When I have finished the development I'll post it to the Issue.

But I still have some problems, with analyze() Funktion. It doesn't work, but the Migrate process self works fine. Does the the analyze Function not access the same methods?

Log in or register to post comments

Comment #8

dropfen CreditAttribution: dropfen commented 25 May 2013 at 10:44

File	Size
MigrateItemsXMLs.inc_.txt	3.94 KB

So, here's the beta version of the class. MigrateItemsXMLs
Beta, because it needs to be tested.

I find it works very well and is fast as the MigrateItemsXML with about 4-5000/min
Maybe some one will find it useful for own Migration. You can use it the same way you use MigrateItemsXML just with the (s) at the end and you can put an array of urls in your Migration, or a singe url as a string, it doesn't matter.

However, download, test, enjoy ;)

Log in or register to post comments

Comment #9

dropfen CreditAttribution: dropfen commented 25 May 2013 at 10:45

Status:

Needs work

» Needs review

Log in or register to post comments

Comment #10

dropfen CreditAttribution: dropfen commented 29 May 2013 at 09:12

@mikeryan, what do you think about the implementation?
Is it smart enought to contrib?

Log in or register to post comments

Comment #11

mikeryan

he/him

English

Murphysboro, IL, USA

CreditAttribution: mikeryan commented 4 June 2013 at 14:57

Status:

Needs review

» Needs work

Sorry, I think you misunderstood when I said "extend" the MigrateItemXML class to handle multiple URLs - what I meant was not to define a new class extending it, but to modify that class so it can handle an array of URLs as well as a single URL, similarly to what MigrateSourceXML does. There's no need to introduce another class here, it can be enhanced without breaking existing code.

Log in or register to post comments

Comment #12

dropfen CreditAttribution: dropfen commented 4 June 2013 at 16:50

OK, this is what I did first :)

I will merge my last overrides with the MigrateItemsXML. The last version in #8, seems to be stable.
Can you explain me please why the rollback process (1300/min) takes 3-4 times longer then the migration (6100/min)?

Is there a bug in my implementation?
Thx

Log in or register to post comments

Comment #13

mikeryan

he/him

English

Murphysboro, IL, USA

CreditAttribution: mikeryan commented 4 June 2013 at 17:18

No, at the database level deletion is often slower than insertion, it's not surprising for rollback to be slower.

Log in or register to post comments

Comment #14

dropfen CreditAttribution: dropfen commented 12 June 2013 at 01:57

Status:

Needs work

» Needs review

File	Size
1998632-use_urls_in_migrate-items-xml.patch	9.68 KB

here's the patch to get MigrateItemsXML to accept an array of urls,
It works very nice by the greater part.

One bug that I could't fix, is that on the analyze method you will get only values of non imported items.

Log in or register to post comments

Comment #15

12 June 2013 at 02:01

The last submitted patch, 1998632-use_urls_in_migrate-items-xml.patch, failed testing.

Log in or register to post comments

Comment #16

12 June 2013 at 02:01

Status:

Needs review

» Needs work

The last submitted patch, 1998632-use_urls_in_migrate-items-xml.patch, failed testing.

Log in or register to post comments

Comment #17

dropfen CreditAttribution: dropfen commented 12 June 2013 at 09:09

Version:	7.x-2.6-beta1	» 7.x-2.x-dev
Status:	Needs work	» Needs review

Log in or register to post comments

Comment #18

dropfen CreditAttribution: dropfen commented 12 June 2013 at 09:10

#14: 1998632-use_urls_in_migrate-items-xml.patch queued for re-testing.

Log in or register to post comments

Comment #19

12 June 2013 at 09:13

Status:

Needs review

» Needs work

The last submitted patch, 1998632-use_urls_in_migrate-items-xml.patch, failed testing.

Log in or register to post comments

Comment #20

dropfen CreditAttribution: dropfen commented 12 June 2013 at 09:31

WTF?

Log in or register to post comments

Comment #21

mikeryan

he/him

English

Murphysboro, IL, USA

CreditAttribution: mikeryan commented 12 June 2013 at 14:56

It definitely breaks the wine.inc role migration. Don't have time to look in detail, I did notice at least one code typo "chache_ids"...

When rerolling, please make sure to adhere to Drupal coding standards (such as a space after "if").

Log in or register to post comments

Comment #22

dropfen CreditAttribution: dropfen commented 12 June 2013 at 19:45

File	Size
1998632-use_urls_in_migrate-items-xml.patch	9.3 KB

I did some cleanup, and fixed drupal coding standards. Maybe the problem comes because of the xml property.
It's dynamically now and it depend on the $id we give to the getItem method.

Log in or register to post comments

Comment #23

dropfen CreditAttribution: dropfen commented 12 June 2013 at 20:04

Status:

Needs work

» Needs review

Log in or register to post comments

Comment #24

dropfen CreditAttribution: dropfen commented 13 June 2013 at 12:34

Title:

Add $url Array Support for MigrateItemsXML

» Add MultiFiles Support for MigrateItemsXML (MIgrateItemsXMLList)

File	Size
1998632-24-use_urls_in_migrate-items-xml.patch	9.33 KB

new patch, the last had an horrible performance bug :|

Log in or register to post comments

Comment #25

dropfen CreditAttribution: dropfen commented 13 June 2013 at 12:33

Title:

Add MultiFiles Support for MigrateItemsXML (MIgrateItemsXMLList)

» Add $url Array Support for MigrateItemsXML

Log in or register to post comments

Comment #26

dropfen CreditAttribution: dropfen commented 14 June 2013 at 14:32

Title:

Add MultiFiles Support for MigrateItemsXML (MIgrateItemsXMLList)

» Add $url Array Support for MigrateItemsXML

File	Size
1998632-use_urls_in_migrate-items-xml.patch	9.35 KB

fixed: array_unique($ids);

Log in or register to post comments

Comment #27

dropfen CreditAttribution: dropfen commented 16 June 2013 at 00:33

@mikeryan
the patch works very well now, if you have the time to get a look of it,
however I would be happy to see it in the commits ;)

Thanks, dropfen

Log in or register to post comments

Comment #28

mikeryan

he/him

English

Murphysboro, IL, USA

CreditAttribution: mikeryan commented 19 June 2013 at 17:24

Status:

Needs review

» Needs work

Don't be scared! The actual code looks good, I just finally got around to installing Dreditor, which makes it easier to get extra-picky about comments and coding conventions...

+++ b/plugins/sources/xml.inc
@@ -299,18 +299,22 @@ abstract class XMLMigration extends Migration {
+   * @array or @string if just a single url

@var array - the constructor parameter could be a string, but the property will always be an array.

+++ b/plugins/sources/xml.inc
@@ -299,18 +299,22 @@ abstract class XMLMigration extends Migration {
+   * $activeUrl just for better understanding there could be more files available.

Huh? Better described as essentially a cursor over the urls array, I think.

+++ b/plugins/sources/xml.inc
@@ -299,18 +299,22 @@ abstract class XMLMigration extends Migration {
+   * Stores the current loaded XML document.

currently

+++ b/plugins/sources/xml.inc
@@ -329,44 +333,49 @@ class MigrateItemsXML extends MigrateItems {
+  public function __construct($urls, $itemXpath='item', $itemIDXpath='id') {

Variable naming convention is lower-case separated by _, please don't change the parameters.

+++ b/plugins/sources/xml.inc
@@ -329,44 +333,49 @@ class MigrateItemsXML extends MigrateItems {
+   * Our public face is the URL list we're getting items from

Add a period at the end.

+++ b/plugins/sources/xml.inc
@@ -329,44 +333,49 @@ class MigrateItemsXML extends MigrateItems {
+    $spaces = '&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;';
+    $urls = implode('</br>' . $spaces, $this->urls);
+    return 'urls = ' . $urls . '</br></br>' . $spaces . $this->itemXpath .
+    ' | item ID xpath = ' . $this->itemIDXpath;

Looks kind of hacky (and I think you meant <br />), how about an <ul>?

+++ b/plugins/sources/xml.inc
@@ -404,22 +413,47 @@ class MigrateItemsXML extends MigrateItems {
+   * additional store them in cache_ids;

Additionally store them in cache_ids.

+++ b/plugins/sources/xml.inc
@@ -404,22 +413,47 @@ class MigrateItemsXML extends MigrateItems {
+  protected $cache_ids = NULL;

Declare the property before the constructor. Also, don't forget to declare idsMap.

+++ b/plugins/sources/xml.inc
@@ -404,22 +413,47 @@ class MigrateItemsXML extends MigrateItems {
+      //Make sure, to load new xml.

// Make sure to load new xml.

+++ b/plugins/sources/xml.inc
@@ -404,22 +413,47 @@ class MigrateItemsXML extends MigrateItems {
+  public function computeCount() {

Reordering the functions adds to the size of the patch, and makes it harder to follow what's changed.

+++ b/plugins/sources/xml.inc
@@ -548,11 +563,14 @@ class MigrateItemsXML extends MigrateItems {
+   * if $id is not in the currentItems array, look for it in the idsMap, where ids are mapped
+   * in $url => $ids relations.

Misplaced comment - unnecessary anyway I think, the code comments cover it.

+++ b/plugins/sources/xml.inc
@@ -560,8 +578,22 @@ class MigrateItemsXML extends MigrateItems {
+    // Otherwise, get it fom the right url
+    // First get the rigth url from $idsMap
+    foreach ($this->idsMap as $url => $ids) {

Needs an indent.

Log in or register to post comments

Comment #29

mikeryan

he/him

English

Murphysboro, IL, USA

CreditAttribution: mikeryan commented 19 June 2013 at 17:27

Oh, and $cache_ids should be $cacheIDs (lowerCamel convention for class properties).

Log in or register to post comments

Comment #30

dropfen CreditAttribution: dropfen commented 20 June 2013 at 20:00

File	Size
1998632-use_urls_in_migrate-items-xml.patch	10.42 KB

Thank you very much for reviewing and it's a good feeling to get instructed by a such dev.
So I made the corrections and after installing phpstorm got some notifications which I fixed on the fly.
So it's not dreditor alone so picky.

Thanks for the idea with the urls list markup. I have to say, that it was very late when I wrote the hack with the spaces ;)

I set the xpath selection rules before the url list, since the list could become very large and this info should be available on page load and not after scrolling.

Log in or register to post comments

Comment #31

dropfen CreditAttribution: dropfen commented 20 June 2013 at 20:01

Status:

Needs work

» Needs review

Log in or register to post comments

Comment #32

dropfen CreditAttribution: dropfen commented 20 June 2013 at 23:24

What do you think, should the getAllItems() method be overriden maybe with getItems()? Because it's not really All, you know. But for now this method is public so I'm not sure if other classes whants to call it.

Log in or register to post comments

Comment #33

mikeryan

he/him

English

Murphysboro, IL, USA

CreditAttribution: mikeryan commented 21 June 2013 at 14:58

Status:	Needs review	» Fixed
Issue tags:		+Migrate 2.6

I wouldn't want to change the public API unless there's a really compelling reason.

Committed, thanks!

Log in or register to post comments

Comment #34

dropfen CreditAttribution: dropfen commented 22 June 2013 at 20:24

The reason is:
You should never be able to load all the Items at the same time, because of performance see your own comment #4.
And now when you call the getAllItems method you will get just the Items from the currentUrl. This is probably not what you want.

So, if the API should be used in a clean way, there is no reason to call the getAllItems method since you always need the special Item ($this->getItem($id) ) depend on the ID which you can get with $this->getIdList()

Thank you, for reviewing&committing!

Log in or register to post comments

Comment #35

6 July 2013 at 20:30

Status:

Fixed

» Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Log in or register to post comments

Comment #36

dropfen CreditAttribution: dropfen commented 15 August 2013 at 18:03

Category:	feature	» task
Status:	Closed (fixed)	» Needs review

File	Size
1998632-36-change_urls_-_selectio_rules_order_in_toString_method.patch	984 bytes

I have added a minimal change to the __toString() method of the MigrateItemsXML class.

Changed the order of the urls and selection rules, because when you deal with more then 50 urls you have to scroll down to see you settings.
So I think selection Rules can be pulled on top.

I think it's not necessary to make an own Issue for that, so I put it here.

Thanks