The migration is run in a batch, so I can't take advantage of devels query log. Something like that would be useful in the UI.

Comments

Niklas Fiekas’s picture

Category: feature » support

The problem I wanted to debug yesterday got worse, linear (or so) dropping with the number of imported items.

Processed 0 (0 created, 0 updated, 0 failed, 0 ignored) in 32.4 sec (0/min) - continuing with 'ConsoleroPost'
Processed 1 (1 created, 0 updated, 0 failed, 0 ignored) in 28.9 sec (2/min) - continuing with 'ConsoleroPost'
Processed 0 (0 created, 0 updated, 0 failed, 0 ignored) in 32.5 sec (0/min) - continuing with 'ConsoleroPost'
Processed 1 (1 created, 0 updated, 0 failed, 0 ignored) in 28.6 sec (2/min) - continuing with 'ConsoleroPost'
Processed 0 (0 created, 0 updated, 0 failed, 0 ignored) in 34.7 sec (0/min) - continuing with 'ConsoleroPost'
Processed 0 (0 created, 0 updated, 0 failed, 0 ignored) in 26.2 sec (0/min) - continuing with 'ConsoleroPost'
Processed 1 (1 created, 0 updated, 0 failed, 0 ignored) in 28.5 sec (2/min) - continuing with 'ConsoleroPost'
Processed 0 (0 created, 0 updated, 0 failed, 0 ignored) in 32.1 sec (0/min) - continuing with 'ConsoleroPost'

Looks like some slow query is doing this. How can I find it or how can I EXPLAIN sql built using the query builder?

Niklas Fiekas’s picture

Category: support » bug

Update: Fixed the query, speed increased dramatically. Then again, dropping. Now I am importing from a copy of the table, deleting now and then the rows that are already imported, which works fine.

That makes it look to me like a bug. (Now this issue was a feature request, support request and a bug report^^).

Might it have something to do with array('map_joinable' => FALSE)?

mikeryan’s picture

Category: bug » feature
Status: Active » Postponed (maintainer needs more info)

Have you examined the performance with xhprof, or drush migrate-import --instrument=timer?

Niklas Fiekas’s picture

I didn't try xhprof yet. As for --instrument-timer, I am using migrate_ui, where that doesn't seam to be available. (Nice to have feature, btw?).

I have the full data on the server where only migrate_ui is available and a subset on my local machine for testing. Server fails with migrate_ui, local machine runs great using drush. I'll try to verify the problem with the full data, but I think (speculation as of now) the problem is somehow specific to migrate_ui.

Niklas Fiekas’s picture

Status: Postponed (maintainer needs more info) » Active

Comparing migrate_ui and drush on the same dataset on my local machine (notebook with a slow harddisk).

Importing 20k nodes (100k total) in migrate_ui works well, then speed drops and importing with item limit 5 in migrate_ui looks like that:

Processed 0 (0 created, 0 updated, 0 failed, 0 ignored) in 28.7 sec (0/min) - continuing with 'ConsoleroPost'
Processed 0 (0 created, 0 updated, 0 failed, 0 ignored) in 29.2 sec (0/min) - continuing with 'ConsoleroPost'
Processed 0 (0 created, 0 updated, 0 failed, 0 ignored) in 28.4 sec (0/min) - continuing with 'ConsoleroPost'
Processed 0 (0 created, 0 updated, 0 failed, 0 ignored) in 28.6 sec (0/min) - continuing with 'ConsoleroPost'
Processed 3 (3 created, 0 updated, 0 failed, 0 ignored) in 27.5 sec (7/min) - continuing with 'ConsoleroPost'
Processed 5 (5 created, 0 updated, 0 failed, 0 ignored) in 27.2 sec (11/min) - done with 'ConsoleroPost'

In drush it looks like that:

$ drush mi ConsoleroPost --limit="5 items" --instrument=timer
Processed 5 (5 created, 0 updated, 0 failed, 0 ignored) in 27.1 sec (11/min) - done with 'ConsoleroPost'                                  [completed]
 Timer                                     Cum (sec)  Count  Avg (msec) 
 page                                      28.106     1      28106.21   
 MigrateSourceSQL next                     25.93      5      5186.038   
 mapRowBySource                            22.773     31707  0.718      
 destination import                        0.452      5      90.356     
 MigrateSourceSQL execute                  0.376      1      376        
 saveIDMapping                             0.297      5      59.324     
 comment_save                              0.228      5      45.664     
 lookupDestinationID                       0.008      10     0.796      
 MigrateFieldsEntityHandler->prepare       0.002      5      0.332      
 MigrateDestinationEntity->prepareFields   0.002      5      0.306      
 MigrateFieldsEntityHandler->complete      0.001      5      0.13       
 MigrateDestinationEntity->completeFields  0          5      0.092      
 MigrateTextFieldHandler->prepare          0          5      0.04       
 MigratePathEntityHandler->prepare         0          5      0.01

I don't know the migrate internals well enough to be sure, but I think the initial work before the actual import looks quite expensive. migrate_ui has to repeat that for every max_execution_time and often no time is left for real importing. Dush can do the initial work once and then run the imports without worries about max_execution_time.

I tried importing from a copy of the source table, deleting rows after they were imported. That helped and increased speed in migrate_ui. I'll go with that for now.

Any other things I could try or could migrate somehow cache the initial work, if that's the point?

mikeryan’s picture

Category: feature » support
Status: Active » Postponed (maintainer needs more info)

D[r]ush can do the initial work once and then run the imports without worries about max_execution_time.

Yep. One reason Drush should be used for migrations whenever possible - max_execution_time means when using the batch API you're paying the overhead over and over and over, using the UI is much less efficient.

So, your problem is in the 31707 calls to mapRowBySource - i.e., as you suggested in comment #2, because map_joinable is FALSE. If the map table can't be joined to the source query, then on each invocation migrate has to look up each source record individually against the map table until it finds 1 (or with --limit="5 items", 5) that can be imported. With a short max_execution_time, it just doesn't catch up to the "next" record in time.

Your options are:

  1. Run migrations in drush.
  2. If your source data is in the same MySQL server as the Drupal db, and accessed with the same credentials, you should be able to set map_joinable=TRUE, so identifying what remains to be migrated is done within the source query itself.
  3. Use a highwater mark. If your query is ordered by, say, modification timestamp, and you set that field to be the highwater field, then Migrate can modify the query to start right where you left off.

Do any of these work for you?

Niklas Fiekas’s picture

Status: Postponed (maintainer needs more info) » Fixed

Thank you mikeryan.

(1) I should probably change hosting, so that I can ssh in and use drush. Otherwise I'll have to use workarounds for now.
(2) That isn't possible as of now, but I could always copy the table over.
(3) Highwater marks didn't speed up the import.

Should I open a seperate issue for a query log or* something like --instrument=timer in the UI? Those would have helped debugging this issue.

* Edit: Maybe not a query log. That would be a lot of queries^^

mikeryan’s picture

Title: Query log (integrate with devel?) » Track timers in UI
Category: support » feature
Status: Fixed » Active

We can repurpose this issue for doing timer input in the UI. Seems like we've set a drupal.org record for category changes (among support request/feature request/bug report) for this issue, but that's OK:-).

Niklas Fiekas’s picture

It was already a feature request, so that's nothing new ;) Thanks again.

pifagor’s picture

Issue summary: View changes
Status: Active » Closed (outdated)