Hi,

I am sure sg is wrong either with my Drupal setup, or my mysql but after importing 250 000 nodes and 2 500 000 terms (around 200 000 terms are hierarchical) my site is just loading for minutes on a single page view.

I tried creating aliases with pathaut which worked fine for just a couple of hundred nodes, but now it is terrible.

Here are the details, if any of you have any ideas I would truly appreciate it.

693012.88 0 node_pathauto_bulkupdate SELECT nid, type, title, uid, created, src, dst, vid FROM drupal_node LEFT JOIN drupal_url_alias ON CONCAT('node/', nid) = src WHERE src IS NULL AND (type = 'forum' OR type = 'image' OR type = 'panel' OR type = 'bio' OR type = 'page' OR type = 'story' OR type = 'uprofile' OR type = 'video_embed' OR type = 'video_upload' OR type = 'video_upload_mm' OR type = 'video_upload_op' OR type = 'video_upload_youtube') LIMIT 0, 50

node table has 250 000 entries
url_alias has 30 000 entries
term data has 405 000 entries
term hierarchy 405 000 entries
term node 2 045 401 (a lot of terms are part of freetagging hence one node can have up to 10-15 terms)

cheers,
G

Comments

giorgio79’s picture

I just tested the bulk update for categories (terms) is very fast, only node update is suspicious

greggles’s picture

Status: Active » Postponed (maintainer needs more info)

What do you hope can be achieved as a result of this problem report?

The queries aren't optimal for large sites, but those large sites are somewhat rare and therefore not a priority. There is advice on how to handle this in the handbooks: http://drupal.org/node/236304

I do recommend that as long as you are running that really long query you also increase the number of objects to alias per bulk generate to a much higher number like a few hundred or a few thousand.

giorgio79’s picture

Hi Greg,

My first objective was to try to get some advice from people who have been there. When I joined Drupal the case studies pitched big companies like MTV, Sony BMG, TeamSugar etc etc that use Drupal, and when I saw that I said, hey, those are massive sites... Although I am starting to think those sites are not out of box either... So I hope people around here have encountered sg like my situation.

Thanks for the link on the cron setup, my concern though is more related to speed. It takes like 20 minutes to url alias 1000 nodes. It took 5 seconds, when I had a couple hundred nodes, which is the speed I would like to get back :)

I did spend this afternoon with investigation and it seems to boil down to all the taxonomy queries that need to be made, as it seems Drupal is not really suited to a hierarchy with five levels like mine, and a couple of hundred thousand of terms.

Anyway, I am kind of halfway solving my issue, and I will post my case study when done :D

Thanks for responding though...

Hopefully someone who has been there will find this post anyway and give some sound advice ;) Or who knows, it may be that there are a lot of us, and then we will chip in for some development fund ;)

Cheers,
G

greggles’s picture

Well, it is possible to run really big sites on Drupal (like those you mentioned) but they require more hardware/work/etc. than just running a typical blog site on a shared host.

Also, the very nature of Bulk Generate is that it is not a "normal" operation. It is something that is done during a massive import or when you need to change a lot of aliases. So....not many people are interested in working hard to opimize it because they only use it infrequently.

If you can find any ways to improve the speed that would be great, but otherwise it will just remain a slow operation that is only painful infrequently...

giorgio79’s picture

Greg, one question regarding:

pathauto_node.inc in the function node_pathauto_bulkupdate() function I see this query:

$query = "SELECT nid, type, title, uid, created, src, dst, vid FROM {node} LEFT JOIN {url_alias} ON CONCAT('node/', nid) = src WHERE src IS NULL ". $type_where;

I replaced the query with this, but it is still slow, not sure where the bottleneck is, will try poking around some more...

$query = "SELECT nid FROM {node} WHERE NOT EXISTS (SELECT src FROM {url_alias} WHERE CONCAT('node/', nid) = src)";

greggles’s picture

Well, different folks have done work to try to fix this before ;)

Here is investigation related to "NOT EXISTS" and a subquery: http://drupal.org/node/212327#comment-722133

greggles’s picture

Status: Postponed (maintainer needs more info) » Fixed
Anonymous’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.