This function is in the categories.inc file.
My company has assigned me the task of converting an website running obselete CMS into Drupal. This current website has about 24,000 news items and about 350 categories (although I'd like to trim them a little). Currently a full import causes the site to grind to a halt.
A sample import of 10,000 articles just about works. However, the problem is (according the the devel module) that the function category_node_get_categories runs a query on the databse which is unbelievably slow.
- Time Taken: 2076.59
- Function: category_node_get_categories
- Query:
SELECT n.nid, r.nid AS node_id, c.*, n.title FROM category c INNER JOIN category_node r ON c.cid = r.cid INNER JOIN category cn ON c.cnid = cn.cid INNER JOIN node n ON c.cid = n.nid INNER JOIN node cnn ON cn.cid = cnn.nid WHERE n.status = 1 AND n.moderate = 0 ORDER BY cn.weight, cnn.title, c.weight, n.title
(I had to reformat the devel output)
2 seconds for one query?! Thats not right - surely.
Upon inspecting the code, I see this:
$result = db_query(db_rewrite_sql('SELECT n.nid, r.nid AS node_id, c.*, n.title FROM {category} c INNER JOIN {category_node} r ON c.cid = r.cid INNER JOIN {category} cn ON c.cnid = cn.cid INNER JOIN {node} n ON c.cid = n.nid INNER JOIN {node} cnn ON cn.cid = cnn.nid WHERE n.status = 1 AND n.moderate = 0 ORDER BY cn.weight, cnn.title, c.weight, n.title'), $nid);
Am I being stupid or is there nowere for $nid to be inserted? Could this be the problem - the query is returning a world of everything rather than something specific to the node in question?
The whole node took about 4.2s to produce which is not going to be acceptable for a live, high traffic site.
This categories module is fantastic, but it has so many bugs and issues. For example, WHATEVERY YOU DO - DO NOT leave the option ticked to make menu items for nodes. That feature cost me days of debugging and hair pulling. It created thousands of menu items and them promtly crashed the site by trying to generate a huge serialised array of the menu's for the cache!
Doea anyone have any idea's about this?
Comments
Comment #1
nicholasthompsonI changed the function from the above query to this:
Note the new "WHERE r.nid=%d".
This reduced the query time from 2 seconds to 0.003 seconds. Thats much better! My concern is the impact on the module.
What will I break by doing this?
Comment #2
nicholasthompsonThe full tweaked function:
Can someone review this?
Comment #3
nicholasthompsonIs this one gonna go the same way as my category counting bug report which I realised wasn't a bug at all?
Why is it static? Surely you can only view one node at a time?!
Comment #4
nicholasthompsonI've just tried to do a full import of my data (about 350 categories + 12,000 articles... This function causes PHP to use 78MB RAM (I had to up the limit from 24 to 128 just to get it to work). The query takes between 12 and 15 seconds and then there is the PHP time on top of that which takes the entire page load time to over 20 seconds on a dual processor Xeon (2x1Ghz PIII) with 1.5Gb RAM.
Surely this isn't the best way to do it. Can someone confirm if my patch breaks anything I dont know about?
Comment #5
hyperlogos commentedWait, are you saying that WITH your patch the import uses up all that, or WITHOUT it?
Comment #6
nicholasthompsonYou know - it was so long ago now, I honestly cant remember. I gave up with the category module and went back to taxonomy. The category module is far to unstable for large public site usage (imho)...
Comment #7
nicholasthompsonI'm pretty sure those stats were without my patch (if memory serves...)
But, like I said, this module was not stable enough for use on our site (Sportbusiness.com) so I moved back to taxonomy which does EVERYTHING we want it to do (some things required a little persuasion...)
Comment #8
JirkaRybka commentedI'm going to close this issue after all these years of silence. As for the performance issue, I was bitten by the same (and more) on 5.x and 6.x versions, and fixed that (more or less) by introducing a cache at #501378: PERFORMANCE! Central caching for category API functions - so this issue might be considered a duplicate of that, given that 4.7.x and 5.x versions are not supported now.