The MySQL code with the change to Drupal 4.7 seems to have made queries more resource intensive. I'm trying to think of ways to avoid that... personally, the whole integration of revision into the node table (or breaking out of the content from the node table for revisions) seems like overkill, but I suppose that doesn't matter too awful much to the core devs... I see that as, perhaps, an issue that is related.

So... is there a way to make the quotes module more refined to handle large numbers of quotes? I thought I had a lot, but then someone else wrote in another thread that they had 47,000! Wow!

I expect quotations to grow on a site... so... I don't know that it could be done, but it's something the 'quotes module' community might want to consider looking at.

CommentFileSizeAuthor
#6 quotes_0.module38.43 KBTaran
#5 quotes.module38.23 KBTaran

Comments

Taran’s picture

Title: A Thought For Larger Sites. » INNER JOIN removal

I'm trying to brainstorm ways to avoid INNER JOIN on 3 separate tables ('SELECT DISTINCT(n.nid), n.sticky, n.created, nr.title, q.author FROM {node} n INNER JOIN {node_revisions} nr ON nr.vid = n.vid INNER JOIN {quotes} q ON q.vid = n.vid WHERE n.type = 'quotes' AND n.status = 1 ORDER BY n.sticky DESC, n.created DESC'), which wasn't necessary in earlier versions because tables node and node_revision added overhead.

Blech.

Does it make sense to integrate the quotes table into nodes or node_revision so that one less inner join could be had?

grcm’s picture

It's better to let the database do the work, not PHP - that's what a DB is designed to do.

The old method of having multiple revisions to a node inside one single row in the database was ghastly- I really couldn't believe it when I saw it.

If MySQL can't handle the load, use PostgreSQL :-)

Seriously though, MySQL should be more than capable of handling most Drupal web sites- perhaps your sort memory is not set high enough?

red_eye-at-patentlystupid.com’s picture

Yes I am the one with 47,000+ quotes and I acutally got an abuse ticket over a join query that appeared only to be doing a count(on the surface with out full access to the logs and not having gone through the code thouroughly I am just guessing). I am hoping something can be possibly done to streamline things.

Taran’s picture

Hmm.

grcm: Nobody said anything about PHP doing the work. I've finally tracked down one of the issues that is causing my site to get CPU overload warnings from my host - and it's the double inner join doing that (through lots of experimentation) and the SELECT Distinct - both of which are used to display quote pages.

PHP is not the issue here. MySQL queries are. A nested join across three tables is netting queries of 45 seconds and MORE in length. That is after much pulling of hair to track down what, of all the wonderful things that came in 4.7, was causing all of this load.

Now - from where I'm sitting, the simple way around this is to incorporate the quotes table directly into the node table (adding the appropriate fields and index) so that the INNER JOIN will only be across two tables. I honestly don't know what the hell the core folk were thinking when they broke out the revisions into a separate table; it certainly makes INNER JOINs necessary for the two tables, but with any module that adds a third table to a query, you can expect problems with all the 'unimportant' folk with shared hosting. I suppose in future Drupal installation step (1) will be: get your own server. :-)

I don't know how many quotes you have, but... with a bit over 10,000 quotes and a total of about 15K nodes at this point, this falls under 'not good'. Again, not a PHP issue, a MySQL issue, and I do not fault anyone over the quotes module. It's just something that needs to be fixed for people with large amounts of quotes.

That said...

redeye - I can only imagine how much 47,000 quotes must be hurting. Over the next week, I'm going to try some stuff along the lines of above - such that the quotes queries will be restricted to two tables, as they were before. I can't promise anything... and I'm probably not the best skilled person to do this... but the nested inner join's additional layer has to be dealt with, at least for 2 people because I bet that's where your problem is as well. More as people start hitting the upper limits.

Looking back now, I really regret upgrading to Drupal 4.7, and if I could roll it back with all the changes I've made, I would. The node table being broken out that way is going to cause problems for people as their sites grow on shared hosting, I think.

Taran’s picture

Title: INNER JOIN removal » OK, I consolidated a bit...
StatusFileSize
new38.23 KB

Based on Killes comments, I took another look at the code... it seems to be a combination of things. So I made a few changes, and while it's not out of the woods yet, it might be of help to others. Basically, I changed the queries for the quotes pages to only one inner join, since the called node_load is doing all the work anyway (and making it do it twice is sort of strange).

It has helped a bit, but not enough to make me happy. The quotes module that I am presently using is attached; use with your own discretion (test it on your own system and don't blame me if it doesn't work!).

I'm toying with the thought of bypassing node_load altogether, as Killes suggested, but I'm not sure how to get the links and quote author into the posts just yet. I'm cutting my teeth on 4.7 code right now. node_load is, for some reason, causing the problem - and I believe it's the piggybacking within node_load that is causing the overhead - since, as Killes pointed out, on the main page this problem does not exist.

Thoughts? Meanwhile, back to the code...

Taran’s picture

Title: OK, I consolidated a bit... » Switched direction
StatusFileSize
new38.43 KB

I went with a different approach, and avoided node_load (as killes suggested). The version attached, a 4.7 quotes, doesn't display the author on the quotes page - but it takes less of a load to put the quotes up.

Since I use the author's name as the title for the quotes, this works for me. It's open to improvement - specifically that - if anyone is interested in tackling it. I'm behind on other work now, so I'm stopping here... for now (?).

jhriggs’s picture

Component: Miscellaneous » Code
Assigned: Unassigned » jhriggs
Category: feature » bug

RDMSes are designed to do joins, so having these three joins should not be a problem, especially since two of the joins are going against unique keys. I guarantee that there are many drupal queries that have a lot more than three joins, especially when using node access modules. Before we go swatting a fly with a sledgehammer by changing structure of tables and the like, let's just try some optimization.

I just committed version 4.7r2 which may address several of these issues. I have optimized several of the queries, changed the PRIMARY KEY on the quotes table, and added a couple of indexes to the quotes table. In my tests, I saw query times and page-creation times drop by 75% or more with these changes. After copying the new version of the module into your modules directory, be sure to run the /update.php script to have the database changes applied.

Note that if you are downloading the .tar.gz file from the web site, it may take several hours for the new version to appear. The .tar.gz files are only re-created every several hours. When you download the file, be sure the top entry in the CHANGELOG.txt file is 2006-05-22.

Taran’s picture

Thanks, jhriggs. As noted elsewehere, the 4.7.1 Drupal core upgrade seems to have addressed things as well.

jhriggs’s picture

Status: Active » Closed (fixed)

Closing.