So i have about 12500 rows in my node table currently. This doesn't seem like a ridiculous amount... I have been on projects with far more and I expect the table to grow on this project. The problem I have is when I have to run SELECT queries on the node table, specifically ones that are general "get the newest content of type x" where I have to order a large result set, the queries are real slow...

For example:

A)

SELECT * FROM `drupal_node` n
JOIN drupal_content_type_news c ON n.nid = c.nid and n.vid = c.vid
WHERE n.type = 'news' AND c.field_news_title_value <> '' 
ORDER BY n.nid DESC

*** Showing rows 0 - 29 (3,548 total, Query took 0.9636 sec)

Or if I want to join for the path alias (it is sometimes faster with the extra join which doesn't make sense, exp since it seems like having to do that join using CONCAT would be slow...)

B)

SELECT * FROM `drupal_node` n
JOIN drupal_content_type_news c ON n.nid = c.nid and n.vid = c.vid
JOIN drupal_url_alias url ON url.src = CONCAT('node/', n.nid)
WHERE n.type = 'news' AND c.field_news_title_value <> '' 
ORDER BY n.nid DESC

*** Showing rows 0 - 29 (3,548 total, Query took 0.8300 sec)

So anyways, all my indexes seem to be working at whatnot... things are just slow because of the number of rows that are being scanned to do the ordering.

If I simply drop the ORDER clauses from above, the queries become fast like I would expect on basically straight SELECT queries:

A) Showing rows 0 - 29 (3,548 total, Query took 0.0635 sec)
B) Showing rows 0 - 29 (3,548 total, Query took 0.0675 sec) - this one is now the slower of the two

So basically I concluded with my limited mySQL optimization knowledge that my DB is slow at ORDERing a large result set... I have created a work around which has been working for me, but I am wondering if there are any serious drawbacks.

Here is what I am doing:

  //get the current highest nid from the db
  $sql = 'SELECT MAX(nid) FROM {node} WHERE 1';
  $result = db_query($sql);
  $max_nid = db_result($result);

  //now choose a safe # here knowing that i don't actually need to search ALL the node
  //the last 1000 are fine since I am always ordering for the newest ones anyways
  $min_nid = $max_nid - 1000;

Then, when I run my queries I simply insert 'AND n.nid > $min_nid' into the query (using %d of course). The results are much better as follows:

A) Showing rows 0 - 29 (560 total, Query took 0.0240 sec)
B) Showing rows 0 - 29 (560 total, Query took 0.1309 sec) - still rather slow here but much better

So my questions are...

- is my solution appropriate? even though it works it seems awfully hacky to me
- is there a simpler and more straight forward way to improve my queries selecting from the node table that have a large result set but I only want to get at say the first 50... LIMIT doesn't work because it still scans the whole table... thoughts?

Bottom line though is that I can't be running nearly 1sec queries on my site and I'm afraid what will happen when I have 100k rows in my node table. Thanks for your time.