according to PDO implementation (http://drupal.org/node/134580), and latest Oracle driver implementation (http://drupal.org/node/39260), it is suggested to remove db_num_rows(). since db_num_rows() is a database-dependent API, e.g. oci8's oci_num_rows() only return AFFECTED rows number (http://php.net/oci_num_rows), and so duplicate SELECT query is required for Oracle driver implementation, which greatly reduce its performance. most of the queries will able be to rewrite as SELECT COUNT(...).

this patch will split into numbers of small patches, in order to reduce overall impact. patches within this issue will always be a completed and integrated version, based on CVS HEAD, and other patches result.

Comments

kbahey’s picture

I think this is a step backwards. Because of Oracle and PDO's limitations, we are now forced to have go lowest common denominator.

This in turn mandates two queries per pager query (one the real query, and the other the COUNT query).

There are 35 places in core that use db_num_rows() already. Some of that can go, but some of them are really needed.

On a large system, this means performance goes down the tube.

So, my recommendation is that the lesser endowed engines work around it, and leave engines that support it with db_num_rows() ability.

Crell’s picture

"Lesser endowed engines" includes the PDO-MySQL driver, if you're using the default forward-only cursor. Most places in core where num_rows is used it should only be one query anyway because that's all it's being used for. Doing a full normal select and getting back all of the data just to get a row count is wasteful.

Core needs to be able to handle all supported databases directly, so that means "working around it" in core for everything. PDO offers a simple solution, though.

$records = $result->fetchAll();

Now it's just an array and count() works. That's no slower than what the mysql driver already does internally, in fact. (That's why mysql_unbuffered_query() exists.) In most cases, though, SELECT COUNT(*) is going to be faster if that's all you need.

hswong3i’s picture

@kbahey: a scientific benchmarking is always the most solid solution, when we are talking about performance impact. according to my benchmarking result, remove db_num_rows() will not reduce overall performance, but even boost up for around 5%. surly that i am not able to perform a "complete" benchmarking for every affected modules, but the result may still consider as solid, based on benchmarking suggestion from Dries and killes :)

besides performance consideration, i guess most of the child-patches of "remove db_num_rows()" should able to commit with no question. some of them may need reviews for better implementation, but that's not such difficult :)

hswong3i’s picture

sorry, the real benchmarking result: http://edin.no-ip.com/html/?q=node/282

gerhard killesreiter’s picture

I've reviewed the patch and pointed out a few errors. The only real problem in core is the user online block and even there you can work around the problem.

hswong3i’s picture

StatusFileSize
new33.75 KB

complete patch with minor bug fixes, and code clear up. progress will split into according small patches :)
P.S. user_block() will be reviewed after this patch, for a better implementation :)

hswong3i’s picture

StatusFileSize
new33.03 KB

complete patch based on latest CVS HEAD, update book.module and user.module (user_block() implementation). progress will split into corresponding small patches :)

dries’s picture

I think it's OK to remove this function as it improves database compatibility. There is a good work-around for it, that shouldn't be too expensive in the vast majority of the cases.

I haven't reviewed the latest patch yet.

moshe weitzman’s picture

-  $number = db_num_rows(db_query("SELECT event FROM {flood} WHERE event = '%s' AND hostname = '%s' AND timestamp > %d", $name, ip_address(), time() - 3600));
+  $number = db_result(db_query("SELECT COUNT(event) FROM {flood} WHERE event = '%s' AND hostname = '%s' AND timestamp > %d", $name, ip_address(), time() - 3600));

shouldn't that be COUNT(*) in order to take advantage of mysql optimization? there is no drawback that i can see. it is (slightly) semantically better as well.

hswong3i’s picture

StatusFileSize
new32.99 KB

according to moshe's suggestion, and his document reference (http://www.mysqlperformanceblog.com/2007/04/10/count-vs-countcol/), this patch is updated. many thanks :)

P.S. seems the full version of this patch will be able to accept. i will not update its child patches at this point. BTW, if we still hope to keep db_num_rows() for backward supporting (where i think it is a bit meaningless, as it is database-dependence), those child patches will be required, as they can improve overall performance. so i will let them keep open, until we are all agreed to remove db_num_rows() supporting from DB API :)

dries’s picture

I'm OK with committing this patch, as it will really help development of other database backends. At the same time, I've no problem with postponing this patch to Drupal 7 either. As we're going to do a lot of database work in D7, we might be better of to postpone this patch. Some feedback would be appreciated.

hswong3i’s picture

it may be a bit pity if we postpone this patch (and also the same case as LOBs patch)... as a lot of improvement is already done for our D6 database API, oracle driver (and some other else, e.g. MSSQL and DB2) is now able to develop with no difficulty.

db_num_rows() (and also LOBs handling) is the last difficulty that i can discover about cross database compatibility. we shouldn't waste our previous effort, overcome these finial difficulty, cleanup the path for other database driver development, and improve the database support ability for D6. i am looking for a pretty feature of Drupal in the field of enterprise, we should catch these chance on time :)

moshe weitzman’s picture

i think it is OK to commit this patch remove the num_rows DB function and either put db_num_rows() in legacy.module or paste it into the upgrade docs for lazy upgraders. the fix is straightforward enough that we won't upset module developers too much.

Crell’s picture

I'm generally in support of lots of little patches in the same direction over one big patch, as it makes it much easier to maintain patches during development. For that reason, I'd favor committing this for D6.

If someone really needs to do old-style num_rows() behavior, the PDO answer would be ->fetchAll(). Do we want to consider implementing a db_fetch_all() now to replace db_num_rows(), or is that a silly addition if it's going to only last one revision?

hswong3i’s picture

i think db_fetch_all() maybe not too useful: we always loop though result set by using foreach , and preform corresponding action. it is usually pointless for a complete result set fetching :)

on the other hand, i will vote for SELECT COUNT(...) and program logic re-factor, rather than implementing db_fetch_all(), as db_num_rows() replacement. SELECT COUNT(...) and program logic re-factor may improve overall performance (according to benchmark result, it really did that), which should also be considered :)

dries’s picture

If that's the general consensus, I'd be happy to commit this patch once it is marked "Ready to be committed". This should happen sooner than late.

I looked at the code and it looks good except for:

+          $authenticated_count = 0;
+          $max_users = variable_get('user_block_max_list_count', 10);
+          $items = array();
+          while ($max_users-- && $account = db_fetch_object($authenticated_users)) {
+            $items[] = $account;
+            $authenticated_count++;
+          }

Now, $authenticated_count willl be smaller or equal to $max_users. On drupal.org, $max_users is 15 but $authenticated_count should return 300 or so.

hswong3i’s picture

StatusFileSize
new33.05 KB

sorry for such careless bug, and here is the updated version.

P.S. i found that DB2 will also face problem when implementing db_num_rows():
http://www-128.ibm.com/developerworks/ibm/library/i-osource9/index.html

hswong3i’s picture

StatusFileSize
new33.05 KB

minor update based on latest CVS.

i am now using it for both oracle and DB2 driver development, and found it is necessary. hope someone can take a final review for this patch :)

gábor hojtsy’s picture

Status: Needs review » Needs work

While reviewing the patch, I found a minor and a major problem:

- the session.inc changes introduce an E_NOTICE error as far as I see, $items is not initilized
- all changes related to node_title_list() calls should be revisited... this seem like passing on the data to theme('node_list') which invokes theme('item_list') which returns HTML output even if there were no rows in the result set... this is not only superfluous HTML output, but cases bugs in cases where it is used as if it returns FALSE if there were no rows in the set, like if ($daytop && ($result = statistics_title_list('daycount', $daytop)) && ($node_title_list = node_title_list($result, t("Today's:")))) {

hswong3i’s picture

Assigned: Unassigned » hswong3i
Status: Needs work » Needs review
StatusFileSize
new34.34 KB

@goba: thanks for your advice, and i have update the patch. now the checking will preform within node_title_list(): if nothing can fetch from input DB result object, this function will return FALSE, without calling theme_node_list(). it is a program logic re-factor, as like as other cases. thank you :)

gábor hojtsy’s picture

This version looks OK now. But this is a significant amount of change, so it would be nice to see independent tests done.

pwolanin’s picture

Status: Needs review » Needs work

minor suggestions on the code. Typo in these code comments:

+ * a SELECT COUNT(*) on the temporary table afterwards. db_affected_rows() do
+ * not give consistent result across different database types in this case.

"do" should be "does", as in "db_affected_rows() does not give"

Also, probably not much of a difference, but db_table_exists() and db_column_exists should be made consistently like this (use db_result rather than db_fetch_object):

 function db_table_exists($table) {
-  return db_num_rows(db_query("SHOW TABLES LIKE '{". db_escape_table($table) ."}'"));
+  return db_result(db_query("SHOW TABLES LIKE '{". db_escape_table($table) ."}'"));
 }

now the usage seems to vary between PostgreSQL and MySQL driver - unless there is a good reason to do so?

Finally, I think the logic change in function menu_tree_page_data() is faulty - $item will always be FALSE at that point in the loop. You need to create a flag instead.

pwolanin’s picture

here's a possible implementation for a correct change to the menu.inc part: http://drupal.pastebin.com/d2a1733d8

Note to see the bug introduced by the patch for menu.inc, you may need to be clearing the cache_menu table by hand regularly.

here's the test: set admin, admin/build, and admin/build/menu to be expanded

On visiting any page, the navigation menu should be expanded down fully to admin/build/menu. With the patch applied that fails to happen.

hswong3i’s picture

StatusFileSize
new34.35 KB

just post the complete and updated version here. and let commit its split version one by one :)

dries’s picture

Let's focus on this patch rather than on the multiple smaller ones.

hswong3i’s picture

ooo sorry about that... seems i have misunderstand Goha's meaning... very sorry :(

hswong3i’s picture

StatusFileSize
new34.42 KB

minor update for standardize coding format. reuse small patches progress.

hswong3i’s picture

Status: Needs work » Needs review
dries’s picture

Status: Needs review » Fixed

I've reviewed this patch once more and spent some time testing it. Everything looks OK. It gives us better cross-database compatibility and better performance.

(It's not clear whether pwoladin's suggestion made it in, the snippet already expired, but I assume it has.)

chx’s picture

Status: Fixed » Needs work

Hold. The patch is not yet committed and it should not be because at least taxonomy_render_nodes is broken.

+  while ($node = db_fetch_object($result)) {
+ $output .= node_view(node_load($node->nid), 1);
+  }
+  if ($node) {

It's guaranteed that $node is FALSE after loop. You wanted if ($output)

chx’s picture

Found a few more of this in the patch. Edited version of the irc chat follows. You need a variable like $is_more for menu.inc find a good name for it make it FALSE before the while ($item = and make it TRUE inside the loop (I will call this control variable later aggregator_parse_feed , if ($item) , wrong, if ($items) . comment_render , tricky, might need use a control variable here as well. node_title_list , return $node ? change to return $items ?

node_revisions( , if ($node) ===> if ($output)

system_update_179 , use control variable

return count($authmaps) ? $authmaps : 0; <= no need for count, php will nicely cast empty array to false , nonempty to true, try it.

pwolanin’s picture

This patch still breaks the menu system and should not be committed - see my pastebinned code above for the fix.

hswong3i’s picture

Status: Needs work » Needs review
StatusFileSize
new35.84 KB

@chx and @pwolanin: thanks for your advice. now i add some counter for extra checking, and should able to due with the problem :)

pwolanin’s picture

Status: Needs review » Needs work

well, at least that won't break menu, but there is no reason to increment a counter - all we care is whether we have 0 or > 0 results. Just use the code with the boolean falg I suggested which will be faster.

hswong3i’s picture

Status: Needs work » Needs review
StatusFileSize
new35.97 KB

@pwolanin: update based on your suggestion. thank you very much :)

P.S. i keep some handling as before, since we are really considering about row numbers.

P.P.S. i am still prefer about using integer counter, for standardize coding format between db_num_rows() => program logic re-factor. anyway, both v0.9 and v0.9.1 are working equally, please feel free to commit any one of this, based on different concern ;p

dries’s picture

Status: Needs review » Active

Committed the last version of this patch. I'm marking this 'active' until this has been documented in the upgrade instructions.

hswong3i’s picture

Category: feature » bug
Priority: Normal » Critical
Status: Active » Needs review
StatusFileSize
new8.43 KB

@Dries: sorry that the version you just commit is a buggy version. according to chx and pwolanin help, we update this thread, and here is the bug fix patch for latest CVS.

hswong3i’s picture

StatusFileSize
new3.1 KB

encore version... change integer counter into boolean flag, based on chx and pwolanin suggestion :(

hswong3i’s picture

the patch in #38 is only a tweaking version of current implementation, it should be able to function without any question, but improve some performance :)

pwolanin’s picture

Status: Needs review » Needs work

patch was not made from the drupal root directory

pwolanin’s picture

Status: Needs work » Reviewed & tested by the community
StatusFileSize
new3.2 KB

here's a new patch re-rolled from the root dir (the other applied using patch -p1). Otherwise identical. The logic looks right and at least the taxonomy module tests ok.

gábor hojtsy’s picture

Status: Reviewed & tested by the community » Needs work

If $num_rows does not contain a number, why call it $num_rows? (In all of the files patched)? I think $has_rows or something even simpler would be better.

pwolanin’s picture

Status: Needs work » Needs review
StatusFileSize
new3.2 KB

changed variable names.

gábor hojtsy’s picture

Status: Needs review » Fixed

This looks much better, thanks. Committed!

Anonymous’s picture

Status: Fixed » Closed (fixed)