I am running with Postges (which seems to be the reason behind many of my problems) and the standard template. After some while doing things (not sure what causes it entirely yet - see below) I try and display the home page as an anonymous user, I get a couple of non ascii characters appear in the browser and nothing else.

Deleting the contents of the cache table clears the problem.

Once it has started to occur - any display of the home page causes it to occur other than immediately after deleting the cache. Not sure if this is a timing thing or not.

At the time of simplest occurance (just selecting home from the primary menu when set up as an anonymous user), the following three 'cid' records are in the cache.

filter:1:9dd69af2aee6060e935bf68b4d5e6457
archive:calendar:28-8-2004
http://drupal.home/

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

AlanChandler’s picture

Title: Selecting "Home" menu from primary selection seems to corrupt cache » Cache problem with Postgres

I think I have a handle on what the problem is, although I don't yet know the solution.

The "data" column of table "cache" has the data in it stored in compressed form. At the moment is declared as type TEXT. This seems to be only a character oriented type and might be causing postgres to do some form of conversion back to a string. At least some of the data in the database appears to be uncompressed strings.

I thought the solution to this was to convert the data column to type bytea, but just tried this, and although I got a different result it was not correct. Since the manual talks about potentially escaping some bytes in this form too - I am not so sure.

The result I got, was a blank screen with the following (document source and what was seen in the browser).

\037\213\010

AlanChandler’s picture

FileSize
1.59 KB

I have changed bootstrap.inc to store the data on disk in an uncompressed form, and only compress it for transmission to the browser just before outputing it.

I have no idea whether this is an acceptable approach - but it now seems (with a few minutes testing to see that it works) to work fine under Postgres.

Patch attached.

killes@www.drop.org’s picture

I don't like your patch because I put some effort into getting the gzipped cache into core. ;)

The idea is not to save bandwidth (we could use mod_gzip then) but to remove the overhead in creating the gzipped pages anew for each page view.

Can you give me information on both the browser and the server setup? Especially interesting would be anything that uses gzipping techniques itself.

When trying my patch I discovered that I would get problems when changing the data column from "text" to "blob" in MySQL.

AlanChandler’s picture

This is all at home.

I am running the debian linux (sarge) with apache2, php4 and postgres as the server. Client end is debian unstable. I generally am using konqueror as the browser, although on this one I also tried it with mozilla firefox. I have access to IE under windows 98 or XP if needed.

killes@www.drop.org’s picture

FileSize
1.81 KB

I've prepared a patch which excludes PostGreSQL users from this feature. I'd welcome a better solution though.

AlanChandler’s picture

FileSize
1.82 KB

Why don't you just merge my patch and yours. That way, we still get to transmit compressed stuff when we can. Here is an attached patch that shows that.

AlanChandler’s picture

I just realised the real implications of your previous comments. I guess what you are saying is that it is more important to save processing time on the server than to save bandwidth by zipping up the contents on the fly just before transmission (which could be an expensive operation if the server has to do it for all users).

In which case, ignore my last patch.

killes@www.drop.org’s picture

I had a similar proble with mysql yesterday. This wasn't a php or mysql error, but a faulty database. The anonymous user was missing. Can you check if you have a uid = 0 user in your users table and that he has in the right role "anonymous user" (rid = 1)?

For some reason people lose this anonymous user occassionally. I'd really like to know why.

AlanChandler’s picture

Yes, I have uid 0 in my database - although he doesn't have a name. users_roles links uid=0 to rid=1

adrian’s picture

Assigned: Unassigned » vertice@www.drop.org

Assigning to myself.

adrian’s picture

I have a working fix for this, but it's based on my latest postgres compatibility patch (most notably the new includes/database.pgsql.inc). I think i'll roll it into the other patch , but I first want to get the ok from dries.

Firstly, i changed the field to bytea, then what I did was to create a new set of db_api functions :

/**
 * Returns a properly formatted Binary Large OBject value.
 *
 * @param $data
 *   Data to encode.
 * @return
 *  Encoded data.
 */
function db_encode_blob($data) {
  return pg_escape_bytea($data);
}

/**
 * Returns text from a Binary Large OBject value.
 *
 * @param $data
 *   Data to decode.
 * @return
 *  Decoded data.
 */
function db_decode_blob($data) {
  return stripcslashes($data);
}


And then i updated includes/bootstrap.inc to run the data through those functions on save / output.

I also added the functions to database.mysql.inc, but they just return the input. I still need to do the updates.inc change for this, and there prolly needs to be more testing with various versions of postgres/php.

adrian’s picture

looks like i spoke to soon. i am getting close to the solution tho.

adrian’s picture

There is a patch that fixes this on the Postgres Compatibility Update issue.

JonBob’s picture

adrian’s picture

FileSize
2.66 KB

This part of the update patch didnt make it in completely since it caused problems with mysql.

Here is a fix that properly escapes the blob field for mysql aswell as postgres.

I also neglected to add the new updates.inc function that recreates the cache table with a bytea type on the data field.

Dries’s picture

Really, I don't like the fact you have to avoid using the %s and %d directives. It is prone to errors.

adrian’s picture

The problem with using %s and %d, is that the already escaped data gets escaped again, which kills postgres.

When attempting to use %s and %d for everything BUT the data field, i ran into cases where there was a '%' in the data field, which once again .. crashed sprintf completely.

I was discussing this on #drupal, and steven suggested i run the db_encode_blob through str_replace('%', '%%', $data) to make it sprintf safe. Hence, apart from the data field, all the other fields are using %s/%d in my latest patch.

There was a suggestion that we implement our own sprintf for use in db_query , whereby we can have %s , %d and %b (blob) types, and then we can even add the "'"s to strings automagically. This would however require us to update every db_query statement , so it was shelved as an idea to fix this for drupal 4.6.

adrian’s picture

Dries’s picture

Still, it is this kind of special cases that - when added up - make it impossible to understand the code. I'll give it some more thought.

Steven’s picture

I've coded a replacement db_query here:
http://www.acko.net/dumpx/dbquery.phps

As Adrian already said, it automatically adds quotes to blobs and strings, which means we could make queries a bit simpler and avoid double-quote usage. Making quotes part of the %s/%b placeholders really makes sense from a db abstraction point of view: you just signify "I want a string" without worrying about how it gets included.

It doesn't do any of sprintf's advanced formatting (like '%02d'), but I don't think they're used anywhere.

Given that it's 100% sure this code will be cleaned up for 4.6, I think it's okay to include that messier code for now.

adrian’s picture

Yes, what steven said. Essentially, I don't like the solution either, but it's a critical fix, and it's a special case of exactly 2 queries.

Here is an updated patch with a comment stating that.

adrian’s picture

This patch created the http://drupal.org/node/10407 bug, when using mysql.

Hence it's not ready for primetime yet.

joemc91’s picture

I just ran into this problem tonight and have not been able to diagnose it. My website (www.joepilot.net) is not updating the cache properly. It updates it, but unserializing the data fails, so somewhere in the encoding process, the data is broken.

My assumption is that the addslashes command in check_query breaks it, but I cannot imagine why since I've been using the program for a few days without error till this afternoon.

This problem only began occuring earlier today. It also may be that I added a bad (read single quote) into the variables section. This is actually the only cache item that is broken right now. I can also add nodes, but I cannot delete them (?). I had this problem a few weeks ago, but I don't remember how I fixed it. Thanks for your help.

adrian’s picture

This has been fixed in the latest postgres update.

the problem was that serialized data was being butchered when inside the query , due to table prefixing (ie: s:{1:'string'} was turned into s:1:'string'. ... thanks steven).

This doesnt place the query in there, but uses addcslashes to make the binary data safe for a text field (not bytea anymore).

Some storage space is being wasted, but until we can implement a proper solution (for 4.6), i feel this is the best we are going to get.

Anonymous’s picture

I have been experiencing this problem on my setup as well. Although it is a little different configuration than yours.

I am running:

MacOS X 10.3.6
Apache 1.33 (w/ PHP 4.3.9)
MySQL 4.0.18
Drupal 4.5.1 (happened LESS frequently in 4.5.0 btw)

Every once in a while, (I believe) it caches a blank page, and the only solution at this moment is to run a "delete from cache;" statement in the mysql console.

Also, when I had zlib enabled (via php.ini), it would cache a page, and display wierd ascii characters (as if editing a compiled program with a text editor). I had disabled zlib (via php.ini) in order to prevent this, but it seems the real problem is with the cache system.

This morning, I disabled the cache in hopes to prevent this further outage. If anyone has any suggestions, please make them 8) ..

Cvbge’s picture

Component: other » database system
Assigned: vertice@www.drop.org »
Status: Closed (duplicate) » Active

Hello, the problem with cache still exists, both in 4.6 and cvs :(
I'll be talking about postgres of course :)

Let me describe the problem again: function page_set_cache() compresses the page and stores it in the cache table. The data stored is a normal binary data and as such it has to be stored in a 'bytea' type column.
Before storing it has to be escaped, for example using pg_escape_bytea() function. After retriving from a db it needs to be unescaped with e.g. pg_unescape_bytea().

page_set_cache() uses cache_set() to save the data, which in turn uses normal db_query() to insert it... And here's the problem:

  • I can't put the binary $data into the query, e.g. INSERT ... VALUES ('$data') because db_query() runs db_prefix_tables() on the query and {} will be removed thus breaking the data
  • I can't put the binary $data into '%s', e.g. db_query("INSERT ... VALUES('$s'), $data) because db_query() runs db_escape_string() on it's arguments which will also break the data

So, how can this be overcome? I see following solutions:

  • for INSERT put the $data in the body of the query, i.e. '$data', but first change all '%' to '%%' and '{' to '%x' and '}' to '%y'. This will remove all { and } and will make it safe for db_prefix_tables. When SELECTing the data back you'll have to change %% to % and %x to { and %y to }.
  • completly encode $data (for example uuencode), which will translate it into printable string, and keep the data in normal text column
  • change db_query so it recognizes e.g. %b flag and put $data in arguments. This does not look very easy to do...

As I read this caching mechanism was intended to save cpu cycles by not compressing the page every time. So I don't know if encoding the data won't defeat the purpose of the cache...

What do you think?

Cvbge’s picture

for INSERT put the $data in the body of the query, i.e. '$data', but first change all '%' to '%%' and '{' to '%x' and '}' to '%y'. This will remove all { and } and will make it safe for db_prefix_tables. When SELECTing the data back you'll have to change %% to % and %x to { and %y to }.

I'm not sure if you can do such conversion....

killes@www.drop.org’s picture

Cvbge: I'd prefer to allow %b for db_query. What are the difficulties you see?

Cvbge’s picture

FileSize
2.22 KB

Here's a patch implementing the first way (putting $data in the query body). Probably mysql-part needs some changes so it escapes the $data correctly.

After applying the patch everything works for not anonymous users, as previously, because cache is not used for them.

Now for anonymous users:

a) client does not support compressed content (e.g. wget)
- When getting a page for the first time I get <b>Notice</b>: Undefined index: messages in <b>.../includes/bootstrap.inc</b> on line <b>772</b> at the begining of the page, then normal page
- When getting a page for the 2nd or further time (a page is cached) I get:

<b>Notice</b>:  Undefined index:  messages in <b>.../includes/bootstrap.inc</b> on line <b>772</b><br />
<b>Notice</b>:  Undefined property:  cache in <b>.../includes/bootstrap.inc</b> on line <b>290</b><br />
<b>Notice</b>:  Undefined index:  module in <b>.../includes/bootstrap.inc</b> on line <b>151</b><br />
<b>Notice</b>:  Undefined index:  filter in <b>.../includes/bootstrap.inc</b> on line <b>158</b><br />
<b>Notice</b>:  Undefined index:  system in <b>.../includes/bootstrap.inc</b> on line <b>158</b><br />
<b>Notice</b>:  Undefined index:  user in <b>.../includes/bootstrap.inc</b> on line <b>158</b><br />
<b>Notice</b>:  Undefined index:  watchdog in <b>.../includes/bootstrap.inc</b> on line <b>158</b><br />

And also after the page:
<b>Notice</b>: Undefined property: cache in <b>.../includes/session.inc</b> on line <b>45</b><br />

b) if client does support compressed content:
- when getting a page for the first time: everything is ok
- when getting for the 2nd or further:

Notice: Undefined index: module in .../includes/bootstrap.inc on line 151                                                                                                             
Notice: Undefined index: filter in .../includes/bootstrap.inc on line 158                                                                                                             
Notice: Undefined index: system in .../includes/bootstrap.inc on line 158                                                                                                             
Notice: Undefined index: user in .../includes/bootstrap.inc on line 158                                                                                                               
Notice: Undefined index: watchdog in .../includes/bootstrap.inc on line 158                           

Then goes compressed html page

Cvbge’s picture

Status: Active » Needs review
FileSize
6.46 KB

This one seems to work, for postgresql at least.

Cvbge’s picture

Fixes: added %f (search.modules uses it) and %%.

Cvbge’s picture

This version uses preg_replace_callback. It looks nicer then the previous one but have ugly $args initializing...

Cvbge’s picture

A similar problem, it seems, exists in mysql: http://drupal.org/node/16998

Cvbge’s picture

I've done some speed tests, similar to those for "extend db_query".

My interpretation of the tests is:
1. there is not much difference between callback and foreach versions. Thus I'd suggest callback version as it's nicer to read.
2. patched version is a bit slower then not-patched, but it's not much and, as I wrote in http://drupal.org/node/17656#comment-47980, should not be a reason to drop this patch.

Results:

callback version, mysql:

Queries not using %b:

Fastest one - 1.6ms:
Test 1(1000 runs):	 Testing SELECT last_comment_timestamp, last_comment_name, comment_count FROM {node_comment_statistics} WHERE nid = %d...
Total test time: 174.11ms, avg query time: 0.17ms, std. dev.: 0.02ms

Longest taking - 60ms:
Test 2(1000 runs):	 Testing INSERT INTO {cache} (cid, data, created, expire, headers) VALUES ('%s', '%s', %d, %d, '%s')...
Total test time: 3422.20ms, avg query time: 3.42ms, std. dev.: 0.16ms

This one took 13ms:
Test 3(1000 runs):	 Testing SELECT DISTINCT(uid), MAX(timestamp) AS max_timestamp FROM {sessions} WHERE timestamp >= %d AND uid != 0 GROUP BY uid ORDER BY max_timestamp DESC...
Total test time: 191.43ms, avg query time: 0.19ms, std. dev.: 0.01ms

Test 4(1000 runs):	 Testing INSERT INTO {system} (a,b,c) VALUES(10, 20, '30')...
Total test time: 121.88ms, avg query time: 0.12ms, std. dev.: 0.01ms

Test 5(1000 runs):	 Testing INSERT INTO {system} (a,b,c) VALUES(%d, '%s', 30)...
Total test time: 188.32ms, avg query time: 0.19ms, std. dev.: 0.01ms

Test 6(1000 runs):	 Testing INSERT INTO {system} (a,b,c,d,e,f,g,h) VALUES(%d, '%s', 30, '%d', '%s', '%d', '%s', '%s')...
Total test time: 337.01ms, avg query time: 0.34ms, std. dev.: 0.03ms

Queries using %b:

Test 7(1000 runs):	 Testing INSERT INTO {system} (a,b,c) VALUES(%d, '%s', 30, %b)...
Total test time: 225.07ms, avg query time: 0.23ms, std. dev.: 0.01ms

Test 8(1000 runs):	 Testing UPDATE {node} SET %a WHERE nid = %d AND pid = '%s' OR fid = %d AND foo = %b AND aid = '%d' OR bid = '%d' AND cid = %b OR did = '%s' AND eid = '%s' GROUP BY somethig ORDER BY 1,2...
Total test time: 429.07ms, avg query time: 0.43ms, std. dev.: 0.02ms

callback version, pgsql:

Queries not using %b:

Fastest one - 1.6ms:
Test 1(1000 runs):	 Testing SELECT last_comment_timestamp, last_comment_name, comment_count FROM {node_comment_statistics} WHERE nid = %d...
Total test time: 173.07ms, avg query time: 0.17ms, std. dev.: 0.02ms

Longest taking - 60ms:
Test 2(1000 runs):	 Testing INSERT INTO {cache} (cid, data, created, expire, headers) VALUES ('%s', '%s', %d, %d, '%s')...
Total test time: 2461.93ms, avg query time: 2.46ms, std. dev.: 0.13ms

This one took 13ms:
Test 3(1000 runs):	 Testing SELECT DISTINCT(uid), MAX(timestamp) AS max_timestamp FROM {sessions} WHERE timestamp >= %d AND uid != 0 GROUP BY uid ORDER BY max_timestamp DESC...
Total test time: 188.39ms, avg query time: 0.19ms, std. dev.: 0.01ms

Test 4(1000 runs):	 Testing INSERT INTO {system} (a,b,c) VALUES(10, 20, '30')...
Total test time: 120.95ms, avg query time: 0.12ms, std. dev.: 0.01ms

Test 5(1000 runs):	 Testing INSERT INTO {system} (a,b,c) VALUES(%d, '%s', 30)...
Total test time: 180.55ms, avg query time: 0.18ms, std. dev.: 0.05ms

Test 6(1000 runs):	 Testing INSERT INTO {system} (a,b,c,d,e,f,g,h) VALUES(%d, '%s', 30, '%d', '%s', '%d', '%s', '%s')...
Total test time: 313.72ms, avg query time: 0.31ms, std. dev.: 0.03ms

Queries using %b:

Test 7(1000 runs):	 Testing INSERT INTO {system} (a,b,c) VALUES(%d, '%s', 30, %b)...
Total test time: 213.42ms, avg query time: 0.21ms, std. dev.: 0.02ms

Test 8(1000 runs):	 Testing UPDATE {node} SET %a WHERE nid = %d AND pid = '%s' OR fid = %d AND foo = %b AND aid = '%d' OR bid = '%d' AND cid = %b OR did = '%s' AND eid = '%s' GROUP BY somethig ORDER BY 1,2...
Total test time: 415.68ms, avg query time: 0.42ms, std. dev.: 0.02ms

foreach version, mysql:

Queries not using %b:

Fastest one - 1.6ms:
Test 1(1000 runs):	 Testing SELECT last_comment_timestamp, last_comment_name, comment_count FROM {node_comment_statistics} WHERE nid = %d...
Total test time: 153.75ms, avg query time: 0.15ms, std. dev.: 0.02ms

Longest taking - 60ms:
Test 2(1000 runs):	 Testing INSERT INTO {cache} (cid, data, created, expire, headers) VALUES ('%s', '%s', %d, %d, '%s')...
Total test time: 2968.74ms, avg query time: 2.97ms, std. dev.: 0.14ms

This one took 13ms:
Test 3(1000 runs):	 Testing SELECT DISTINCT(uid), MAX(timestamp) AS max_timestamp FROM {sessions} WHERE timestamp >= %d AND uid != 0 GROUP BY uid ORDER BY max_timestamp DESC...
Total test time: 170.20ms, avg query time: 0.17ms, std. dev.: 0.01ms

Test 4(1000 runs):	 Testing INSERT INTO {system} (a,b,c) VALUES(10, 20, '30')...
Total test time: 70.66ms, avg query time: 0.07ms, std. dev.: 0.01ms

Test 5(1000 runs):	 Testing INSERT INTO {system} (a,b,c) VALUES(%d, '%s', 30)...
Total test time: 179.14ms, avg query time: 0.18ms, std. dev.: 0.03ms

Test 6(1000 runs):	 Testing INSERT INTO {system} (a,b,c,d,e,f,g,h) VALUES(%d, '%s', 30, '%d', '%s', '%d', '%s', '%s')...
Total test time: 326.67ms, avg query time: 0.33ms, std. dev.: 0.02ms

Queries using %b:

Test 7(1000 runs):	 Testing INSERT INTO {system} (a,b,c) VALUES(%d, '%s', 30, %b)...
Total test time: 212.95ms, avg query time: 0.21ms, std. dev.: 0.02ms

Test 8(1000 runs):	 Testing UPDATE {node} SET %a WHERE nid = %d AND pid = '%s' OR fid = %d AND foo = %b AND aid = '%d' OR bid = '%d' AND cid = %b OR did = '%s' AND eid = '%s' GROUP BY somethig ORDER BY 1,2...
Total test time: 415.05ms, avg query time: 0.42ms, std. dev.: 0.06ms

foreach version, pgsql:

Queries not using %b:

Fastest one - 1.6ms:
Test 1(1000 runs):	 Testing SELECT last_comment_timestamp, last_comment_name, comment_count FROM {node_comment_statistics} WHERE nid = %d...
Total test time: 158.75ms, avg query time: 0.16ms, std. dev.: 0.02ms

Longest taking - 60ms:
Test 2(1000 runs):	 Testing INSERT INTO {cache} (cid, data, created, expire, headers) VALUES ('%s', '%s', %d, %d, '%s')...
Total test time: 2172.37ms, avg query time: 2.17ms, std. dev.: 0.11ms

This one took 13ms:
Test 3(1000 runs):	 Testing SELECT DISTINCT(uid), MAX(timestamp) AS max_timestamp FROM {sessions} WHERE timestamp >= %d AND uid != 0 GROUP BY uid ORDER BY max_timestamp DESC...
Total test time: 172.14ms, avg query time: 0.17ms, std. dev.: 0.02ms

Test 4(1000 runs):	 Testing INSERT INTO {system} (a,b,c) VALUES(10, 20, '30')...
Total test time: 70.52ms, avg query time: 0.07ms, std. dev.: 0.00ms

Test 5(1000 runs):	 Testing INSERT INTO {system} (a,b,c) VALUES(%d, '%s', 30)...
Total test time: 178.12ms, avg query time: 0.18ms, std. dev.: 0.02ms

Test 6(1000 runs):	 Testing INSERT INTO {system} (a,b,c,d,e,f,g,h) VALUES(%d, '%s', 30, '%d', '%s', '%d', '%s', '%s')...
Total test time: 317.08ms, avg query time: 0.32ms, std. dev.: 0.05ms

Queries using %b:

Test 7(1000 runs):	 Testing INSERT INTO {system} (a,b,c) VALUES(%d, '%s', 30, %b)...
Total test time: 213.25ms, avg query time: 0.21ms, std. dev.: 0.03ms

Test 8(1000 runs):	 Testing UPDATE {node} SET %a WHERE nid = %d AND pid = '%s' OR fid = %d AND foo = %b AND aid = '%d' OR bid = '%d' AND cid = %b OR did = '%s' AND eid = '%s' GROUP BY somethig ORDER BY 1,2...
Total test time: 406.10ms, avg query time: 0.41ms, std. dev.: 0.01ms

original version, mysql:

Queries not using %b:

Fastest one - 1.6ms:
Test 1(1000 runs):	 Testing SELECT last_comment_timestamp, last_comment_name, comment_count FROM {node_comment_statistics} WHERE nid = %d...
Total test time: 150.24ms, avg query time: 0.15ms, std. dev.: 0.09ms

Longest taking - 60ms:
Test 2(1000 runs):	 Testing INSERT INTO {cache} (cid, data, created, expire, headers) VALUES ('%s', '%s', %d, %d, '%s')...
Total test time: 3298.52ms, avg query time: 3.30ms, std. dev.: 1.44ms

This one took 13ms:
Test 3(1000 runs):	 Testing SELECT DISTINCT(uid), MAX(timestamp) AS max_timestamp FROM {sessions} WHERE timestamp >= %d AND uid != 0 GROUP BY uid ORDER BY max_timestamp DESC...
Total test time: 161.17ms, avg query time: 0.16ms, std. dev.: 0.01ms

Test 4(1000 runs):	 Testing INSERT INTO {system} (a,b,c) VALUES(10, 20, '30')...
Total test time: 70.61ms, avg query time: 0.07ms, std. dev.: 0.01ms

Test 5(1000 runs):	 Testing INSERT INTO {system} (a,b,c) VALUES(%d, '%s', 30)...
Total test time: 140.52ms, avg query time: 0.14ms, std. dev.: 0.01ms

Test 6(1000 runs):	 Testing INSERT INTO {system} (a,b,c,d,e,f,g,h) VALUES(%d, '%s', 30, '%d', '%s', '%d', '%s', '%s')...
Total test time: 209.02ms, avg query time: 0.21ms, std. dev.: 0.03ms

original version, pgsql:

Queries not using %b:

Fastest one - 1.6ms:
Test 1(1000 runs):	 Testing SELECT last_comment_timestamp, last_comment_name, comment_count FROM {node_comment_statistics} WHERE nid = %d...
Total test time: 135.77ms, avg query time: 0.14ms, std. dev.: 0.03ms

Longest taking - 60ms:
Test 2(1000 runs):	 Testing INSERT INTO {cache} (cid, data, created, expire, headers) VALUES ('%s', '%s', %d, %d, '%s')...
Total test time: 2318.39ms, avg query time: 2.32ms, std. dev.: 0.08ms

This one took 13ms:
Test 3(1000 runs):	 Testing SELECT DISTINCT(uid), MAX(timestamp) AS max_timestamp FROM {sessions} WHERE timestamp >= %d AND uid != 0 GROUP BY uid ORDER BY max_timestamp DESC...
Total test time: 150.37ms, avg query time: 0.15ms, std. dev.: 0.02ms

Test 4(1000 runs):	 Testing INSERT INTO {system} (a,b,c) VALUES(10, 20, '30')...
Total test time: 70.30ms, avg query time: 0.07ms, std. dev.: 0.00ms

Test 5(1000 runs):	 Testing INSERT INTO {system} (a,b,c) VALUES(%d, '%s', 30)...
Total test time: 122.21ms, avg query time: 0.12ms, std. dev.: 0.02ms

Test 6(1000 runs):	 Testing INSERT INTO {system} (a,b,c,d,e,f,g,h) VALUES(%d, '%s', 30, '%d', '%s', '%d', '%s', '%s')...
Total test time: 179.20ms, avg query time: 0.18ms, std. dev.: 0.01ms
Cvbge’s picture

Improved documentation, fixed db_queryd().

Cvbge’s picture

Clicked submit too fast :(

Cvbge’s picture

_db_query_cb() was missing return; for init code. This should fix "unexplained" default calls.

Cvbge’s picture

The weird notices I was getting occur also on fresh cvs with mysql. No patches, no postgres. Just vanilla drupal HEAD with MySQL.

I'm not sure how to reproduce this, but I have with cache and clean urls enabled and I access front page with Links or Opera as anonymous user I get:

   Notice: Undefined index: q in .../includes/bootstrap.inc on line 701                                                                                                                  
   Notice: Undefined index: q in .../includes/bootstrap.inc on line 702                                                                                                                  
   Notice: Undefined index: q in .../modules/statistics.module on line 80                                                                                                                
   Notice: Undefined index: time in .../includes/bootstrap.inc on line 55                                                                                                                
   Notice: Undefined property: cache in .../includes/session.inc on line 45    

When I clicked on a node I didn't get any notices for the first time, when I refreshed I got the notice about time index. The same when clicking on "Create new account" (user/register). This suggests that this happen when the page is loaded from cache.

What is weird is that I don't see this messages when I access the page from Firefox!

Disabling clean urls does not change anything. Applying my patch does not change anything.
Disabling cache of course help.

As I have proven 'weird notices' are not connected with postgresql nor with my patch.

Cvbge’s picture

db_query_temporary() might also need to be changed to support %b, but that should be easy, it's another clone of db_query().
I'll wait with the patch untill prefixing temporary tables is decided, as those 2 patches would change the same lines.

dopry’s picture

Title: Cache problem with Postgres » -cb_2 tested using mysql...

I did a cursory test using mysql with a clean head install... Set cache on, loaded some lorem ipsum with devel module... logged out... didn't see any problems as anonymous, or logged in as an autheticated or super user..

did not that I the login block doesn't change after login, until another refresh occurs, but was consistant after removing patch so its probably user.module related... I'll head off that way now...

.darrel.

Cvbge’s picture

Title: -cb_2 tested using mysql... » Cache problem with Postgres

did not that I the login block doesn't change after login, until another refresh occurs, but was consistant after removing patch so its probably user.module related... I'll head off that way now...

I've tested that this bug also happens on fresh cvs with mysql as a db... also I've seen previous bug reports about this bug.

Cvbge’s picture

Final patch I hope.
Changes:

  • added update.inc update for postgresql
  • moved common mysql and mysqli functions to includes/database.common_mysql.inc. Moved functions: db_next_id(), db_query_range(), db_query_temporary(), db_decode_blob(), db_lock_table(), db_unlock_table().
  • addes support for %b to db_query_range(), db_query_temporary()

Becaus of new files you need to apply the patch with patch -p1 < thepath.diff

I've done some simple tests for mysql and postgresql and I'm going to test it with generate scripts from devel module, but would be good if mysql people could test it too.

Cvbge’s picture

Put common functions back in separate files.

Cvbge’s picture

I've done some performance testing for 4.7: with cache/without cache/with patch/withoutpatch with concurncy 1 and 5. Raw results are at http://cvbge.org/mb/

The test was done on my local machine, linux, 256M memory, Celeron 850MHz, Apache 1.3.33, MySQL 4.1.11, no other activity, 3 runs of each test.

Preparation:
1. download fresh cvs, put into 2 dirs, patch one of them
2. create 1 admin user
3. add, enable devel module
4. run generate-taxonomy.php, generate-contants.php
5. enable all modules (except legacy)
6. enable all blocks
7. grant all rights for anonymous user (except rights for devel module)

Quick comparison, only concurency 1:

              min  mean  [+/-sd]    median    max
Total:      702    715    35.2       706     950  [unpatched, no cache]
Total:      712    725    28.2       716     874  [patched,     no cache]
Total:      133    134     8.8       134     217   [unpatched, with cache]
Total:      134    142    15.2       141     260  [patched,     with cache]
Cvbge’s picture

Forgot to add: only front page was tested; PHP version 4.3.10.

Dries’s picture

Anyone that disagrees with this patch, or the numbers reported. I think we have no choice but to commit this patch, unless the performance regression is not acceptble.

Dries’s picture

I'd reorder the case statements in the switch-block such that the most common path is the fastest path. %d should probably at the top, followed by %s. %% shouldn't be first. It can cut down the number of comparisons, and slightly improve performance.

What is this code for?

+  if (count($args) > 0) {
+    if (is_array($args[0])) { $args = $args[0]; }
Cvbge’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
20.13 KB

Setting to RTBC, maybe more people will look at it.

I've written small script and counted all %foo modifiers, here are the results:

          'b' => 2,
          '%' => 24,
          'd' => 615,
          'f' => 4,
          's' => 373

If anyone wants the script:

#!/usr/bin/perl -w
use Data::Dumper;
my %count;
while(<>) { 
  next if (/\Wt[(]/);
  $count{$1}++ while(/%(%|s|d|f|b)(?=\W)/go); 
}
print Dumper(\%count);

I've changed the order, but I don't expect big performance boost... I think I'll do new tests again tonight.

The code you cited allows to pass an array of all arguments. The same behaviour exists in current code, only coded differently, i.e.:

if (count($args) > 1) {
    if (is_array($args[1])) {
      $args = array_merge(array($query), $args[1]);
    }

I've changed some documentation to be like the old one (different functions had differently worded documentation; not all has been updated)

Cvbge’s picture

I've compared the cb_6 and cb_7 performance and seen no difference between them... Raw results are on http://cvbge.org/bm2/ if anyone is interested.

Dries’s picture

I guess that means the overhead is due to preg_replace_callback.

Instead of writing

  if (count($args) > 0) {
    if (is_array($args[0])) { $args = $args[0]; }
  }

We can probably write:

  if (isset($args[0] && is_array($args[0])) 
 $args = $args[0];
  }

It's not tested but it looks like it might be slightly faster, and a tad more elegant? Depending on PHP's internal implementation of count() (does it store the size in the array header?), this might or might not make a small difference.

Maybe "/(%%|%s|%d|%f|%b)/" (double quotes) can be written as '/(%%|%s|%d|%f|%b)/' (single quotes)?

Cvbge’s picture

Stupid httperf...

Cvbge’s picture

Raw ab logs at http://cvbge.org/bm3/ (strpos and reference - additional versions that check for % before calling preg_replace and pass $args to preg_replace as reference)

Cvbge’s picture

DRUPAL_DB_QUERY_REGEXP -> DB_QUERY_REGEXP, undo one-liners

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed. Thanks.

Anonymous’s picture

Status: Fixed » Closed (fixed)
chx’s picture

Title: Cache problem with Postgres » The new db_query with preg_replace_callback is slower than printf
Version: » 4.7.2
Status: Closed (fixed) » Active

It's more than a bit late, but we have sacrificed quite some speed here -- for MySQL users, the majority of Drupal users, there is no gain from the new system.

I will submit a patch which removes %b support and moves the necessary SQL statements cache_set / cache_get to ??sql.inc.

beginner’s picture

Version: 4.7.2 » x.y.z
Priority: Critical » Normal
magico’s picture

Version: x.y.z » 4.7.3

@chx: any news about this? Is this a 4.7 only bug? We have some new cache system things in HEAD...

magico’s picture

Version: 4.7.3 » x.y.z

strange... this was already set to HEAD! But the submit failed...

Gábor Hojtsy’s picture

Version: x.y.z » 4.7.x-dev
Status: Active » Fixed

The performance issues are handled here for 6.x: http://drupal.org/node/171728
I set this fixed because the original issue was fixed here.

Anonymous’s picture

Status: Fixed » Closed (fixed)