This patch removes the locks from Drupal. Backwards compatibility is kept (for a change).

CommentFileSizeAuthor
#192 no_locks_vs_5_7.patch3.66 KBfelixsmile
#173 db_next_id_4.patch4.48 KBdavid strauss
#172 db_next_id_3.patch4.48 KBdavid strauss
#170 db_next_id_version_check.patch2.07 KBdavid strauss
#153 drupal_next_id_0.patch1.5 KBrecidive
#151 drupal_next_id.patch1.26 KBrecidive
#132 no_locks_vs_5.patch3.66 KBdavid strauss
#130 no_locks_vs_head.patch4.11 KBdavid strauss
#119 no_locks_drupal_5.patch3.69 KBdavid strauss
#115 no_locks_5.patch8.94 KBdavid strauss
#114 no_locks_4.patch8.93 KBdavid strauss
#110 no_lock_2.patch7.54 KBCrell
#106 no_lock_1.patch7.91 KBkilles@www.drop.org
#103 includes_4_7_5.zip_.patch20.42 KBfelixsmile
#93 no_lock_0.patch7.91 KBkilles@www.drop.org
#89 no_lock.patch9.46 KBkilles@www.drop.org
#82 NoLock-NoTempTables.dif_0.txt7.87 KBpeterthevicar
#81 NoLock-NoTempTables.dif.txt7.9 KBpeterthevicar
#65 no_block.patch.txt8.52 KBkilles@www.drop.org
#50 locking.png17.96 KBdries
#40 no-lock.jpg210.91 KBdries
#38 nolock1_0.patch7.34 KBchx
#37 nolock1.patch7.4 KBchx
#34 no_locking2.patch15.36 KBRobRoy
#33 no_locking.patch15.31 KBRobRoy
#30 no_locking_47.patch.txt14.16 KBkilles@www.drop.org
#28 nolock_3.patch13.09 KBchx
#25 nolock_2.patch13.09 KBchx
#23 nolock_1.patch12.26 KBchx
#22 nolock_0.patch12.11 KBchx
#21 nolock.patch12.85 KBchx
#17 no_locks_for_mysql_2.patch8.17 KBchx
#16 no_locks_for_mysql_1.patch8.8 KBchx
#12 no_locks_for_mysql_0.patch8.16 KBchx
#11 no_locks_for_mysql.patch8.15 KBchx
#6 no_locks_3.patch7.37 KBchx
#5 no_locks_2.patch7.19 KBchx
#2 no_locks_1.patch7.19 KBchx
#1 no_locks_0.patch6.03 KBchx
no_locks.patch6.03 KBchx

Comments

chx’s picture

StatusFileSize
new6.03 KB
chx’s picture

StatusFileSize
new7.19 KB

mysqli db_next_id was forgotten.

eaton’s picture

patched the files, ran update.php. I saw 'unspecified error' flicker briefly in the progress bar just before the page reloaded and said things had gone well. Everything SEEMS to be working though, including inserting/updating/etc in situations that previously locked tables. so far so good, further updates as events warrant.

markus_petrux’s picture

This is practically a dup of the postponed Getting rid of the sequences table issue. Although, that one proposed a db_insert_id() to replace db_next_id(), this new approach looks promising.

Would you mind to continue in that issue?

chx’s picture

Title: Get rid of locks » Get rid of locks for mysql
StatusFileSize
new7.19 KB

For postgresql, this just won't happen. At least not easily. Problem here is that if we have two requests that want to replace the same row, then the second will fail and that's bad. Cvbge, please comment.

chx’s picture

StatusFileSize
new7.37 KB
markus_petrux’s picture

May I ask why do you keep the {sequences} table? As far as I can see it is only used by the core itself, and only for MySQL.

chx’s picture

Because I want this in 4.7. For 4.8 the issue you linked is a possibility. For 4.7 this is a huge performance benefit for MySQL users while PostreSQL users neither lose neither gain anything. And finally, many contrib uses db_next_id -- why break BC when you can get away with it?

markus_petrux’s picture

I think I see, db_next_id is kept for backwards compatibility, though I believe this implementation wouldn't work.

function db_next_id() {
  db_query('INSERT INTO {sequences} VALUES (NULL)');
  return mysql_insert_id();
}

That is, db_next_id() is supposed to return the key of the specified table (argument $name removed here), not the next key of the {sequences} table.

markus_petrux’s picture

In the PG version of db_query_replace()

$queries['insert'][0] = 'INSERT '. $queries['insert'][0]

1) Is the keyword INTO missing?
2) semi-colon is missing.

chx’s picture

StatusFileSize
new8.15 KB

What you need is a unique indentifier. And this is unique. It's just that it's unique not just for nodes but users, menu items, comments etc. But fear not that we will run out of numbers, I am using a 63 bit counter.

And now database.mysql joins the patched files.

chx’s picture

StatusFileSize
new8.16 KB

INTO is optional, but why not add it? Yes, that ; was missing thanks for catching.

markus_petrux’s picture

Are all the related tables using BIGINTs as well? ;-)

markus_petrux’s picture

Are there performance issues (for joins, sorts, etc.) when using BIGINTs?

markus_petrux’s picture

Maybe you could still keep the existing design of the sequences table and use the REPLACE trick to get rid of the LOCK table in db_next_id?

chx’s picture

StatusFileSize
new8.8 KB

Well, I could use integers, yes, four billion objects are ought to be enough for any Drupal site. (no, we do not support all google pages as nodes until you donate a server farm comparable to Google's for development purposes.)

chx’s picture

StatusFileSize
new8.17 KB

Ops, the user module patch does not belong here.

chx’s picture

Status: Needs review » Closed (fixed)

Some good idea and some good time wasted.

chx’s picture

Status: Closed (fixed) » Needs work
beginner’s picture

Priority: Critical » Normal

critical?

chx’s picture

Title: Get rid of locks for mysql » Database code cleanup: no lock, no preg_replace_callback (for mysql at least)
Status: Needs work » Needs review
StatusFileSize
new12.85 KB

There we go. We have two new functions, _db_query_sprintf which is just sprintf for mysql(i) and the complex horror for pgsql it was. blob placeholder is now a define. REPLACE implemented. The postgresql implementation is based on the SQL by Joshua Drake (whee! did you know that he is into Drupal?)

Be wary: completely untested.

chx’s picture

StatusFileSize
new12.11 KB

a little cleanup...

chx’s picture

StatusFileSize
new12.26 KB

after testing, a new version (thx Eaton).

eaton’s picture

Haven't had a chance to look too deeply at the code, but it appears to be working finr, going through both an update to an existing database and a full fresh install cycle. It's bit disorienting to create a node with $nid = 47, but it seems to be working smoothly.

Becuse the ids are shared between tables we now need to pay attention to modules that churn through lots of IDs -- aggregator.module, for example, can be cron'ing away in the background creating hundreds of records (and using hundreds of IDs) an hour. Also, it might be worth looking for cases where we can use updates instead of deletes/inserts etc. In tables like {system} it doesn't matter, but in others it might.

chx’s picture

StatusFileSize
new13.09 KB

Mine db_next_id solution is clumsy compared to http://drupal.org/node/77312 . I removed the error handling from there as we do not check ever whether the database disappeared or not... I added mysqli, though.

moshe weitzman’s picture

chx’s picture

Priority: Normal » Critical

To make it clear: when this gets in, plumbley must be mentioned as the db_next_id is his. This is critical as locking tables are really not nice.

chx’s picture

StatusFileSize
new13.09 KB

sammys corrected db_query_replace for postgresql. the for update is not after SELECT but at the end. I feel so good that many other people help this patch.

dries’s picture

Someone cares to benchmark this? :)

killes@www.drop.org’s picture

StatusFileSize
new14.16 KB

Here's a patch for 4.7 for people who want to test it against a 4.7 install.

killes@www.drop.org’s picture

I've done some preliminary testing and can't see much of a difference.

chx’s picture

I am 100% that sprintf is faster than preg_replace_callback... even if you can't bench a difference, given enough queries there will be a difference however small. Ah and there is no disadvantage as far as I can see.

On another note, db_query_replace is sorely needed. Just look at http://drupal.org/node/77924 .

RobRoy’s picture

StatusFileSize
new15.31 KB

The 4.7 patch didn't apply totally cleanly so I re-rolled that patch with a fix for the block.module in there so people could test that out too.

This patch is for DRUPAL-4-7.

RobRoy’s picture

StatusFileSize
new15.36 KB

Oops, forgot to remove the DELETE in block.module.

Here is another for DRUPAL-4-7.

robertdouglass’s picture

I can confirm that for high traffic sites, the locking of tables is a severe disadvantage. I'll see if we can test this patch on the site in mind.

gábor hojtsy’s picture

Plus I can confirm that using LOCKs is not possible on Hungarian shared hosts, so those using Drupal on those hosts need to comment these out, which makes for a possibly vulnerable system without the solutions presented here. Unfortunately I don't have the time to test the patch though, but definitely agree that it is critical for the next Drupal release.

chx’s picture

Title: Database code cleanup: no lock, no preg_replace_callback (for mysql at least) » Remove database locking
StatusFileSize
new7.4 KB

Per killes' request I mutilated my beautiful patch.

chx’s picture

StatusFileSize
new7.34 KB

Reroll with mysqli junk removed.

dries’s picture

Going to benchmark this today (if all goes well). I already spent 2 hours setting up a good benchmark infrastructure for this.

dries’s picture

StatusFileSize
new210.91 KB

I spent a couple hours benchmarking this patch, but it doesn't seem to make any difference regardless of my benchmark configuration.

Attached is the result of my last benchmarks. All accesses were done by 20 concurrent anonymous visitors with page caching enabled. One out of the x accesses resulted in a cache miss, which will cause the page to get cached. In order to put a page in the cache, we need to take a lock. So when x is 2, for example, 1 request out of 2 will put a page in the cache and will require the lock to proceed. With 20 concurrent visitors, this means there must be some contention.

I don't think that's the case because (i) the graph is fairly linear (which suggest that lock contention isn't a problem) and (ii) the no-lock patch didn't improve performance.

eaton’s picture

On the other hand, removing database locks from core expands the number of hosts Drupal can run on, doesn't it?

dries’s picture

Drupal.org:

show status;
| Table_locks_immediate          | 648542721    |
| Table_locks_waited             | 12206055     |

On drupal.org less than 2% of the locks have to wait ...

My test setup:

| Table_locks_immediate          | 15392266   |
| Table_locks_waited             | 20863      |

On my test machine less than 0.2 of the locks had to wait ...

Looks like I might have to tweak my configuration setup some more so that at least 2% of the locks have to wait.

robertdouglass’s picture

I wish we could benchmark this on a client's site of mine. There, when we look at the MySQL processes at peak times, the whole page are stuck waiting on locks (100%). I can't confirm for sure whether this is due to some shady programming in a contributed module or not, but the queries that we see in the queue are the same ones addressed in this patch.

dries’s picture

I figured out what is wrong with my benchmark setup. Fixed it. The benchmarks are running again but they take 3 hours to complete.

Robert: could you check which table is asking for the lock (instrument db_lock_table and have it write its table names to a file):

  static $file = NULL;

  if (!$file) {
    $file = fopen('/tmp/lock.txt', 'a+'); 
  }
  fwrite($file, "$table\n");

variable_set() asks for a table lock so if modules are abusing the variables table to store data (eg. per-page counters), you might spent a lot of time waiting for locks ... if that is the case, you might want to write a stack trace to disk in order to find the culprit.

dries’s picture

chx, robert: looking at the code ... it doesn't make 100% sense to me. If we're going to have backend specific code, we might as well simplify this and use MySQL's REPLACE function. Introduce a db_replace() and get rid of all the ugly regex? Both the variables table, the session table and cache table could take advantage of that function.

If you provide a patch, I'll benchmark it and I'll compare it to the existing no-lock patch! :)

moshe weitzman’s picture

IMO, neglible performance change is a *good* thing. This patch is about removing a big support problem.

RobRoy’s picture

This patch is also about avoiding critical data loss issues which is why I think it belongs in 4.7. See my post about two users administering blocks at the same time http://drupal.org/node/77924.

chx’s picture

Rob, data losses can be prevented with locking. We try to avoid that now.

I would like to point out that the "ugly regexp change" I do in db_query_replace is modelled straight from db_query_temporary. But the cost of one regexp change is so minimal that it's dwarved by writing the database anyways.

dries’s picture

Well, I think it is a lot of ugly code (regex here, regex there) to work around this problem. I mean, if it gets this ugly, we might as well do a:

 if (db_type == 'mysql') {
    // MySQL specific code
 }
 else {
    // PostgreSQL specific code
 }

That is ugly as well, but at least it is (i) faster and (ii) readable.

dries’s picture

StatusFileSize
new17.96 KB

Here is another graph. 20 threads are requesting cached pages, and every once in a while (x-axis), the cache is invalidated. This will result in one thread regenerating the page, and grabbing the global table lock to store it in the cache.

The no-lock patch improves performance a little bit.

RobRoy’s picture

Rob, data losses can be prevented with locking.

I thought I asked about locking on IRC for the admin > blocks data loss problem at http://drupal.org/node/77924 and you said that it wouldn't help; that this patch was what needed to get in for that critical (IMHO) data loss issue.

But it seems for my issue of two admins deleting all block data, moving the DELETE and INSERTS next to each other and locking the blocks table would prevent data loss, no? I will refactor the patch there to work with the current db implementation and let me know if that would fix it.

chx’s picture

Dries, one preg_replace per a REPLACE type query is not bad at all. Especially because the regexp is very, very simple it's almost str_replace just a safer one.

What you suggest is not a generic solution -- you will need to write MySQL / PgSQL specific code for variable_set, for cache_set and so on for any such function. I provide an API. And when tostinni finishes (if he ever does) his Oracle specific code, you need to go over again, and add that. And so on. I am very much against that solution.

Another solution would be a query builder for insert query builder. I am not sure whether I want that or not but you really do not want that so that's a moot point.

gábor hojtsy’s picture

OK, we have a small performance improvement, we have a cleaner API, we have better support for free hosts (which disable LOCK for you). What is missing? Is it only some code cleanup?

dries’s picture

Code clean-up would be nice (I don't like the regex -- the code is quite ugly).

Also not that we'll always need table locks. For example, RobRoy's block.module fix I just committed introduces table locks in the block.module. This renders the non-performance argument of this patch moot.

I'll look at this some more as time permits -- I already spent hours testing/benchmarking this patch. I'll try to do some more testing next week. (I think this is very much about performance.)

m3avrck’s picture

The problem with locking is tied more to MyISAM than InnoDB type tables.

There is a patch: http://drupal.org/node/78503 ... it introduces a new function "db_engine_type" . If we use that function, we can see if InnoDB is supported (I'm guessing this will be almost always the case) and then switch the following tables to InnoDB:

variable
cache
blocks
sequences

Those are the only tables that we actually lock in core. As InnoDB types, we should see a nice performance boost on heavily trafficed sites with lots of concurrent users.

drumm’s picture

I think the regexp is okay.

chx’s picture

Status: Needs review » Reviewed & tested by the community

Based on #53 and #56 , really what's missing?

dries’s picture

Status: Reviewed & tested by the community » Needs work

There is a lock in block.module which renders the non-performance argument moot.

killes@www.drop.org’s picture

Sorry, I disagree, the lock in block.module is only executed if somebody saves the block admin page. For the cache_set function it is quite nice to not have the locks. Due to the many times the cache_set function is executed, there are quite a lot of lock queries done. These queries don't take a lot of time, but why not avoid them?

I think that we will never be able to entirely work without db locks.

RobRoy’s picture

Sorry, I disagree, the lock in block.module is only executed if somebody saves the block admin page.

Actually, that is incorrect. The locks in block.module execute when somebody views the block admin page as _block_rehash() is called on each view of that page to refresh the list of blocks for the user to modify.

But I still think that removing some locks is better than not removing any.

m3avrck’s picture

I agree, moving the other locks are much more important--those are ones that can affect lots of visitors to your site which can really slow down a *major* site.

The block lock is certainly needed, but in terms of performance, that should be of little effect, since it would only be admins visiting that page.

chx’s picture

Status: Needs work » Reviewed & tested by the community

If this goes in I will implement a variable-based semaphor for the block page. But that will cause a big debate which shall not hold this up.

drumm’s picture

I suspect the whole block rehash madness could be better designed so it doesn't delete everything and readd it on viewing the page. The point is, these problems can be solved and we should concentrate on single changes and get them done. We have to start somewhere and every lock removal needs to stand on it's own merits anyway.

moshe weitzman’s picture

Status: Reviewed & tested by the community » Needs work

now there is a conflict in cache_set(). sigh.

killes@www.drop.org’s picture

Status: Needs work » Needs review
StatusFileSize
new8.52 KB

re-rolled

dries’s picture

I'm going to postpone this patch. We'll work on it later.

killes@www.drop.org’s picture

If you only concern is that this patch doesn't remove all locking instances, look at:

http://drupal.org/node/80963

This removed the last remainning lock in core. I'd prefer to keep the db_lock function nevertheless.

drumm’s picture

Status: Needs review » Postponed

That other one isn't in yet (although it seems like a good idea, and not because of locking).

Can we break this into two patches since it removes two unrelated use of locking? It would be nice to review each on their own merits.

On the next_id one- why not just use auto increment? I postgres has something suitable for this.

On the replace one- that code looks annoying to write, but I can't think of any way to make it better.

chx’s picture

Assigned: chx » Unassigned
Status: Postponed » Closed (won't fix)

this patch missed two releases. i don't care any more. not my problem. if someone wants to fight blatant ignorace masked in unfounded requests, then do it.

gerhard killesreiter’s picture

Status: Closed (won't fix) » Needs review

no comment

drumm’s picture

For the record, I think removing use of database locking from core is a good idea because many smaller hosts don't allow that permission.

I think having a less-clunky way of replacing rows is a good idea. And I think having a better way of dealing with sequences is a good idea (although I'm not sure why we are keeping that table around when, as far as I know, don't really need it).

mboy-1’s picture

Agree

mboy-1’s picture

After 4 hours, i finally patched files, but why I still get the error message even I applied the patch?

This is the error message when I tried to create a new page.

user warning: Access denied for user: 'xxxxxx@%' to database 'xxxxx' query: LOCK TABLES cache WRITE in /home/content/............/html/includes/database.mysql.inc on line 120.

Please help, thanks!

recidive’s picture

Using auto_increment in the sequences table and returning mysql_insert_id() in db_next_id() also allow us to setup mysql replication with multiple masters. This sort of replication requires insert in more than one database at the same time.

We want to setup mysql for high availability using a master-master replication with following settings:

In the Master A server my.cnf file we would have:

auto_increment_increment=10
auto_increment_offset=1

And on the Master B:

auto_increment_increment=10
auto_increment_offset=2

This way the keys inserted in one database never match with the other, i.e. the keys inserted in Master A database would be 1, 11, 21 etc and in Master B 2, 12, 22 etc. But for this to work with drupal, we need the keys to be generated by the auto_increment feature.

-1 for the last patch.

+1 for the patch in comment #23.

guidocecilio’s picture

Category: task » support
Priority: Critical » Normal

You can tell me what patch to apply, because there are many and I need its since I am a slave of Godaddy and a lover of Drupal

killes@www.drop.org’s picture

Category: support » task
Priority: Normal » Critical

don't hijack issues.

FiReaNGeL’s picture

Following this discussion. +1 for getting rid of table locks.

chx’s picture

Status: Needs review » Postponed

Not because I do not care -- I still do not and will not change ever -- but because Dries said so, this is postponed.

Just clearing the Drupal5 queue.

jonaswouters’s picture

Version: x.y.z » 4.7.x-dev

I also agree we should get rid of the table locks. Two of our biggest sites are suffering from this issue at the moment and we have no quick fix for them. @Dries, this is a serious problem for us. Please review this issue again.

webchick’s picture

Version: 4.7.x-dev » 6.x-dev

This won't happen in 4.7.x... 6.x at the earliest.

peterthevicar’s picture

StatusFileSize
new7.9 KB

Thanks to chx and niosop it is possible to avoid using LOCK TABLE and CREATE TEMPORARY TABLE.

I know I'm not alone in having a shared hosting package which doesn't allow these statements.

Here is a minimal patch for 4.7.4 which removes both safely. There are still some calls to LOCK TABLES but these are in admin places so will simply give an error message but not cause problems in the usual case of a single admin. The comments in the patch say where the code comes from - none of it is mine.

Note this only works for MySQL.

HTH, Peter

peterthevicar’s picture

StatusFileSize
new7.87 KB

Oops - forgot to name the cache table.

kbahey’s picture

Subscribing to this issue.

felixsmile’s picture

Title: Remove database locking » Version 4.7.5

Is there a patch available for Drupal 4.7.5?

It would be nice if you could also add a patched version of bootstrap.inc, database.mysql.inc, search.module. I realise this is a rather stupid question, but somewhere on this site said quite correctly that even experienced users (admittedly w*** users) sometimes have problems patching their files.

If anyone could post the patched 4.7.5 files, it would be really nice.

Thank you very much, great job, I'm impressed,

Felix.

PS: you should make more publicity for your thread, it's by far not the first one you find when performing a search

Crell’s picture

Title: Version 4.7.5 » Remove database locking

Restoring title...

felixsmile, please do not change the title of an issue when you comment on it. The title is for the whole thread, not for your comment. Thanks.

m3avrck’s picture

Status: Postponed » Needs review

Doesn't need to be postponed either now that 6.x branch exists.

felixsmile’s picture

I'm sorry, I noticed it a bit too late, and then, I thought a double post would be worse. Next time I'll fix it myself. Thank you for telling me.

Caleb G2’s picture

subscribing

killes@www.drop.org’s picture

StatusFileSize
new9.46 KB

I've updated chx' latest patch.

wim leers’s picture

Subscribing.

killes@www.drop.org’s picture

Status: Needs review » Needs work

chx has pointed out, that my patch has cruft in it and that we should look into using INSERT ... ON DUPLICATE KEY UPDATE

http://dev.mysql.com/doc/refman/4.1/en/insert-on-duplicate.html

chx’s picture

We looked at it and it's hard to convert INSERT syntax into that. Still I leave the issue at CNW because there is cruf.t

killes@www.drop.org’s picture

Assigned: Unassigned » killes@www.drop.org
Status: Needs work » Needs review
StatusFileSize
new7.91 KB

Updated the patch and remove the cruft.

I have been running this with a test system of mine. The tests consist of letting 5 rampaging wgets on an unsuspecting Drupal site. The filter cache was cleared before this. With this patch, the script took 15m15s without 16m16s. About 10000 filter strings were generated and stored.

thierry_gd’s picture

+1 for getting rid of table locks.

felixsmile’s picture

This looks like it's really working great. Please excuse me if this is a naive question, but I couldn't find any answer to this here:

what are the plans for versions 4.7.5 and 5.0?

If I understood all the discussions around the topic, this is particularly interesting as even when a user has the lock tables privilege, Drupal runs faster without locking the tables. You should do more advertisement, for sure :)

Thanks

Felix.

gerhard killesreiter’s picture

There are no plans to backport this patch (as usual). However, you can probably safely apply it to your site's installation. I am running it on drupal.org now and haven't yet seen any problems related to it. I'll wait a bit longer and then mark it as rtbc.

matt westgate’s picture

I just had an experience on site that converted some of its write-heavy tables to InnoDB which actually caused more overhead since the LOCK TABLES queries are still run since it causes InnoDB and MySQL to both perfom locks. Once I commented out the LOCK and UNLOCK queries since InnoDB was handling it's own transactions, we saw a performance gain.

felixsmile’s picture

Maybe I didn't understand what you said, I thought adapting the patch for versions 4.7.5 and 5.0 would not really be backporting but updating. That's what I really meant: does anyone have any plans to update the patches or, even better, has already done it and could post the patch or the "finished products"?

If no one has, I will try and look at it, comparing the files of version 4.7.5 and 4.7.4. If the lines concerned are identical (or similar), I'll patch it "by hand", line by line :) (if you tell me there is a better method, please feel free too =D).

Thank you for the experience about the performance issue. So, rephrasing my previous post: in a "pure" MySql environment, the patch might/will bring some performance improvements.

Thank you very much for your replies,

Felix

felixsmile’s picture

Update: maybe I've understood your post now: do you mean that now that the task has been assigned to version 6.x, you will no longer think about the versions 4.7.x and 5.x, which would also be the reason why you called it backporting?

I'm not a freak in matters of php, but I think this is a really useful patch for many of us. If anyone can explain me roughly what to do to update the patch for 4.7.4 posted above to 4.7.5, I can write a detailed tutorial for the people interested, if you like.

Thank you, and sorry for the double post,

Felix

webchick’s picture

It means that the patch will only be applied to Drupal 6 (if it's accepted), and not previous versions of Drupal. So you will never be able to download a 4.7.7 or a 5.2 or whatever that has this change in it.

However, you can apply the patches to an existing 4.7.x/5.x installation, as long as you know what you're doing and remember to re-apply the patch each time you upgrade (say, from 4.7.5 to 4.7.6). More info is available on applying patches here: http://drupal.org/node/60108

dries’s picture

Still not a fan of the regex ...

Is there a reason we can't clean up db_next_id(). It's in the MySQL specific database backend so it should be OK to use MySQL specific constructs (like REPLACE).

gerhard killesreiter’s picture

preg_replace('/^INSERT /', 'REPLACE ', $insert_query),

This poor regexp? This so harmless that we almost could convert it to str_replace, but only almost.

The main concern (performace wise of course) of this patch is not db_next_id btw, but cache_set.

felixsmile’s picture

StatusFileSize
new20.42 KB

OK, while you are talking about really interesting stuff, I'll try to describe for newbies like me how to use your nice patch. Please tell me if something is missing, I don't want to be responsible for the destruction of a site =D

This description is valid for 4.7.5, but I'm sure it will work with other versions, too. Don't forget to make tons of BACKUPS before everything you try.

Thank you for your previous reply, I'll repeat it here: every time you will update your Drupal installation, you'll have to patch the updated files again.

No-Lock-Patch in 10 easy steps.

1) As a newbie is more probably a Windows user :) go here http://drupal.org/node/32875 and download cygwin. (otherwise http://drupal.org/node/60108 will be good for you). Don't forget patchutils during installation.

2) If you leave all the options as they are, cygwin will be in C:\cygwin. There is a home directory in there, with another directory inside containing your username. In there, you copy the entire includes folder of your Drupal installation. (that's just one way or another, if you have bandwith problems, do it differently, copying just the four or five files to be patched.

3) Copy the patchfile in the same directory.

4) Open cygwin, go to your includes folder (typing cd includes).

5) Then, type patch < no_lock_0.patch

6) He will complain:

can't find file to patch at input line 30
Perhaps you should have used the -p or --strip option?
The text leading up to this was:
--------------------------
|Index: includes/cache.inc
|===================================================================
|RCS file: /cvs/drupal/drupal/includes/cache.inc,v
|retrieving revision 1.5
|diff -u -p -r1.5 cache.inc
|--- includes/cache.inc 10 Nov 2006 07:26:27 -0000 1.5
|+++ includes/cache.inc 16 Jan 2007 14:20:08 -0000
--------------------------
File to patch:

7) Tell him bootstrap.inc, cache.inc does not exist for drupal 4.7.5, you will get a little error file later.

8) The rest will work fine, and you will have 5 new little files. On the screen, you might see something like this:

File to patch: bootstrap.inc
patching file bootstrap.inc
Hunk #1 FAILED at 91.
1 out of 1 hunk FAILED -- saving rejects to file bootstrap.inc.rej
patching file database.mysql.inc
Hunk #1 succeeded at 204 with fuzz 1 (offset -50 lines).
Hunk #2 succeeded at 386 (offset -57 lines).
patching file database.mysqli.inc
Hunk #1 succeeded at 196 with fuzz 1 (offset -43 lines).
Hunk #2 succeeded at 378 (offset -50 lines).
patching file database.pgsql.inc
Hunk #1 succeeded at 382 (offset -44 lines).

9) One thing is still missing, though: the error he produced before. Now, in the file bootstrap.inc.rej he tells you what the problem is. So, you have to copy and paste manually the missing code in your new bootstrap.inc. Here is an example with the files I had now.

Get rid of this (opening bootstrap.inc with a text editor):

function cache_set($cid, $data, $expire = CACHE_PERMANENT, $headers = NULL) {
  db_lock_table('cache');
  db_query("UPDATE {cache} SET data = %b, created = %d, expire = %d, headers = '%s' WHERE cid = '%s'", $data, time(), $expire, $headers, $cid);
  if (!db_affected_rows()) {
    @db_query("INSERT INTO {cache} (cid, data, created, expire, headers) VALUES ('%s', %b, %d, %d, '%s')", $cid, $data, time(), $expire, $headers);
  }
  db_unlock_tables();
}

and replace it with

function cache_set($cid, $table = 'cache', $data, $expire = CACHE_PERMANENT, $headers = NULL) {
db_query_replace(
 array("DELETE FROM {%s} WHERE cid = '%s'", $table, $cid),
array("INSERT INTO {%s} (cid, data, created, expire, headers) VALUES ('%s', %b, %d, %d, '%s')", $table, $cid, $data, time(), $expire, $headers)
);
  }

10) Copy everything back to your site (i.e. ideally your test installation) and pray it's going to work. :)

If anyone of the more experienced users gets a heart attack reading this, please tell me why you had it. Otherwise, I think these files should work well.

I attach the result of my little experiment so that if the experienced users feel my method was OK, someone else might get around the hassle (well, in reality, not that much...) of patching the files. It's a zip file, I'm sorry, please tell me if there is a better way to post the files.

Good luck to everyone, and thank you for that really nice piece of work and your help in the forum,

Felix.

felixsmile’s picture

Update: little (or big?) problem. If I do this on my live site (don't worry, all is backuped =D), I will get:

user warning: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '; (cid, data, created, expire, headers) VALUES ('bc_18_fr_http://www.mysitewithoutthedotcom' at line 1 query: db_query_replace REPLACE INTO N; (cid, data, created, expire, headers) VALUES ('bc_18_fr_http://www.mysite.com/?q=user/38/edit', '-1', 1169766951, 0, '') in /home.10.6/mydatabase/www/includes/database.mysql.inc on line 121.

I'm running on MySql 5, and the lines that must be concerned are the following (as there is no db_query_replace in the "original" non-patched version):

function db_query_replace($delete, $insert) {
  $insert_query = array_shift($insert);
  db_query(preg_replace('/^INSERT /', 'REPLACE ', $insert_query), $insert);
}

Sorry if I'm making a dumb mistake, I just wanted to make sure no ones patches as I said :) Nonetheless, of course, I would be quite, not to say very grateful if you could help me out.

Thank you again,

Felix.

Art Morgan’s picture

Is anyone using this patch successfully on 4.7.5? As I explain in the post:
http://drupal.org/node/115634#comment-198815
I applied the patch and don't get any errors, but my cache table is still locking and I still have horrible latency when adding nodes and comments. If anyone has any updates or alternative solutions (aside from moving to 5.0), I would love to know.

killes@www.drop.org’s picture

Status: Needs review » Reviewed & tested by the community
StatusFileSize
new7.91 KB

If anybody messes this issue any further up by adding "I want this for Drupal 4.4" I'll get really unfriendly.

I've re-rolled the patch for offsets. It runs now for two months on drupal.org and we didn't encounter any problems. IMO this is good to go.

kbahey’s picture

OK, how about 4.3 then?

Seriously, here is something I don't understand:

For the sequences, database.mysql.inc, says:

return mysql_insert_id();

However, for database.mysqli.inc, it says:

return db_result(db_query("SELECT LAST_INSERT_ID() FROM {sequences} WHERE name = '%s'", $name));

Is this intentional, or an oversight? I think the mysqli version is good enough for mysql.

killes@www.drop.org’s picture

mysqli_insert_id only exists in PHP5. using mysql_insert_id is the better option, I believe.

recidive’s picture

Status: Reviewed & tested by the community » Needs work

The php docs say mysqli_connect is also PHP5 only, so I guess we can stick with mysqli_insert_id() in database.mysqli.inc.

Crell’s picture

Status: Needs work » Reviewed & tested by the community
StatusFileSize
new7.54 KB

Patch no longer applied. This is just a reroll of #106 but with mysqli_insert_id(), since it is guaranteed on the same PHP version as mysqli itself is.

I rather like the side effect that we get an alternative to the delete/insert cycle to use elsewhere such code would be needed. Spiffy.

dries’s picture

I'll think about this patch some more.

david strauss’s picture

A few problems:

  • db_query_replace() breaks any existing transactions that a currently running. If we add transaction calls to Drupal core, we need to support transaction nesting with something like PressFlow Transaction. (Killing existing transactions is also a problem when calling LOCK.)
  • db_next_id() has a rather slow implementation compared to my #145542 patch.
  • At least one call exists to a mysqli_* function without specifying a connection, result, or other context. This works with the basic mysql_* (no "i") functions, assuming you only have one connection active. The mysqli extension, however, requires specifying the connection. The call I noticed is to mysqli_insert_id() in db_next_id(). See mysqli_insert_id and mysql_insert_id. Note that only the latter has the $link as an optional argument.
  • The cache and variable updates should use ON DUPLICATE UPDATE instead of REPLACE. REPLACE actually performs a DELETE and INSERT, with all the side-effects, like CASCADE DELETEs if you use them. ON DUPLICATE UPDATE merely runs an UPDATE on duplicates, which is semantically correct for what we're doing. The PostgreSQL db_query_replace() implementation will need modification if we use ON DUPLICATE UPDATE instead of REPLACE.
david strauss’s picture

Status: Reviewed & tested by the community » Needs work
david strauss’s picture

StatusFileSize
new8.93 KB

I haven't tested this patch at all, but I'd like some help writing db_query_replace for PostgreSQL.

Right now, I have:

function db_query_replace($insert, $update) {
  // Extract the INSERT query SQL
  $insert_sql = trim(array_shift($insert), ';');

  // Prepare the UPDATE query
  // The UPDATE SQL must be prepared early to handle extra arguments
  $update_sql = trim(array_shift($update), ';');
  $update_sql = _db_query_prepare($update_sql, $update);
  
  db_query($update_sql . '; IF NOT FOUND THEN ' . $insert_sql . '; END IF;', $insert);
}

Will that work? Because it doesn't DELETE, there's no longer a risk of having the database in a briefly funky state, so I've removed the transaction encapsulation.

david strauss’s picture

StatusFileSize
new8.94 KB

Fixes a goof in the MySQL part. This patch remains completely untested.

owen barton’s picture

Subscribing.

I like this approach - can someone confirm how this might map to other databases? If this structure is pretty standard then I think we have a winner...

david strauss’s picture

Because my version of the patch doesn't ever DELETE rows, it doesn't even require transactions for the db_query_replace function. I only take advantage of special capabilities in MySQL and PostgreSQL to improve performance.

david strauss’s picture

For the sake of archiving semi-working versions of these functions that are no longer necessary for the patch:

MySQL(i)

/**
 * Perform a row INSERT or UPDATE.
 *
 * @param $insert
 *   An array, where the first value is a query that inserts a new row,
 *   subsequent values are arguments. Same syntax as db_query.
 * @param $update
 *   An array, where the first value is a query that updates an existing row,
 *   subsequent values are arguments. Same syntax as db_query.
 */
function db_query_replace($insert, $update) {
  // Queries must be prepared separately to properly handle extra supplied args

  // Prepare the INSERT query
  $insert_sql = array_shift($insert);
  $insert_sql = _db_query_prepare($insert_sql, $insert);
  
  // Prepare the UPDATE query
  $update_sql = array_shift($update);
  $update_sql = _db_query_prepare($update_sql, $update);
  
  // Identify the SET clause of the UPDATE query
  preg_match('/SET (.*) WHERE/si', $update_sql, $matches);
  $update_set = $matches[1];
  $sql = $insert_sql . ' ON DUPLICATE KEY UPDATE ' . $update_set;
  
  return db_query($sql);
}

PostgreSQL:

/**
 * Perform a row INSERT or UPDATE.
 *
 * @param $insert
 *   An array, where the first value is a query that inserts a new row,
 *   subsequent values are arguments. Same syntax as db_query.
 * @param $update
 *   An array, where the first value is a query that updates an existing row,
 *   subsequent values are arguments. Same syntax as db_query.
 */
function db_query_replace($insert, $update) {
  // Extract the INSERT query SQL
  $insert_sql = trim(array_shift($insert), ';');

  // Prepare the UPDATE query
  // The UPDATE SQL must be prepared early to handle extra arguments
  $update_sql = trim(array_shift($update), ';');
  $update_sql = _db_query_prepare($update_sql, $update);
  
  db_query($update_sql . '; IF NOT FOUND THEN ' . $insert_sql . '; END IF;', $insert);
}
david strauss’s picture

StatusFileSize
new3.69 KB

This patch is designed to be the minimum necessary to remove the locks. It includes my uber-sequencer for MySQL. I've eliminated the replacement function because it was causing problems. If the replacement function performed deletions, it had semantically incorrect side effects. If it updated, the MySQL queries became excessively long, and PostgreSQL still had to run two queries. Having it update required supplying update and insert versions of the query, which also eliminated the convenience of using the function. Finally, I prefer to avoid string manipulation of queries.

This patch is for Drupal 5.1, and it has been tested on a clean 5.1 installation. Re-rolling against Drupal 6 will be trivial.

moshe weitzman’s picture

the db_next_id() changes look great ... the other changes appear to make variable_set() and cache_set( unsafe. Perhaps you are saying that they aren't safe today either. Please elaborate on why this change to variable_set and cache_set is safe.

david strauss’s picture

@moshe weitzman The updates to variables and the cache are safe. Because they both happen the same way, I'll just use variables as the example. The examples show what would happen if another thread reads the value at any time.

Scenario one: a variable already exists

  1. Another thread reads the old value.
  2. variable_set() updates the value.
  3. Another thread reads the new value.

Scenario two: the variable is not set

  1. Another thread reads that the variable is unset.
  2. variable_set() attempts to update the value and fails.
  3. Another thread reads that the variable is unset.
  4. variable_set() inserts the value.
  5. Another thread reads the new value.

This behavior supports atomicity because other threads never see inconsistent or partially updated states. The old variable_set needed the lock because it deleted the variable and added it again. This created an inconsistent state where no variable value was set that, without the lock, could be readable by other threads. The lock for cache_set() is simply unnecessary.

chx’s picture

After an INSERT / LAST_INSERT_ID(id + 1) why do you check for $id == 0 ??

david strauss’s picture

By the way, we could also decrement all the current values in {sequences} by one and run the following:

function db_next_id($name) {
  global $active_db;
  $name = db_prefix_tables($name); 
  db_query('INSERT INTO {sequences} VALUES ("%s", %d) ON DUPLICATE KEY UPDATE id = LAST_INSERT_ID(id + 1)', $name, 0);
  $id = mysqli_insert_id($active_db);
  return $id + 1;
}
david strauss’s picture

@chx The last_insert_id for MySQL is 0 if the previous query didn't set a value. If the table has an autoincrement column (which {sequences} doesn't), last_insert_id equals the value for that column. So, the INSERT part of the query won't set a value for last_insert_id, leaving it as 0. However, if the INSERT fails and the UPDATE runs, I explicitly set last_insert_id to a value greater than 0.

david strauss’s picture

Assigned: killes@www.drop.org » david strauss
Status: Needs work » Needs review
recidive’s picture

Status: Needs review » Needs work

Nice approach!

Changing this:

db_query('INSERT INTO {sequences} VALUES ("%s", %d) ON DUPLICATE KEY UPDATE id = LAST_INSERT_ID(id + 1)', $name, 1);

To this:

db_query('INSERT INTO {sequences} VALUES ("%s", LAST_INSERT_ID(%d)) ON DUPLICATE KEY UPDATE id = LAST_INSERT_ID(id + 1)', $name, 1);

will make SELECT LAST_INSERT_ID() return 1 when it is running the INSERT part, so we can get rid of the $id == 0 check.

Is it safe to use mysql_insert_id() instead of SELECT LAST_INSERT_ID() when making complex use of LAST_INSERT_ID() functionality? Does it just run LAST_INSERT_ID() behind the scenes?

moshe weitzman’s picture

nice explanation, david.

i think this is ready for a reroll against HEAD, quick test, and RTBC.

david strauss’s picture

@recidive Nice suggestion; I'll roll it into the patch. last_insert_id is stored for each connection, so there's no change of it getting scrambled by another thread. mysql_insert_id() and SELECT LAST_INSERT_ID() are equivalent.

david strauss’s picture

Here's a re-roll against Drupal 5.1. A re-roll against HEAD is coming in the next few minutes.

david strauss’s picture

StatusFileSize
new4.11 KB

Here's a re-roll against HEAD.

recidive’s picture

Status: Needs work » Reviewed & tested by the community

Tested, works great! Marking as RTBC.

You forgot to attach the 5.1 patch. By the way, is it going to be back ported to Drupal 5?

Great work.

david strauss’s picture

StatusFileSize
new3.66 KB

This patch is impossible to backport to Drupal 5 without breaking MySQL 3 compatibility. (It might be possible if we can detect the MySQL version and only run the new code on MySQL 4 or higher.) Drupal 6 requires MySQL 4 or higher, so there's no problem there.

Here's the latest patch for Drupal 5.

david strauss’s picture

Adding mysql_get_server_info() and version_compare() would allow a check if the MySQL version used with Drupal 5 would support the new db_next_id(). However, it would probably be better to just leave it as a Drupal 5 patch that high-volume sites can use.

kbahey’s picture

From what I understand LAST_INSERT_ID() has issues when using persistent connections. So a site that uses mysql_pconnect can get in trouble.

But this should not stop the patch going in. The solution is to document this caveat/restriction.

david strauss’s picture

@kbahey There's no caveat to document. Drupal doesn't use persistent connections.

chx’s picture

Search using temporary tables already breaks w/ pconnect. So, let's just move on.

dries’s picture

Wasn't the idea to get rid of the sequence table and db_next_id()? That said, this is a step forward. Removing this would probably also fix the pconnect issues.

While thread-safety of variable_set/variable_get tuples is not an issue here, it could still happen that we attempt to do two INSERTS at the same time in the case of the variables table. It might not actually be a big problem but we'd silently ignore the collapse and lose the data of the second insert ... At the core, the proposed patch makes variable_set() unsafe -- you can no longer trust the function to actually store your data. This should probably be documented, and then forgotten about. ;)

Then again, variable_set() shouldn't be in the critical path of the execution -- unless someone is incrementing a counter for each page being generated. Or something like that. This patch would help for situations like that but for most people, this shouldn't eliminate a ton of locks.

Is the primary motivation of this patch, performance?

dries’s picture

(Just to be clear: I support David's patch but it doesn't hurt to talk/brainstorm about this a bit more.)

Crell’s picture

AN idea was to remove the sequence table. I don't believe that was ever the intent of this patch. This patch is "explicit locks suck, let's not use them." :-)

I don't think the db_next_id() functions can get any more atomic than this. For variable_set(), the only error case I can see is if two processes try to set the variable for the first time at the same time. Both get 0 affected rows from the update, then both try to insert, and one will fail. The odds of that happening are so remote that I am not concerned about it at this point, especially when it means we get rid of a lock. We already have the case where an existing variable can be overridden immediately after being saved by another process. It just happens a bit slower with a table lock. :-)

The same first-write is a concern for the cache table, and it's slightly more likely to happen, but that's the only potential issue I can see.

david strauss’s picture

@Dries The primary motivation for this patch is performance.

And when I say "safe," I mean the operations are atomic. I agree that they are not durable, but durability is impossible to guarantee without actual transactions. It is true that, should two threads try to store a value for a nonexistent variable or cid at the nearly the same time, only the first thread will succeed. This is still atomic because the first operation is fully committed, and the second operation is fully uncommitted. Atomicity in a database is not a guarantee that all operations succeed. (The closest analogue is durability, the guarantee that committed transactions have a permanent effect.) Atomicity is a guarantee that an operation completes fully or not at all. Now, I'd love to wrap those two operations in transactions so the row lock makes the second thread wait and have the final effect, but that would break compatibility with MyISAM.

I could rewrite cache_set and variable_set using either REPLACE or ON DUPLICATE KEY UPDATE. That would solve concurrency issues by locking the table once for the INSERT or UPDATE, but using ON DUPLICATE KEY UPDATE can create excessively long queries for cache_get. REPLACE has the annoying tendency to actually pair a DELETE and INSERT instead of running a proper UPDATE.

What would people think about splitting off InnoDB into a different abstraction layer? Say, mysql_innodb and mysqli_innodb? We could use transactions without the guilt of broken compatibility.

david strauss’s picture

@Crell The other problem with theoretically solving concurrency issues (sans transactions) is that there's no guarantee that the later page load, node save, or anything else actually grabs the later lock. In the end, that's all the database allows us to verify with a non-transactional level of concurrency control: the second query that reaches the database gets the final say. But if two queries occur so close in time that their order of INSERTs and UPDATEs affects the other, then one could just as easily be delayed by slower-running PHP or some other effect.

Crell’s picture

I had suggested mysqli -> innodb on the infrastructure list a few weeks ago in a sleep-deprived stupor. It's a possibility, but has two caveats:

1) It would effectively mandate PHP 5 if one wants to use InnoDB. I consider this a bonus rather than a downside, personally, but that's just me. :-)

2) It would not make mixed-engine databases possible, or at least easy. If all DB calls assume [MyISAM | InnoDB] semantics, then mixing engines could lead to undesired weirdness.

I'm not sure if it's too late in the D6 cycle to implement that now. I haven't looked into the schema API changes enough to say how they would affect it, either. (This is another area where an all-PDO backend would help, as PDO provides a transaction compatibility layer that silently ignores transaction commands that the database can't handle; not perfect, but at least provides a single API.)

As for the long queries, are very-long cache_set() calls common enough that it's a problem? most variable_set() calls will be reasonably small and rare, or small and rare enough that they can be safely doubled and not kill the database. I'm not sure about cache-set().

recidive’s picture

AFAIK, using transactions will not break MyISAM compatibility, it will just have no effect on these tables. I'd like to see those operations making use of transactions too (and some others like node/revisions insertion). We could add db_begin(), db_commit() and db_rollback() functions. Innodb and pgsql admins would love this. I'm not sure these changes fit the purpose of this patch though.

dries’s picture

Status: Reviewed & tested by the community » Fixed

Yep, MySQL version that don't support transactions, should ignore these statements. Either way, we should probably leave that for another patch.

I've committed this patch to CVS HEAD. Thanks all.

I don't feel comfortable backporting this patch to DRUPAL-5. This needs serious testing first.

david strauss’s picture

Adding transaction support is issue #146928.

ChrisKennedy’s picture

Status: Fixed » Active

The change to db_next_id() is causing me to get a bunch of duplicate key warnings for menu_links on MySQL 4.1.21 during a clean installation:

user warning: Duplicate entry '2' for key 1 query: INSERT INTO menu_links ( menu_name, mlid, plid, href, router_path, hidden, external, has_children, expanded, weight, depth, p1, p2, p3, p4, p5, p6, module, link_title, options) VALUES ( 'navigation', 2, 0, 'user', 'user', 4, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 'user', 'Log in', 'a:0:{}')

etc.

If I revert the change to includes/database.mysql.inc the errors don't occur. Any ideas?

david strauss’s picture

@ChrisKennedy I just installed last night's Drupal 6 development snapshot with a MySQL 4.1.20 database and PHP 5.2.2. I'm using mysqli to connect to the database. I cannot replicate this issue. What are you doing when this error occurs?

david strauss’s picture

@ChrisKennedy A few questions:

* Are you using mysql or mysqli to connect?
* Are you using MyISAM or InnoDB?
* What version of PHP are you using?

ChrisKennedy’s picture

This is during installation, with the output displayed during the "Configure site" step: install.php?profile=default&locale=en

I'm using PHP 4.4.6 if that helps. Merlin didn't have the error either on MySQL 4.1.20, so maybe my host has modified some MySQL settings that are causing it only for me.

If no one else is getting these warnings we can change the issue back to fixed.

david strauss’s picture

I'm creating a PHP 4/MySQL 4.1/MyISAM test environment to identify the extent of the the problem ChrisKennedy is having.

recidive’s picture

Status: Active » Needs review
StatusFileSize
new1.26 KB

I suspect this problem is related with differences between mysql_insert_id() and LAST_INSERT_ID() on this mysql version.

Attached is a patch that changes the return from mysql_insert_id() to db_result(db_query('SELECT LAST_INSERT_ID()')). I think it's worth testing.

david strauss’s picture

Status: Needs review » Needs work

@recidive If your patch fixes the problem, the global variables should be removed from db_next_id().

recidive’s picture

Status: Needs work » Needs review
StatusFileSize
new1.5 KB

Ok, new patch that also removes the global $active_db;.

@ChrisKennedy: could you test and see if this patch fix the problem?

ChrisKennedy’s picture

I still get the errors unfortunately.

david strauss’s picture

Status: Needs review » Needs work

The patch in #151 doesn't fix ChrisKennedy's issue. (I had him try the patch.) I set up a test installation using PHP 4.4.4 as a CGI and MySQL 4.1.20 (MyISAM). I'm connecting to the database over TCP/IP to a different host. After configuring the database in the setup process, I get sent back to the first screen of installation (language selection). Then, if I click "English," I get the following error:
Fatal error: Cannot redeclare cache_get() (previously declared in /home/.ceylon/fkdemos/d6.fkdemos.com/includes/cache-install.inc:13) in /home/.ceylon/fkdemos/d6.fkdemos.com/includes/cache.inc on line 14

But I do not get the duplicate key warnings that ChrisKennedy reports.

Tested and works:
* PHP 5.2.2 as mod_php + MySQL 4.1.20 Server + InnoDB + MySQLi 5.0.33 Client Library
* PHP 5.2.2 as mod_php + MySQL 4.1.20 Server + MyISAM + MySQL 5.0.33 Client Library
* PHP 5.2.2 as mod_php + MySQL 4.1.20 Server + MyISAM + MySQLi 5.0.33 Client Library
* PHP 5.2.1 as a CGI + MySQL 4.1.20 Server + InnoDB + MySQL 5.0.16 Client Library
* PHP 5.2.1 as a CGI + MySQL 4.1.20 Server + InnoDB + MySQLi 5.0.16 Client Library
* PHP 5.2.1 as a CGI + MySQL 5.0.24a-standard-log Server + MyISAM + MySQLi 5.0.16 Client Library

Tested and fails:
* PHP 4.4.4 as a CGI + MySQL 4.1.20 Server + MyISAM + MySQL 5.0.16 Client Library
* PHP 4.4.4 as a CGI + MySQL 4.1.20 Server + InnoDB + MySQL 5.0.16 Client Library
* PHP 4.4.4 as a CGI + MySQL 5.0.24a-standard-log Server + MyISAM + MySQL 5.0.16 Client Library

Evidently, this is a PHP 4 issue, but I have no idea why.

david strauss’s picture

Status: Needs work » Active

After checking out a fresh CVS HEAD and reverting the file containing db_next_id to 1.72 (the latest version prior to my patch), I still cannot complete installation on PHP 4. Unless the problem is with variable_set or cache_set (highly unlikely to be restricted to PHP 4), my patch is not the problem I'm experiencing with the tests in #155.

david strauss’s picture

Status: Active » Fixed

The installation from a fresh HEAD checkout finishes perfectly after a suggested change by chx: add return; to the beginning of drupal_unset_globals().

As requested by chx, here's the backtrace as it's called after clicking "Save configuration" for the database installation:

Array
(
    [0] => Array
        (
            [file] => /home/.ceylon/fkdemos/d6.fkdemos.com/drupal/includes/bootstrap.inc
            [line] => 870
            [function] => drupal_unset_globals
            [args] => Array
                (
                )

        )

    [1] => Array
        (
            [file] => /home/.ceylon/fkdemos/d6.fkdemos.com/drupal/includes/bootstrap.inc
            [line] => 857
            [function] => _drupal_bootstrap
            [args] => Array
                (
                    [0] => 0
                )

        )

    [2] => Array
        (
            [file] => /home/.ceylon/fkdemos/d6.fkdemos.com/drupal/install.php
            [line] => 21
            [function] => drupal_bootstrap
            [args] => Array
                (
                    [0] => 0
                )

        )

    [3] => Array
        (
            [file] => /home/.ceylon/fkdemos/d6.fkdemos.com/drupal/install.php
            [line] => 961
            [function] => install_main
            [args] => Array
                (
                )

        )

)
ChrisKennedy’s picture

That doesn't fix the problem I reported as it's unrelated to the db_next_id() change.

david strauss’s picture

As Drupal 6 matures and nears release, we can identify whatever obscure effect is causing ChrisKennedy's issue and make the new code conditional. It's very hard to test now with so many major new changes.

drewish’s picture

Status: Fixed » Active

Using PHP 5.2.2, mysqli and MySQL 5.0.37 I'm getting the same problem as ChrisKennedy It's broken the installation for me. On IRC dmitrig01 reported the same errors. The sequence table is being incremented but the id isn't being consistently returned by db_next_id().

mysql> select * from sequences;
+-----------------+----+
| name            | id |
+-----------------+----+
| menu_links_mlid | 85 |
+-----------------+----+
1 row in set (0.00 sec)
mysql> select menu_name, mlid, plid, link_path from menu_links;
+------------+------+------+-----------+
| menu_name  | mlid | plid | link_path |
+------------+------+------+-----------+
| navigation |    0 |    0 | admin     |
| navigation |    1 |    0 | batch     |
+------------+------+------+-----------+
2 rows in set (0.00 sec)
david strauss’s picture

Does this function work for people having trouble?

function db_next_id($name) {
  $name = db_prefix_tables($name); 
  db_query('INSERT INTO {sequences} VALUES ("%s", %d) ON DUPLICATE KEY UPDATE id = LAST_INSERT_ID(id + 1)', $name, 1);
  $id = db_result(db_query('SELECT LAST_INSERT_ID()'));
  if ($id == 0) {
    $id = 1;
  }
  return $id;
}

I wonder if the double LAST_INSERT_ID in the current patch is the cause of the trouble. MySQL might have unpredictable behavior if you run the function twice in a query.

drewish’s picture

using the code David Strauss provided gives a different variation on the errors:

user warning: Duplicate entry '1' for key 1 query: INSERT INTO menu_links ( menu_name, mlid, plid, link_path, router_path, hidden, external, has_children, expanded, weight, depth, p1, p2, p3, p4, p5, p6, module, link_title, options) VALUES ( 'navigation', 1, 0, 'user', 'user', -1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 'system', 'Log in', 'a:0:{}') in D:\drupal\drupalhead\includes\database.mysqli.inc on line 154.
mysql> select * from sequences;
+-----------------+----+
| name            | id |
+-----------------+----+
| menu_links_mlid | 85 |
+-----------------+----+
1 row in set (0.00 sec)

mysql> select menu_name, mlid, plid, link_path from menu_links;
+------------+------+------+-----------+
| menu_name  | mlid | plid | link_path |
+------------+------+------+-----------+
| navigation |    1 |    0 | admin     |
| navigation |    7 |    0 | batch     |
+------------+------+------+-----------+
2 rows in set (0.00 sec)
ChrisKennedy’s picture

For me (PHP 4.4.6, MySQL 4.1.21) that code reduces installation to a single warning:

user warning: Duplicate entry '7' for key 1 query: INSERT INTO menu_links ( menu_name, mlid, plid, link_path, router_path, hidden, external, has_children, expanded, weight, depth, p1, p2, p3, p4, p5, p6, module, link_title, options) VALUES ( 'navigation', 7, 2, 'admin/compact', 'admin/compact', -1, 0, 0, 0, 0, 2, 2, 7, 0, 0, 0, 0, 'system', 'Compact mode', 'a:0:{}'

And the menus/sequences are otherwise created correctly:

mysql> select * from sequences;
+-----------------+----+
| name            | id |
+-----------------+----+
| menu_links_mlid | 92 |
| users_uid       |  1 |
+-----------------+----+
2 rows in set (0.00 sec)
david strauss’s picture

I'm almost 100% sure this is a bug in MySQL now.

From drewish's server:

mysql> INSERT INTO sequences VALUES ("testing", LAST_INSERT_ID(1)) ON DUPLICATE KEY UPDATE id = LAST_INSERT_ID(id + 1); SELECT LAST_INSERT_ID();
Query OK, 0 rows affected (0.00 sec)

+------------------+
| LAST_INSERT_ID() |
+------------------+
|                0 | 
+------------------+
1 row in set (0.00 sec)

mysql> INSERT INTO sequences VALUES ("testing", LAST_INSERT_ID(1)) ON DUPLICATE KEY UPDATE id = LAST_INSERT_ID(id + 1); SELECT LAST_INSERT_ID();
Query OK, 0 rows affected (0.00 sec)

+------------------+
| LAST_INSERT_ID() |
+------------------+
|                0 | 
+------------------+
1 row in set (0.00 sec)

mysql> SELECT LAST_INSERT_ID(3); SELECT LAST_INSERT_ID();
+-------------------+
| LAST_INSERT_ID(3) |
+-------------------+
|                 3 | 
+-------------------+
1 row in set (0.00 sec)

+------------------+
| LAST_INSERT_ID() |
+------------------+
|                3 | 
+------------------+
1 row in set (0.00 sec)

mysql> INSERT INTO sequences VALUES ("testing2", LAST_INSERT_ID(1)); SELECT LAST_INSERT_ID();
Query OK, 1 row affected (0.00 sec)

+------------------+
| LAST_INSERT_ID() |
+------------------+
|                1 | 
+------------------+
1 row in set (0.00 sec)

And on my server:

mysql> INSERT INTO sequences VALUES ("testing", 1) ON DUPLICATE KEY UPDATE id = LAST_INSERT_ID(id + 1); SELECT LAST_INSERT_ID();
Query OK, 1 row affected (0.01 sec)

+------------------+
| LAST_INSERT_ID() |
+------------------+
|                0 | 
+------------------+
1 row in set (0.00 sec)

mysql> INSERT INTO sequences VALUES ("testing", 1) ON DUPLICATE KEY UPDATE id = LAST_INSERT_ID(id + 1); SELECT LAST_INSERT_ID();
Query OK, 2 rows affected (0.00 sec)

+------------------+
| LAST_INSERT_ID() |
+------------------+
|                2 | 
+------------------+
1 row in set (0.00 sec)

mysql> INSERT INTO sequences VALUES ("testing", 1) ON DUPLICATE KEY UPDATE id = LAST_INSERT_ID(id + 1); SELECT LAST_INSERT_ID();
Query OK, 2 rows affected (0.00 sec)

+------------------+
| LAST_INSERT_ID() |
+------------------+
|                3 | 
+------------------+
1 row in set (0.00 sec)

It appears that certain versions of MySQL have difficulty working with LAST_INSERT_ID in conjunction with ON DUPLICATE UPDATE. Drewish's server gave the expected, correct results when LAST_INSERT_ID was used with simple SELECT, UPDATE, and INSERT queries. LAST_INSERT_ID() always returns 0 on his server when used in an ON DUPLICATE UPDATE query, regardless of whether LAST_INSERT_ID is set in the INSERT portion, the UPDATE portion, or both.

Now I'm working on isolating the problem to the server or client library.

david strauss’s picture

Okay. Let's try this version:

function db_next_id($name) {
  $name = db_prefix_tables($name);
  // Reset LAST_INSERT_ID so it doesn't have a stale value if INSERT succeeds.
  db_query('SELECT LAST_INSERT_ID(1)');
  db_query('INSERT INTO {sequences} VALUES ("%s", 1) ON DUPLICATE KEY UPDATE id = LAST_INSERT_ID(id + 1)', $name);
  $id = db_result(db_query('SELECT LAST_INSERT_ID()'));
  return $id;
}
ChrisKennedy’s picture

That version works for me.

david strauss’s picture

The problem was a combination:
* ChrisKennedy's MySQL server has a bug where the second LAST_INSERT_ID(x) in ON DUPLICATE KEY UPDATE doesn't work properly. This is easy to work around by calling "SELECT LAST_INSERT_ID(1)" once before incrementing the sequence.
* Drewish's MySQL server has a bug that prevents LAST_INSERT_ID from working properly at all in an ON DUPLICATE KEY UPDATE statement. This does not have a good workaround.
* The client libraries do not have any flaws related to this issue. It is purely a MySQL server one.

Options:
* Blacklist MySQL n, where 5.0.37 <= n < 5.0.40.
* Run the old db_next_id() code for the MySQL servers in question.
* Revert to the old db_next_id() code.

david strauss’s picture

The problem with ChrisKennedy's MySQL server is also a documented bug. For the 4.1 series, it was fixed in 4.1.21, according to the "[11 Nov 2006 4:27] Paul DuBois" comment.

david strauss’s picture

MySQL 5.0.37 was released on March 12. MySQL 5.0.40 was released April 17. We're looking at a month-long window of buggy MySQL releases.

david strauss’s picture

StatusFileSize
new2.07 KB

Here's a patch that should check if a flawed MySQL version is in use.

david strauss’s picture

Status: Active » Needs review
david strauss’s picture

StatusFileSize
new4.48 KB

This patch provides more help and and uses the workaround for MySQL 4.1.21.

david strauss’s picture

StatusFileSize
new4.48 KB

Let's use functions that exist.

dries’s picture

I'm not sure I like this patch. Telling people they have a buggy MySQL version does not seem right and it likely a bad precedent. It's starting to become hack-ish and am tempted to roll back the db_next_id() changes in the original patch.

Better would be to get rid of db_next_id() altogether. Why not use auto_increment?

david strauss’s picture

@Dries There's nothing hackish about this patch; MySQL just made some releases with bugs that directly impact the functionality. I asked everyone present in IRC last night if I should use the version to create a workaround or simply add to installation requirements. While I supported a dynamic workaround, everyone else supported an installation requirement. The offer stands to make this a dynamic workaround based on MySQL version number in db_next_id(). I don't think we can afford to revert to the old code for performance reasons, and autoincrement would require a separate table for every sequence.

robertdouglass’s picture

I don't think we should hurt 99% of Drupal installations beause of a MySQL bug that affects 1 month's worth of MySQL version numbers. If I were in charge of releasing this software, I'd add installation requirements that mention the bug and a link to a patch to get around it.

chx’s picture

Title: Remove database locking » Remove database locking -- or not. This issue is doomed.

Code clean-up would be nice (I don't like the regex -- the code is quite ugly).

All this because of two regexps which had exactly one special character each.

Telling people they have a buggy MySQL version does not seem right and it likely a bad precedent. It's starting to become hack-ish

They do have a buggy MySQL -- but David even offered you to roll another patch.

chx’s picture

Title: Remove database locking -- or not. This issue is doomed. » Remove database locking -- or not. This issue is cursed.

Sorry, bad English for a second.

kbahey’s picture

Status: Needs review » Needs work

Ubuntu Feisty 7.04 ships with one of the blacklisted versions:

mysql-server-5.0 5.0.38-0ubuntu1 mysql database server binaries

Those who want to stay synched with their distro's version are not going to do a local build from source, since it complicates updates in the future via aptitude, ...etc.

So -1 from me on leaving the situation as it is and saying Drupal cannot be used on Feisty.

Here is what I get when I run the set of SQL statements you posted:

INSERT INTO sequences VALUES ("testing", 1) ON DUPLICATE KEY UPDATE id = LAST_INSERT_ID(id + 1);
SELECT LAST_INSERT_ID();
INSERT INTO sequences VALUES ("testing", 1) ON DUPLICATE KEY UPDATE id = LAST_INSERT_ID(id + 1);
SELECT LAST_INSERT_ID();
INSERT INTO sequences VALUES ("testing", 1) ON DUPLICATE KEY UPDATE id = LAST_INSERT_ID(id + 1);
SELECT LAST_INSERT_ID();

And got these:

Query OK, 1 row affected (0.02 sec)
+------------------+
| LAST_INSERT_ID() |
+------------------+
|                0 |
+------------------+
1 row in set (0.00 sec)

Query OK, 2 rows affected (0.00 sec)
+------------------+
| LAST_INSERT_ID() |
+------------------+
|                2 |
+------------------+
1 row in set (0.00 sec)

Query OK, 2 rows affected (0.00 sec)
+------------------+
| LAST_INSERT_ID() |
+------------------+
|                3 |
+------------------+
1 row in set (0.00 sec)

Having a workaround for the bad versions is acceptable, but complicates the code too, which is not a good thing.

Crell’s picture

What version of MySQL are we considering the minimum for Drupal 6? 4.1, yes, but 4.x.what? Could we handle the 4.1.x case by simply making the required version the one right after that bug was fixed? Then there's no "hole" in the supported versions.

david strauss’s picture

@kbahey Based on the results you've posted, it appears that the Ubuntu version of MySQL in 7.04 is handling LAST_INSERT_ID() properly. The Ubuntu team may have patched the problem even though they're basing their release off version 5.0.37.

webchick’s picture

autoincrement would require a separate table for every sequence.

I think Dries meant (maybe) to ditch the sequences table altogether, and use autoincrement on the PK fields of the various tables. Other database systems would have to implement a sequence or whatever they need in order to mimic the autoincrement functionality of mysql (and, afaik, pgsql).

kbahey’s picture

@David Strauss
Working properly? I am not sure why the first INSERT/UPDATE followed by a LAST_INSERT_ID() return 0 though? Shouldn't it return 1?

Here is another test:

mysql> INSERT INTO sequences VALUES ("testing", 1) ON DUPLICATE KEY UPDATE id = LAST_INSERT_ID(id + 1);
Query OK, 1 row affected (0.00 sec)

mysql> SELECT LAST_INSERT_ID();
+------------------+
| LAST_INSERT_ID() |
+------------------+
|                0 |
+------------------+
1 row in set (0.00 sec)

mysql> select * from sequences;
+-----------------+----+
| name            | id |
+-----------------+----+
| menu_links_mlid | 98 |
| users_uid       |  1 |
| testing         |  1 |
+-----------------+----+
3 rows in set (0.00 sec)

Note that testing has 1, but LAST_INSERT_ID() returns 0.

Doing it again

mysql> INSERT INTO sequences VALUES ("testing", 1) ON DUPLICATE KEY UPDATE id = LAST_INSERT_ID(id + 1);
Query OK, 2 rows affected (0.00 sec)

mysql> select * from sequences;
+-----------------+----+
| name            | id |
+-----------------+----+
| menu_links_mlid | 98 |
| users_uid       |  1 |
| testing         |  2 |
+-----------------+----+
3 rows in set (0.00 sec)

mysql> SELECT LAST_INSERT_ID();
+------------------+
| LAST_INSERT_ID() |
+------------------+
|                2 |
+------------------+
1 row in set (0.00 sec)

Works properly, 2 in the table, and 2 returned from LAST_INSERT_ID().

What gives?

The debian change log for the mysql-server package is not specific about this bug but says "Package the Enterprise version again (.37 was a community version), since Debian and we have always done so. This brings in a few more bug fixes and makes functional derivations less likely."

recidive’s picture

@kbahey: It will not return 1 because sequences.id is not an auto_increment column. However if you run INSERT INTO sequences VALUES ("testing", LAST_INSERT_ID(1)) instead, it will do since we are setting LAST_INSERT_ID to 1, this way. This is what it is currently on HEAD.

david strauss’s picture

@recidive HEAD works find as long as you don't run MySQL 4.1.21, 5.0.37, 5.0.38 (community), or 5.0.39. The modification posted here with the call to SELECT LAST_INSERT_ID(1) works around the 4.1.21 issue. Without a LAST_INSERT_ID(1) in the INSERT part, it becomes necessary to run SELECT LAST_INSERT_ID(1) to "prime the pump."

recidive’s picture

Yes, I'm aware of this.

Anyway, +1 for the auto_increment (and drop sequences table) approach. This shouldn't be so hard to do. I've done something similar to this in the past for Drupal 4.7. I'll try to find the patch and will post it here if it is usefull.

david strauss’s picture

@recidive How about we abstract this problem with a new db_insert() function? For databases with only sequences, we can increment the sequence first and then do the insert. For databases with only autoincrement, we can just insert the row and retrieve the key from the database. In any case, db_insert() returns the primary key of the new row.

db_insert($table, $primary_key, $fields);

$table is (obviously) the table name
$primary key is a string for the primary key column name or an array
$fields is an associative array from fields to values. Hopefully, we can just analyze the values to determine how to escape them. If not, we could increase the complexity of the $fields array (just as theme_table allows you to have simple strings in the $rows arrays or nested arrays).

If primary key is a single column and it is unspecified in $values, then db_insert with either use the sequence or autoincrement.

kbahey’s picture

Title: Remove database locking -- or not. This issue is cursed. » Remove database locking

/me waves wand, and removes the curse that chx cast ...

Anyways ...

I like auto increment, since it offloads the work to the database, rather than to our code. Down the with sequences table ...

However for db_next_id(), the sequence number has to be known before the columns are populated, like for example, assigning a unique nid for a node before the node is inserted.

So, how are we going to address this (getting a unique sequence number before creating the row in the table)? Or do we just insert with some default values and do an update later (messy!)

P.S. A side note, InnoDB locks the entire table for the span of a transaction when there is auto increment! Not a big deal IMO, but has to be kept in mind nonetheless.

beginner’s picture

FYI: Debian stable is ok (using 5.0.32-7etch1), but Debian testing uses the buggy version 5.0.38-1.

http://packages.debian.org/cgi-bin/search_packages.pl?searchon=names&ver...
http://packages.debian.org/testing/misc/mysql-server

5.0.38 has been introduced in April only, and according to the changelog, the bug has not been patched:
http://packages.debian.org/changelogs/pool/main/m/mysql-dfsg-5.0/mysql-d...

But that's probably irrelevant for most, but a few daring developers.

recidive’s picture

Status: Needs work » Fixed

Marking as fixed to avoid confusion. The remaining issues will be fixed with #49836.

Anonymous’s picture

Status: Fixed » Closed (fixed)
felixsmile’s picture

StatusFileSize
new3.66 KB

Hi everyone,

Here is the patch file that should work for 5.7.

Have a nice day,

Felix.

yonailo’s picture

Hi,

What's the problem with LAST_INSERT_ID and persistent connections (cited in #134) ? We use persistent connections without any problem and I have installed this patch in my Drupal 5 + Mysql 5.0.45 and everything seems to be working fine, though I have not done heavy testing nor concurrency testing.

sinmao’s picture

Version: 6.x-dev » 5.10

Hi, will this work for Drupal 5.10? It sounds like it was made for 5.1, but the top post version says 5.10.

Also, this should remove all of the table locks on my cache tables, right? Thx.