This is NOT a full sequences API not at all but it's enough for some uses. actions_aid and simpletest_test_id , that's duplicate code not to mention it belongs to the DB layer. This patch solves that. Further work: we can reuse this for users.uid as that should not be serial because of the known problems. The postgresql folks can create a version which uses proper sequences. The mysql folks can support more than the single serial mysql allows -- like a timestamp and a serial. Sorry for the hasty writeup.
Comment | File | Size | Author |
---|---|---|---|
#109 | 356074comment-2223588-extended.patch | 1.79 KB | Josh Waihi |
#108 | 356074comment-2223588-extended.patch | 1.79 KB | Josh Waihi |
#106 | 356074-cleanup.patch | 487 bytes | Josh Waihi |
#103 | sequences.patch | 1.7 KB | mfb |
#102 | sequences.patch | 1.75 KB | mfb |
Comments
Comment #1
Damien Tournoud CreditAttribution: Damien Tournoud commentedThis looks great, and is useful.
I never really understood why we moved to auto-incremented column in D6. It breaks important things (most notably: Split ID-space staging scenarios, and DB import/export using some broken MySQL software) while not being that an DX improvement.
Comment #3
Crell CreditAttribution: Crell commentedDidn't we spend a month in D6 getting rid of this?
Comment #4
chx CreditAttribution: chx commentedOnce again: actions_aid and simpletest_test_id already uses this. This patch saves code.
Comment #5
Crell CreditAttribution: Crell commentedSo are you seeing this as a secondary system, or are you actually going back to manual sequences instead of auto_inc fields? The former I'm OK with; the latter not so much.
Comment #6
chx CreditAttribution: chx commentedOf course a secondary system. I thought that's clear as the doxygen says "Use this function if for some reason you can't use a serial field, normally a serial field with db_last_insert_id is preferred.".
Comment #8
Crell CreditAttribution: Crell commentedIf a secondary system, then why is it integrated directly into the DB layer? :-)
In any case, db_insert() should never ever be called from a method inside the DB layer itself. Internally everything happens with $connection->insert() (and friends). (See nextId()).
Also in the same method, $sequence is passed in but never used. Methinks that is likely not the intended behavior.
Comment #9
chx CreditAttribution: chx commentedRerolled against HEAD, made the change in the NextId method that Crell asked, doxygen'd the update, fixed the test runner script, added a small test. It's in the DB layer because for example PostgreSQL maintainers might decide that they want to use native sequences for this and then they will need the name of the sequenc. Just because we provide a degenerate case, that's a minor implementation detail and the users should name their sequences.
Comment #10
chx CreditAttribution: chx commentedComment #11
Crell CreditAttribution: Crell commentedWhy in the world are you calling Database::getActiveConnection() from within a connection object? $this->insert(). That's all you need.
The singleton calls should also be banished to the outside of the database system. :-)
I don't understand the purpose of the db_affected_rows() call in simpletest_clean_results_table(). Since that function is slated for execution anyway, it shouldn't be used. If you convert the delete query to the new API that should give you the value you want.
Comment #12
chx CreditAttribution: chx commentedComment #13
catchMinor but
Can just drop "Use db_last_insert_id is preferred."
Comment #14
Crell CreditAttribution: Crell commentedActually that's wrong too, because db_last_insert_id() is deprecated. If using a serial field, the insert->execute() method will return the new ID.
Comment #15
chx CreditAttribution: chx commented#391340: Job queue API uses this so rerolled.
Comment #17
chx CreditAttribution: chx commentedsystem_install function collision.
Comment #19
chx CreditAttribution: chx commentedKeeping up with HEAD.
Comment #21
dww#391340: Job queue API is no longer using this directly, but it needs to so that we can have sane garbage collection of the consumer id (at least, a pointer to this issue was chx's answer to my objection about the lack of garbage collection over there).
Also, I think there were some spots in project* during porting to D6 where auto_inc wasn't sufficient, and I wished we still just had the {sequences} table -- but of course I don't remember the details right now. ;) Anyway, I'm all in favor in general of an API like this, especially if there are already 2 or 3 spots in core that are basically duplicating this code for various reasons...
That said, some thoughts reading patch #19:
A) Don't we need a schema update to drop the {simpletest_test_id} table, not just remove it from the schema definition?
B) This patch doesn't do any garbage collection, either. ;)
C) The doc changes from #13 still aren't in here.
D) Is there an issue where actions_aid is discussed and why it can't just use an auto_inc directly? I know there *are* cases where that's not good enough and you'd need something like this, but it's certainly not obvious from reading this patch (and its comments). Seems like it'd be helpful to add some of that explanation in the comments, both to justify committing this patch, and to help potential users of this API decide if they need it or not.
Comment #22
chx CreditAttribution: chx commentedA) there was no simpletest_test_id in D6 core.
B) not yet.
C) need fixing
D) we use actions_aid like this, simpletest and soon job queue. MySQL cant have two auto-columns (like a serial and a timestamp) so there are more than one reason for this patch.
Comment #23
chx CreditAttribution: chx commented#204411-57: Anonymous user id breaks on MySQL export/import (xID of Zero get's auto-incremented on external xSQL events) remarks that using this for user uids would solve problems on MySQL.
Comment #24
chx CreditAttribution: chx commentedThere is cleanup now. Documentation change added. And for motivation I added users.uid to this. To avoid the horrors with importing users, I have added a safeguard.
Comment #25
Damien Tournoud CreditAttribution: Damien Tournoud commentedI would like to extend this API to add the ability to get a particular sequence number. If this number is > to the current sequence number, the sequence number gets updated.
That would allow us to properly solve the "I need to insert an entry at a particular sequence number" problem properly. PostgreSQL sequence numbering is perfectly brain dead in that regard (see #495956: system.install incorrectly assumes ANON and AUTH RID are always sequential).
Comment #26
chx CreditAttribution: chx commentedA lot, lot better patch.
Comment #28
Damien Tournoud CreditAttribution: Damien Tournoud commented- fixed the default implementation: we don't need FOR UPDATE (and this is not supported by SQLite anyway), and we can use direct SQL queries there
- fixed the MySQL implementation: we can use direct SQL queries there (untested)
- fixed the logic inside user_save()
Comment #30
chx CreditAttribution: chx commentedComment #32
chx CreditAttribution: chx commentedRemoving uid 1 placeholder caused the grief.
Comment #34
chx CreditAttribution: chx commentedHm, so we want user_save to work with an arbitrary uid passed in? Sure, we can do that. I guess it makes sense in importing or working together with another DB, etc. And as noted in a comment, there is no security risk.
Comment #35
chx CreditAttribution: chx commentedTo list a few reasons why we are doing this:
Comment #36
Crell CreditAttribution: Crell commentedI'm feeling scope creep here. Originally this thread was for providing a sequence API for edge cases that just need an ID, but won't necessarily be using it in a primary key sense. Now it sounds like you want to roll back all of the work done in D6 to switch from Drupal-managed sequences TO auto-increment. What is the goal here now?
Comment #37
chx CreditAttribution: chx commentedComplete roll back is a separate issue. I changed users because they were hacked to make auto increment work and this seemed like a total fit. And yes I want to roll back but that's a followup.
Edit: #204411: Anonymous user id breaks on MySQL export/import (xID of Zero get's auto-incremented on external xSQL events) is the issue I fixed with users.uid moved to this API.
Comment #38
Crell CreditAttribution: Crell commentedFrom the docblock, it looks nextId()'s second parameter should be named $minimum_id, not $existing_id. It should also be optional (probably with a 0-default). The default there should not be just in the utility function.
Why doesn't the base nextId() use a merge query? For that matter why is it writing insert and update queries literally? (There may be a good reason, but it's not documented so I have no idea what that is.)
There's some stray whitespace tweaks in the patch, it looks like. :-(
The MySQL nextId() implementation relies on insert failing with an exception. That will also break any active transactions, which is not acceptable. This is why Merge queries exist. :-) Please use them, or document why we cannot.
Comment #39
chx CreditAttribution: chx commentednote: we want to use GREATEST instead of that IF.
Comment #40
chx CreditAttribution: chx commentedFixed the docblock, explained why we do not use a merge query, moved whitespaces to another issue, mysql fails back to parent if transactions enabled.
Comment #41
David StraussI will think this over. I prefer the (1) get ID (2) INSERT workflow from a DX perspective. I will, however, probably want table-per-sequence for sequences in MySQL. If reduces the real-time next-value operation to one query; we can clean up old IDs on cron.
Comment #42
BerdirJust wondering, most db_* functions allow to specify the database with an $options array, see http://api.drupal.org/api/function/db_select/7 for example.
Would that make sense for db_next_id() too? I'm not sure, if you add stuff to another Drupal Installation, for whatever reason....
Comment #43
chx CreditAttribution: chx commentedWell, we had good code for these earlier so i reintroduced the try-UPDATE-then-INSERT and tried that indeed if another transaction wants to write the same row, it will wait until this transaction commits.
Comment #44
chx CreditAttribution: chx commentedOpsie! Patch mixup :)
Comment #45
David StraussOne note: this does remove our ability to stagger keys in MySQL circular replication setups. Table-per-sequence would not.
Comment #46
BerdirCan you convert that to DBTNG while you're changing it anyway? Then I can remove it from #491556: DBTNG the rest and these issues will hopefully not conflict with each other.
Comment #48
chx CreditAttribution: chx commentednote that staggering the keys can easily be set up in a settings.php override. that's not a problem, not at all.
Comment #49
David Strauss"note that staggering the keys can easily be set up in a settings.php override"
No, it can't. The key staggering happens on the MySQL server, and IP-based failover systems are transparent to Drupal. Drupal wouldn't be able to know when to switch incrementation schemes.
Comment #50
chx CreditAttribution: chx commentedHuh?? I thought that auto increment increments can be set up uniform -- of course, starting the sequences can not be but MEH just set them up manually, it's not like you have a cluster with an empty database often...
Comment #51
David Strauss@chx I'm not really sure what you're saying, especially by "I thought that auto increment increments can be set up uniform."
Comment #52
chx CreditAttribution: chx commentedI meant, all tables have the same auto increment even if that's not 1. So having a centralized setting for that is enough.
Comment #53
David StraussJust talked with chx. This is far smaller in scope than I thought and is actually a good idea. We do need to prime this for starting at "2" for the sake of the user table, so I'll leave it on "needs work" for now.
(I had thought on initial reading that this would take us back to the D5 sequences method, which is very much not the case.)
Comment #54
David StraussTagging.
Comment #55
chx CreditAttribution: chx commentedWe do not need to prime it because we take of imported users there.
Comment #56
chx CreditAttribution: chx commentedTagging back
Comment #58
chx CreditAttribution: chx commentedOK, so the current implementation uses D5-like thing but we had an implementation far back in time where we were using a single sequence for everything. We will revert to that and David Strauss agreed to this. The advantage is that you can use MySQL auto increment settings without problems. The only disadvantage is that the IDs will not be continuous but they are not continuous if you delete anyways, so that's not a real problem.
Comment #59
chx CreditAttribution: chx commentedComment #61
David StraussINSERT IGNORE can mask problems other than duplicate keys. Use a try/catch block and look for an error code/exception specific to duplicate key insertion failure. Alternatively, use INSERT ... ON DUPLICATE UPDATE [something with no effect].
Also, the code in the user module still specifies a sequence name.
Comment #62
chx CreditAttribution: chx commentedWell, the actions tests were assuming that aid is 1. That's no more the case.
Comment #63
chx CreditAttribution: chx commentedEnrolled David's request w a lengthy comment about it.
Comment #64
David StraussThis looks good. Clean, simple, and does what we need. (And it supports key staggering with multiple MySQL servers.)
Comment #65
David StraussWait, system_update_7040() looks like it might need work.
Comment #66
David StraussComment #67
chx CreditAttribution: chx commentedsystem_update_7040 needed a little love still.
Comment #68
David StraussIndeed, RTBC.
Comment #69
Crell CreditAttribution: Crell commentedThe MySQL version of nextId() needs its parameter to have a default value.
Shouldn't the base version of nextId() catch any errors and handle the transaction itself? Nothing in the calling routine is going to know what transaction object to roll back in case there's a problem.
It should be noted in docblocks that these are *not* atomic operations.
I've read the MySQL implementation three times now. I still do not grok what's going on there. The comment tells me that INSERT IGNORE can cause problems. OK, great, I would never have thought to use INSERT IGNORE. How does ODKU help here? WHY are we doing that in the first place? I don't follow.
The $new_id = query is doing something quite weird that I don't understand without a comment. Does that even, um, work?
I also read the comment for the delete query several times. I still don't understand what's going on there. Of course we keep the new_id in the table. The auto-inc just set it, didn't it?
Also, all of these long comment blocks would be much easier to read if they have a proper blank line before them.
It also looks like we're merging the keyspace for users, actions, and anything else that uses the sequences system. Eeek! OK, I get that we're not supposed to assume sequential values for a surrogate key, but that just makes me feel wrong inside. Users are a first class entity like Nodes, and now various other things. Why do they all use auto-inc in their own keyspaces... EXCEPT for Users which now suddenly use a different API entirely to get its keys? That feels like a ticking time bomb to me that is going to screw someone up sooner or later. I dislike "clever hacks", as they usually need to be replaced later by something not quite so clever.
Comment #70
chx CreditAttribution: chx commentedMoved the deletion to be once per page and the existing_id to happen if necessary.
Comment #72
David StraussAgreed that some of the comments could stand improvement. Also, chx has posted an updated patch in #70 with the optimizations I suggested offline.
What do you mean by "atomic" here? Sequences don't follow the same rules as normal data changes. Moreover, this sequence API behaves with the same atomicity rules as auto-increment sequences do. If you BEGIN a transaction in MySQL, INSERT a row into a table with a serial column, and ROLLBACK, you'll notice that the auto-increment on the table *keeps* its increase. (And there are good reasons for this.)
I have to object to calling it a shared key space. It's not sharing a key space any more than everything else that uses unsigned integers "shares a key space." You're only objecting to the aesthetics of appearing to skip some values in the sequence for each consumer of the sequence.
We make an exception for users because the Users table does even clever-er things with magic 0 and 1 uid values. See the installer for all the hacks we need to support auto-increment on the Users table in MySQL with these magic values.
The only way to separate the sequences (and preserve support for staggered MySQL sequences) is to use a table for each sequence. This means sequence tables would be created on the fly. *Some* administrators like to lock down DDL events happening in production unless planned. (It's not such a bad idea.) A change to have a table per sequence would require table creation privileges at runtime or, worse, require developers to implement a sequences info hook.
Even more fun is the fact that DDL events kill transactions in MySQL. Getting a sequence value could kill a transaction in progress if we create tables on the fly.
The "clever hack" of skipping keys seems like the lesser evil, especially when it simplifies the code while maintaining correctness.
Comment #73
David StraussCross-post with the test bot.
Comment #74
chx CreditAttribution: chx commentedAaaaaaaand more comments (and a bugfix).
Comment #78
chx CreditAttribution: chx commentedComment #79
agentrickardCrell asked me to give this a look, since I have the same id = 0 autoincrement problem in Domain Access (d6), since I copied the logic from {users} for use there. I also have a module (Domain Prefix) that allows on-fly-fly table copying, and can cause issues when dealing with the {users} table. See the patch in #445386: Copying "users" table fails which solves this.
I thought moving away from {sequences} in D6 was a Good Idea, and #35 aside, I still think its a good idea.
If I read this right, then modules other than user and actions cannot use this table unless we are ok with _not_
objecting to the aesthetics of appearing to skip some values in the sequence for each consumer of the sequence.
I object to this, both in contrib and in core. It's confusing and counter-intuitive for the developer. (And a seeming regression from the D5 sequences days of key/value) pairs.
That said, I don't have a whole lot of stake here (transactions, multiple unique keys, simply not my call). But this looks like a single use-case API which breaks expected behavior. As a developer, I expect sequential integrity (no skipped keys) in the id strings for my tables. So if this goes in, I just won't use it for my modules.
I can also see a lot of administrator confusion, when our clients ask us "Why are there users 1, 2, 3, 4 and then nothing until 26? We never deleted any users?" This, I think, could be a big problem.
AFAIK, the autoincrement 0 problem on MySQL only bites us when doing a db copy from one to another. It may be better (and simpler) to run a check for the absence of UID 0 and adjust the table appropriately. (I have some code in Domain Prefix that handles this when copying tables.)
Comment #80
Crell CreditAttribution: Crell commented@agentrickard: Is "clients who think they know more then they actually do" the only problem in practice? I don't like the non-sequential keys either, but technically neither MySQL or Postgres guarantee them in the first place.
Comment #81
David StraussI refuse to design around "clients who think they know more then they actually do." We don't make technical decisions based on whether uninformed parties think we've made the correct technical decision.
Comment #82
chx CreditAttribution: chx commentedSeriously, that's not a valid argument at all. I have heard at DC DC that some people are developing/rolling in a way that they leave holes for development/production. If a client asks you then just tell them that the numbers above 1 are meaningless.
Comment #83
agentrickardHey, I'll back off that point if its a naive one. I'm just saying, "this makes me uncomfortable".
I would argue, David, that we, in fact _do_ make technical decisions based on "uninformed parties" comfort all the time. Otherwise, we'd all login with our uids and force 16-digit randomized passwords, for instance.
The best argument so far is that autoincrement keys themselves can't be trusted to be sequential (or can be altered). That had not occurred to me. If that's the case, consider my objection withdrawn.
Comment #84
agentrickardHowever, I still might argue that the scope of this patch is far larger than the problem it tries to solve.
Do we have sequence issues in cases other than table copying, where AUTOINCREMENT won't allow zeros? I see some mention of this in #35, but don't get the use-case? Automated timestamps + a sequential ID? We fix that now by manually entering the timestamp (which is easier than the patch, isn't it)?
I mean, this is more code than the problem it fixes. Isn't it?
Comment #85
David StraussNo, that's called a usability issue. Skipping UIDs has never and will never affect the way users interact with the system.
Comment #86
chx CreditAttribution: chx commentedwell, actions has a use case. simpletest used to have a use case where the test got an id and the simpletest_test_id was not using it for anything else. We have added some data there so this case is now moot. queue had a similar use case where every consumer got a unique id. Although we have optimized this out, it certainly can be expected that more "i need a unique id in the db" cases similar to simpletest and queue will appear.
Comment #87
Crell CreditAttribution: Crell commentedI am not 100% convinced here that this isn't going to introduce as many problems as it solves. However, chx and David have made a compelling argument so I am not going to block this patch. I also will not fully endorse it, though. Kicking it upstairs for final decision. :-) (Yes, I am going to pass the buck because as DB maintainer I don't have a job description!)
Comment #88
agentrickardI'm with Crell. I don't quite get it, but I won't block it, either.
Comment #90
Dries CreditAttribution: Dries commentedI find this comment to be very limited. In #35, a lot of good reasons where mentioned, and they are all lost in the phpDoc. I'm not suggesting we copy them, but I do think we can do a better job helping people understand why and when to use this.
It might also be useful to mention when not to use the sequences API.
I'm not a big fan of abbreviating things to $stmt but it is consistent with what we do elsewhere. Ignore for now.
This comment makes me scratch my head. It should be made more clear.
Comment #91
chx CreditAttribution: chx commentedComment #92
chx CreditAttribution: chx commentedComment #94
chx CreditAttribution: chx commentedBah, system.install upgrades numbers.
Comment #95
cweagansGood to go.
Comment #96
Dries CreditAttribution: Dries commentedCommitted.
Thanks!
Comment #97
chx CreditAttribution: chx commentedThe commit never happened.
Comment #98
chx CreditAttribution: chx commentedsystem_upgrade_7043 conflict fixed.
Comment #99
webchickIt looks like Dries missed this, so I committed it.
Please document changes in module upgrade guide.
Comment #100
webchickWell that'll teach me to commit patches without looking at them. :P
1. This patch added an extra system_update_7043, which means test bot failed 10,000 patches in under a second. ;) Fixed.
2. When I ran the update I got:
Failed: SQLSTATE[42000]: Syntax error or access violation: 1075 Incorrect table definition; there can be only one auto column and it must be defined as a key
Comment #101
webchick...and this also broke SimpleTest's browser-based testing. Batch API spews errors about $max_id = db_select('SELECT MAX(value) FROM {sequences}')->fetchField(); Should be db_query().
So now, we need tests too. :P~
Also, why is only mysql's database.inc changed and not also pgsql and sqlite?
Comment #102
mfbFix the system_update_7044() errors.
Comment #103
mfbJust tested this and noticed it was still broken, db_delete did not call execute(). Re-rolled to make it quite a bit simpler.
Comment #104
Dave ReidTested and seems to work fine. HEAD (update.php) is broken without this.
Comment #105
webchickThanks, committed to HEAD.
I'd still like to see test coverage of that query that caused SimpleTest to break.
Comment #106
Josh Waihi CreditAttribution: Josh Waihi commentedyeah the original nextId function isn't correctly built. I don't know if my patch is the correct solution. I'm not in the loop here. But this helps PostgreSQL back to passing tests.
Basically, $new_value isn't set and so it through notices etc and doesn't make the function work.
Comment #107
Damien Tournoud CreditAttribution: Damien Tournoud commentedThe default implementation is broken anyway: we can't, by contract, write a value into a
serial
column.For the default implementation, I'm ready to settle with:
It will be very slow if $existing is big, but what can you do about it?
For PostgreSQL, I'm considering using sequences, this way:
Comment #108
Josh Waihi CreditAttribution: Josh Waihi commentedI've put in the changes as DamZ suggested. However, the PostgreSQL solution failed:
So I've used
RESTART WITH
instead:Its the way PostgreSQL is designed. If that is also its flaw, PostgreSQL should address it. Not us.
Comment #109
Josh Waihi CreditAttribution: Josh Waihi commentedresubmitting the patch because you can't access it with the '#' in there.
Comment #110
chx CreditAttribution: chx commentedPlease start a separate issue for the postgresql one. There is no way you can pull it off (I asked in #postgresql and listed in #35 as one of the very reason this patch exists) and we should not let a bugfix be held up by the postgresql.
Comment #111
chx CreditAttribution: chx commentedAlso note that all the existing_id magick is for auto catching up with mass imports and if you loop that will be lots of fun. Or not. I helped with a certain site that webchick knows well in 2006 which was importing 300 000 users...
Comment #112
Josh Waihi CreditAttribution: Josh Waihi commentedYes we should. Chx, this is not MSSQL or some other db we don't support. Drupal needs to work on more than just MySQL.
My patch above still stands unless someone has a better idea.
This patch is crucial for PostgreSQL at the moment as it fixes a bunch of tests. Infact, test virtually won't run without it.
Comment #113
Crell CreditAttribution: Crell commentedI'd really prefer that follow-on patches happen in new issues if possible, since this was already committed. Not that they don't get addressed, but that we don't play ping-pong in a single thread.
As for the best way to support this API in Postgres, I defer to Damien and Josh as they actually know how Postgres works.
Comment #114
chx CreditAttribution: chx commentedJosh, then let's concentrate on making the default work. The postgresql specific is a separate issue. The default should work on pgsql too. If we can do a better one that's good but if not, oh well, that's why we have a default.
Comment #115
Josh Waihi CreditAttribution: Josh Waihi commentedOK, closing this off and continuing it in #633678: Make sequence API work on non-MySQL databases