Closed (won't fix)
Project:
Content Construction Kit (CCK)
Version:
5.x-1.x-dev
Component:
General
Priority:
Critical
Category:
Task
Assigned:
Unassigned
Reporter:
Created:
29 Dec 2006 at 16:16 UTC
Updated:
23 Jul 2008 at 22:58 UTC
Jump to comment: Most recent file
Comments
Comment #1
karens commentedGood catch! I hadn't thought about this, but now in 5.x where you can create your own content type names this will be a real issue. Not a problem in 4.7 since all content types have names like 'content_example' so they are differentiated.
I'm bumping this to critical since someone who comes up with an unfortunate content type name could really make a mess of things.
This will take some investigation to catch all the places that use the current construct...
Comment #2
karens commentedActually, I'm going to categorize this as a bug, to make sure it gets proper attention.
Should we use something other than 'node_type_'? The node_type table is a core table and it's possible core will do something sometime using other tables like 'node_type', and I don't want another upgrade mess in the future like we had with 5.x where we have table name conflicts.
How about 'node_content_' to make it more clear what module put that table there? That means we'll end up with tables like 'node_content_content_example', but I don't think it really matters what the table is called so long as there is no conflict. Plus then 'node_page' will end up as 'node_content_page', which will be more clearly understood.
Comment #3
yched commentedWhy not "node_data_type_page", "node_data_type_example"
This would play well IMO with the "node_data_field_myfield" pattern used by "per field" storage.
Or maybe (while we're at it) move from "node_data_" to "node_content_" for both field and type tables, if we want to make the name of the "content" module stand out more.
I think we would benefit harmonizing tablenames for "per fields" and "per content types" storage tables anyway.
I'm working on delegating the table names generation in a small helper function, that would stop these sort of things being scattered all around the code. Plus this gives us time to debate about naming conventions :-)
Comment #4
karens commentedGood idea on the helper function. I was just thinking the same thing!
What about ditching the 'node_' prefix completely? Most other modules start their table names with the name of the module, so make everything 'content_', i.e. 'content_field', 'content_group', 'content_type'.
Comment #5
yched commentedDeal :-)
patch coming soon...
Comment #6
yched commentedHere is a first step :
This patch uses a _content_tablename helper function to generate table names for fields and node types, and moves the rest of code to use it.
The current naming conventions are not altered for now, I think we should keep that for a second step, once we're sure that everything works fine.
I tried to separate the cases where :
- the table name is actually generated (content type creation or update) : the new _content_tablename generates the name
- the table name is only "used" (to perform queries, etc...) : a new key 'table' in the content type description (content_types() function) is added, and holds the table name generated for this content type.
There remains one or two spots where I'm uncertain - I added comments for that in the patch
Comment #7
karens commentedI made a few changes. You're right on the fields, might as well use the array rather than doing a sql query. And your question about the orig_type was right -- not needed. It was a leftover from another way I tried to fix that problem.
You are deleting the table only if there were fields, but it needs to be deleted always, so I moved that part outside the 'if'. And I found another place that needed a 'if db_exists'.
I then tried adding, deleting, updating, importing, and exporting types, and I think it is working right.
Comment #8
yched commentedThanks for the validation / correction.
I committed a version including your edits - there are two places where I kept my original code :
- content_crud.inc / content_type_create() :
no need for content_clear_type_cache(); being called twice
- content_views.inc / content_views_field_handler_group()
the 'node_data_' strings that remain are the only db alias for the table - I'm not quite sure what to do with them, but I don't think they _need_ to be changed (plus I think this might alter css id's and break themes.
Do you think this should go in 4.7 as well ? (not required, but would be easier for us to follow...)
Second step : rename :-) (+ update function... I think we'll have to regenarate all the stored queries for the views involving cck fields...)
Comment #9
karens commentedIt might be good to put the help function in 4.7 although I'm not sure we actually need to rename the tables there (I could go either way on that, just wondering if we'll re-create some of our upgrade problems if we do).
I'm thinking that if we have that helper function and the new content array item that identifies the table, other modules may want to use it, and it would be quite confusing if it's there in 5.x but not in 4.7, so consistency would be good.
Not sure either about views. I changed it because I was thinking it was referring to the real table name, but I guess it is the alias, which is OK.
Comment #10
yched commentedHere is the patch that actually does the renaming (5.0 for now) :
- content_field_myfield for "per field" tables
- content_type_mytype for "per content type" tables
general tables are also renamed (+ plural form to avoid clashes with a possible content type named "instances"...)
- content_fields
- content_fields_instances
- content_groups
- content_groups_fields
Attached patch provides update functions for content.install and fieldgroup.install,
and (should) take care of regenerating affected views (was the hardest part, actually...)
This is brain surgery, so of course this needs testing - PLEASE backup your db before applying :-)
PS : a few "remove trailing white spaces" lines might have sneaked in the patch, making it (just a little) bigger than it should be.
Comment #11
karens commentedI am starting to worry about how many contrib modules will break if we do this. The location module had to change its db structure (a much smaller change than this) and ankur emailed everyone who had a module that integrated with it to tell them about the change and point out what would need fixing in the contrib modules, then waited a bit to give them time to react before he committed the changes. I wonder if we should do that, too.
Plus I'm starting to wonder if we're trying to change too much, even though I suggested it and I do like this construct much better.
Since this is something that could potentially produce some instability, maybe we should do our stable release of CCK before moving forward on this change.
Thoughts??
Comment #12
eaton commentedIs there anyone who's directly integrating with the CCK table structures? Or is all the work going on via the APIs?
Comment #13
yched commentedKaren, IMO there is little risk of breaking other contribs - be it contrib cck field module or 'plain' contrib module. As Jeff mentioned, external code is not likely to directly query cck data tables as they are 'constantly' moving by nature (until recent commit in http://drupal.org/node/104259, you could not even be sure there was a 'node_mytype' table for each type field).
This argument is less true for 'fixed' tables (node_field, node_field_instance), but I can't think of any cck-related module that would have a reason to query those directly instead of getting the info from API function content_types().
Fieldgroup does, but being now in ckk-core it is handled by the patch, and I think we'd be aware of other fieldgroup-like modules out there.
This being said, my previous 'helper tablename function' commit _might_ have broken some (internal) stuff, there were two bugs posted today that could be related - I need to investigate those.
Comment #14
yched commentedupdate : the two bugs I mentioned (http://drupal.org/node/106926 and http://drupal.org/node/106900) are not related to the 'helper tablename' refactoring.
Comment #15
eaton commentedThat said, I think that officially branching a CCK 1.0 release WOULD be a great idea. CCK is a pretty solid piece of work at this point... drawing a "This is 1.0" line in the sand would be nice...
Comment #16
karens commentedOK, sounds like everyone is agreeing with changing the names. I just had this panic attack about whether we were trying to do too much. I feel better hearing others say it's the right thing to do. Anyway, I haven't had time to test the patch yet, but I'll try to get to it as soon as I can.
Let's do go ahead with our official release first so we have a nice stable release before we monkey with this. We have a couple other critical bugs outstanding, but I keep waiting for a time when they're all fixed and we never get there. We probably better at least hit the install bug. It would be good to fix the postgres bug, but that may take more time while we figure out what fix is really needed.
So I'll fix the install bug and then get a release out, then try to get back and test this patch.
Comment #17
yched commentedNew version of the patch rerolled after today's cleanup commits
Comment #18
yched commentedRerolled patch against latest 5.0
I think we should get this going pretty fast now.
As was recently pointed in http://drupal.org/node/110371 (marked a dupe of this), this is not really serious when you try to create a node with a conflicting machine name, but hell might open when you _change_ an existing type and set its name to a conflicting table. Having cck columns added to, say, node_access would be... annoying.
FWIW, my test cck install has been running this patch for 10 days now without any problem.
Plus, as I wrote earlier, the massive changes went in the previous 'helper function' patch that is already in.
Comment #19
yched commentedEr, wait. Thinking about it twice, this might be much more complicated because of upgrades.
Upgrade paths for cck + field modules are currently a mess to predict, and I'm having a hard time trying to figure out all the possible cases.
And yet is seems we _have_ to move...
Here are a few ideas for a better future.
- Gather in content.install all db scheme updates for fields that defer their db storage to content.module (that is all currently known field modules).
That includes contrib fields as well, of course.
Currently we have number_update_4, for instance, that updates from ancient 'per field type' to 'per content type' storage
- Ensure that content updates get run before any update for field modules - things are too unpredictable without this.
Updates are ordered by module weight, then by file path.
Weight is used for a lot of other things so it's not really flexible...
Maybe we should force all field modules to be stored in a '/fields' fiolder inside the cck folder ?
Not really install-friendly.
Comment #20
karens commentedHere's a thought. By prefixing all these tables with 'content_' we have another set of possible conflicts -- if someone tries to name their content type 'groups' or 'fields' (or less likely 'fields_instances' or 'groups_fields' or 'data_field_xx'). It would simply this patch greatly and reduce the conflict potential if we only use the 'content_' prefix on content tables and leave node_field and node_group and node_data_ tables alone.
So here's one idea. Let's reduce the number of things that can go wrong by only changing the one table which we know must be changed -- node_[content type] to content_[content type]. We can do something with the other tables later if we like, but don't mix that into this mess right now.
I looked through our core modules and I don't think any of them need any additional update if we leave the node_data_XX tables alone, which takes care of the problem of controlling update order.
Comment #21
yched commentedBy prefixing all these tables with 'content_' we have another set of possible conflicts
Actually, no : the patch uses 'content_type_*' and 'content_field_*' prefix, so there is no way this could conflict with hardcoded 'content_groups*' and 'content_fields*' tables.
only rename node_[content type] to content_[content type]
Yeaah, well. This leaves the db in a 'not-very-clean' state, with some tables in the 'content_' area, and others in the 'node_' area. Really inconvenient when you browse your db in phpMyAdmin, or try to debug a Views query (this concerns us maintainers, primarily :-) ). Later on we'll want to change that eventually and embark in another renaming update.
I really wish we could do that once and for all. But we have a sort of emergency here, so maybe you're right.
I propose allow ourselves 1 or 2 more days to think about it ?
Comment #22
karens commentedI'd say let's make the one critical fix that must be made, wait a day or two to be sure nothing breaks, and then get a new release issued. This and a couple other recent fixes (like the fix to the .install) really need to get into an official release ASAP.
Comment #23
yched commentedOK - I'll submit a new -limited- patch later today for release 1.3.
I propose we "clean the slate" after that, meaning in the next release (1.4 ?) we do not handle upgrade paths for versions <= 1.2. (users from older versions have to update to 1.3 first). How does this sound ?
Comment #24
karens commentedI'm agreeable to drawing some kind of line in the sand for upgrades, especially since there have been so many changes to the database schema over time. Let's make that a separate issue.
Comment #25
yched commentedHere's a trimmed down version : only renames content type tables from 'node_foo' to 'content_type_foo', and should also update views accordingly.
I have to say that in spite of the more limited scope, and after another two hours on the question, I still can't see a way to ensure that the upgrade paths will go smoothly for older 4.7 branches.
Not sure we should do this in two steps and "draw an line" after an intermediate 1.3, or draw the line now, tell 4.7 users to update to 4.7--1.2 and go faster to a nice and consistent db state in 1.3
Well, I guess I'm too tired, my ideas on this right now are kind of blurred.
Comment #26
yched commentedAnd I forgot the patch :-)
Comment #27
karens commentedJust want to get this documented while I'm looking at it. In Drupal 5.x we have three new hooks, one or more of these might be useful here. We hook_enable() -- things that get done anytime a module is enabled, hook_disable() -- things that get done anytime a module is disbled, and hook_requirements() -- which checks for anything that must exist before installing a module (not including dependencies) and which can optionally keep a module from getting installed. Hook enable() and disable() live in the regular module code, and hook_requirements() lives in the install file. I don't know if any of these will help us control update problems, maybe by refusing to install a module that is too far out of date, but something to think about...
Comment #28
yched commentedOK, here's my proposition on this :
Attached patch adds some versioning 'logic' into the _content_tablename function (the helper function that gives the name of dynamic 'per field' and 'per type' tables).
This provides more safety for cck API functions that are called in update functions in content.install and field modules .install : whenever an api function does operations on the tables, it operates on the tables using the correct naming convention (pre or post renaming)
This should eliminate some of the 'irrationalities' and (hopefully) provide for smoother upgrade paths - so this patch goes a little further than my previous 'minimal renaming' one, and moves both 'per type' *and* 'per fields' tables to the 'content_' namespace. It definitely feels too weird having 'per type' on 'content_type_' while 'per field' is still on 'node_field_'.
The patch leaves 'static' tables (node_field, node_field_instance, node_group, node_group_instance) where they are. If we want later on (when we draw that line...), we can move them with the others.
Waiting for some approval before committing :-)
Comment #29
yched commentedPS : thanks for the reminder on hook_enable et al - I can't say I was really aware they were here.
It did not seem as they could be of immediate help on our current issue, but it made me think that maybe we could use hook_enable to perform content_types_rebuild, instead of currently on each module overview submit ?
I opened http://drupal.org/node/112225 about that.
Comment #30
karens commentedIf you move the per-field tables you need to start worrying about update order for older installations that haven't created those table yet, plus don't the individual field module installation files then need updates? That gets us into the whole nightmare we were trying to avoid for now and I'm not sure we've done enough testing of all the possible scenarios and upgrade problems that might create.
I'd still recommend we leave those per-field tables alone until we have a system to ensure we won't run into updgrade issues. We need to get the critical fix committed ASAP without taking more time to be sure we won't break anything else.
Or is there some reason you now think upgrading the node_data tables isn't going to create problems for old outdated installations?
Comment #31
yched commentedI have made a test install with a pre-'per field' version of cck (that's march 26, 2006), and I am currently testing the upgrade process...
Should be useful whichever way we go.
This raised another problem : the 'display settings' column added in node_field_instance by the display settings patch triggers errors in older content.install updates (content_update_5 calls content_fields et al) and breaks the upgrade path.
Well, I'd like to move faster on this too and move on to more interesting (and less tedious...) stuff, but it seems we have to fix this as well :-/
Comment #32
karens commentedWhen I talked about moving ASAP I wasn't suggesting you aren't moving fast enough, I was just suggesting we speed things up by keeping it to the absolute minimum of things that need updating. I know how much time this is taking and it's really a grind.
For the display field problem, I think we could get around that by adding something to the update_5() text to unset the display field in the result that comes back from content_fields();
Several functions like that are used in various updates. I wonder if there are others that need this kind of fix?
What can I do to help at this point?
Comment #33
yched commentedWhen I talked about moving ASAP I wasn't suggesting you aren't moving fast enough (...)
Of course, I know - sorry about that. I guess this stuff starts to get on me :-)
unset the display field in the result that comes back from content_fields()
Unfortunately no, because the error is raised in _content_type_info, that is called in content_fields().
$field_result = db_query("SELECT nfi.field_name, nfi.weight, nfi.label, nfi.widget_type, nfi.widget_settings, nfi.display_settings, nfi.description FROM {node_field_instance} nfi WHERE nfi.type_name = '%s' ORDER BY nfi.weight ASC, nfi.label ASC", $type['type']);I guess either we move the display settings back in the 'widget_settings' column as it was mentioned at one point (although I agree that these ideally deserve a column of their own), or we manage to conditionally load the column somehow ?
What can I do to help at this point?
Well, I'm currently trying to run the update process through a PHP debugger to understand what's going on. Tricky for now, and I will have to stop for today shortly.
I guess I could send you a copy of my install and db dump (it's a 4.7 + 'old cck' db upgraded to 5.0 core, so from there I just have to roll cck 5.0 updates, see errors, drop and re-import the db, roll cck 5.0 updates, etc...). But it's not nice stuff... No obligation, really :-)
Comment #34
karens commentedHere's a really quick idea for the display settings problem, change the code from:
to
Shouldn't break anything and will work no matter what the layout is.
Comment #35
yched commentedSmart :-) That should do the trick indeed...
Er, well apart from the 'display setting' issue, I found something... annoying..., which probably explains much of the upgrade issues we've seen here and there.
_content_type_info, which builds the cached 'content_type_info' array that gathers all the info types and fields, is broken in updates :
it builds types info using something like
So when this is called during contribs update process before our friend content_update_1001 has copied 'old' cck types info into the new core {node_type} table, {node_type} only contains default 'page' and 'story', and thus _content_type_info only has no type info for 'old cck' types, and content_update_5, for instance, is screwed.
I possibly have made things worse during my 'helper tablename function' patch, since I made the choice to generate the tablename once and store it in $type['table'], instead of calling _content_tablename every time we need the table name...
I can try to revert that behaviour (while keeping _content_tablename, i think it's precious right now) and see if it makes things better.
But we may also think again about our upgrade policy... wouldn't we be better off telling 4.7 users 'update to latest 4.7 cck first' ?
Comment #36
karens commentedI would love to tell people they must update to latest 4.7 first but I don't know yet how it can be done. We would need to provide instructions on how to do it (after making sure the instructions work). We need to give the a link to a file they can use that will get them to the right update level for both 4.7 and 5.x (a complete release so all the files are in synch) and make sure that nothing will break if they're currently using a later release and have to back up (because they were using nightly snapshots or cvs and just never got the database updated). We need to check the schema somehow and throw up an error message telling them what must be done. And on what pages is the message displayed and to who (surely not to anyone who views the site, but probably only the site admin). And if they walk past all that and try to update,can we make the program 'refuse' to do it? And even once all that is worked out, what happens if they do the 5.0 update before we have a chance to throw up an error message? And are there any other scenarios where they could end up not getting things updated correctly?
I've been trying to think how this could be done and as you can see, I'm kind of running in circles. I think I need to walk away from this for a bit, then come back at it fresh.
Comment #37
yched commentedAnd even once all that is worked out, what happens if they do the 5.0 update before we have a chance to throw up an error message?
Yes, that's the meanest thing IMO - people will be upgrading core to 5.0, and then find out that we ask them to update their 4.7 cck...
Well, I've made some progress on debugging - the only thing that still have issues (with data loss) are ... per content type fields (the ones we do want to move) ;-)
Comment #38
yched commentedOK, after quite some debugging and db restoring, I think this patch manages to provide a smooth upgrade path from older 4.7 version (starting from antedeluvian 'per field *type*' storage) to new naming convention with 'per field' and 'per content type' tables moved to 'content_'. No warnings, no data loss, correct db scheme at the end.
That includes :
- various fixes in content_update_4/5 and (field module)_update_4 functions,
- an additional check in the part of content_alter_db_field that manages data migration from one storage type to the other
- reverting the 'per type' tablename being stored (and cached) in $type array (I added that in http://drupal.org/cvs?commit=49450)
AAMOF, the 'per field' tables precisely were the ones that did not raise errors or warnings to begin with, so I really think it makes no sense to leave them where they are.
Moving node_field and node_field_instance tables, like I primarily intended to, is, er, absolutely out of question...
Additional tests with 'less antique' 4.7 versions might be done, but I think the current state of the patch is quite close to filling the amount of time I'm willing to spend on this :-), so I think this can be committed.
PS : considering what I found in the process, i'm not sure the pure 4.7 (antique 4.7 to newest 4.7) upgrade is flawless either...
Not willing go deeper tonight...
Comment #39
karens commentedThanks for all the hard work!! I think this is looking pretty good, and I agree it is nice to get both the per field and per content tables moved together, I had just been worried that it was the per field tables that were going to create update problems.
Now the next question is what to do about 4.7. Do we rename the tables there or only in 5.x?
One other small thing I realized. If someone does the update in 4.7 to add the display column they're going to get an error in 5.x when it tries to do it again, but I think we can add a query to see if the column already exists before doing this that will fix that problem.
Comment #40
yched commentedI had just been worried that it was the per field tables that were going to create update problems.
I do understand that. It would really have bugged me, but I would have resigned myself to this as the most reasonable thing to do if I they proved to be troublesome. I'm glad they don't :-)
I'm not sure we have to rename stuff in 4..7. There is no real need as there is for 5.0, so if we folks can deal with different names while doing debugging / support managing in the issue queue, we can leave it as is.
Besides, renaming 4.7 tables also means dealing with the fact that during an upgrade in 5.0, the tables might or might not have been renamed in the 4.7 era (exact same kind of issue than the one 'display_settings' column you're pointing) - I wish we could avoid that.
What I'm wondering is whether the internal stuff (helper table naming function) should be backported (without actually doing any renaming). As I wrote earlier, the upgrade process from old 4.7 to new 4.7 seems buggy as well, and this update function provides some 'security'.
About the display_settings column : that's quite true. I'm not sure how to have an equivalent of db_table_exists for a column (MySQL + PgSQL)... Maybe we could jut add a @ in front of the line that adds the column ?
Comment #41
karens commentedFor update 1003, just add this at the front of it:
I think I agree on not changing table names in 4.7, at least unless we find it becomes a problem. I'm not sure what things you are thinking of backporting, but you're the one who has dug through these problems, so if you have ideas for that, have at it.
Do you want to go ahead and commit this or shall I? I posted an item on groups.drupal.org in the CCK group to forewarn people the table names are changing, and I'm going to add something to the project page and the CCK handbook. Hopefully enough people will see it that we can keep the carnage under control :-)
Comment #42
yched commentedupdate 1003 : right.
Great idea about the post on g.d.o - writing a clean and well phrased public announcement in english always takes me some time, so I'd rather leave it to more adequate people :-)
I'll commit the renaming stuff for 5.0 ASAP. Maybe we can add a request for testing in the g.d.o thread before we release, or do you want to release straight ahead ?
Comment #43
yched commentedcommitted to 5.0 and HEAD :-)
Comment #44
karens commentedI'll mark this to be ported pending whatever you want to propose for that.
As for releasing, I'd like to do it pretty quickly, but let's leave it out here a day or two first just in case anything unexpected happens. I can post asking for reviewers, but testing an update process is really cumbersome (as you well know), so I don't know if we'll get any takers.
Comment #45
yched commentedI agree that we cannot expect too much validation on upgrade process.
But the patch also connects to cck 'neuro system', so I'd be more confident if we could get people to use it, even on plain 5.0 day to day use.
Comment #46
yched commentedeven if that's the right classification, 'to be ported' hides the issue in default issue queue, so i'm setting back to 'needs work'
Comment #47
karens commentedYes, I noticed it disappeared. I posted an issue on the infrastructure issue queue about getting the two new status items (to be ported and needs more info) to show up on the issue queue, so I guess this is as good as anything as an alternative.
Comment #48
karens commentedI'm going to re-categorize this since it isn't a critical bug in 4.7, it's a task.
Comment #49
dopry commentedWe're not going to touch table names in 5.x this later in the cycle I don't think.