continuation of the thread i started on the devel list:
http://lists.drupal.org/pipermail/development/2007-May/024194.html

basically, everything and its dog stuffs whatever it wants directly into $node. there are tons of problems caused by this, e.g. where many modules all think $node->pidis their pid. pid's a great example, since that nails even core (both book.module and path.module think they "own" $node->pid.

attached patch is an example of one way to enforce this. nodeapi('load') gets an array back from each module. instead of array_merge()'ing that into $node, it could put each array into $node->module_name instead.

however, node.module doesn't itself use nodeapi, it does its own thing in node_load(). in order to make this not require a 2000+ line patch, which will basically break every other patch in the queue and be very difficult to get into D6, i was going to use a shortcut, and say that everything in {node} and {node_revisions} that's handled by node_load() gets to stay at the top level of the $node object, so every place in all of drupal that does $node->nid, $node->title, etc, etc, would still work.

unfortunately, this won't work, either, due to comment.module. :( {node} contains a 'comment' column, which is how we know if comments are enabled or disabled on a given node. that'd get clobbered by the $node->comment array when nodeapi loads in comment's additions. :(

comment is the only conflict like this in core, but in IRC, chx pointed out we'd be screwed by a contrib calling itself the promote.module, for example. :(

so, either we have to break *everything* and have $node->node['comment'] (and therefore $node->node['nid']), or we have to give up on this approach for nodeapi's results to get stuffed into an array at $node->module_name...

also, chx was dreaming about having this exactly mirror the schema array, and have $node[table_name][column_name] but that couldn't be enforced at the nodeapi('load') level, since not every DB table matches the module name, and a given module might have N tables. so, if we wanted to go schema-style, we'd have to rely entirely on convention, and we couldn't force modules to get this right.

CommentFileSizeAuthor
node_namespace_0.patch.txt784 bytesdww
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

dww’s picture

oh, and maybe a compromise to avoid the worst of the callsite badness, maybe we keep $node->nid, but then put everything else in their respective module-named-arrays. something like:

$node->nid
$node->node['title']
$node->node['body']
$node->node['comment']
$node->node['promote']
...
$node->comment['comment_count']
...
pwolanin’s picture

could hook_nodeapi just check isset($node->modulename) before doing the assignment? This would still penalize the novice developer, but in a way that's perhaps less likely to make their whole site break.

A change of this magnitude also makes a change of $node->comment to $node->comment_setting (or some such) seem reasonable.

nedjo’s picture

Yes, we need to fix this.

In doing so, we should be thinking ahead to the possibilities that the new schema API opens up.

From http://drupal.org/node/145684:

To enable schema-aware handling of a data item array, we need to reflect the relationships somehow in the data structure. The most obvious way is an approach analagous to the Forms API. That is, when loading data (or directly creating a data array), we mark keys in a way to identify them as foreign references. A node, then, would have direct properties from its primary table--the node table. All of its other properties would be in arrays associated with their respective tables of origin. E.g., a book with a parent have, instead of a 'parent' key, a '@book' key, which is an array of properties held in the book table, in this case, the parent.


// Direct properties of a node as held in the node table.
$node['nid']
// Properties in another table.
$node['@node_revision']['body']

With this structure, eventually, we won't need node_load() or node_save(). drupal_load('node', 21) will be enough. We read what needs to be loaded (or saved) from the table relationships as mapped in the schema API.

Yes, those heady days are a bit far off now. But if we're taking the trouble to change our object structure, we may as well get it right.

pwolanin’s picture

this patch: http://drupal.org/node/146425#comment-253153 now puts all the book module data in $node->book, and nothing else in the top-level $node object.

Frando’s picture

This is such a great thing.

I mostly agree with nedjo, and I like his '@tablename' syntax very much.
I think it is better to stick data into arrays named after the tables the data comes from than to just have an array for each module that invokes nodeapi, because this would be a big step towards a data api. There would however also have to be a place to stick data in that is not directly coming from a table (e.g. node links).

So, what we could do is
- change the $node object to a $node array (for consistency with the sub-arrays)
- only fields of the node table get top level keys
- data coming from other tables (like terms or files) lives in a '@tablename'-array
- other data that modules want to insert into the node array are either stored in '#properties' or in a 'modulename' array (this is basically the same discussion as with the node rendering patch - what is an array, and what is a child? There could also be namespace conflicts between module arrays and fields of the node table, as they'd both be top level keys. Hmmm.)

This would be a major step forward.

A fixed syntax of data arrays makes many things possible, a unified data syntax is, now that the schema API is in, probably the most important step towards a consistent and simple CRUD or data API.

pwolanin’s picture

While I see the possible advantages of naming per table, how would you *enforce* this? A major advantage of the suggestion above to have the nodeapi code do this is that it is automatic.

chx’s picture

We will enforce it by documentation and good example (ie. core). Woe upon to you if you do not keep convention. In an upcoming Drupal version your code will suddenly break...

pwolanin’s picture

@chx - I though that that breaking code was the whole point of new Drupal versions? Not much of a threat ;-)

dww’s picture

Glad to see lots of clueful folks participating here, that's a good sign. ;)

After spending time with this yesterday, I'm afraid of allowing *anything* (with the possible exception of nid) at the top level of the $node object/array -- otherwise, we're going to have namespace collisions again (modules and/or tables that conflict with columns from {node} and {node_revisions}, etc). If we had different syntax for table names, module names, and raw column names, these collisions wouldn't necessarily break things (though it could get a little confusing). With enough crazy syntax, we *could* keep some stuff at the top level, but if we're moving it to an array, we're going to have to break *every* call site, anyway, so it doesn't really buy us much to keep anything at the top.

I can certainly see the benefit of mirroring the schema API's array layout, but it's true we also have stuff in $node that's not from the DB (at least not directly) such as links, etc. I'm also very attracted to the nodeapi('load') forcing you into a unique namespace already (since every module must have a unique name). But if core is universally doing the schema-based array thing, and it's documented as such, I can live with nodeapi('load') still providing you with access to the whole $node so you can populate it as appropriate for your module.

Re: syntax. If we're going to special-case tables in here, how about we use {tablename}, which is the universal Drupal visual cue for a DB table? If we end up needing both table names and module names, @ is nice for modules, since it's sort of like an address conceptually ("if I'm not getting this info directly from a table, where must I go to find it? It's over there @ the foo module..."). Then, the actual column names have no special syntax at all. So, a new $node array might look something like this:

$node['nid']
$node['{node}']['type']
$node['{node}']['status']
$node['{node}']['comment']
$node['{node}']['promote']
$node['{node}']['sticky']
$node['{user}']['uid']
$node['{user}']['name']
$node['{user}']['picture']
$node['{user}']['data']
$node['{node_revisions}']['title']
$node['{node_revisions}']['body']
$node['{node_revisions}']['teaser']
$node['{node_revisions}']['format']
$node['{node_revisions}']['log']
$node['{node_revisions}']['timestamp']
$node['{node_comment_statistics}']['last_comment_timestamp']
$node['{node_comment_statistics}']['last_comment_name']
$node['{node_comment_statistics}']['comment_count']
...

It's certainly cleaner/safer this way, and is a step towards the data API Nedjo and others are talking about, which is cool. However, I see a few potential problems with this:

  1. It makes it even harder to ever change the schema, since now you have to change N call sites if, for example, we move stuff from {node_comment_statistics} back into {node} for performance/JOIN-killing reasons. Schema API makes our schema-awareness possible, but too much awareness just makes it harder to change the schema in another way (sure, it's easy to modify the schema now in your hook_update_N(), but you've also gotta patch every call site in core that references your schema via the $node array). :(
  2. What do you really do about stuff like uid? It's a field in {node}, but we want some stuff JOIN'ed in from {users} about it. And, what about the original author of the node (uid in {node}) vs. the author of the current revision (uid in {node_revisions})? If we're JOIN'ing on {users} for the one in {node}, should the name, picture and data fields go under the $node['{node}'] array to indicate it's been joined on *that* particular uid, or do we just put it at the top level like my example above, and somewhere document that the stuff in $node['{users'}] is data JOIN'ed with the uid from {node}?
  3. If we've got the {table_name} and @module_name syntax, what do we actually do about stuff like the $node->links array? This data is the result of a hook invocation, so it's neither from a specific table nor a specific module. However, it's also not a property of the node array in the regular FAPI sense of #property entities. Maybe we also have $node['%hook_name'] entries? such as:
    $node['nid']
    $node['{node_revisions}']['title']
    ...
    $node['%link']
    ...
    
  4. What's currently in $node->taxonomy doesn't strictly follow the schema's table layout. It's also got some more funky JOIN stuff going on (sort of like the problem with {users} above). So, it's not accurate to use:
    $node['nid']
    $node['{term_node}'][...]
    $node['{term_data}'][...]
    

    Perhaps this is a candidate for just saying "this is taxonomy.module's data in its own special format", and treat this as:

    $node['@taxonomy'] = array('tid1' => array(...), 'tid2' => array(...), ...)
    

    However, I fear developers will be confused by this, and it'll introduce this yucky ambiguity of if you should do it table-based using {} or module-based compound data using @.

All that said, I think we're getting close to something worth starting a patch for. I'd just like to have a better top-level view of (and agreement on) where we're headed and how we're going to address these issues before I start writing a script to convert the call sites and hack up node_load(). ;)

eaton’s picture

This is the stuff that was hashed through on the *display* side for the node rendering patch. Is there any chance that, if we're going to the work of reshuffling everything about nodes already, we could consider using renderable formapi style array? that would allow us to attach quite a bit of metadata if we needed it, and it supports constructs like access control on elements, relatively clean nesting of sub-elements, and so on.

I would REALLY hate for us to reinvent the wheel. If we're smart about it, we could even make it possible for the results of node_load() to be directly renderable for simple node types.

pwolanin’s picture

I'll again argue for the module name solution. I think many modules join 2 or 3 tables to add data to a node, and it makes much more sense to me to have that data together in one array.

Also, in terms of forms, this leads pretty naturally to constructs like:

$form['mymodule']['#tree'] = TRUE;
$form['mymodule']['name'] = array('#type' => textfield, '#default_value' => $node['mymodule']['name']);
$form['mymodule']['address'] = array('#type' => textfield, '#default_value' => $node['mymodule']['address']);

And then the form values map directly back into the right parts of the node object for the preview, etc.

eaton’s picture

If we've got the {table_name} and @module_name syntax, what do we actually do about stuff like the $node->links array? This data is the result of a hook invocation, so it's neither from a specific table nor a specific module. However, it's also not a property of the node array in the regular FAPI sense of #property entities. Maybe we also have $node['%hook_name'] entries? such as:

I'm startign to feel a bit concerned about this direction. One of the real advantages of Drupal's system (thus far, at least) is that it gives modules 'equal access' to stuff that's being built. I worry that if we try to enforce a rigid standard like this, we'll just end up pushing more and more complex logic into other areas of the system.

One of the benefits of using FAPI-style arrays is that, in theory at least, #module and #db_table properties could be easily added to any element if needed. Am I missing something? Adding more and more prefix types and naming conventions seems... less than ideal.

pwolanin’s picture

@eaton - I'm not sure I have a clear idea of the method you're suggesting. Can you post a couple lines of example code?

Frando’s picture

I think eaton is talking of something like this:

$node['node'] = array('#module' => 'node', '#table' => 'node');
$node['node']['nid'] = ..
$node['node']['title'] = ..
$node['node']['body'] = ..
$node['author'] = array('#table' => 'user');
$node['author']['uid'] = ..
$node['author']['name'] = ..
$node['taxonomy'] = array('#module' => 'taxonomy')
$node['links'] = array(...)

... and I've to admit that I quite like this... I'll definitely think about this some more.

eaton’s picture

That still leaves the important question of what the *standards* for that structure should be... but goodness if it doesn't feel more familiar to someone who's worked with FAPI before.

The fact that we already have lots of utility functions for working with this style of array, that we can (if we plan things right) render it straightaway, and the fact that moving forward even something like FQuery could be used for complex manipulation... those are bonus.

eaton’s picture

Also, it's worth noting that in this scheme, things like nid and title and body would need to be #properties rather than elements. The final tricky bit is that $node['#type'] should probably be 'node' rather than 'story' or 'blog' etc, for safety's sake. But I digress.

RobRoy’s picture

This sounds awesome. Each step we take to become more consistent across the board, is lowering the barrier to Drupal development, reducing code, and giving us greater flexibility to do some powerful stuff. This is a logical step and gives us one more object in that familiar construct. We already have hook_nodeapi('view') adding pieces to $node->content somewhat like this, so why not construct the whole $node array like that?

Thank goodness for the extra 4 weeks. So what's the game plan now?

Jürgen Depicker’s picture

subscribing

eaton’s picture

I posted my first stab at unifying some of these various issues in a blog post at http://www.lullabot.com/blog/trouble_nodes ... I think if we can figure out a way to load nodes (and any other objects, if possible) as structured arrays we'd gain a lot. The real problem would be the change in syntax. $node->foo would go away, for the most part. $node['foo']['#value'] would be the syntax, instead.

dmitrig01’s picture

I had this idea:
here is some psudo code to help
function drupal_load($type, $load_args) {
call $type_load.
in the node case, this would be drupal_load('node', 234) or drupal_load('node', array('node' => array('nid' => 1234))) (syntax: array('tablename' => array(fieldname => fieldval)))
that would return an array of specially formatted table names and field names to load. e.g.

array(
  'node' => array(
    'nid',
    'uid',
    'title',
    'vid',
    'type',
    'language',
    'created',
    'changed',
    'status',
    'moderate',
    'sticky',
    'promote',
    'language',
  ),
  'node_revisions'...
  '#primary' => 'node.nid', // What field to reference in the where clause if only a number is passed in
)

then pull in data from hook_load('node') which would add extra fields
*build query here*
*execute query*
*parse results*
*return results*
}

dww’s picture

Last week, Eaton and I discussed this at length. We decided:

a) $node shall become a #FAPI-style structured array.
b) Everything at the top-level of $node will either be #properties, like $node['#nid'] or arrays keyed by module name, like $node['book'][...]
c) Eaton was going to draft some guidelines on what should be a #property and what should be a regular element with a #value.
d) Eaton was going to draft some ideas on the proper use of #type inside a $node array.

It'd sure be great to make progress on this...

dww’s picture

Assigned: dww » Unassigned

Oh, and who am I fooling? ;) I'm not going to have time to work on this anytime soon. Please, let's hope someone else can pick up the torch.

pwolanin’s picture

@dww If you and Eaton don't have time to work on the full refactoring for D6, we should consider the very simple version of $node->modulename['key'] enforced via the node module as you proposed above.

dww’s picture

criznach’s picture

One thing to keep in mind that no one has mentioned specifically is how the FAPI rebuilds node objects using only the form tree after submission. With any existing module (I'm looking at project), there will be cases where forms are divided up into fieldsets, widgets, etc in the tree. Simply turning the #tree property on does not work in this case because your generated node structure will mimic the UI, and not the desired module namespace tree. In most current cases, the FAPI will simply generate a flat node structure because #tree is false by default. We can hack our way around this by manually setting the #parents array on each form element so the generated node has a namespace structure, but that's not automatic, and a maintenance nightmare.

beginner’s picture

Version: 6.x-dev » 7.x-dev
Status: Postponed » Active

per message linked in #24.

webchick’s picture

hunmonk brought this patch to my attention tonight.

a) I do think it'd be great to get this resolved in core. I'm all for increased consistency, etc.
b) It's going to break everything.
c) So I need to touch base with Dries about it.
d) However, unless this change is coupled with something that has profound improvements in what it allows Drupal to do (such as the page/node rendering stuff, or data API) I don't think it has much of a shot on its own. :\ It's really hard to justify every module, theme, block, etc. having to change its code around simply because a few contributed modules never name-spaced properly.

In the meantime, I'd suggest contrib modules use the convention proposed by dww in #1.

mdupont’s picture

Version: 7.x-dev » 8.x-dev

Just stumbled upon this old issue. Is it something still interesting for D8+ now we have entities and classed objects, or is it an obsolete issue we can close?

Version: 8.0.x-dev » 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.6 was released on February 1, 2017 and is the final full bugfix release for the Drupal 8.2.x series. Drupal 8.2.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.3.0 on April 5, 2017. (Drupal 8.3.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.3.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

andypost’s picture

Status: Active » Closed (outdated)

with new entity api there's no more collisions