Problem/Motivation

There is a certain set of properties that are not entity specific, but are shared across entities (or aren't part of the entity itself, depending on your viewpoint). node.module currently stores when a node was created and when it was updated, for example, while taxonomy.module doesn't do likewise. For modules who need to process this information (comment.module as a prime example) this should be standardized.

Proposed resolution

Standardize on a set of properties that all core entities share.
The current spec is (work in progress):

  1. id: The primary identifier
  2. vid: The current revision identifier
  3. uuid: The UUID of the entity
  4. bundle: The bundle of the entity
  5. langcode:The language code of the language the entity was created in.
  6. uid: The user ID of the user who created the entity. (By definition this only applies to user created entities, which arguably are most of them, though.) (Note that for users, this refers to the person that created the user itself, i.e. an administrator (or the user itself).)
  7. created: The time this entity was created.
  8. changed: The time this entity was last updated/saved.

To support these properties entities should utilize the entity_base_schema() helper function in their hook_schema() implementation. (The parameters and return value of entity_base_schema() have yet to be bikeshedded.)
There is no strict requirement that entities support these properties, but if possible they should. In how far this will apply to remote entities and/or will be made a requirement or be handled automatically by entity.module will be seen when more of the entity API has been fleshed out. For now this is just about standardization of existing (semi-)patterns.

Remaining tasks

Write the patch.

User interface changes

None.

API changes

The following now works:

<?php
if (in_array($entity, array('comment', 'file', 'node', 'taxonomy_term', 'user')) {
 
$entity->created;
 
$entity->changed;
 
$entity->language;
 
$entity->status;
 
// $entity->uid doesn't work with 'user' :(
}
?>

Renaming of existing keys (e.g. 'nid' -> 'id') is held off for #1233394: [Policy, no patch] Agree on a property naming pattern.

Original report by tstoeckler

In #731724: Convert comment settings into a field to make them work with CMI and non-node entities it came up, that comment module needs to know when a given entity was created, or changed and/or by whom. Node module currently maintains that data itself, and comment.module sort of denormalizes it in {node_comment_statistics}. But this information isn't inherently part of the entity information itself. To recapitulate @Crell's famous example, the pizza doesn't have to know when it was made or by whom.

So the idea of this patch is to somehow store and manage these properties on all entities. Note that this is not about Metadata of the Entity itself á la #551406: Centralize entity properties meta data, but about the mentioned "meta-metadata" :). There might be more, but the following came up in #731724: Convert comment settings into a field to make them work with CMI and non-node entities:

  1. Author/Creator of an entity (prev. $node->uid)
  2. Creation time of an entity (prev. $node->created)
  3. Updated time of an entity (prev. $node->changed)

I'm willing to put some work into this, but I've only gotten a glimpse of the whole Entity API picture so far, so I'd like to discuss storage and implementation first.

Related:
#1233394: [Policy, no patch] Agree on a property naming pattern

Files: 
CommentFileSizeAuthor
#6 entity-base-schema.patch27.71 KBtstoeckler
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch entity-base-schema.patch. Unable to apply patch. See the log in the details link for more information.
[ View ]

Comments

Issue tags:+API clean-up

Now that we have an entity.module, we could have this in entity.install:

function entity_base_schema() {
  $schema['id'] = array(
    'description' => 'The primary identifier.',
    'type' => 'serial',
    'unsigned' => TRUE,
    'not null' => TRUE,
  );
  $schema['vid'] = array(
    'description' => 'The current revision identifier.',
    'type' => 'int',
    'unsigned' => TRUE,
    'not null' => TRUE,
    'default' => 0,
  );
  $schema['language'] = array(
    'description' => 'The {languages}.language.',
    'type' => 'varchar',
    'length' => 12,
    'not null' => TRUE,
    'default' => '',
  );
  $schema['uid'] = array(
    'description' => 'The {users}.uid that owns this entity; initially, this is the user that created it.',
    'type' => 'int',
    'not null' => TRUE,
    'default' => 0,
  );
  $schema['status'] = array(
    'description' => 'Boolean indicating whether the entity is published.',
    'type' => 'int',
    'not null' => TRUE,
    'default' => 1,
  );
  $schema['created'] = array(
    'description' => 'The Unix timestamp when the entity was created.',
    'type' => 'int',
    'not null' => TRUE,
    'default' => 0,
  );
  $schema['updated'] = array(
    'description' => 'The Unix timestamp when the entity was most recently saved.',
    'type' => 'int',
    'not null' => TRUE,
    'default' => 0,
  );
  return $schema;
}

Moving those node only properties to every entities is really a good idea. It should definitely be done!
But because the entities can have a different storage, how do you manage to store it for those who would not use the database?

Issue summary:View changes

Updated issue summary.

After a very enlightening chat with @sun and @fubhy, I added an issue summary, and am now off to see if I can cook something up. :)

We should also be adding uuid to this list too.

Also: bundle. This would also enforce the notion of every entity is published or unpublished which may not be desirable.

Status:Active» Needs work
StatusFileSize
new27.71 KB
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch entity-base-schema.patch. Unable to apply patch. See the log in the details link for more information.
[ View ]

Bundle sounds good, I had pondered about that when I saw {node}.type.
UUIDS I'm not sure about. Currently no entities support UUIDS, and while it might not be that much work to support them, I don't think this issue should depend on it. I'll look into it.

This would also enforce the notion of every entity is published or unpublished which may not be desirable.

Could you give a reason why that might not be desirable? I can understand that it might not be needed in a lot of cases (e.g. disabled taxonomy terms), but I can't see what harm it would do.

Here's an initial patch, which implements the entity_base_schema() function as above and makes all core entities use it. The properties aren't properly maintained yet, though.

I read fubhy's post about entity controllers yesterday and I think in light of that, it might make sense to turn this into a separate "EntityStatistics" controller in the long run (I know statistics isn't the right word, but I haven't come up with the right one yet). We will have to see about that, though, when more of the bigger picture is in place.

Status:Needs work» Needs review

I agree with @Dave Reid, bundle + uuid should be part of the base schema.

For the time being, we could comment out the uuid definition, as I'm not sure whether there's any working code for uuids in core yet.

UUID and bundle definitely. We don't have UUIDs on entities right now, but the Config Initiative has that as part of its roadmap so we should assume that it will happen. I've added those to the summary.

I'm actually not convinced on "status", because it could mean very different things in different entities. Nodes have a 2-state status right now (published, unpublished). Users (should?) have three (active, pending, banned). Taxonomy terms, uh, no idea.

As noted in #1233394: [Policy, no patch] Agree on a property naming pattern I do not support folding all IDs to just "id", but that should be discussed in the linked issue.

Issue summary:View changes

Added proper issue summary

I'm actually not convinced on "status", because it could mean very different things in different entities. Nodes have a 2-state status right now (published, unpublished). Users (should?) have three (active, pending, banned). Taxonomy terms, uh, no idea.

Exactly. So -1 on status.
I have use cases for entities that are not language aware, I'd remove that from entity_base_schema() along with status.
Also, why do we always have "vid" if an entity might not be revisionable?

It's terrible that Core right now has entities with bundles that don't map to columns. It breaks EntityFieldQuery and is just bad.

Standardizations like this are certainly a good idea. I like the idea of adding a schema-helper function for that.

There is no strict requirement that entities support these properties, but if possible they should.

Yep, requirements are a totally different topic. Still, I'd replace *if possible* with *if adequate*. If something doesn't make sense in a certain situation, it shouldn't be added.

@status
I don't like adding status to terms for the sake of standardizations. If terms have no status, that's fine. Adding something without any meaning just confuses people. (see "if adequate" above).

UUID and bundle definitely. We don't have UUIDs on entities right now, but the Config Initiative has that as part of its roadmap so we should assume that it will happen. I've added those to the summary.

I agree we should plan for uuids. But it's not there yet, so we shouldn't include it in patches for now.

Bundle.. well, I'm not sure whether replacing every bundle key with "bundle" is a good idea. Why should something be called differently just because it's used as bundle property? If I have apples and they should have certain fields depending on the colour, that would mean I'd have to call the colour bundle. Uh?

vid: The current revision identifier
uid: The user ID of the user who created the entity.

I think at #1233394: [Policy, no patch] Agree on a property naming pattern we have an agreement that we want to do away with this abbreviated ids , but use properties like "user_id". Analogously, maybe revision_id or later revision_uuid would make sense to me.

It's terrible that Core right now has entities with bundles that don't map to columns. It breaks EntityFieldQuery and is just bad.

Yep, I agree we should enforce bundle keys to be "query-able".

I suppose nothing forbids us from standardizing on "bundle" as a property name in code, and a bundleLabel() method/property/whatever for the user-facing text (eg, nodes would use "Type"). I think the API consistency is more important, personally. The less work it is to support new "stuff" (fields or otherwise) on "entity interface using things", the better.

Still, we'd have to deal with the apple's "bundle" in code instead of its "colour", what makes it harder to write/follow that code. Also, having different UI labels and properties in code is bad for DX as there is no naive association between the UI element and the property any more.

Also, with #1184944: Make entities classed objects, introduce CRUD support we have a method to get the bundle, so why not just use it?

I'm all in for standardization but I think "bundle" is different here than "id", or "language". Id and language are properties that have the same meaning across all entity-types, so using a unique name makes much sense. But "bundle" is an attribute we assign to another property which already has its meaning. Who'd assume that $term->bundle would be the vocabulary?

Re 'status': I think it does make sense as a general entity property. There will always be entities for which this doesn't apply, but in general I think it does. To have one property to look at which tells you whether this entity can be viewed publically make sense in just about any kind of entity listing. This doesn't prevent user.module from having 3 (or 10, or whatever) statuses, we can just do if(!empty($entity->status)).
Even though taxonomy terms right now don't support this behavior, I think they should. On a newspaper website, for instance, if nodes go through a peer-review and moderation process, you could just as well have tags go through that process as well, before they are 'published' and publically viewable.

Not having a 'status' on entities makes something like #731724: Convert comment settings into a field to make them work with CMI and non-node entities just that much weirder. Comment module keeps statistics on all nodes (to be entities), but only published ones...

Using those sorts of flags for access control is vastly more difficult than you make it sound. :-) For one, it has to be done in the query for listings. empty($entity->status) is too late to do that check. It has to be in the SQL a la the node grants system.

Secondly, empty($entity->status) assumes that:
1) status 0 means "do not show to a certain class of user"
2) status !0 (including negative numbers) always implies "DO show this to all users"

Thirdly, a unified status field assumes that all keys have the same meaning across all entities. But which is closer to "unpublished", "user not yet approved" or "user blocked"? And whichever of those gets relegated to ID 2 would implicitly mean "published", yet I wouldn't consider either of those analogous to published (and therefore, by the empty($entity->status) check, accessible to the world).

Of course, an int value here should be used for only one of two things: boolean true/false (because MySQL has no boolean type) or an actual numeric value. If it's being used for an enumeration, then just storing an int is a 100% guaranteed way to introduce keyspace collisions and ugly bugs. As soon as you have the potential for multiple legit values, the potential legit values become infinite. Someone is going to add a user status of 4 to mean something relevant to them, and then some other module will do the same, and poof, those two modules are now not only incompatible but probably create a security risk. Doubleplusungood. :-)

The only way that a shared status column could work is if were a strict boolean, and that boolean had a very clearly defined meaning that we could logically apply to all entities. The only one I can think of is "available to all users, even anonymous" vs. "available only to selected administrators", aka core's node published state. However, we've already seen how that is insufficient as soon as you introduce any sort of robust workflow system so even there it's a bad fit.

Let's not make a standard out of something that doesn't even fit logically now. :-)

I agree, that once/if we have a robust workflow system in core, this whole discussion is moot, and that then 'status' will probably have to go. If there's so much support for dropping 'status', then I'll defer.
Updated the issue summary accordingly.

Issue summary:View changes

Add UUID and bundle to the list

I agree with Crell, status isn't logical nor sufficient enough to make sense at this point. I think we can find another approach for comment.module and #731724: Convert comment settings into a field to make them work with CMI and non-node entities to track published entities.

I'd like to make this a strict requirement. Introducing a vague recommendation won't help much, since we then will end up doing extra checks for everything anyway.

Re #17:
I think when we have more of the Entity CRUD and controllers in place, we will have easy access to id(), vid(), and bundle() anyway. Then, I think, we should/could add something like a EntityStatisticsController (statistics is still the wrong word...) that tracks and gives access to 'created', 'updated', etc. I think establishing a soft standard is a step in the right direction, though, even if it isn't the whole nine yards.

Issue tags:+D8MI

I wholeheartedly support adding language to the base schema. @bojanz: for the use cases where you have language independent entities, that is what LANGUAGE_NONE is for and the language would default to that unless told otherwise I think, so you should not be bothered that it is there. I think the 80% use case is that an entity will easily have some property or field that has information that is in some language, and at that moment, the language property gets meaning. I think it makes sense to have it there in the default schema for once someone decides to add a textfield to your entity, and bingo there goes the need for language tracking.

I wholeheartedly support adding language to the base schema.

Yes, a language callback as the one discussed in #1260640: Improve field language API DX could ignore/override this property in some contexts but it definitely needs to be there most of the time.

@bojanz: for the use cases where you have language independent entities, that is what LANGUAGE_NONE is for and the language would default to that unless told otherwise I think, so you should not be bothered that it is there.

Slightly OT but I think we will need also a LANGUAGE_UNSUPPORTED (zxx) value for this, see http://www.w3.org/International/questions/qa-no-language#nonlinguistic.

On only having the LANGUAGE_NONE property we have #965300: Change LANGUAGE_NONE to LANGUAGE_NOT_SPECIFIED; add LANGUAGE_NOT_APPLICABLE and LANGUAGE_MULTIPLE which might be the issue for it or at least related.

Issue tags:+language-content

Tagging for the content handling leg of D8MI.

I've just submitted "Drupal 8 entity property translation BoF" for Drupalcon Denver. If you are going to be there, please comment there so we can figure out the best time to do it.: http://denver2012.drupal.org/forum/drupal-8-entity-property-translation-bof Thanks!

Issue summary:View changes

Removed 'status' per consensus.

#6: entity-base-schema.patch queued for re-testing.

Status:Needs review» Needs work
Issue tags:+API clean-up, +D8MI, +language-content

The last submitted patch, entity-base-schema.patch, failed testing.

Issue tags:+UUID

Issue tags:-D8MI, -language-content

UUIDs and langcode were introduced elsewhere on all core entities. Removing this off D8MI.

Note that changed time tracking for stored core content entities is being proposed at #2074191: [META] Add changed timestamp tracking to content entities with a common interface to accompany it.

Issue summary:View changes

Updated issue summary.

Status:Needs work» Closed (duplicate)

I think we have this covered now through interfaces and field types (created with #2145103: Provide non-configurable field types for entity created, changed and timestamp fields) ?

Well, the introduction of some of those interfaces did not solve storing that data, eg. menu or taxonomy editing is still not accountable (we don't know users and time information). So I don't think this is solved, but it may be fair to say it is out of scope for D8.

Status:Closed (duplicate)» Active

I agree — we only standardized the supporting code via interfaces and other magic, but we didn't actually change all implementations and their schema/storage.

That said, at the same time, the base schema of e.g. node entities looks vastly different today compared to what is being proposed in the issue summary. Primarily due to architectural changes in the Entity Field API and Content Translation systems.

Out of all entity types, nodes and custom blocks are still the only ones that have revisions, whereas custom block versions do not record any additional information aside from the content itself (i.e., no authoring info, etc).

Configurable fields still have the same revisions as before, and we have converted (or are converting) more "default" fields of entity types into configurable fields, which might help with making more entity types versioned + follow the proposed standard schema here.

Lastly, there is a parallel track/effort that tries to auto-generate the base schema of entity type tables through their base field property definitions for D8 — in case that effort succeeds, the actual database schema would be completely decoupled, and theoretically, I'd assume it would be very simple to add missing standard properties to all entities.

I still think that this is one of the most important issues around for Drupal as a Content Management System product — because without this content entity standardization, we're lying quite a bit on the "management" part...

The concern of potentially missing standard properties on some core entity types just came up in #2060629: Remove hook_schema_alter() from core

Use-case being: (1) Core provides a base/default implementation for a certain entity type. (2) Contrib wants to advance on it. (3) But without having to replace it, its storage, and all data entirely.

The concrete example mentioned: The File entity does not have a 'bundle'. The contributed File Entity module added a 'bundle' column and used hook_schema_alter() to retain and expose it.

Thus, in order to retain extensibility and flexibility for contrib, every entity type in core should support the discussed set of base properties here. That guarantees that no module has to futz with the base entity schema in order to enable entity features that are available by default for other entity types.

What it won't allow is to add arbitrary custom schema fields to existing entity types, but I think there's a good level of agreement that this should not be supported in the first place, because the schema and data cannot be properly maintained — instead, you can implement custom field items for that use-case.

Re #33: With #1498720: [PP-2] Make ContentEntityDatabaseStorage handle changes in the entity schema definition you could just add the additional field (i.e. bundle) in hook_entity_field_info() and the storage controller would adapt the schema automatically.

That said, I would totally support adding common fields to all entity types, i.e. a bundle to the file entity, etc.