Download & Extend

Maintain common properties/statistics about entities

Project:Drupal core
Version:8.x-dev
Component:entity system
Category:task
Priority:normal
Assigned:Unassigned
Status:needs work
Issue tags:API clean-up, UUID

Issue Summary

Problem/Motivation

There is a certain set of properties that are not entity specific, but are shared across entities (or aren't part of the entity itself, depending on your viewpoint). node.module currently stores when a node was created and when it was updated, for example, while taxonomy.module doesn't do likewise. For modules who need to process this information (comment.module as a prime example) this should be standardized.

Proposed resolution

Standardize on a set of properties that all core entities share.
The current spec is (work in progress):

  1. id: The primary identifier
  2. vid: The current revision identifier
  3. uuid: The UUID of the entity
  4. bundle: The bundle of the entity
  5. langcode:The language code of the language the entity was created in.
  6. uid: The user ID of the user who created the entity. (By definition this only applies to user created entities, which arguably are most of them, though.) (Note that for users, this refers to the person that created the user itself, i.e. an administrator (or the user itself).)
  7. created: The time this entity was created.
  8. changed: The time this entity was last updated/saved.

To support these properties entities should utilize the entity_base_schema() helper function in their hook_schema() implementation. (The parameters and return value of entity_base_schema() have yet to be bikeshedded.)
There is no strict requirement that entities support these properties, but if possible they should. In how far this will apply to remote entities and/or will be made a requirement or be handled automatically by entity.module will be seen when more of the entity API has been fleshed out. For now this is just about standardization of existing (semi-)patterns.

Remaining tasks

Write the patch.

User interface changes

None.

API changes

The following now works:

<?php
if (in_array($entity, array('comment', 'file', 'node', 'taxonomy_term', 'user')) {
 
$entity->created;
 
$entity->changed;
 
$entity->language;
 
$entity->status;
 
// $entity->uid doesn't work with 'user' :(
}
?>

Renaming of existing keys (e.g. 'nid' -> 'id') is held off for #1233394: Agree on a property naming pattern.

Original report by tstoeckler

In #731724: Decouple comment.module from node by turning comment settings into a field it came up, that comment module needs to know when a given entity was created, or changed and/or by whom. Node module currently maintains that data itself, and comment.module sort of denormalizes it in {node_comment_statistics}. But this information isn't inherently part of the entity information itself. To recapitulate @Crell's famous example, the pizza doesn't have to know when it was made or by whom.

So the idea of this patch is to somehow store and manage these properties on all entities. Note that this is not about Metadata of the Entity itself á la #551406: Centralize entity properties meta data, but about the mentioned "meta-metadata" :). There might be more, but the following came up in #731724: Decouple comment.module from node by turning comment settings into a field:

  1. Author/Creator of an entity (prev. $node->uid)
  2. Creation time of an entity (prev. $node->created)
  3. Updated time of an entity (prev. $node->changed)

I'm willing to put some work into this, but I've only gotten a glimpse of the whole Entity API picture so far, so I'd like to discuss storage and implementation first.

Related:
#1233394: Agree on a property naming pattern

Comments

#1

Now that we have an entity.module, we could have this in entity.install:

function entity_base_schema() {
  $schema['id'] = array(
    'description' => 'The primary identifier.',
    'type' => 'serial',
    'unsigned' => TRUE,
    'not null' => TRUE,
  );
  $schema['vid'] = array(
    'description' => 'The current revision identifier.',
    'type' => 'int',
    'unsigned' => TRUE,
    'not null' => TRUE,
    'default' => 0,
  );
  $schema['language'] = array(
    'description' => 'The {languages}.language.',
    'type' => 'varchar',
    'length' => 12,
    'not null' => TRUE,
    'default' => '',
  );
  $schema['uid'] = array(
    'description' => 'The {users}.uid that owns this entity; initially, this is the user that created it.',
    'type' => 'int',
    'not null' => TRUE,
    'default' => 0,
  );
  $schema['status'] = array(
    'description' => 'Boolean indicating whether the entity is published.',
    'type' => 'int',
    'not null' => TRUE,
    'default' => 1,
  );
  $schema['created'] = array(
    'description' => 'The Unix timestamp when the entity was created.',
    'type' => 'int',
    'not null' => TRUE,
    'default' => 0,
  );
  $schema['updated'] = array(
    'description' => 'The Unix timestamp when the entity was most recently saved.',
    'type' => 'int',
    'not null' => TRUE,
    'default' => 0,
  );
  return $schema;
}

#2

Moving those node only properties to every entities is really a good idea. It should definitely be done!
But because the entities can have a different storage, how do you manage to store it for those who would not use the database?

#3

After a very enlightening chat with @sun and @fubhy, I added an issue summary, and am now off to see if I can cook something up. :)

#4

We should also be adding uuid to this list too.

#5

Also: bundle. This would also enforce the notion of every entity is published or unpublished which may not be desirable.

#6

Status:active» needs work

Bundle sounds good, I had pondered about that when I saw {node}.type.
UUIDS I'm not sure about. Currently no entities support UUIDS, and while it might not be that much work to support them, I don't think this issue should depend on it. I'll look into it.

This would also enforce the notion of every entity is published or unpublished which may not be desirable.

Could you give a reason why that might not be desirable? I can understand that it might not be needed in a lot of cases (e.g. disabled taxonomy terms), but I can't see what harm it would do.

Here's an initial patch, which implements the entity_base_schema() function as above and makes all core entities use it. The properties aren't properly maintained yet, though.

I read fubhy's post about entity controllers yesterday and I think in light of that, it might make sense to turn this into a separate "EntityStatistics" controller in the long run (I know statistics isn't the right word, but I haven't come up with the right one yet). We will have to see about that, though, when more of the bigger picture is in place.

AttachmentSizeStatusTest resultOperations
entity-base-schema.patch27.71 KBIdleFAILED: [[SimpleTest]]: [MySQL] Unable to apply patch entity-base-schema.patch. Unable to apply patch. See the log in the details link for more information.View details | Re-test

#7

Status:needs work» needs review

I agree with @Dave Reid, bundle + uuid should be part of the base schema.

For the time being, we could comment out the uuid definition, as I'm not sure whether there's any working code for uuids in core yet.

#8

UUID and bundle definitely. We don't have UUIDs on entities right now, but the Config Initiative has that as part of its roadmap so we should assume that it will happen. I've added those to the summary.

I'm actually not convinced on "status", because it could mean very different things in different entities. Nodes have a 2-state status right now (published, unpublished). Users (should?) have three (active, pending, banned). Taxonomy terms, uh, no idea.

As noted in #1233394: Agree on a property naming pattern I do not support folding all IDs to just "id", but that should be discussed in the linked issue.

#9

I'm actually not convinced on "status", because it could mean very different things in different entities. Nodes have a 2-state status right now (published, unpublished). Users (should?) have three (active, pending, banned). Taxonomy terms, uh, no idea.

Exactly. So -1 on status.
I have use cases for entities that are not language aware, I'd remove that from entity_base_schema() along with status.
Also, why do we always have "vid" if an entity might not be revisionable?

It's terrible that Core right now has entities with bundles that don't map to columns. It breaks EntityFieldQuery and is just bad.

#10

Standardizations like this are certainly a good idea. I like the idea of adding a schema-helper function for that.

There is no strict requirement that entities support these properties, but if possible they should.

Yep, requirements are a totally different topic. Still, I'd replace *if possible* with *if adequate*. If something doesn't make sense in a certain situation, it shouldn't be added.

@status
I don't like adding status to terms for the sake of standardizations. If terms have no status, that's fine. Adding something without any meaning just confuses people. (see "if adequate" above).

UUID and bundle definitely. We don't have UUIDs on entities right now, but the Config Initiative has that as part of its roadmap so we should assume that it will happen. I've added those to the summary.

I agree we should plan for uuids. But it's not there yet, so we shouldn't include it in patches for now.

Bundle.. well, I'm not sure whether replacing every bundle key with "bundle" is a good idea. Why should something be called differently just because it's used as bundle property? If I have apples and they should have certain fields depending on the colour, that would mean I'd have to call the colour bundle. Uh?

vid: The current revision identifier
uid: The user ID of the user who created the entity.

I think at #1233394: Agree on a property naming pattern we have an agreement that we want to do away with this abbreviated ids , but use properties like "user_id". Analogously, maybe revision_id or later revision_uuid would make sense to me.

#11

It's terrible that Core right now has entities with bundles that don't map to columns. It breaks EntityFieldQuery and is just bad.

Yep, I agree we should enforce bundle keys to be "query-able".

#12

I suppose nothing forbids us from standardizing on "bundle" as a property name in code, and a bundleLabel() method/property/whatever for the user-facing text (eg, nodes would use "Type"). I think the API consistency is more important, personally. The less work it is to support new "stuff" (fields or otherwise) on "entity interface using things", the better.

#13

Still, we'd have to deal with the apple's "bundle" in code instead of its "colour", what makes it harder to write/follow that code. Also, having different UI labels and properties in code is bad for DX as there is no naive association between the UI element and the property any more.

Also, with #1184944: Make entities classed objects, introduce CRUD support we have a method to get the bundle, so why not just use it?

I'm all in for standardization but I think "bundle" is different here than "id", or "language". Id and language are properties that have the same meaning across all entity-types, so using a unique name makes much sense. But "bundle" is an attribute we assign to another property which already has its meaning. Who'd assume that $term->bundle would be the vocabulary?

#14

Re 'status': I think it does make sense as a general entity property. There will always be entities for which this doesn't apply, but in general I think it does. To have one property to look at which tells you whether this entity can be viewed publically make sense in just about any kind of entity listing. This doesn't prevent user.module from having 3 (or 10, or whatever) statuses, we can just do if(!empty($entity->status)).
Even though taxonomy terms right now don't support this behavior, I think they should. On a newspaper website, for instance, if nodes go through a peer-review and moderation process, you could just as well have tags go through that process as well, before they are 'published' and publically viewable.

Not having a 'status' on entities makes something like #731724: Decouple comment.module from node by turning comment settings into a field just that much weirder. Comment module keeps statistics on all nodes (to be entities), but only published ones...

#15

Using those sorts of flags for access control is vastly more difficult than you make it sound. :-) For one, it has to be done in the query for listings. empty($entity->status) is too late to do that check. It has to be in the SQL a la the node grants system.

Secondly, empty($entity->status) assumes that:
1) status 0 means "do not show to a certain class of user"
2) status !0 (including negative numbers) always implies "DO show this to all users"

Thirdly, a unified status field assumes that all keys have the same meaning across all entities. But which is closer to "unpublished", "user not yet approved" or "user blocked"? And whichever of those gets relegated to ID 2 would implicitly mean "published", yet I wouldn't consider either of those analogous to published (and therefore, by the empty($entity->status) check, accessible to the world).

Of course, an int value here should be used for only one of two things: boolean true/false (because MySQL has no boolean type) or an actual numeric value. If it's being used for an enumeration, then just storing an int is a 100% guaranteed way to introduce keyspace collisions and ugly bugs. As soon as you have the potential for multiple legit values, the potential legit values become infinite. Someone is going to add a user status of 4 to mean something relevant to them, and then some other module will do the same, and poof, those two modules are now not only incompatible but probably create a security risk. Doubleplusungood. :-)

The only way that a shared status column could work is if were a strict boolean, and that boolean had a very clearly defined meaning that we could logically apply to all entities. The only one I can think of is "available to all users, even anonymous" vs. "available only to selected administrators", aka core's node published state. However, we've already seen how that is insufficient as soon as you introduce any sort of robust workflow system so even there it's a bad fit.

Let's not make a standard out of something that doesn't even fit logically now. :-)

#16

I agree, that once/if we have a robust workflow system in core, this whole discussion is moot, and that then 'status' will probably have to go. If there's so much support for dropping 'status', then I'll defer.
Updated the issue summary accordingly.

#17

I agree with Crell, status isn't logical nor sufficient enough to make sense at this point. I think we can find another approach for comment.module and #731724: Decouple comment.module from node by turning comment settings into a field to track published entities.

I'd like to make this a strict requirement. Introducing a vague recommendation won't help much, since we then will end up doing extra checks for everything anyway.

#18

Re #17:
I think when we have more of the Entity CRUD and controllers in place, we will have easy access to id(), vid(), and bundle() anyway. Then, I think, we should/could add something like a EntityStatisticsController (statistics is still the wrong word...) that tracks and gives access to 'created', 'updated', etc. I think establishing a soft standard is a step in the right direction, though, even if it isn't the whole nine yards.

#19

Issue tags:+D8MI

I wholeheartedly support adding language to the base schema. @bojanz: for the use cases where you have language independent entities, that is what LANGUAGE_NONE is for and the language would default to that unless told otherwise I think, so you should not be bothered that it is there. I think the 80% use case is that an entity will easily have some property or field that has information that is in some language, and at that moment, the language property gets meaning. I think it makes sense to have it there in the default schema for once someone decides to add a textfield to your entity, and bingo there goes the need for language tracking.

#20

I wholeheartedly support adding language to the base schema.

Yes, a language callback as the one discussed in #1260640: Improve field language API DX could ignore/override this property in some contexts but it definitely needs to be there most of the time.

@bojanz: for the use cases where you have language independent entities, that is what LANGUAGE_NONE is for and the language would default to that unless told otherwise I think, so you should not be bothered that it is there.

Slightly OT but I think we will need also a LANGUAGE_UNSUPPORTED (zxx) value for this, see http://www.w3.org/International/questions/qa-no-language#nonlinguistic.

#21

On only having the LANGUAGE_NONE property we have #965300: Change LANGUAGE_NONE to LANGUAGE_NOT_SPECIFIED; add LANGUAGE_NOT_APPLICABLE and LANGUAGE_MULTIPLE which might be the issue for it or at least related.

#22

#23

Issue tags:+language-content

Tagging for the content handling leg of D8MI.

#24

I've just submitted "Drupal 8 entity property translation BoF" for Drupalcon Denver. If you are going to be there, please comment there so we can figure out the best time to do it.: http://denver2012.drupal.org/forum/drupal-8-entity-property-translation-bof Thanks!

#25

#6: entity-base-schema.patch queued for re-testing.

#26

Status:needs review» needs work

The last submitted patch, entity-base-schema.patch, failed testing.

#27

Issue tags:+UUID

#28

Issue tags:-D8MI, -language-content

UUIDs and langcode were introduced elsewhere on all core entities. Removing this off D8MI.

nobody click here