Hi all,
uuids for entities are needed in Drupal 8 for content staging and other initiatives.
The generation and validation of uuids has already been implemented by dixon.
What's still missing is to implement the generation of uuid for each core entity.
Unknowing what was awaiting me, i've yesterday been conscripted by sun and fago for this task during the DrupalDevDays Barcelona code sprint(s).
Find (in a moment) below a patch against Drupal 8 dev branch of this morning with a first approach for implementing uuid in nodes for quick evaluation (before i continue with the other node types), including a uuid schema definition and updating in node.install, property definition in Node class and uuid generation in NodeStorageController, as well as a new entity_load_by_uuid function in entity.module.
If this (rather trivial) approach is okay, i continue with the other core entity types (terms, files, users, comments, probably not vocabularies, they already have a machine name).
Please review and comment as quick as possible <:)
Comment | File | Size | Author |
---|---|---|---|
#42 | config.uuid_.42.patch | 21.96 KB | sun |
#40 | config.uuid_.40.patch | 21.96 KB | sun |
#40 | interdiff.txt | 1.97 KB | sun |
#37 | config.uuid_.37.patch | 22.59 KB | sun |
#37 | interdiff.txt | 14.21 KB | sun |
Comments
Comment #1
danielnolde CreditAttribution: danielnolde commentedAttached the patch.
Comment #2
fagoThe data type here should be EntityInterface|FALSE. The sentence should say "entity with the given UUID".
I don't think we need that much @see here. Only include relevant stuff.
Problem is, that won't work with all entity types. So we have to document that. Also, entity types need to be able to "opt-in" to support uuids. So we probably want to add an "uuid" entity key for that to the entity-keys section of hook_entity_info()?
The result is actually no entity, so the variable should not say so. Use e.g. $result.
Indeed, ouch.
Unnecessary indentation.
Comment #3
fagoImproved issue title + added our sprint tags
Comment #4
danielnolde CreditAttribution: danielnolde commentedThanks for the feedback, Wolfgang, the code improvement suggestions and raising the issue of incorporating an additional key type for this via hook_entity_info. Good point!
Attached is a new improved version including all those things - still only for the node entity (will be extended soon to other core entity types => further feedback very welcome!).
Comment #5
fagoLet's better not add this, as we do not have something like that in core yet. Or maybe say it could be used by a content staging mechanism.
Also, I guess we should make more clear it's optional. Maybe best just follow the notion as we use for @parameter definitions:
- uuid: (optional) The name ...
Comment #6
danielnolde CreditAttribution: danielnolde commentedtrue.
Okay, find attached the complete patch with uuid introduction for every core entity type that probably should carry a uuid: node, taxonomy_term, user, comment and file.
Have at it, test it (automated tests are still missing, coming if everything is seen ok).
Comment #7
BerdirComment #8
sunstring
I expected to find this in the generic/universal DatabaseStorageController for all entities.
The uuid property is as custom as the already existing langcode property. We definitely want it on all entities in Drupal by default.
36 sounds too small to me. I think I've seen uuids using 64 to 128 chars already.
We should take into account that a uuid might have been generated by another system than Drupal, possibly not even running on PHP.
I'd therefore go with 128 chars, sounds like a sensible limit.
(and elsewhere):
1) "a" instead of "an".
2) "UUID" is an acronym and should be all-uppercase.
"@todo", all lowercase, without colon.
Let's create a follow-up issue for auto-generic uuids for all existing entities; i.e., the upgrade path. I don't think that has to be done as part of this patch.
The first two arguments, plus the opening "array(" should be all on one line.
Lastly, by executing this code in ::create(), it looks like we do not have to account for the case in which a uuid already exists (which would be the case for ::save()). However, I wonder whether it might be required to add some "failsafe" logic to the save() method as well, in order to ensure that every entity that is saved has a uuid (and if not, create one).
Speaking of unforeseen possibilities, we at least need some basic tests here.
I'd love to see this patch land this weekend. It's ultimately the very very first, most trivial corner-stone for the (much larger and much more complex) content staging efforts.
Comment #9
dixon_I'd actually argue that we should make UUID a requirement for all entity types. Why? Because if we want Drupal to become truly interoperable system built on RESTful principles, we need a unique way of identifying our resources across other systems. And requiring UUIDs for entities would not mean too much of an overhead.
I'll look at the actual patch more in detail soon.
Comment #10
danielnolde CreditAttribution: danielnolde commented@sun: As discussed yesterday, i'll work all those suggestions and corrections in this weekend, so this should be able to be RTBC'd as of tomorrow=sunday.
@dixon: Best wait for tomorrows patch containing sun's input...
thanks for all that prompt and comprehensive input, guys, i learned much once again! :)
Comment #11
danielnolde CreditAttribution: danielnolde commentedFind attached a new patch with sun's input incorporated.
@Dixon: That's a version you now can review ;)
What's missing:
* Tests (sun pointed out that it would be sufficient to extend an existing basic entity test with some lines of code for creation>retrieval>validation of UUIDs)
* More thoughts about how far the automation of UUID creation in DatabaseStorageController should go, e.g. if the "uuid"-type entity key should be considered automatically for saving UUIDs with every entity, and so on...
Comment #13
fagoI agree, we want UUIDs. As discussed, it makes sense to add the uuid property by default to our entity class. However, we cannot enforce an entity type to provide an uuid as some integrated data you cannot control just might not have it. E.g. consider integrating a 3rd party service by exposing some remote entities - we cannot enforce any schema or properties on that.
Comment #14
Damien Tournoud CreditAttribution: Damien Tournoud commentedI think we should make UUID mandatory, and I also think we should remove the internal ID at the same time. The difference in performance is likely going to be negligible for most use cases.
I think we can make it mandatory across the board. There are many ways to generate UUIDs based on another key you have. UUID version 3 and 5 are precisely meant to generate a UUID based on an arbitrary URL or object identifier.
Comment #15
fagoInteresting. However, we still would have to store that UUID in order to be able to remote entity? That wouldn't work as we are not necessarily notified when a new remote entity is added...
Comment #16
Damien Tournoud CreditAttribution: Damien Tournoud commentedYou can easily manage a map of UUIDs to your external object identifier to support loading by UUID. But obviously, you have to discover an external object to be able to generate its UUID and reference it. You cannot reference an object that you have not discovered yet :)
Comment #17
sunWe should do this simple patch first and have the discussion about potentially replacing serial IDs in a separate follow-up issue. Replacing serial IDs with UUIDs has many more implications that need to be discussed and tested than just adding UUIDs.
Comment #18
dixon_I agree with sun. Let's get UUIDs in for core entities first. Then we can create follow-ups.
Comment #19
BerdirSimple re-roll of the patch in #11, the update function conflicted.
Comment #20
danielnolde CreditAttribution: danielnolde commentedThanks for adjusting the conflicting update function, Berdir ... core modules are quite a moving target right now... <:)
Comment #21
sunThanks!
Unless I blatantly overlooked something, we're still missing tests?
Also, a more in-depth review:
I don't really understand why we're defining these separately for each entity class. We already have a universal $langcode property for all Drupal entities and I would treat the $uuid property in the exact same way. By defining the property on the base Entity class, we semi-enforce it for all entities, which I believe is what we want. I'm not sure whether custom entities currently have any way to opt-out of the $langcode property handling, but regardless of whether there is one or not, the way for opting out of or working around the equally universal $uuid property would be identical.
"Loads an entity by UUID."
The first sentence can be dropped.
The second should be turned into:
(and elsewhere) "UUID" should always be all-uppercase.
Extraneous blank line at the end should be removed.
1) I don't think we want the first is_array() check, since that will potentially hide bad configuration and code under the hood.
2) The second isset() condition is sufficient; it implicitly covers the first. isset() does not throw an error if a parent key does not exist.
1) Doesn't this entire loading logic belong onto the entity CRUD controller? The other entity loaders in entity.module are all calling into
entity_get_controller($entity_type)
. It looks like this procedural helper should do that as well.2) In any case, EntityFieldQuery should be imported with a use statement at the top of whichever file it is used, so that this particular line here only calls
new EntityFieldQuery()
.3) Do we actually need EFQ here in the first place? The query runs against the entity's base table only, so I don't really understand why it's needed here and why a simple db_query() or db_select() isn't sufficient. The other CRUD operations on the entity controller are using simple/regular db queries, too. As long as that is the case, I think we should do the same here. It also looks like we can leverage and re-use buildQuery() on the default DatabaseStorageController to execute the query. And of course, that will also automatically add static caching of queried entities when appropriate.
4) Speaking of ::buildQuery() and ::load(), this load by UUID does not invoke any module hooks. As long as the regular entity load methods are invoking hooks, this loadByUuid() should do the same and invoke hooks.
1) (and elsewhere) "ID" should equally always be written all-uppercase.
2) This comment is very cryptic/sparse. I've to read the code to understand what it tries to say.
Even though this code might cease to exist, if we're going to move this code to the CRUD controller:
The array_keys() looks superfluous to me; perhaps even bogus. It looks like what you want is
I'm not sure whether we need separate data types here. An empty string should be sufficient to denote that there is no property/key for a UUID.
It can default to an empty string.
Conditions that are checking the property may use a strict string comparison of
$entity->$uuidKey !== ''
to determine whether there is a key defined. (which would account for the case of a key of "0", even though that would be a very wonky property name in the first place)If the passed in $values already contain a UUID, then we don't want to overwrite that with a new.
If we need to account for URLs as custom auto-generated UUIDs for entities that do not have one as @Damien Tournoud suggests, then we might have to increase the length to 255. We might want to defer that discussion and decision to a follow-up issue though.
Comment #22
danielnolde CreditAttribution: danielnolde commentedwow, slow down sun, and keep it digestible. Hard to adress the individual feedback with this mass and without structure.
Okay, i'll try:
1. $uuid property per entity class
> I don't really understand why we're defining these separately for each entity class.
This follows the comparable language property (which is defined in each Entity class). I don't have any feelings about that.
2. wording
For all remaining the mere wording changes, please provide changed patch yourself, to keep the overhead and unnecessary double-work to a minimum.
3. Use entity_get_controller
> $entity_query = new Drupal\entity\EntityFieldQuery();
> 1) Doesn't this entire loading logic belong onto the entity CRUD controller?
> The other entity loaders in entity.module are all calling into entity_get_controller($entity_type). It looks like this procedural helper should do that as well.
Good point.
Could you elaborate more on how close the new load_by_uuid() method should mimic the load() method?
4. entity_load_by_uuid
a) EFQ makes big sense imho. Remember this is particular function is not about core entities only and has to support _any_ method of entity retrieval/storage possible.
b) EFQ import via use statement: Okay
c) hooks: What hooks do you see are necessary here to invoke?
d) move to CRUD controller: Can you elaborate, i miss out on some background here.
e) reset() instead of array_keys() => your code replacement doesn't work, please state whether you tested suggested code or not.
5. $uuidKey property can be FALSE
This is just cargoculted from $revisionKey (we should have the same mechanic in both cases). So same reasoning, whatever that may be ;-)
6. Respect $uuid in create functions
True.
7. uuid field length
I agree that we wait for some definite input here, and stop for now the (guess)work regarding this.
8. what's more
8a) Shouldn't the module's hook_update_N be somehow doublecheck if the columns to add exist before calling db_add_field ?
Find attached an updated patch, with most of your suggestions incluced. The move load_by_uuid() method in DatabaseStorageController however is work in progress a.t.m. (works, but needs work and your input, see 3. above).
Comment #23
sunOK, looked into this code a bit more myself.
The current situation with entity storage controllers + deprecated $conditions + EFQ is anything but sane and clean, and really makes this patch much more ugly than it would have to be. There's a lot to figure out in that space, but that has absolutely nothing to do with this issue.
Thus, I'm going to punt all of that to separate issues for entity system maintainers to figure out. Attached patch circles back to the previous approach of keeping the logic in the procedural entity_load_by_uuid() function, and still uses EFQ, since the $conditions parameter is deprecated, but intentionally adds a very verbose @todo comment to clarify that this kind of entity API is NOT acceptable in any way.
But as said, all of that is irrelevant for this issue. The clear and focused scope is to add UUIDs to all core entities, and we are going to do that with the current API that is available to us.
Still needs tests.
Comment #24
djdevinWriting tests.
Comment #25
djdevinFound a little bug while testing, using $this->entityType instead of the $entity_type in the parameters.
Comment #26
djdevinTest for saving UUIDs to nodes attached. Do we need tests (probably) for the other core entities?
Comment #27
sunThanks for working on tests!
However, it should go into a new test case class in core/modules/entity/lib/Drupal/entity/Tests and use the entity_test module's "entity_test" entity type instead of Node.
Apparently, we forgot to add the UUID to entity_test's schema ;)
Comment #28
djdevinAh yeah I was wondering what this should go under, that sounds good! Wrapping my head around the test structure :) Will fix and reroll.
Comment #29
BerdirNot sure I see the problem with the amount of lines of code in the load_by_uuid() function. Yes, $conditions vs. EFQ is a mess, but other than that:
- The first few lines check if the entity type supports uuid's, required, in that function, as long as we don't assume that every entity has an uuid.
- We could think about not adding the $reset argument and instead have a separate entity_load_reset($entity_type) function, or just call resetCache() on the controller. That would save the additional argument in all load functions.
- A generic loadbyProperty() method could not assume that the property is unique, so it would have to return an array of results, which we'd have to check again.
- So the only thing that a load function really could do is create an EFQ with the property and then load them all.
Also, the @todo comment that describes how bad this is is almost longer than the actual code ;)
I think what we are trying to do is making EFQ more flexible so that it can return (possibly lazy loaded) entity objects. The problem with having something like loadByProperty() imho is that the arguments would either get very complex (multiple properties, conditions other than =) or just serve the basic use case (single property that equals a given value) and everything else still needs EFQ, so you need to decide which API you want to use and need to learn both.
To improve EFQ in another way, since you use it in 90% of the cases with a given entity type, we could add a simple wrapper function for that, think of something like this:
Not sure about the details, but basically execute() would return something similar to what db_query() returns, a class with methods, not just an array. Or maybe a loadFromEFQ($query->execute()) on the controller.
Anyway, this is getting off-topic :)
Comment #30
fagoEFQ definitely needs some love, but let's discuss that in its own issue. For simplifying the load-by-property case, see #1184272: Remove deprecated $conditions support from entity controller.
I don't think we need to bloat the control with lots of handy-methods like loadbyProperty(), but add an optional? query() method for executing entity-queries. Then, a simple helper for entity_loady_by_property() that just calls that should be fine.
Comment #31
sunAlrighty, looks like this is very close to RTBC, as soon as there are proper tests. Also happy to scale down the added @todo, as long as #1184272: Remove deprecated $conditions support from entity controller stays (at minimum) major.
@djdevin: Any news on the revised tests?
Comment #32
Damien Tournoud CreditAttribution: Damien Tournoud commentedThat looks wrong for two reasons:
As a consequence, I suggest:
(also notice the wording of the description for consistency with the description of the primary key)
Comment #33
djdevin@sun yes will have them up today, stupid me left work at home.
#32 makes sense.
Comment #34
djdevinPatch combined with tests in core/modules/entity/lib/Drupal/entity/Tests/EntityUuidTest.php
Includes entity_load_by_uuid() fix in #25
Includes changes to entity_test to add a uuid column
Does not contain #32, yet.
I kept a Node test because we might actually want to test uuid creation not just on a test entity but also all things that we support in core with this patch (comments, taxonomy, nodes). Thoughts?
Comment #35
dixon_We definitely need
entity_save_by_uuid()
as well. But let's take that in a follow-up issue.I will take a ride with the latest patch later today.
Comment #36
danielnolde CreditAttribution: danielnolde commentedWow, much happened in the meantime (while i was away :) - glad so many people work on this now, cool!
It's not very relevant i guess, but i find all things pointed out and discussed to make very good sense, especially the issues around entity_load_by_uuid() (nice minor improvements there, anyway, cool) and the tests cover pretty much everything that makes sense to be tested a.t.m. regarding UUIDs, great.
@dixon_: Hope all this helps to get content staging a little step forward.
@sun: Can't wait to see this committed to core (oh boy, would be my first core contribution :))
Comment #37
suna16ebc4 Cleaned up test code.
338a118 Changed uuid columns to be unique keys and allow for NULL values.
564839c Rewrote Entity UUID test case.
Comment #38
sunI believe that all concerns have been addressed. Can we move forward here?
Comment #39
fagoI've reviewed the patch, it looks already pretty good to me. Here some minor stuff:
@see should be at the bottom of the comment block.
Should say it throws that with @throws ?
Also, imo the #1184272: Remove deprecated $conditions support from entity controller related comment is a bit verbose - potential solutions belong to the issue instead.
Comment #40
sunThe remark about @see is not true. @see can and should be used where it makes sense within the context of the phpDoc; i.e., where appropriate. Regardless of that, I moved it to the bottom.
On @throws, we only seem to document that on OO methods currently. Nevertheless, I've added it.
Also shortened the @todo.
b75b1a2 Cleaned up phpDoc of entity_load_by_uuid().
Comment #42
sunMerged HEAD.
Comment #43
moshe weitzman CreditAttribution: moshe weitzman commentedCode looks quite solid to me. Great to see this moving ahead.
Comment #44
Dries CreditAttribution: Dries commentedIt's not clear why we create the UUID in the
DatabaseStorageController
. It feels like it has to be created earlier in the stack; the UUID could come in handy before the entity is being stored.In fact, it would make the most sense to create and set the UUID in
Entity::__construct()
.The Entity class could then be extended with a
getUUID()
method, just like Entity already has aid()
method.(The
id()
method should really be calledgetId()
or at some point, evengetDatabaseStorageId()
because that is what it is.)Comment #45
BerdirThat method *is* called when a new entity is created: EntityStorageControllerInterface::create(), called by entity_create(). It is the correct place for that logic.
A method to get the UUID sounds useful, how the id() method is called is obviously off-topic here, but I strongly disagree with getDatabaseStorageId(), the entity class is not supposed to care that it is stored in the database.
Comment #46
effulgentsia CreditAttribution: effulgentsia commentedWhy is generating a UUID specific to storage? If DatabaseStorage is replaced with MongoStorage, do we want the logic for generating the UUID to be different? If so, then #42 is correct. Otherwise, #44's suggestion of Entity::__construct() seems more logical to me.
Comment #47
BerdirEntity::__construct() is called every time an object is instantiated, this includes loading them.
If create() would be the wrong place, then FileStorageController::create(), NodeStorageController::create() and all those others would be wrong as well.
I agree that having a create() method in the *storage* controller is a bit strange, but it's currently the only controller we have and the definition is that the entity should only be about data and all logic should be in the controller(s).
Comment #48
sun@Berdir sufficiently clarified how this code is called already; I've nothing to add to that - other than that the tests are verifying the expected generation of UUIDs already.
There's some hope that the entity system improvements that are happening in parallel currently will allow us to move this code to a better location in the end, but until then, this patch follows the current architectural design of the entity system.
Additionally, friends, this is just the very very first and minimal step to bring UUIDs into core ;) I have no doubt that this code will be revised, potentially even a lot, as we continue to properly integrate UUIDs throughout core and get to the actual, true content staging challenges.
There's a lot of work to do to make this happen.
Comment #49
fagoI agree with berdir and sun, the patch looks good and follows the current way of doing things! :-)
Entity storage controllers contain all the storage logic, what it includes auto-populating properties like $node->created or UUIDs. We might want to consider separating storage-engine related stuff out of the storage-controller though, so that we ease swapping out the storage-engine only. But that's another issue.
Comment #50
Damien Tournoud CreditAttribution: Damien Tournoud commentedI'm fine with this as well. Seems like a good first step.
Comment #51
moshe weitzman CreditAttribution: moshe weitzman commentedOK, lets commit and work in a follow-up to trim DatabaseStorageController to be more focused on storage. Would be great to create some follow-ups so we know next steps for UUIDs (beyond the upgrade path for existing entities).
Comment #52
Dries CreditAttribution: Dries commentedI don't feel like anyone had a solid answer to my question in #44 (or Alex's in #46). At the same time, everyone seem to be ok with the current weird behavior in NodeStorageController::create(). All things considered, I committed the patch.
I have two requests though:
Thanks!
Comment #53
BerdirThis needs a change notice then...
I also just noticed #1717678: Entity::createDuplicate() does not account for uuid property while porting a module to 8.x, crazy timing ;)
Comment #54
fagoAs said, entity storage controllers contain all the storage logic of an entity type. So this is not only for create, pre-populating keys during pre-save also happens there. That's so since we introduced that in #1184944: Make entities classed objects, introduce CRUD support where we discussed having methods in the Entity class vs in the Controller and came to the conclusion that a) we need a place for multiple-handling as well and b) making storage truely swappable involves more than that and probably should happen on another layer.
That said, I think the method is fine where it is, but if that needs more discussion let's have it in a separate issue.
Comment #55
moshe weitzman CreditAttribution: moshe weitzman commentedI notice that entity_load_by_uuid($entity_type, $uuid) requires both the entity_type and the uuid. Ideally, one could identify the entity based on just the uuid. Thats kinda the point of universally unique. One bad side effect is that one can't determine easily which entity corresponds an url like
content/[UUID]
Comment #56
catchFor that to happen you'd need to be able to figure out where a uuid is stored without needing to know the entity type.
We currently assume that entities are stored independently of each other, so $entity_type allows you to know where the storage is. I don't see us moving all entities into a single table.
The other option is a global uuid lookup table with uuid, entity_type, entity_id. That'd be a massive table and is easy enough to implement in contrib if people want the same url for returning completely different and unrelated things.
Comment #57
moshe weitzman CreditAttribution: moshe weitzman commentedYeah, we should consider adding a uuid lookup table to core in addition to the uuid column in the entity base table. Perhaps we wait and see if the need becomes real. Related - http://groups.drupal.org/node/246748 (content deployment use cases)
Comment #58
fagoadded #1720784: Add method for getting an entity's UUID
Comment #59
sunComment #60
BerdirAnother one: http://drupal.org/node/1721572
No idea what I could possibly write more into that. It's simple, there's no before/after code examples. I also added the uuid() method issue, so that we can extend it once that is in.
Comment #61
sunThanks!
I think that's sufficient for now. That gives a nice container, and I'm confident we'll adjust this code through other issues anyway before the release.
Comment #62
moshe weitzman CreditAttribution: moshe weitzman commentedThe global lookup table mapping UUID to entity (in addition to the entity specific column) would be analogous to the global url alias table. That becomes crucial if we change system urls to reference UUID instead of nid, uid, etc. If we do that, we might have to dereference URLs in a service like #1269742: Make path lookup code into a pluggable class.
Comment #63
sunFWIW, I've tagged all related issues with "UUID".
Regarding a meta issue with a more concrete roadmap, there seems to be #1540656: [META] Entity Serialization API for web services (e.g. content staging) — though I'm not 100% sure whether that is "current" and properly maintained. However, I guess it would make sense to add the suggested follow-up issues to its summary.
Thanks all!
Comment #64
btopro CreditAttribution: btopro commentedPointing to one approach to the UUID vs entity ID based URL topic -- https://drupal.org/project/uuid_redirect
Personally would rather see uuid take the place of nid in terms of default path generation. wysiwyg link projects and authors manually referencing addresses within textareas would be easier to transition via feeds and exportables without the need for having to skim content and do the rewrite or have conversations with content authors about how to reference material.
Comment #65
sunCreated the follow-up issue for the rather radical idea that (also) came up here:
#1726734: Replace serial entity IDs with UUIDs in URLs, or even globally?
Comment #67
Jānis Bebrītis CreditAttribution: Jānis Bebrītis commentededit: wrong issue
Comment #68
Jānis Bebrītis CreditAttribution: Jānis Bebrītis commentedComment #69
colanI just created the missing follow-up #3157880: Determine where entities are stored without knowing their types as discussed above from #55 to #62.