Stop using field name in the Field API
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | field system |
| Category: | bug report |
| Priority: | critical |
| Assigned: | bjaspan |
| Status: | won't fix |
In #364620: Allow creating a field with a deleted field's name we changed fields and instances to be identified by a serial primary key instead of their name because it was necessary to allow field names to be created with the same name as a previously deleted but not yet cleaned up field. However, we left most of the Field API working in terms of "field name" as a fake primary key for non-deleted fields.
Now, unsurprisingly, the time has come to go the rest of the way and stop using field name as a fake primary key at all. We have to replace all use of field name in API functions with field id. I'm posting a full explanation of how this affects the bulk delete problem in #367753: Bulk deletion (I haven't written it yet), but for now I'll post a short excerpt that summarizes the problem:
<?php
// Retrieve all deleted field instances.
$instances = field_read_instances(array('deleted' => 1), array('include_deleted' => 1, 'include_inactive' => 1));
foreach ($instances as $instance) {
// Retrieve some pseudo-objects along with field data for the
// instance.
//
// TODO: We can't query by $instance['field_name'] because the
// field and/or instance has been deleted and another with the
// same name may exist. We have to query by instance id,
// but the entire Field API is based on field name. Now we need
// to change that.
$obj_types = field_attach_query($instance['field_name'], array('bundle' => $instance['bundle'], 'deleted' => 1), 10, FIELD_QUERY_RETURN_VALUES);
?>
#1
Here are functions that need to be updated:
<?php% grep -r 'function.*field_name' *
field.api.php:function hook_field_attach_pre_query($field_name, $conditions, $result_format, $age, &$skip_field) {
field.api.php:function hook_field_storage_query($field_name, $conditions, $result_format, $age) {
field.api.php:function hook_field_storage_delete_field($field_name) {
field.api.php:function hook_field_storage_delete_instance($field_name, $bundle) {
field.attach.inc: $results = $function($obj_type, $grouped_objects[$field_name], $field, $grouped_instances[$field_name], $grouped_items[$field_name], $a, $b);
field.attach.inc:function field_attach_query($field_name, $conditions, $limit, $result_format = FIELD_QUERY_RETURN_IDS, $age = FIELD_LOAD_CURRENT) {
field.attach.inc: $results = $function($field_name, $conditions, $result_format, $age, $skip_field);
field.attach.inc:function field_attach_query_revisions($field_name, $conditions, $result_format = FIELD_QUERY_RETURN_IDS) {
field.crud.inc:function field_read_field($field_name, $include_additional = array()) {
field.crud.inc:function field_delete_field($field_name) {
field.crud.inc:function field_read_instance($field_name, $bundle, $include_additional = array()) {
field.crud.inc:function field_delete_instance($field_name, $bundle) {
field.form.inc:function field_add_more_js($bundle_name, $field_name) {
field.info.inc:function field_info_field($field_name) {
field.info.inc:function field_info_instance($field_name, $bundle_name) {
modules/field_sql_storage/field_sql_storage.module:function field_sql_storage_field_storage_delete_field($field_name) {
modules/field_sql_storage/field_sql_storage.module:function field_sql_storage_field_storage_purge($obj_type, $object, $field_name) {
modules/field_sql_storage/field_sql_storage.module:function field_sql_storage_field_storage_query($field_name, $conditions, $result_format, $age) {
modules/field_sql_storage/field_sql_storage.module:function field_sql_storage_field_storage_delete_instance($field_name, $bundle) {
?>
#2
I'm working on this.
#3
Here is my first pass at this. All the tests passed on my system until I ran 'cvs update'. Apparently HEAD is quite broken at the moment and I now get some exceptions, but they look like they are coming from simpletest, so I'm going to assume it is not from this patch for now.
The changes here are pretty straightforward. Basically any API function that used to take a $field_name now takes a $field_id. This includes field_create_field(), field_create_instance() (in the 'field_id' property instead of the 'field_name' property), and field_info_*(). field_info_*() (and field_read_*() also returns fields keyed by id instead of name.) This affects a lot of places in the code and tests, but the changes are basically all one-liners or the equivalent.
The parts of the API that are still based on field name are object properties (e.g. $node->body, not $node->2), form elements, rendering, and perhaps a few things I'm forgetting. There is also some internal code that still uses field name where it could/should use field id, but still field names are in fact still unique among non-deleted fields, it works fine. We can/should clean these up, possibly in this patch or possibly in others.
I realized that this change is not actually *required* for bulk delete to work because we could just make field_attach_query() operate on field id, leaving everything else using field_name. However, that would be confusing and inconsistent, and anyway changing the API to use field id is the correct thing to do, for the same reason the taxonomy API works in terms of term id and not term name.
The effort involved in maintaining this patch across other Field API changes (think "merge conflicts") will be substantially larger than the effort for other Field API code to adapt to using it (think "a few one-line changes"). Therefore, if we're going to do this (and I think we should), I request that we suspend all other Field API patches until this is done.
#4
"We can/should clean these up" refers to places using field name internally when they "should" be using field id (even though both work).
Object properties, form elements, rendering, "and perhaps a few things I'm forgetting" should and need to remain based on field name. Basically, "field name" is now no more or less than "how this field will be attached to objects, forms, and rendered output." It is a property that affects how the field interacts with the outside world, instead of being the effective primary key. So I did not mean to say those uses should be cleaned up, because they are correct.
#5
Ooops, fix a few things I missed, mostly in field.api.inc. Now we have:
<?php% grep -r 'function.*field_name' *
field.attach.inc: $results = $function($obj_type, $grouped_objects[$field_name], $field, $grouped_instances[$field_name], $grouped_items[$field_name], $a, $b);
field.form.inc:function field_add_more_js($bundle_name, $field_name) {
?>
The field.attach.inc case is one of the places that can/should be cleaned up; field_attach_load_multiple() is still using field name internally. field_add_more_js() is a callback getting its args from the URL, so $field_name is correct there (I suppose changing that to use field id would be another possible but not mandatory cleanup).
All other functions taking field_name are updated.
#6
I cleaned up the field_attach_load_multiple() code to use field_id instead of field_name. The includes changing the $options parameter to accept a 'field_id' key instead of a 'field_name' key, so field_attach_query() is updated too.
#7
The last submitted patch failed testing.
#8
oops
#9
As explained in #3, I've asked Dries and webchick not to commit any Field API patches until this one gets in, or until we decide that it should not for some reason. So, let's get those reviews going! :-)
#10
It's a big patch but all the individual changes are very simple. Didn't spot anything obvious but equally don't have time right now for a thorough review. But given the tests pass I can't see an issue. Also I'd really love to see body as field in quick so if someone else can give this a once over let's unblock everything else.
#11
- (minor) field.api.php: $field_name remains in the PHPdoc for hook_field_storage_query()
- We cannot change
function field_info_field($field_name) {function field_info_instance($field_name, $bundle_name) {
Those are the only way to get $field and $instance structures out of a field_name, and field names remain the only entry point to write for predictable and portable code around Field features. You cannot write code relying on field ids, they change from site to site, or between staged sites.
Similarly, I don't think _field_info_collate_fields()'s $info['fields'] and $info['instances'] can be keyed by id.
By definition _field_info_collate_fields() only collects active, non-disabled fields and instances. That's only one per field name, there's no ambiguity here. Those functions are a major part of how "fields interact with the outside world", and thus need to remain name-based.
#12
re #11: The parts you say must remain name-based are exactly the parts that absolutely must switch to ids. ;-) Here's my argument:
- field_attach_query() needs to accept an id because it has to allow querying deleted fields, and deleted fields cannot be identified uniquely by name.
- field_attach_query(RETURN_VALUES), which bulk delete needs, uses field_attach_load(). Therefore, field_attach_load() must take an id, not a name, for the same reason.
- field_attach_load() uses field_info_field() to get the full field record, so field_info_field() must take an id.
I understand the argument that we need a way to retrieve field info by name. Certainly, we can provide versions of field_info_field()/instance() that operate by name and id, either _info_{field,instance}_by_name() or _info_{field,instance}_by_id(). However, id is the natural, correct form for the API, so by_name() should be the convenience function that iterates through the list, not the other way around.
Note that exactly the same situation exists with terms and vocabularies. They are keyed by id, not name, and there are functions to retrieve them by name when necessary.
The only advantage I see with field_info_{fields,instances}() returning arrays keyed by name instead of id is that array_keys() will give you all the field names instead of all the field ids. Unless you want that array of all names, you are going to iterate over the response, and then you have access to the field name in each record anyway. I am not aware that this is critical, however, so it could remain keyed by name. However, I really think it is best for the entire API to be based on either ids or names, and since part of it must be based on id, and id is the natural choice anyway, I lean towards keying by id.
I really think that bulk delete will be very painful to implement without this change.
#13
field_info_*_by_name() sounds right.
"I really think that bulk delete will be very painful to implement without this change."
Oh, I have no doubt about this. But I have doubts we need bulk delete - which for now is the only reason we introduced field ids and de-facto dual identifiers.
I don't see how serial ids can be said natural, BTW. Needed, possibly, for internal reasons tied to bulk delete and the fact that a 'real' field needs to cohabit with a deleted field with the same name. As a consumer of the Field API ("I'm a module that defines a field in my hook_install() and I write supporting code around this", or "I build features upon fields provided by another module, or upon custom fields I created in the UI"), ids are definitely not natural nor portable, names are.
#14
The fact that terms and vocabularies are keyed by id isn't a good argument for doing the same elsewhere. The various retrieve by name functionality in taxonomy.module has been broken for ages - because taxonomy will let you do things like add two fields with the same name with the same parent in the same vocabulary. And you can't get terms keyed by ID for functions like taxonomy_get_tree() because the same actual term can appear twice in the same tree too.
Over at #413192: Make taxonomy terms fieldable we're adding 'machine name' to vocabularies precisely because IDs are crap for things like templates and portability. I can see the value in storing an ID for internal use only, and it seems like that would fix the deletion issue, but that doesn't mean we should force people to deal with numeric IDs everywhere when interacting with the API just for consistency's sake.
#15
Okay, then so far this is what I know we will need for bulk delete (which I still think is mandatory and do not understand yched's argument that it is not):
- field_attach_query() needs to accept $field_id instead of $field_name
- field_attach_load() must accept a field id in $options, perhaps in addition to accepting a field name, though query is currently the only operation that uses that feature
- we need field_info_field_by_id() and instance_by_id($field_id, $bundle)
- field_delete_field() and _instance() need to take ids instead of names
- bulk delete will have to use field_read_fields/instances instead of field_info_fields/instances in order to identify deleted fields and instances
I don't KNOW this list will be sufficient, it is just the requirements I've identified for bulk delete so far.
I suppose in each case of "foo needs to take an id" we can have a foo_by_id() function. Then, the foo() function will actually just be a wrapper that calls field_info_field($field_name) and calls the by_id version. That fact that all functions foo() can be translated from name to id so simply is exactly why I think just switching to ids everywhere is better: it's *almost* always what you want anyway, and in the case where you need to work by name, you can call field_info_field_by_name() first and then you are all set.
So, let's have a vote so we can get all Field API issues unblocked.
(A) Changes as I've outlined in this patch.
(B) Changes as I submitted in my patch, plus adding field_info_field/instance_by_name()
(C) Something else (please specify).
I vote (B).
#16
#367753: Bulk deletion is on a different track now, right ? Is this one a won't fix ?
#17
This one is a "should fix but the community thinks it would be simpler to have more complex code and lots of special cases, so won't fix."