The development of Drupal Field API obligates us to look at taxonomy.module to keep it relevant and modern. The concerns At Drupalcon DC, Benjamin Melançon (ben-agaric) and Benjamin Doherty (bangpound) began to look at this problem and untie it. Here's what we propose.
Module reorganizationTaxonomy.module should primarily contain features for the management of controlled vocabularies: CRUD of vocabularies and terms, describing the hierarchy of terms, and other features that may or may not be fully developed yet: synonymy, term relationships.
A new module called term.module will be the field module that will replace all or most of taxonomy.module's nodeapi functionality. Hopefully it will do more, too.
Field widgets also need some development. Currently, we're using Options.module select widgets, but we shouldn't limit ourselves only to that widget. Tagging autocompletion widget will need to be built.
Tagging features live in taxonomy.module because there had been a significant overlap in the functional, UI and data requirements with taxonomy.module's support for controlled vocabularies. However, this overlap will become much less significant. I'd like to solicit the community's ideas about how to move tagging into a rational place in Drupal 7 and this proposal. We certainly do not propose abandoning tagging at all. We simply need to know where it fits in a Drupal 7 where taxonomy.module lets us manage vocabularies and term.module lets us put terms on things.
Should-be invisible functional changesCurrently, the vocabulary definition contains properties that will need to be expressed in a term field's definition. Without a UI for adding fields to content types, the vocabulary admin form's submit handler needs to call the field functions for creating new fields and field instances. The same choices of multiple values and required translate easily to Field API instance properties.
All the functions in taxonomy.module that load and save term-node relationships are probably going to disappear or be put on a serious diet. All of these data transactions are going to be handled by Field API's own functions or the new term.module's implementation of Field API hook.
taxonomy_node_load
taxonomy_node_view
taxonomy_node_delete
taxonomy_node_delete_revision
taxonomy_node_validate
taxonomy_node_update
taxonomy_node_insert
taxonomy_node_save
taxonomy_node_update_index
taxonomy_node_rss_item
These functions are outliers. They work with node-term relationships, but they don't have an analog in the Field API.
taxonomy_node_get_terms_by_vocabulary
taxonomy_get_tids_from_nodes
taxonomy_term_count_nodes
taxonomy_select_nodes
The current field storage schema is a huge step away from the way term_node table is structured. Most importantly, term_node holds all of these relationships in one table, while Field API defaults to storing field values by bundle. This means that values of term fields of different vocabularies will be stored in separate tables and perhaps even in multiple tables if there are multiple fields made from the same vocabulary.
There's incredible value to easily selecting all nodes with a particular term or all terms for a particular node, so we propose maintaining the term_node table in addition to the Field API storage mechanism. The term_data table would be 100% redundant to the Field API storage table. Dropping term_node entirely would make previously easy tasks much more difficult. This isn't a stubborn commitment to maintaining the term_node table, so please comment if you have some alternatives.
Core dependencies on taxonomy.moduleChanges to forum.module will be necessary.
Windows of opportunismThe initial motivation to undertake this project — rather the "final straw" against bangpound's indolence — was Stéphane Corlosquet's proposals for RDFa in Drupal. In this model, nodes are subjects, terms are objects, and dc:subject is the predicate. Currently, nodes and terms are doing something with each other, but we don't really know what they're doing. The RDFa proposal requires us to infer the nature of the relationship, and dc:subject is a safe default. If fields can be made out of terms, we will have better opportunities in core to define the "predicate" or nature of the relationship between nodes (or other things) and terms.
We will also be able to re-use the same vocabulary multiple times on the same thing. Benjamin Melançon gave us an example where nodes describe some aspect of international movement of people or goods. The same vocabulary of geographic place names will need to be used for fields capturing "country of origin" and "country of destination" information.
If user profiles will be comprised of fields, taxonomy terms have a sensible and exciting role to play there.
Development workSo far, the patch changes the taxonomy_form_vocabulary form and its submit handler to create fields and instances. We also needed to change one line in options.module to support the new term.module.
No tests yet. I know it's a requirement.
| Comment | File | Size | Author |
|---|---|---|---|
| #199 | taxonomy.patch | 121.27 KB | catch |
| #197 | taxonomy.patch | 121.25 KB | catch |
| #193 | taxonomy.patch | 121.25 KB | catch |
| #189 | drupal.taxonomy-field-revamp.patch | 121.39 KB | sun |
| #183 | taxonomy-412518-183.patch | 121.47 KB | Anonymous (not verified) |
Comments
Comment #1
Anonymous (not verified) commentedHere's the first patch.
Comment #2
Anonymous (not verified) commentedComment #3
mlncn commentedGit repo
Because ripping out the guts of taxonomy and putting them back in breaks a lot of things, while we are very committed to not killing kittens and keeping patches as bite-size as possible, for development of this patch we created a git repository:
github.com/benjamin-agaric/drupal-taxonomy-as-fields
Developers interested in participating in developing this patch to a committable state are invited to use or fork the git repo or to post patch files to this issue queue. Thanks!
benjamin, Agaric Design Collective
Comment #4
catchsubscribe.
Comment #5
catchInitial feedback:
Why is vocabulary weight removed? Is this because term display will be per-field-instance? If so I guess we don't need the nasty order by vocabulary weight in taxonomy_node_get_terms() - and possibly could kill taxonomy_node_get_terms() altogether? If that's the case, then great! But want to make sure I understand the reasoning.
An answer to one @TODO - hierarchy is always on, if you add two parents to a taxonomy term it turns on multiple hierarchy - this was a very late change in D6 when drag and drop went in - probably needs re-refactoring.
Also a heads up that I just opened the opposite side of this issue #413192: Make taxonomy terms fieldable.
Comment #6
Jeff Burnz commentedsubscribe.
Comment #7
Anonymous (not verified) commentedTerm display will be per-field-instance.
Many of the properties of vocabularies are like field instance properties, and I've had a difficult time knowing how ruthless I can be in removing things that should/would be managed by any bundle admin tool that might come up. In particular, weight would correlate to the weight property of a field instance. However until now, the weight and other properties have been attached to the whole vocabulary (which kinda correlates to the field itself).
If we remove weight, why not remove "multiple" and "required" too? Why even define the field name for the whole vocabulary? At some point we're faced with building our own field manager inside taxonomy module simply to make sure the features remain neatly consistent OR we agree to prune all these features to the bare minimum because whatever Field API bundle tools will not be saddled with this baggage.
Comment #8
Taxoman commentedRef.
Without having given this transition deep thought, the first question that comes to mind is this:
- How would this affect security (ACL)? I imagine it would pose significant challenges in the intersection between taxonomy based access control and the Views module, for example.
I wonder if the sheer amount of work and discussions and testing needed for this to land properly without jeopardizing security or delaying Drupal 7, might suggest that this is a Drupal 8 task...
(Drupal 7 will not be usable before its security scheme and the Views module works well together, given how crucial the Views module is becoming to many Drupal sites.)
Comment #9
catchre: #7, weight being a field instance property sounds great to me - it's more flexible than vocabulary weight so why keep an inflexible system in place.
@Taxoman: afaik both term_data and term_node aren't going anywhere. It's completely fine for field modules to handle some additional storage by themselves, and term_node is the only thing required for taxonomy based permissions to work. We need to make much bigger changes than this to Drupal 7 before a release, and we've got 5 months before code freeze, plenty of time to get this right. It'd be a real shame to ship Drupal 7 with the Field API in core and no core modules taking advantage of it.
Comment #10
catchRegarding the vocabulary properties like multiple select, required etc. - these should definitely move to field instance properties IMO - however doing that now also means we'll have no core UI for managing that stuff until a Field API patch lands. This is a general issue with any conversion (profile, poll etc.), so I've opened another issue at #414328: Converting core modules to provide fields leaves them with no UI to stop any discussion about that from polluting this one.
Comment #11
xanoSubscribing.
(But WHY the git repo? :-( I am having trouble with CVS already :-P )
Comment #12
Anonymous (not verified) commentedCVS is a lot of trouble, Xano. I don't know what to tell you.
Here's a new patch that:
1. removes all taxonomy_vocabulary_node_type table transactions in taxonomy module.
2. fixes the term field module so it actually works.
3. further refines the awkward process of creating fields and instances from vocabularies.
The next patch coming from me may be a few weeks away. I have a major project to finish this week and then I'm training my client the week after. My next goal is to remove all the Node API stuff from taxonomy.module and start writing the code to either maintain term_node table or make the absence of term_node invisible to Drupal core.
Comment #13
dodorama commentedI'm not sure if this is the right place to ask for this, but since we're rewriting the module from the ground up can it be useful to link terms to the user who created them so that we can have, for example, personal vocabularies as stated here?
Comment #14
Anonymous (not verified) commenteddodazzi this is the wrong queue for that feature request.
you may have some openings for this feature at #413192: Make taxonomy terms fieldable. adding a user reference field to a term object would be a foundation on which to develop this feature.
Comment #15
owen barton commentedExciting stuff - subscribing :)
Comment #16
scor commentedsubscribing
Comment #17
mlncn commentedUpdated git repo to use and work with the latest D7. Trying to get this working for the RDF in core code sprint tomorrow.
benjamin, Agaric Design Collective
Comment #18
dries commentedTaggging. Drupal-asm. :-)
Comment #19
catchBenjamin showed me the latest status of this when he was in London.
I think this is an accurate summary of the remaining issues to resolve:
1. Upgrade path - we probably need to create a new field per vocabulary, and a new field instance per content type it's assigned to.
2. UI. Since there's no field UI in core, we might need to keep the checkboxes on admin/build/types - if it's possible to plug those directly into Field API and only have the minimum bridging code in taxonomy.module to do this that'd be great. Then if you want to do anything sensible, for now you'd have to install CCK HEAD, and we can rip out the bridging code later.
3. All of the term_node stuff in Drupal 6 is really hard to do in D7 without Views in core. I think we need to maintain both the field storage and term_node until patches like scalar searching etc. are committed. Again - just to get the initial patch in, if we can remove some of the term_node stuff later that's great.
Comment #20
Anonymous (not verified) commentedgreat. I'll be able to get back on this patch next week.
Comment #21
skilip commentedSubscribing
Comment #22
moshe weitzman commentedsubscribe ... scalar searching of fields is in.
Comment #23
catchLast patch posted here was a month ago and I'm pretty sure work was done since then - no-one is going to look at a git repo, please post an updated patch if you've got time to do so.
Comment #24
catchI had another look at the last patch and I think we might be able to get this in easier in two issues:
1. Add term.module which provides the field type - that gives us the taxonomy as field feature (attach it to users, attach it to nodes on new sites) in core - works great for new sites
2. Remove taxonomy_node_*() and all the spaghetti code around that, do the upgrade path, deal with the performance issues.
They're really two separate issues, and it'd mean the initial patch being very tiny, and the followup patch being entirely focused on removing stuff and the upgrade path. There'd be no reason not to work on them in parallel, but just a lot less to review at once.
Comment #25
xanoSo we basically split everything in Taxonomy API and Term field modules?
Comment #26
catchThat's what the current patch does, I'm not really sure why we need a separate module though - seems like those functions could just be added to taxonomy.module
Comment #27
Anonymous (not verified) commentedCatch:
I will split the patch in two tomorrow.
Comment #28
Anonymous (not verified) commentedA lot of Field API stuff changed in the last month. Pretty much nothing works in the code now.
I'll be working on it this weekend.
Term and Taxonomy module are separated because there's no requirement to use them both. Term is optional even if Taxonomy module is enabled.
Comment #29
andypostNow terms have fields #413192: Make taxonomy terms fieldable
Is this issue going just to make term_node to be splited into field storage?
Comment #30
Anonymous (not verified) commentedI've brought the patch up to date with HEAD. More or less the same things work as with the previous patch.
I have some questions that are coming up though.
#413192: Make taxonomy terms fieldable brought in a 'machine_name' for vocabularies similar to the node type... even with the same UI. I too had copied this code for defining the field name. Is it redundant now? Is it OK to use (switching to my notes so I get it right) the same name for the vocabulary's term bundle AND for the vocabulary's field name?
Now that I have machine_name stored with the vocabulary, I no longer loop through term field instances to see if the field's vid setting == the vocabulary's vid. Instead I just grab fields whose field name is the vocabulary's machine name. Should I backpedal? I probably should because renaming bundles is possible now, but renaming fields isn't as far as I can tell.
I need some help with formatters and widgets. Right now, I'm using options widgets, but this will be no good for tags. The default formatter makes terms into links as in Drupal 6. Should there be more vocab field formatters than this in core?
Here's a patch. It also includes the patch at #490432: taxonomy_form_vocabulary always returns error when editing existing vocabularies. I'll be working throughout the weekend as well. See you in IRC!
Comment #31
catchCan't comment on the field/bundle rules but that sounds awkward, seems like we need Barry or yched on that bit.
Would be good to have an autocomplete widget, other widgets or formatters might be nice to have but don't see any need to introduce them with the initial patch.
Comment #32
Anonymous (not verified) commentedPart of this task is now posted at #491190: Provide a taxonomy term field type. Please turn your attention there. This issue should remain open, because there's still more code to write.
Comment #33
Anonymous (not verified) commentedComment #34
chx commentedMentioning "vocabularies" in the title is pooling wool over the unsuspecting eyes. If I have not asked in IRC "ok, so vocabularies are fields but what are then terms" turns out that this issue tries to sneak in terms-as-fields under a false title. Do not even try do that. This issue MUST be VERY VERY carefully benchmarked and only then can it be considered.
Comment #35
catchClarifying issue titles between this and #491190: Provide a taxonomy term field type.
Comment #36
Anonymous (not verified) commentedTalk in #drupal is that a term field should be automatically created with each vocabulary. On some level this seems obvious to me, because in almost every use case, you want your vocabulary to relate to some other thing on your site. The place where this relationship happens is the field.
This doesn't really address the extent of the UI and usability questions for this task. With Field API, we get to completely disconnect taxonomy.module from the node system. The "content type" checkboxes on the vocabulary settings form really need to go away! So even if we make a field for each vocabulary, we still have to make field instances for those fields on our node types.
When we create new node types, we don't want to automatically create a node reference field that's constrained to that new node type. Or do we?
Comment #37
Anonymous (not verified) commentedI've created a new issue related to this project. I'd very much appreciate feedback from contrib module maintainers and anyone who has something to say about tagging in Drupal 7.
#495774: Modularize tagging
Any work I do on tagging is going to wait until term.module is committed, so I think you have a good amount of time to think about it and share ideas.
Comment #38
xanoCatch and I discussed Term.module this week and we came to the conclusion there is not much reason to create a separate module next to Taxonomy.module as it may cause confusion for end users ("Do I need _another_ module to be able to categorise my content?!") and with the registry adding it to Taxonomy.module doesn't really cause overhead.
Comment #39
Anonymous (not verified) commentedyched has realized some of the problems we will face as we migrate from nodeapi-terms to fieldapi-terms. I suggest everyone read it. Importantly he reminds that there has already been some discussion about this topic. It's probably better to continue this discussion here, because this issue will remain open until the various tasks for this project are completed.
Generally, I think the semantic changes are vital. We'll be able to name the predicate explicitly, while in the past, the node (subject) -> term (object) relationship was either ambiguous or deduced.
I see this problem having two faces:
This leads into the next problems of what happens to at taxonomy/term/[tid] paths. If all else fails, do we implement hook_field_storage_write so we can continue to maintain a term_node table for these paths? (Yipes!) If users and terms can be classified with taxonomy terms, does that mean they too should appear in these lists? Open questions for discussion.
Answers to these questions are probably dependent on other changes in D7, so rather than answer them all now, I'm keeping an open mind as I learn more about what's going on in D7. As always, I appreciate feedback, suggestions, help, etc.
Comment #40
yched commented"object - term relations can now have different predicates" is a nice way to describe the semantics change of "taxo as field". Let's be clear that those are 'mental' predicates, though, only known by humans (site admins and users). RDF can then be used to apply actual "machine" predicates.
IMO the upgrade path require we create one default field for each vocab, and one instance per node type using the vocab. Those fields will receive the D6 node-term relations. Site admins can then add other taxo fields for a given vocab using Field UI (a bit like what you can do with Content Taxonomy in D6), but the default fields should be good enough for most sites.
Theoretically we *can* bypass the regular field storage (one table per field) and have all data for object-term relations in a single, separate table (see hook_field_storage_pre_*() hooks). This term_object table must then have columns for obj_type, bundle, field_id, delta
Not sure exactly of all pros and cons, so I'm not saying this is what we actually want to do, but this is the only way to keep a "list all [obj_type] objects having term [tid] in any taxo field" page (taxonomy/term/[tid]/[obj_type]). Other than that, the best we can do is "list all [obj_type] objects having term [tid] in field [field_name]" (taxonomy/term/[tid]/[obj_type]/[field_name]) - which might be what we want indeed.
Note that the historical taxonomy/term/[tid] path is dead anyway (or it defaults to 'nodes'). There's no way to grab an ordered, paginated list of mixed object types (nodes + users + contrib entity type...)
Comment #41
catchI reckon we have a few options here. {term_objects} with a type column in addition to field storage to allow some semblance of what we could do in D6. That seems to me like it should probably be an additional table rather than completely replacing field storage, not sure. {term_node} isn't exactly efficient as it is, but as long as we don't make things significantly worse that'd help. We could also consider generating term_$entity tables dynamically. Something like this seems necessary though.
For taxonomy/term/% itself - I think there's a few things we can look at:
Remove the 1+2,3,4,5,6/8 behaviour from core - Views handles that effectively in contrib now as of Views 2, and we have too much special casing in taxonomy_term_page() to try to handle it without having views in core to make it more abstracted. I may open a separate issue for this anyway. So taxonomy/term/% becomes taxonomy/term/%taxonomy_term with a proper menu loader.
For the listings themselves, we could probably go for taxonomy/term/%/content taxonomy/term/%/users taxonomy/term/%/terms etc. etc. - if you only have a vocabulary assigned to content, you get the same as D6 behaviour. Doesn't seem overly hard to implement if we have the dual storage there in the first place. I don't think a mixed page of entities is desirable anyway - at least we shouldn't worry about that.
Then just like now you can override those pages with whatever you want, but the default behaviour core provides actually makes a lot of sense for most cases - doesn't mean that we can't show terms grouped by field etc. on node rendering itself and link to different views in the formatters (maybe replacing hook_term_path())?
Comment #42
Anonymous (not verified) commenteddon't forget depth!
Comment #43
catchbangpound - yeah I think we could consider dropping depth from core as well - just install views if you need it.
Comment #44
scor commentedyes, we should really keep a URI for each term, the RDF patches use it. It took many years to get an instance path for each comment (#26966: Fix comment links when paging is used.), let's not lose it for terms now!!!
Comment #45
catchPosted #503456: Remove multiple tid and depth handling for core taxonomy paths which simplifies all the code around the default term page.
Comment #46
yched commented@scor:
I might very well be missing it, but I don't see how this compares to comments permalinks.
The moment we enable terms on non-nodes, there cannot be a single 'URI' for a term, listing all objects tagged with that term. You cannot build a SQL query that lists heterogeneous objects sorted by "{node}.timestamp or {user}.timestamp or {foo}.bar". It has to be several pages filtering by object type.
To avoid link rot, taxonomy/term/%term would need to default to nodes, but can this path be considered "the" term's URI ? What's the meaning of attaching this URI to the RDF info added to the term when it's applied to a user ?
Comment #47
Anonymous (not verified) commentedThe term is content in its own right, so we can default to listing nodes as is done now, but the term page also shows the description, other fields, and perhaps some representations of hierarchy. Hypothetically, If we listed no nodes and kept taxonomy/term paths otherwise to show only the term and its bundled fields would this suffice for the RDF features?
Comment #48
catchWhen we discussed comment permalinks, someone suggested having a comment/%/view url which just shows the individual comment.
So we could have taxonomy/term/%/content taxonomy/term/%/user, but also taxonomy/term/%/view - the last one doesn't show any lists of associated content by default, but maybe contains information about parents / children / relations / synonyms etc.
Other options are keeping taxonomy/term/% pages and adding a block/column for each object type so it stays as a single page - but that would require block module not to be crap and probably views in core so you could customise easier.
Comment #49
scor commented@yched: we don't necessarily need to have all the content which is tagged with a term (though it would be nice for nodes at least as it is now in D6), but having simply the name of a term and some basic metadata about it would be great (as bangpound and catch said). We can let Views/contrib dealing with listing more complex instances types like users...
Comment #50
yched commentedre #49: well, the default output for 'taxo field' values cannot depend on Views.
When a taxo field is displayed on a user, it makes little sense to display it as a link to 'list of nodes tagged with that term', it would need to be a list of users.
OTOH, 'taxo as field' lets you add taxo terms to any fieldable type, but taxonomy.module cannot be expected to know how to build a list of [arbitrary fieldable type]. That would require some common 'data API': at least #460320: Standarized, pluggable entity loading (nodes, users, taxonomy, files, comments), and probably an equivalent on the 'build / display' side. Tricky.
Other than that, agreed with what bangpound and catch said.
It should be noted, though, that while Fields on terms has awesome use cases, IMO most terms on most sites will still be just a term name and at best a 'description', which makes the common 'only term data, no content listing' page pretty sparse. It's probably useful for RDF 'URI', but I'm not sure it makes much sense to have upfront in the UI.
Comment #51
Anonymous (not verified) commentedNew questions:
hook_field_widget_info() has no documentation. Do widgets have configurable settings?
We'll hopefully be able to define multiple vocabularies/subtrees as allowed values for a taxonomy term field. For tagging widgets where the user has permission to create new terms in a vocabulary, we need to define which vocabulary and parent term for new terms. Will this be a field setting or a field instance setting?
What will replace taxonomy/term/%taxonomy_term paths given the new problems we've created? If we create 1 field per vocabulary to mimic Drupal 6 behavior, will that page list nodes where that field has that term? Or do we want it to include all term fields? (I'm aware that we'd need #460320: Standarized, pluggable entity loading (nodes, users, taxonomy, files, comments) to be truly flexible.) It would be nice to write paths like view/[bundle]/[field]/[value] where [bundle] and [value] could be "all."
$node->taxonomy goes away?
Comment #52
yched commentedYes, look in text_field_widget_info()
I can't think of a reason that would enforce this setting to be the same for all instances, so that could be an instance setting, I guess ?
Or, since it only relevant for widgets that let you create new terms, perhaps a widget setting ?
I think we should definitely create fields and instances to match current D6 behavior. Then, to preserve existing links, taxonomy/term/%taxonomy_term needs to list nodes tagged with this term in the default field.
Comment #53
Anonymous (not verified) commentedI did look at text_field_widget_info but I didn't see the settings. Now I do! Thanks.
I'm leaning toward widget settings, because there will be widgets that support adding terms and widgets that don't (such as options widgets). That effectively makes it like instance settings in that each bundle with that field could be adding terms to different vocabularies for multi-vocab fields.
The taxonomy/term paths then will "break" if people delete the default fields for a vocabulary. Or do we have a way to prevent a field from being deleted?
We'll have to be very careful about the documentation of this.
Comment #54
yched commented"The taxonomy/term paths then will "break" if people delete the default fields for a vocabulary. Or do we have a way to prevent a field from being deleted?"
This area in Field API is still kind of grey, for lack of specific use cases so far.
The main issue is what people are able to do with module-defined fields and instances in Field UI. I'm not sure we can or want to prevent silly uses of field_delete_field() or field_delete_instances().
At most, admins will be able to remove *instances* (== "this vocab doesn't apply to this node type anymore" in D6), but not delete the field itself.
We'll probably have a need for module-defined instances to be 'undeletable through Field UI', but it's not implemented yet.
Comment #55
Anonymous (not verified) commentedRather than lose detail in the issue queue, I've created a short wiki page to document what migration from D6 to D7 taxonomy could look like. Please update it if you have ideas or details.
http://groups.drupal.org/node/24145
Comment #56
Anonymous (not verified) commentedI have a big new question today.
Currently, when terms appear on a node, they are rendered as a link to the term's path. When the term is part of a vocabulary with a MODULE property ≠ 'taxonomy', taxonomy_term_path calls MODULE_term_path. Of course, people override this behavior in two ways: hook_link_alter and hook_term_path.
Now that the term is/should be a field, we have field API structures to reckon with: Fields, field instances and formatters. For example, you can choose a term field formatter that renders a term as plain text! No link at all.
Ignoring the menu callbacks for just a moment, please consider the term's path. Is it:
If you choose either of the last two, a term's path is specific to its context in content, it's a field formatter problem and #113614: Add centralized token/placeholder substitution to core has some interesting relevance.
Now returning to the menu callbacks. Taxonomy module will still have callbacks for taxonomy/term/% list pages. If we make the link to these pages into a property of the field or field instance, taxonomy.module's hook_menu/hook_menu_alter implementation should do something like this:
If all this messing about is done in hook_menu and hook_menu_alter, it's cached. If taxonomy_term_page knows the field value, field name and bundle, it doesn't need to go looking for that information again.
What about field permissions? If a field can only be read by authenticated users, should anonymous users see a node where that field matches in the list?
I see another problem with this. If a term's link is specific to a field, what is a term's path according to hook_term_path for? And who on Drupal's green earth would expect field info to be used in generating menu structure? Is it weird and wrong to think we should be implementing menu callbacks on behalf of fields where their instance values are page arguments?!
Comment #57
catchThe only situation where we could ever think about using multiple field tables to produce taxonomy/term/%taxonomy_term is with materialized views in core, which doesn't seem likely at this point. {taxonomy_term_node} isn't exactly a performance star either but if we regress from there then someone will mark this issue won't fix very quickly and it might even be me.
At the moment, I'm thinking we should use parallel storage - i.e. each term field gets it's own field table as usual, but we also provide {taxonomy_term_$entity} tables with object_id and term_id - these would be created when an instance is attached to the entity type.
taxonomy_menu() could then provide taxonomy/term/%/$entity paths as tabs, defaulting to node. Then any other options we can leave to contrib (Views, Panels and then the various taxonomy_* modules) - but that gets us standard field storage and object structure when displaying entities, but also our legacy tables for looking up object ids.
Comment #58
Anonymous (not verified) commentedCatch:
Got it.
In #52 yched alluded to the possibility of using the one-default-field-per-vocabulary as the source of taxonomy/term/[tid] results. This means that taxonomy/term/[tid] would return entities where this original/default field instance value is [tid].
I think you're suggesting that taxonomy/term/[tid] needs to return all entities (of the same kind) where any field has [tid] for its value.
Terms are only in one vocabulary at a time, so only one field will need to be queried value. If we cache the field names in the menu system and pass them as an argument with the term ID, there's no need for parallel storage if we follow yched's suggestion, is there? It's one query either way.
I think yched's idea is simpler that relies on some sensible default fields and instances that TOGETHER behave similarly to Drupal 6 taxonomy. Your idea (if I understand it correctly) is all about getting Drupal 7's taxonomy/term/[tid] callbacks to behave like they do in Drupal 6 regardless of how many instances a given vocabulary's term field might exist in any given bundle.
Comment #59
catchI could happily live with yched's approach as well (but had sadly forgotten it when thinking of the above). It'd both be simpler and potentially a bit faster since field tables don't hold data for older revisions and we could just query on nid instead of vid - along with smaller table sizes.
One other thing is forum module will have to play nicely with this as well.
Comment #60
catch#491190: Provide a taxonomy term field type is in, so this one is on!
Comment #61
catchOK here's a fresh patch now that terms as fields and the autocomplete widget are in - just working on the upgrade path for now although this makes some assumptions on how things will work in general.
# We use yched's suggestion of a field per vocabulary - also discussed this with merlinofchaos in irc and he'd prefer field per vocab vs. term_node since it simplifies some queries.
# The upgrade path creates a field for each vocabulary, and an instance for each node type the vocabulary is assigned to (maintaining settings where they map to field API settings).
# Doing a direct database query for each vocabulary to migrate the data, unfortunately due to 'delta' we can't do an INSERT INTO SELECT FROM - so this is going to have to be batched at least for the inserts. Batch API is still a TODO - but posting what I have since it works in testing so far.
# Dropping all 'widgety' settings from {taxonomy_vocabulary} and the entire {vocabulary_node_type} table, so far left vocabulary.weight - not sure if there's use cases for that outside field conversion (we order the vocabulary listings by weight and allow you to drag and drop them for example).
There may well be issues with the upgrade path here apart from doing the batch, but the primary job to do here is to move taxonomy/term/% listings to use field_attach_query() - we'll only load terms from the default per-vocabulary field so the query should be as simple as the one in HEAD currently, just via field API instead. Then it's a case of ripping out taxonomy_node_* and supporting code once we have alternatives in place, and fixing tests, and field UI in core...
A few questions which I think have been raised here before, but reiterating:
formatters - the default term formatter points to taxonomy/term/% - but if we only show values there for per-vocabulary fields, that might be a bit strange. Do we default formatters to plain text rather than the link? What to do about taxonomy_term_path()?
taxonomy/term/% only applies to nodes - do we need to add per-entity tabs here? (so taxonomy/term/%/$entity_type) - or do we assume that site builders attaching terms to users will build their own listings with Views or whatever (I'd tend towards the latter).
Comment #62
moshe weitzman commentedI agree that we should let contrib add tabs (ornother UI) for non-node term listings.
batch api example module - http://cvs.drupal.org/viewvc.py/drupal/contributions/docs/developer/exam...
IMO taxonomy_term_path() should go and we should do #320331: Turn custom_url_rewrite_inbound and custom_url_rewrite_outbound into hooks. not sure we can count on that tho.
Comment #63
webchickSubscriiiiiiiiibe!
Comment #64
Anonymous (not verified) commented@moshe in #62: Links for terms are now open to a whole bunch of interpretations. Term links displayed through term field formatters could possibly be instance-specific; they could even use tokens (?). Otherwise, taxonomy/term/% is simply the term equivalent of node/%: a definitive menu callback. The only uses of hook_term_path I've seen are image module and forum module. I'm sure there are others. With this in mind, I agree that it could be handed off to something like the hook proposed in #320331: Turn custom_url_rewrite_inbound and custom_url_rewrite_outbound into hooks. As it is, it's conceptually muddled for module developers and a reminder of the strange incompleteness of module-owned vocabs.
@catch in #61: The menu callback, field_attach_query business should be documented as an ideal example of this fundamental tactic, and I don't know if it's necessary to push it on users, other terms, whatever bundles yet in core. At some point, it becomes search, doesn't it?
Comment #65
catchKept going with this, approaching a working patch, still very much in progress though and haven't even opened up taxonomy.test yet.
Status so far:
Upgrade path works, but needs batch API added in.
All taxonomy_node_* and supporting code is removed, with the exception of taxonomy_select_nodes() which we leave in modified form to allow for taxonomy/term/% listing pages.
Hit one major snag with #552716: Impossible to join or order by on field tables without using Views - explanation is there on the limitation in field API, we have two options to get around it here:
a. First get all nids attached to a term, then do a pager query on them with IN ($nid). This should work out faster than our current taxonomy_select_nodes() query (which causes a temp table/filesort) - but if a term has 100,000 records associated with it, we're loading all that data into memory , then risking max_allowed_packet errors in the IN() query.
b. Use hook_field_attach_insert() or hook_field_insert() + field schema + field indexes to add $entity->status $entity->timestamp and $entity->sticky columns to vocabulary field tables. This combined with the $pager option to field_attach_query() would let you do the same query, although while writing this, it still doesn't help with addTag('node access') so we're more or less at a dead end there.
c. Maintain term_$entity tables as well as field storage and use those for the listing pages - gets us back to the old-style query which was bad anyway.
The potential memory issues with taxonomy_select_nodes() are less than we had with taxonomy_get_tree() on that same page before that was removed, or some other situations in core, so I'm sticking with (a) until someone comes up with a better idea - and the query will at least be indexed in both cases, whereas it isn't now.pwolanin, bjaspan and bangpound sanity checked all this on irc - both bjaspan and pwolanin also thought about the IN() query which I'd previously discounted, but doesn't seem all that bad considering current alternatives.
Major TODOs:
1. Manual testing of the upgrade path, probably with field UI installed.
2. Benchmark taxonomy_term_page() and normal term listings with a before/after reasonably large dataset.
3. Figure out what to do with term paths/formatters if a term isn't in the default vocabulary fields
4. Actually create default vocabulary fields on vocabulary creation as well as in the upgrade path.
5. Consider whether we want to leave any of the other helper functions in modified for field API, like taxonomy_term_count_nodes().
6. Fix forum.module
7. fix blogapi.module
8. fix tests.
9. Help text needs a complete rewrite (string freeze rather than code freeze please)
Marking to CNR just to see how bad the test breakage is.
Comment #67
catchFixed taxonomy.install and default profile, which meant implementing #4 from the TODOs above. Hopefully test bot will fail a bit more productively this time.
Comment #69
catchFixed up a lot more tests, need to work out the structure for submitting option selects and taxonomy autocomplete forms for the last one.
Comment #70
Anonymous (not verified) commentedCatch:
On your patch:
I'm looking at forum.module. What a sicko.
The forum_nav_vocabulary variable could be more reasonably expressed as a hard coded field name. Or we could stash the name of the field in this variable. The nodes which are eligible to be in a forum are determined by the nodes in a vocabulary object, so I'd really like to see this vocabulary's field get a name like 'forum_'. $vocabulary->machine_name.
_forum_node_check_node_type and all of this jiggery-pokery can go away:
All of the tricks to list 'active topics' happen with joins on node_comment_statistics. Is this the same as #552716: Impossible to join or order by on field tables without using Views? see forum_get_forums, forum_block_view, _forum_topics_unread.
Given the fact that there are no non-SQL field storage engines yet and we have no idea how a non-SQL field storage engine could support something like joins, sorts and conditions, can't we add a dependency on field_sql_storage to forum and use _field_sql_storage_tablename and _field_sql_storage_revision_tablename to construct these queries? Or is this a big lame bangpound--? I shouldn't have said anything, I know...
Comment #71
catchNew patch:
Vocabulary settings completely removed!
Taxonomy tests are all passing apart from notices in taxonomy_field_is_empty() (undefined string offset 0) - I think this is when submitting a multi-value select.
_taxonomy_term_select() we can remove if we convert term->parents to a field (in terms of storage it's more or less the same, but I have a feeling field_attach_query will make it impossible to actually do without crazy refactoring we don't have time for...) - not going to die in this patch either way :(
Forum - yeah let's make it a special forum field referring to whichever vocabulary vocabulary, I'm a bit scared about forum containers too although the existing mechanism for those is disgusting, so it'll be hard to make it worse as long as it works.
Those node_comment_statistics() join queries are evil - so even if we have to replace them with a similar thing to what we're doing in taxonomy_term_page() it might be more scalable (more queries, but all indexed ones). The other option is we make forum.module maintain it's own denormalized table with the latest nodes, comments, authors and etc. for each forum so it can query that again on run-time. David Strauss has something like this already I think for Drupal.org.
Comment #73
Anonymous (not verified) commentedTerm parents are also the sideshow that makes Taxonomy module buckle with deep hierarchies. It's also the only typed relationship in Drupal core. It has its own problems, too... multiple parents is a big one! As much as I'd like to see it be a field, too, I think it's something we need to live with for a release cycle. RDF, object relations, alternative field storage, and all of Field API need to come to bear on this problem in Drupal 8.
Comment #74
catchi'm wondering if we could do a half-arsed conversion to a field, keeping current storage and queries, but re-using the field widgets just to clean up the code / UI. Either way, doesn't need to be done here.
Comment #75
bjaspan commentedsubscribe
Comment #76
catchIncluding changes from #554164: Join on {forum} instead of {term_node} for forum queries for now, a few more bits and pieces cleaned up as well.
Didn't look at providing a forum field yet, but that's probably the next step for reducing failures there. CNR for the bot.
Comment #78
webchickOh, testing bot...
Comment #80
catchForum patch is in, so re-rolled for that change. Also fixed last remaining notices in taxonomy.module.
So taxonomy.module, apart from a couple of minor TODOs is now fullly functional - we just need to fix all the bits of core which are using old APIs, which is unfortunately quite a lot. CNR for the bot only.
Comment #82
catchStarted taking a crack at forum, what a horrible mess it is in there. This is nowhere near working, at all, and I ran out of steam about 10% of the way through.
We probably need to put in shadow handling back after #554164: Join on {forum} instead of {term_node} for forum queries broke it - I'm thinking a boolean 'shadow' field on the forum table, but didn't implement the schema changes or work out exactly how that would work ye, although there's some draft code for handling it everywhere apart from forum listings themselves.
Either we have to keep variable_get('form_nav_vocabulary', 0) or introduce a helper function to get the $vid allowed value from the field definition, thats a bit of a pain at the moment.
forum_containers is kept in a variable?? Who knew!
field_forum_navigation is too long for a fied name and a pain to type, that should probably be $node->field_forum instead.
While I was in there I realised forum_topic_navigation was still in core, had forgotten that existed and got a nasty reminder, so posted #556136: Remove theme_forum_topic_navigation() to just get rid of the thing.
This patch also updates default.profile for the default.install change. I'll probably not be able to look at this again until Sunday at the earliest and not too much after that either.
Comment #83
yched commentedJust wondering why patches #80 and #82 are approx. same size, while #82 is supposed contain lots of additional forum changes ? Are there missing bits in #82 ?
Comment #84
catchDamn, looks like it's missing the hook_field_is_empty() and a couple of other small changes. I didn't touch the taxonomy hunks in the last patch though - so just applying the forum bits from #82 and everything else from #80 gets things as far as they are.
Comment #85
Anonymous (not verified) commentedI did my best to merge the two patches according to what Catch said.
No changes though.
I'm going to be working on this tomorrow.
Comment #86
catchLooks like that fixed the broken patch, sorry :(
Marking CNR to see where we are with test failures.
Comment #88
Anonymous (not verified) commentedI'm blocked on an issue with #367595: Translatable fields. The issues are #557932: Taxonomy term field autocomplete widgets never validate, always lose values. and #557924: Options widget broken.
The patch here begins to fix blogapi module. It's against HEAD as of now (after the removal of the function registry).
It adds a tags module. It's probably not necessary, but I need some replacement for $node->taxonomy, and $node->taxonomy_vocabulary_tags is the winner. It would be great to just have $node->tags.
Comment #89
catchI think node->taxonomy_vocabulary_tags we might be able to get away with shortening those to node->taxonomy_tags (that would also mean some ugly substr code I put in to stop field exceptions could die).
Comment #90
Anonymous (not verified) commentedI just want to get the tests to run again. This patch removes consideration for blogapi which is no longer in core.
Comment #91
Anonymous (not verified) commentedComment #93
Anonymous (not verified) commentedNew patch. Many tests still fail.
Comment #95
Anonymous (not verified) commentedgrrr... one more time.
Comment #97
Anonymous (not verified) commentedI am nearly done with this patch, but I have unfortunately saved the most difficult part for last... the active topics block.
The entity loading patch that went in earlier today ... yipes. I hope I did all this right.
Comment #99
Anonymous (not verified) commentedMaybe all tests pass?
Comment #101
Anonymous (not verified) commentedSimpletest's own tests are failing. When I investigated, the errors are:
The table name simpletest672694simpletest525064field_data_taxonomy_vocabulary_tags_2 sure doesn't seem right, but it also doesn't look like it's really a problem with this patch.
Comment #102
yched commentedfield_data_taxonomy_vocabulary_tags_2 is really long. It's within the length allowed by Field API, but if the field is part of a simpletested install, the simpletest prefix pushes us over MySQL's maw chars for tablenames.
It's strange that we seem to have 2 simpletest prefixes here, but maybe you could try naming the fields just 'a_[vocab_id]', just to see if it makes things better.
Comment #103
yched commentedoh, 2 prefixes is probably 'normal' in tests where simpletest is testing its own behavior :-(. Er, that's problematic...
Comment #104
Anonymous (not verified) commentedI'm removing the string 'vocabulary_' from field names. It seemed needless anyway
Comment #105
Anonymous (not verified) commentedthat last patch is going to fail to apply. this one should be better.
Comment #107
Anonymous (not verified) commentedThose four test failures happen on a clean HEAD install. Someone committed bad code?
Comment #108
catchPutting back for a re-test in the hope it'll pass when HEAD is fixed.
Comment #110
Anonymous (not verified) commentedComment #111
yched commentedCongrats on tests ;-)
Taxonomy and forum changes are largely over my head, I mainly skipped those parts, this review is only Field-oriented. Not much.
leftover ?
FIELD_QUERY_NO_LIMITS can easily explode PHP memory. I(=t's mainly there for testing purposes, but should be avoided by most actual code. You should use the $count and $cursor param to loop through limited sets of results.
Field names are 32 chars max, so we should document that vocabs machine names shouldn't be longer than 32-9 = 21 chars :-(, and add a validation step on the 'create vocab' form.
Leftover ? + we have debug() now ;-) (and devel has dd())
Beer-o-mania starts in 4 days! Don't drink and patch.
Comment #112
catchAddresssed yched's points apart from FIELD_QUERY_NO_LIMIT. 1. I need to check how the $limit and $cursor params work in practice 2. We already have to use that in a couple of other places, where it can't be removed unless we duplicate a lot of data somewhere. We could fix the usage here, but it's not actually going to improve things on a real site, since that site is bound to run into one of the other queries somewhere anyway.
Also found a few more things to cleanup.
I'm not sure how update.php is doing at the moment, but I'll try to generate some data and do benchmarks on taxonomy/term and if possible /forum later on to see how we're doing before and after.
diffstat: 11 files changed, 411 insertions(+), 1248 deletions(-)
Comment #114
Anonymous (not verified) commentedHow do we deal with forum shadow topics?
Here's my idea. It's based in imagination so if it doesn't/can't work, I'll hear your ideas.
The forum table has nid, (node) vid and tid. The tid is the real forum term.
The field for the forum vocabulary (taxonomy_forums) has a cardinality of unlimited values.
The forum for a topic is the tid from the forum table. This part already works. The breadcrumb is made from the tid value in the forum table.
Shadow topics are those term values in taxonomy_forums that are not the tid from the forum table.
The way to handling shadow topics is to keep taxonomy_forums field with unlimited cardinality. When the node form is generated, stash these values in the form. Set the taxonomy_forums element's #multiple = FALSE and #default_value = the tid value from the forum table.
When the form is submitted, compare the stashed values of taxonomy_forums field, the shadow checkbox, and the submitted single value for taxonomy_forums. Change the taxonomy_forums field to have all of its values that were set aside earlier.
If the submitted value of taxonomy_forums is different, set the $node->forum_tid to be this new value.
if the shadow checkbox is FALSE, remove the original forum_tid from the taxonomy_forums values.
Is this making sense?
Comment #115
moshe weitzman commentedIMO, shadow topics are very esoteric and can be dropped as a feature.
Comment #116
catchUnbroke forum tests.
Shadow topics are supported by phpbb, so I'd rather retain support if possible.
Comment #117
Anonymous (not verified) commentedLet's see those test failures... if any.
Comment #118
Anonymous (not verified) commentedThis patch implements shadow topic support. Basically the taxonomy_forums field has multiple cardinality but the widget-element is altered to be single valued. I really don't know why it works, but it does. The eyeball test and the simpletests verify this works...
Comment #120
Anonymous (not verified) commentedLet's try this one.
Comment #121
Anonymous (not verified) commentedComment #123
Anonymous (not verified) commentedOk. Finally!
Comment #124
Anonymous (not verified) commentedThe shadow topic feature of forums is present but only partially functional. You can move a topic from one forum to another and the topic "remembers" its old forum but the shadowed forums won't list the topic. Only the current forum will list the topic.
I'm committed to keeping this feature, and I know what needs to be done (see taxonomy_select_nodes() in the patch).
(bangpound pleads for mercy and the kind, thoughtful discretion of our beloved committers.)
Comment #125
catchJust to clarify on shadow forum topics, they're broken in HEAD due to #554164: Join on {forum} instead of {term_node} for forum queries and the current state of this patch doesn't make them any worse at all. We used to join on both {forum} and {term_node} then use the disparity to work out if a topic was shadow or not - doing that is one of the reasons for forums performance, and I think we can put it back a lot faster (given the constraint of never being able to join on a field storage table). I'm a bit worried about doing that in this patch though given impending deadlines, and the number of API functions this patch changes and removes along with $node structure changes - whereas shadows will be a straight bugfix post-freeze. Ideally, we'd open a separate issue and start working on that now - so there's progress there without adding noise to this issue so the other changes can be reviewed properly.
Comment #126
Anonymous (not verified) commentedI know what ideally means.
#561766: Regression: Shadow topics no longer appear in their previous forums
Comment #127
catchHere's some preliminary benchmarks for taxonomy term pages. Note we're benchmarking both the change in query to get the list of nodes attached to a term, and the change in how the terms on those nodes are loaded and rendered, so a few variables to consider.
Generated 20,000 nodes and 10,000 taxonomy terms. D6 upgrade path and devel generate currently aren't working with HEAD to add term_node records, so did the following manipulating it directly:
Associated all 20,000 nodes with term/1
Associated 5,000 nodes with term/2
Associated 200 nodes with term/3
Associated 10,000 nodes with tids 1-10,000
Then ran ab -c1 -n1000 on each of these pages. Then applied the patch, ran update.php and did the same thing:
Results:
Query log / page execution time / memory usage screenshots attached for all four pages, HEAD and patch.
I'm fairly sure that the HEAD queries will get slower the higher table cardinality (and result sets, due to temp table/filesort), and use less memory. Whereas the patch should only get slower and worse memory usage with high result sets since they're better indexed, but more bloated in terms of result sets.
Comment #128
yched commentedThe performance of the filed_attach_query() part (the hook_field_storage_query() line in the requests table) is dragged down by the fact that indexes on field_sql_storage.module data tables are sub-optimal. Notably missing an index on 'bundle'.
We probably need to index all meta columns (et_id, entity_id, deleted, etc...) separately - indexing of the 'data columns' is left to hook_field_schema().
[edit: happens in _field_sql_storage_schema()]
Comment #129
catchAnd some fixes to upgrade path found when sorting this out.
Comment #130
pwolanin commentedapplying the patch - I find the manage fields tab confusing on taxonomy vocabularies - what am I supposed ot be doing here?
Adding a vocabulray is a bit counter-intuitive - I have to create a field on the node, and then pick the vocabulary. Can I ad the same vocabulary multiple times to the same node type?
Also, apparently I control wither this is free-tagging when adding the field (by picking the widget)?
So really, (guess this is to be expected) there is not much action on the vocabulary admin page.
Comment #131
pwolanin commentedTerm links doesn't seem to work, e.g. taxonomy/term/1
Also, I can indeed add the same vocabulary to a single node type as multiple fields using multiple widgets - I guess this could be useful, but also seems potentially confusing.
Comment #132
catchRan explain on the various queries involved:
HEAD:
taxonomy_get_tids_from_nodes() also causes a filesort.
Patch:
Looks like indexes could be improved in field_sql_storage.module as yched points out.
And there's no taxonomy_get_tids_from_nodes() or taxonomy_term_load_multiple() because that's handled by the field cache.
@pwolanin
Manage fields on the taxonomy vocabulary page is for adding fields to terms - see #413192: Make taxonomy terms fieldable.
Free-tagging is now a widget setting, yes - a vocabulary doesn't need to know itself if it uses an autocomplete/create widget or not.
All vocabularies create a default field for that vocabulary, however bangpound designed it so you can indeed add more than one field per vocabulary - this allows for use cases like "Country I was born in" "Country I live in" "Countries I've visited" with separate form widgets and display settings, but all using the same countries vocabulary - which is handy, and also lends itself to defining RDF triples.
taxonomy/term/1 paths are working for me using upgraded content - however we only support taxonomy/term/1 for the default provided fields per-vocabulary - if you add additional fields, the nodes won't show up in that listing, same as we can't list users, or other terms, or comments there either.
Comment #133
yched commented#562816: Poor performance on field_attach_query() should fix the performance issues for the field_attach_query() part (filesort + missing indexes)
Comment #134
catchI opened #562824: Optimize term_node and forum topic queries - as mentioned there, may or may not be a prerequisite for a commit here, but this issue is already very long, and we need to do that anyway.
Comment #135
catchBumping this to critical - if we don't actually convert taxonomy_node_* stuff to field API this release, then we probably have to revert all the taxonomy_field patches in so far to avoid major usability wtfs.
Comment #137
catchNow now, test bot.
Comment #138
mlncn commentedWhat's the barrier for getting last-minute essential features committed to head?
That setting "Number of values: Maximum number of values users can enter for this field" actually sets a limit on how many autocomplete-style tags are saved is counter-intuitive to anyone used to D6 CCK, where more values mean more form fields, but, well, wow! That's awesome.
A bug that could be handled in a follow-up is that the autocomplete doesn't stop you from entering more terms than this restriction.
Tracking down why the taxonomy/term/[tid] listing pages being broken, a basic backtrace
drupal_set_message('<pre>'. var_export(debug_backtrace(),TRUE) .'</pre>');and a little more looking indicates that field_info_field() isn't returning anything in taxonomy_select_nodes(), and that's causing the error in field_sql_storage_field_storage_query().ben, agaric
Comment #139
mlncn commentedAh, OK-- term listing pages don't throw errors with upgraded, or all-new, terms. However:
* There is no interface for making a new taxonomy the default for a content type?
Very minor: The option of choosing which vocabulary goes with a taxonomy field is given twice.
Comment #140
catchWith #562824: Optimize term_node and forum topic queries we'll have (an improved version of) term_node back again - which will mean taxonomy/term/n could list all nodes attached to a term regardless of vocabulary. That'd solve the issue of content not showing up there because it's not attached to the default field.
Comment #141
mlncn commentedMarking this committable – it does the taxonomy conversion – with crazy follow-up work needed in other issues to make Field API fully up to the task, namely abovementioned #562824.
Comment #142
catchI've updated #562824: Optimize term_node and forum topic queries with an in-progress patch. We should end up with both forum and taxonomy queries properly indexed when that's ready. I'm currently using the patch here as the basis for that patch - depending on whether this gets committed or not, I'll merge this back here, or re-roll afterwards.
Comment #143
webchickThis is a relatively cursory review; we're just on our way out to dinner so I had to cut it short before I could finish. However, I tried to focus on broken stuff and am pretty sure that once these are resolved, I've no other complaints about this patch.
What is this crap? :P
Also, while we're here, can we remove that blank line above the function so the PHPDoc shows up?
Does this not re-introduce #199373: Enabling forums after being disabled loses vocabulary association?
Er. DIE? This is an important forum feature.
I don't like this; it ties the forum to the forum path. Can we derive this another way?
The minus signs in this patch are GLORIOUS*!&@(*&*!(&@(
Ick. The old code was much easier to read. Can you please split them back up to separate lines?
single quotes.
watch your concatenation.
Adding tags to 'page'?
This review is powered by Dreditor.
Comment #144
catchThis is the crap we have to introduce because we dropped this crap:
And this:
However I worked out we only need to introduce one cast to array there, not two, so it's 1000% nicer than HEAD now instead of 500% :p
Yep.
It shouldn't do, because we replaced $vocabulary->nodes with field instances.
Nice feature, but wasn't a very pretty helper function. However it's used in a lot of places, so I tidied up the PHPDOC a bit and removed the DIE.
Not invented here, and again it's an improvement on HEAD:
So can we do it in a separate issue?
We'll have to add some code back in #562824: Optimize term_node and forum topic queries but it's code which should've been there 2-3 releases ago, even with that, it's still a lovely diffstat.
Done.
Done.
Fixed.
Not quite, adding a new autocomplete widget for our test vocabulary to page. I left this how it was.
New patch.
Comment #145
catchForgot status change.
Comment #146
catchI got #562824: Optimize term_node and forum topic queries to a point where it's got less bugs than this version, brings back shadow forum topics, and also solves the performance/memory issues in both HEAD and the current patch, so I think it's in a sufficient state to merge back to here. There'll be some cleanup to do regardless of how the patches go in, but I think if we can do this at once, it'll be straight bugfixing and tweaks more than any more huge changes. #144 is still good to go otherwise, I'll just need to re-roll this if we decide to commit that first.
summary of the new patch:
It creates taxonomy_index and forum_index tables owned by their respective modules.
These tables hold enough data, denormalized, to build the most expensive queries without adding extra joins.
This removes temporary tables, filesorts etc. from taxonomy/term/n, forum blocks, and forum topic lists. There are other potential queries which should be optimized in a followup patch.
In the process, it adds back some node specific code back to taxonomy.module - but with the crucial difference that this is only required to allow taxonomy module to provide out of the box listings of nodes like Drupal 6 does - i.e. we can eventually remove it when either materialized views or views or both is in core. The new code is specific to providing those listings efficiently, and not mixed in with taxonomy API functions like it used to be.. It also fixes long-standing and critical peformance issues with both modules, which have open issues elsewhere, and custom code on d.o to avoid them falling over.
Comment #147
int commentedadd tag
Comment #148
pwolanin commented@catch - if we get access to SQL table info would we be able to remove this extra table info?
Comment #149
catchpwolanin - we can, but keeping denormalized tables like this lets us fix some very nasty queries in the process. Also it doesn't solve the "showing all nodes tagged with a term regardless of which field was used" issue which you discovered on taxonomy/term/%term
Comment #150
Anonymous (not verified) commentedEveryone:
This patch is postponed until #569224: Expose field storage details through field attach. is committed. All the panicked efforts to make this patch "work" before code freeze can be undone and made WAY more elegant and high performing after that.
Comment #151
catchDiscussed this with bangpound and chx last night - while I think adding those hooks for field storage tables is a good thing, they don't help us much here:
1. Our existing queries for term listings and forum topics are some of the worst performing queries in HEAD.
2. To get taxonomy/term/%term listings working the same as D6, we'd have to join across all possible field tables which might reference a term ID (and taxonomy fields can potentially span multiple vocabularies or subtrees of vocabularies, so this would take quite some logic to determine which fields might potentially reference a term) - this would make things even worse than they are now
3. If we special-case to one vocabulary and do a join on just one field table, we get the same usability issue pwolanin ran across in http://drupal.org/node/412518#comment-1982796 of it 'not working'.
However, chx is very concerned that sites which actually do want to use pluggable field storage for term fields, say couchDB, can do so without having built-in overhead in taxonomy.module to maintain the denormalized tables - which is an extra insert on node_save() at least. This is fair enough.
So... new patch, which keeps the schema, has taxonomy_select_nodes() return early if using pluggable field storage (our answer to sites doing that is they should know how to use views or roll their own term listings in some other way), and the table is only updated if field_sql_storage.module is in use, via a conditional include. We don't support data migration between field engines or anything, so no need to do that here either.
While it's not too hard to do the same thing for forum.module, we'd pretty much have to add checks for this everywhere in forum.module - since all it really does is provide term listings and a few templates - so I think a hard requirement for field_sql_storage or at least these denormalized tables is in order there. However, if the same arguments apply, I'll do my best to move it out of there too.
chx and bangpound also mentioned on irc that we shouldn't try to introduce materialized views piecemeal into core without a centralized API. I vehemently disagree with this. Materialized views isn't even in contrib CVS yet, making it inaccessible to the very large majority of Drupal users, addtiionally David Strauss has said he's not yet happy enough with the maturity of the API to try to push it into core, and we've already introduced one lookup table with #105639: Tracker performance improvements. Either taxonomy.module provides indexed listings of nodes, or we remove that capacity from core (and tracker.module) and let mv and views handle it in contrib, but we have to stop shipping core Drupal versions with site-hosing queries once yet get above a few thousand items - especially when fixing them really isn't that hard. Since maintaining the tables is now optional, I hope that's enough of a compromise.
Comment #152
yched commentedcatch: any idea if / how the special 'field_sql_storage' handling applies after #443422: 'per field' storage engine ? (I'm still fighting with the reroll, but there's a general agreement that we should get it in)
Comment #153
catchyched, I'm not sure. We could potentially set/get any variable as to whether we want taxonomy module to denormalize field data or not, as opposed to the field_storage one. That would allow sites with their own materialized views to not use the core one, for example. That's a two-line change to the existing patch though, so would love feedback on the general idea.
Comment #154
yched commented@catch: I'm not sure we're talking about the same thing. #443422: 'per field' storage engine is not about denormalization, but about field_a being stored by field_storage_sql and field_b stored by field_storage_couchdb instead of the current 'one storage backend rules them all'.
Comment #155
catchSo if taxonomy_field_a can be in default storage, but taxonomy_field_b can be in couch_db, then the approach in the latest patch would still apply if we want taxonomy/term/%term to function - since joining between MySQL and couch DB we're not going to support if it's even theoretically possible, but I'd make it a variable_get('taxonomy_maintain_index', TRUE); instead of checking the field storage module variable. I've changed that in this patch, along with renaming the include to taxonomy.index.inc. Now any contrib module or site can stop taxonomy from maintaining this table and do this a different way - whether they're using views to provided different listings, materlaiized views to maintain their own lookup tables, or some other method - but core keeps the listings the same as D6 with much better performance.
Also took a first stab at the upgrade path, but getting batch API notices in 7004 which suggests update.php still doesn't work?
Comment #156
yched commentedOK, so when #443422: 'per field' storage engine lands, functions in taxonomy.index.inc will be wrapped in
if ($field['storage']['type'] == 'field_sql_storage') {? Or will the index include non local fields too ?+ minor: it would be best IMO if all taxonomy_field_*() hooks were kept together in one place, instead of some in taxonomy.module and some loaded conditionally in taxonomy.index.inc.
batch API notices in 7004 ('simple', single pass update) ? or in 7005 (multistep) ?
Comment #157
catchAgain, it depends how much we want to support Drupal 6 style taxonomy/term/%term listings of all nodes associated with a term (IMO they're useful and we probably do, but it'll be nicer when we generate them with default views). If the general agreement is that yes we do want to support those listings, then we need to maintain the index for all possible fields regardless of where the canonical data is stored, and let sites/modules disable this functionality if they have specific needs.
Well we could put that variable check inside all the functions which were moved to taxonomy.index.inc. Hmm, looking at it it's < 10 functions so that might not be too horrible.
Batch API notices were in 7004.
Comment #159
chx commentedI am unhappy with only the variable and no table checks coming from #569224: Expose field storage details through field attach.. However, the Bible says
go ahead.
Comment #160
catchYeah I'm fine adding that additional check in when the patch is in, but don't want that issue to be a dependency.
Attached patch fixes the conflicts in forum.test, moves the taxonomy index stuff back into taxonomy.module, but with a new @defgroup and associated documentation block to explain what's going on, which also seemed an appropriate place to try to document the variable since we don't have obvious ways to do this otherwise.
The forum indexing should also be controlled by a variable probably, and that may need some cron indexing for sites which disable forum module, have people posting comments on the forum content, then switch it on again, which taxonomy doesn't - but webchick indicated in irc that this could probably be done in a followup. There's also a bunch of forum queries which could probably use the new table introduced to forum, but don't yet - however that's also performance tuning/internal factoring which we can do without the API changes here (and forum module works at least as well as in HEAD with this patch applied)
Comment #161
catchSorry, patch.
Comment #163
catchCross posted with the final dbtng patch... Should apply for real this time, and forum/taxonomy tests all pass.
Comment #165
catchRe-rolled.
Comment #166
catchAnd the patch, a week later.
Comment #168
catchrawhide :(
Comment #169
dries commented#443422: 'per field' storage engine has now been committed.
Comment #170
catchAfter a lot of discussion with chx in irc, we agreed that the management of indexed tables needs to be handled by a variable (since you also might not want that if using materialized view API or similar) - however I think chx also wants an
&& $using_sql_storagein the conditions - so presumably we could get that information now per-field storage is in. The behaviour would then be, that we maintain those tables unless you disable it manually, or the field you're using doesn't use SQL storage - which would just be convenience for sites using pluggable storage so they don't need to change the value themselves. However it needs to be noted that it would then not be possible to both use these centralized lookup tables, and also store your fields outside SQL - I can't think of obvious use cases for that but would rather someone else agreed with me before another revision of this patch gets posted.Comment #172
catchNew patch checks for field_sql_storage() on insert and update hooks. Since there's no hook_field_attach_delete(), I left the delete queries in - a no-op is going to be no-worse than looping through every single field to check the storage.
Comment #174
catchFix stupidity.
Comment #175
catch:(
Comment #176
yched commentedAs a followup to #443422: 'per field' storage engine, we'll need to extend hook_field_storage_info() with a boolean 'local_sql' property, and remove the hardcoded checks against
$field['storage']['type'] == 'field_sql_storage'. 'field_sql_storage' might not be the only local SQL storage backend.Doesn't need to be done in this patch, though.
Comment #177
catchThat was one of my arguments against putting these checks in at all, and just putting in the variable. I leave it up to others whether we stick with the current patch and add the local_sql in later, or go back to the variable and at it in later.
Reviews here would be appreciated, less than two weeks of exception time left.
Comment #178
yched commentedJust throwing an explicit reminder that I'm not where I can easily review large patches, and also that I'm probably not the most qualified to greenlight this patch at this point. Last time I checked, it was OK from a Field API point of view.
So *other* reviewers needed :-p
Comment #179
catchYeah I didn't mean yched specifically ;)
Comment #181
andypostLast patch #174 applies with offset
1) setup with normal profile successful, Tags created
2) New vocab created fine, tested text field addition
3) I found no ability to assign vocabulary to node-type
4) Trying to add existing text field to Tags vocab throw error when save field setting (/admin/structure/taxonomy/1/fields/field_test/edit?destinations[0]=admin/structure/taxonomy/1/fields)
5) Editing term form looks strange, theme changed from admin to garland and fields displayed in strange order (screen attached)
6) after saving new term and after editing no redirect and form fields not cleared
Comment #182
catchThis should be at admin/structure/types same as any other field.
That's not good, but adding fields to terms isn't part of what this patch does - can you verify whether that error occurs in HEAD too?
Same thing here - please confirm that these aren't existing bugs in HEAD.
Comment #183
Anonymous (not verified) commented#570900: Destroy remnants of update_sql() meant the hook_update_Ns in this patch were old fashioned and (even worse) big failures. However, in my usage, update.php still showed some meaningless output when the updates succeeded. I don't know if this is the new Drupal way or if this is an issue with this patch. Please let me know.
This patch applies to HEAD and the relevant tests pass. Hopefully testbot will say the other tests pass, too.
Comment #184
andypost@catch - previous test p.4-5 have trouble with core, so not related to patch
Another review #183
1) new vocab creation fine
2) new terms fine, except form fields still have old values (but this is a core bug - no time to search issue)
3) assigning to node type (Page) a field type taxonomy (checkboxes widget) - works but submit on third step (page settings) brings
4) node/add/page - works
5) nodes are created and navigates fine
6) new vocab, addition existing field (taxonomy from node-page) but with autocomplete widget - excelent, no warnings
7) new term - widget for term-field below Save button :(
8) Addition of term field to vocabulary works fine
Everything works fine except warning and notices
Comment #185
bensnyder commentedsubscribe.
Comment #186
Anonymous (not verified) commented@andypost
The issues you're describing (in #184, nos. 3 and 4) are not related to this patch AFAIK. Can you open a new issue about them? I can have a look after this patch is committed as I'm pretty familiar with the widget and the term field.
Comment #187
andypost@bangpound Summarized #595702: Warnings when adding taxonomy term-field with checkboxes wodget
So about this patch +1 to RTBC!
we found more comfortable to assign vocabularies to node-types with fields oposit old-style (editing vocabulary)
Comment #188
moshe weitzman commentedI read through this patch and I found the flow easy to follow and the code high quality. I added/edited/deleted content and terms without problem.
We are in a terrible twilight zone right now with taxonomy and I think it is important to commit this patch. We should performance optimize the upgrade path at a later time.
Comment #189
sunThis is incredible! The patch, indeed, looks totally awesome, and we should tackle any resulting/remaining issues in follow-ups.
Testbot doesn't seem to come back from re-testing. Same patch. Don't credit me.
Comment #190
Anonymous (not verified) commentedI verified that the patch sun posted in #189 is identical to #183. (Excessive caution?) Thanks for the re-roll!
Comment #191
catchJust a note on the upgrade path - it's already multi-pass, the issue is whether we can make it an INSERT INTO .... SELECT FROM instead, which requires finding a way to handle field deltas during that process if so, we should also look at doing the same for the body as field update (and that's a lot simpler since it doesn't have to worry about deltas).
These are the two main follow-up issues: #597736: Do forum indexing on cron, #597742: Optimize forum.module query performance.
Comment #192
catchRemoved a dead comment, a duplicate comment, and an spare apostrophe.
Comment #193
catchRemoved a dead comment, a duplicate comment, and an spare apostrophe.
Comment #194
webchickOk. Spent a good hour doing a deep-dive on this patch. I had a few questions, and found a few very minor things. None were enough to hold up this patch, though. As soon as catch posts a quick re-roll and it gets green from the testing bot, this is going in.
Here were my comments from IRC:
Catch was able to address all of my "big" questions and the re-roll should take care of the tiny issues.
Comment #196
webchickOh, well shoot. In my eagerness to see this committed, I forgot that I haven't yet checked the UI-related changes. And when I did, I found some bugs.
1. After running update.php, my "Tags" vocabulary on the Article type moved from the top of the list to below body. Possibly some kind of weighting issue in the update path.
2. Things get really crazy though, with Forum module:
As you can see, this is horribly broken. :P
The best thing I can figure out is that it somehow thinks "Tags" is the forum vocabulary, since all 3 of those things it's attempting to create links to have that vocabulary assigned to them.
We should limit this list to only node types, not other entities. If this is a huge hassle, we can save it for another patch.
Comment #197
catchI can't reproduce the syntax error in forum.admin.inc locally with php -l. This should fix the "post new $type" thing - for some reason that was checking the tags field, although I didn't yet restrict it to nodes only.
Comment #199
catchGrr, this one should pass.
Comment #200
webchickExcellent. Committed to HEAD!!
We need follow-ups for (at least):
* Limit types of things you can post to forums to nodes only, and add tests for this feature.
* Clean up comment.module so it fires hook_unpublish on single-saves as well.
Comment #201
catch#599018: Clean up comment hook invocations
#599016: Only allow nodes to be posted to forums
Two more followups in #191.
Comment #202
catchDidn't mean to change status!
I also added a note to the upgrade docs. Since we removed hundreds of lines of code here, this just says "we removed a lot of stuff, please look at the issue or field API docs" at the moment.
Comment #203
giorgio79 commentedFantastic stuff! Looking forward to this.
This may not be the best place to ask, but it is organically following this issue so I Thought it is worth asking.
With this upgrade, on taxonomy term edit pages, will we be able to replace the large select boxes for selecting related terms and parent terms (in the advanced options) with CCK autocomplete for example?
At the moment the term edit page just loads all terms from the vocab in the select boxes which can be a performance and usability issue with larger vocabs for example.
Comment #204
catchRelated terms are no longer in HEAD.
Parent items, sadly, we didn't have time to convert to a field in Drupal 7 (or remove the horrible form code which generates that select list). We should look at performance improvements in D7 though, and term hierarchy is going to be a major TODO for Drupal 8.
Comment #205
giorgio79 commentedCool, thanks Catch.
I just found this issue related to to what we just talked about:
#526120: Remove related terms from core now that terms can have a term reference field
Comment #206
Anonymous (not verified) commentedParent items are going to be one tough job for Drupal 8.
I've heard people talk about a hierarchical field type. It took me a while to really understand what this means, and I still can't put it into a brief comment on an issue.
People are very dissatisfied with the performance of deep hierarchies in taxonomy.module vocabularies.
I predict that Drupal 8 taxonomy module needs to provide or use a field that has a different kind of storage pattern that it can somehow impose on the bundles it's attached to. We already have a simple way of expressing hierarchy, but it's inefficient. There are other ways that are more suitable for deep hierarchies, and they don't require recursion or multiple SQL queries. There are database models that are inherently hierarchical, and Field API is supposed to be agnostic about where field values are stored.
However, we still have to finish Drupal 7.
Comment #207
mfbFollow-up bug fixes for forum.install and forum.module at #601938: Forum exceptions
Comment #208
xanoJust poking a bit: a lot of contrib depends on taxonomy_term_count_nodes(). Any chance we can get it back?
Comment #209
catchWe could probably use the term_index table for that yeah. Can you open a new issue? It'd be putting back an old API rather than a new one, so if you can argue it's a regression probably OK for the spit and polish phase.
Comment #210
xanoDone: #602240: Re-add taxonomy_term_count_nodes().
Comment #212
gábor hojtsy@webchick, @catch: seeing that no category information is printed with node feeds currently and no Feed UI exists (that I could find) to set that up, I was curiously looking for the patch that removed this functionality. Looking at this patch that was committed, it looks like the RSS category addition functionality was removed and no facility was added back in to reproduce it. Or am I missing something?
Comment #213
jessebeach commentedHere is the RSS 2.0 specification: http://www.rssboard.org/rss-2-0-1-rv-6
Comment #214
catchGabor, please open a new issue for that if you think it's a regression, no need to re-open 200 comment issues.