THE TERM METATAG

Let's stop using the term metatag. I think that there are several directions that we can go to make Drupal's node attributes make more sense. But first, we need to really understand what we're talking about when we say metatag.

A metatag is ONE single descriptive point in a document's larger description within a metadata framework like the Dublin Core (see link below). Metadata as it is commonly used today (e.g. in the Dublin Core Metadata Element Set) separates information about data into: 1) document descriptions (empirical data such as author, title, publisher, date of publication) and 2) knowledge descriptions (usually subject content and description/abstract). What Drupal attributes do is allow for the latter -- classification of nodes by subject. With this concept in mind there are a few directions we might go.

For more info on metadata elements see the Dublic Core Metadata Element Set

SUBJECT CLASSIFICATION

You can think in terms of what Libraries have historically done to get a sense of what is possible. A library catalog is a database holding bibliographic data (metadata) to help you find items in a large collection of data. You can do known item searching by looking for an author name, title, etc -- the document description stuff. But a lot of people want to find similar items, so maybe they find out what an item is about (subject) and browse for other books with the same subject. This is how most people want to use Drupal's attributes, I assume, and they may also want some sense of relationship (perhaps hierarchical, perhaps not) between each subject.

Libraries have used complex library classification schemes that use some sort of syndetic structure. There are hierarchical systems like the Library of Congress Subject Headings and Dewey Decimal System. There are flat and flexible systems like Ranganathan's colon classification system for facets.

What we did in Drupal 3.x is a less complex version of what Verity and Semio are doing in creating subject buckets (or taxonomies/ontologies). They create complex searches for terms and put documents in appropriate buckets when they satisfy their search. Verity, of course, does a bit more than that, but we're not creating Verity for a weblog.

DOING IT IN DRUPAL 3.x

It took me a while to understand what Drupal 3.x's attributes were doing, but I see that we werere doing is this:

  • Creating subject headings
  • Creating a list of index terms (called descriptors by indexers) that we might apply to nodes and grouping them under various subject headings
  • Adding index terms to each node

What didn't make sense to me was that we seem to be creating parent-child relationships with metatags and attributes, but those relationships aren't being used. Here's an example:

  Metatag: Deliverables
  Attributes: Wireframes, Flow charts, Personas

If I entered an attribute of "Wireframes" for a node, I would expect that I would find that node if I browsed by topic under "Deliverables" as well. This isn't happening though. What I think should have happened is that if I entered the attribute "Wireframes" for an article, the links for that article would look something like this:

  Title of article blah blah
  ----------------------------------------------------------

  This is the body of this node.  Blah blah blah blah blah blah blah. 
  Blah blah blah blah blah blah blah.  Blah blah blah blah blah blah blah. 
  Blah blah blah blah blah blah blah. 

  >> Posted by jibbajabba on Friday, November 09, 2001 - 11:35
  >> In topic: Deliverables - Wireframes, Strategy and process, Some other topic

Look at the part that reads "In topic". I would want the text for Deliverables to be a link to the Parent subject heading and the text for Wireframes to be a link to the child subject heading. There are other subject headings there as well.

And what about doing hierarchies that are deeper than 2 levels? Is that desired. Maybe maybe not. If yes, Drupal would become usable for more than just weblogs, I'd bet. It could function like CMSes such as Vignette do. But I'd argue that 2 levels is enough for what Drupal is supposed to be.

For an idea of what some complex taxonomies look like, see the MeSH system (Medical Subject Headings)

DOING IT WITH MACHINE INDEXING

Another thought that occurred to me. Why bother users with having to enter attributes themselves? Are there methods for indexing the text of stories/blogs/etc. and then applying subject classfication to them based on the terms they contain? This might also be valuable, but I don't know what it would mean in terms of programming. Also, the user should be able to change the categories applied if they feel that they're wrong or if other categories should be applied that were not picked by the system.

This comes closer to what Verity and Semio offer in value. They automagically apply some sort of classification when a document is entered into the system. Then, at some later part in the publishing workflow, an administrator can check the documents that are retrieved by their parsing/classifying engine and tweak the searches. This is sort of what is done where I work as well. The searches are rather complex boolean searches.

I'm still thinking out loud about much of this, but wanted to bring it back up again because I've found the use of Drupal's metatags to be useful. I'm not sure if my user's have or not, but I plan to poll them on that. In any case I want to bring it up again so that the direction for these things might start to show itself for the next release of Drupal. Let's just change the word metatag to something more meaningful. Categories? Topics?

I'm sure I'll have more to add to this as ideas occur to me.

-Michael

Comments

j0e@www.drop.org’s picture

It seems that we are all struggling with this area, here are the key discussion points that I see in your post and elsewhere.

1. Hierarchies, should we support them? If so 2 levels or 'n' levels.

2. Classification, who is responsible for classifying a post? Users, software, admin?

I think a reason that this is difficult might be that Drupal is being used for many very different purposes. A personal weblog tool, a community publishing tool, a knowledge management tool, a content management system... you get the idea. Each of these applications has a different set of goals. For example, an 'n' level hierarchy is beyond the needs of a personal weblog and might add unwanted complexity to the application. Automated classification sounds too complex for every application except large scale CMS.

So it seems that a highly flexible but not too complex solution is in order Only local images are allowed. No wonder we are struggling.

Oh, to your point about dumping the term meta tags. Is your suggestion to use "subject classification"?

Reading maketh a full man, conference a ready man, and writing an exact man. - Francis Bacon
Using Drupal maketh a full, ready and exact man Only local images are allowed.

Anonymous’s picture

1. why 2 levels?
2. since the meta... eh 'subject classification' system is gone at the moment, I guess it's best to keep things simple to start with and go for the classic 'posters classify their own posts', with perhaps the choice of 'admin(s) classifies all posts' (global choice between these two). I guess all other solutions will slow things down considerably.

anonymous polaar

j0e@www.drop.org’s picture

1. Do you mean why start with 2 levels or more, versus 1 level? If so, it's because I got hung up on the word hierarchy. Only local images are allowed.

2. Now why did you have to bring reality into the picture Only local images are allowed.

Cheers,
- Joe

Reading maketh a full man, conference a ready man, and writing an exact man. - Francis Bacon

Anonymous’s picture

1. see my reply to jibbajabbaboy
2. Only local images are allowed. Talking about reality: I was thinking: how about doing a series of discussions (better on drop.org I guess) where participants are explicitly invited NOT to be realistic. Example: a discussion starting with the question 'how to get users to assign metatags?', where everybody just posts whatever they think of, no matter how unrealistic, far-fetched, absurd... That way nobody has to feel embarrassed about posting stupid ideas, there might actually come good ideas out of it, and if not, at least we can just have a good laugh. I was thinking you might like this... Only local images are allowed.

Kjartan’s picture

In the end we have to be realialistic. Someone has to code it, and they have to be able to understand what is being said. No harm against a go wild kinda disucussion first though Only local images are allowed.

I would like a categorizing system that is easy to use, lets me define N levels and is integrated more. Drupal is more than just a blogging tool. In fact out of 5 Drupal sites I run only one of them is a blog.

My main problem with the meta tags ala 3.0.x is that it was not well enough integrated with the rest. You would categorize the content but it would be hard to do anything with it. It really required a lot of knowledge about Drupal innards. The tags/sections/... should be searchable, easially found, etc.

It would be nifty if Drupal could find out the category on its own, but I would like to see the rest work first. Not sure I like users just adding their own sections as on a busy site it would end up being worthless unless the admins like cleaning up every once in a while.

--
Kjartan

j0e@www.drop.org’s picture

What you say makes good sense and is very reasonable. In fact I think in essense we agree. The go wild discussion has its merrits, but much of the ideas wouldn't make sense to implement. And of course if the developers don't understand it, they can't code it.

If you'll humor me a little longer, I'd like to expand on the idea of users creating new categories. Here is what you said.

Not sure I like users just adding their own sections as on a busy site it would end up being worthless unless the admins like cleaning up every once in a while.

Here are a few ways we might handle this.

  • New categories could be pushed through a submission queue or be voted upon. I think Polaar mentioned this somewhere but can't find it to give proper credit.
  • As jibbajabbaboy said, the user taxonomies could be kept seperate from the administrators taxonomy so the administrators taxonomy stays clean and uncluttered.
  • I was thinking of some sort of ranking though, where a shared taxonomy is self managed by "normal selfish user behavior". To be honest I don't really know how it would work, but here is a really rough sketch.

    John posts an article, and stores it in his own bookmarks with a new category name. Bill reads the article, decides to bookmark. When he bookmarks it, he see's a list of all categories, with the John's new category name highlighted and at the top of the list. It is at the top and highlighted because there is a good chance that Bill will share John's opinion about categorization. Let's say bill disagrees and puts it under one of the pre-existing categories. Now Jeff bookmarks it, both Bill and John's categories are highlighted and at the top of the list. Whichever category that has been chosen the most for this post takes top spot.

    That's what a user see's when bookmarking a post. But what does the rest of the viewing audience see? They see a "public directory" that sorts and ranks all the posts and categories based on the user bookmarks. So John's post might start off categorized in a misc bucket, because John's new category is still unproven. However, if enough people use John's category, it then becomes accepted into the public hierarchy, and the post is listed in that category. As soon as Bill categorizes the post in a standard category, the post will also be listed there.

    Since the act of bookmarking can be viewed as a metric of value, we could use that info. Each category in the hierarchy is a rank order list of postings and one of the ranking attributes is how many times it was bookmarked into that category.

I must sound like the guy on the street corner drinking out of a paper bag. But unfortunately, I can't use drugs or alcohol as an excuse for my ramblings.

BTW, I think I'll take Polaar's advice and keep future pie-in-the-sky ramblings on drop.org and leave drupal.org for serious reality based discussion.

Reading maketh a full man, conference a ready man, and writing an exact man. - Francis Bacon

Anonymous’s picture

I was thinking about this separation between a user's taxonomy and the administrator's (or global) taxonomy. What if a user can bookmark posts or assign keywords, and an administrator links these to the (static) global taxonomy. That way, the site taxonomy stays clean and uncluttered, a user can do whatever he wants with his own. This still places some burden on an administrator, but he doesn't have to categorize every new post, just every new category/keyword/section a user makes (assuming this happens less often than new posts...)
or something like that... Only local images are allowed.

Your ranking idea set me thinking. If you have different categories, you can have a certain value of how many times a post is put in a certain category. A post can belong 'more to this category than to that category'. You can do some weird things with this. Maybe you could draw a 'map' with the different categories on it, and show the post located somewhere in between, closer to this one than that one... Or if you have only three categories, you apply an RGB color value to each post instead of subject headings Only local images are allowed.

polaar

I know, I really should (have somenone) look into my account problem again

jibbajabba’s picture

I think you're all headed in a good direction. Seems like suggestions are: 1) keep global taxonomy and user subject headings/bookmarks separate, 2) allow moderator to map user headings to global taxonomy subject headings.

This is sort of like maintaining a controlled vocabulary with a thesaurus and then mapping the terms in the controlled vocabulary to a taxonomy. This is what we do where I work in the info. services group at Bell Labs. A bunch of IA's are struggling to understand what CV's, thesauri, taxonomies have to do with each other over at Elegant Hack so don't feel like you're all alone in trying to grok this stuff, because that is basically what we're all sort of dancing around at the moment.

Personally, I think the user-contributed stuff is valuable, but the global taxonomy is a higher priority. A big question still remains, "Will people categorize their stories?". I don't think this has been tested in the IR literature. I think we should all avoid that question in any case and just design what we think WE will want to use. Whether someone can code this is another issue. I wish had chosen Computer Science instead of Art History when I went to undergrad! Damnit Only local images are allowed.

Kjartan’s picture

I find the meta-tag and related conversations hard to follow as some terms are just dumped on you and not explained in detail. I am good at english, but some of the words used are just beyond my imediate knowledge so I end up using a dictionary at times. Also I am more into how to get something to work at the moment. So I think in terms of this is what they want, now how do I code it. Which in this discussion is a dissadvatage.

At some point someone should do a summary for normal people to understand. We also have to have some terms that anyone can understand. I don't think using "global taxonomy" in a user interface will have the users categorize the content Only local images are allowed.

--
Kjartan

jibbajabba’s picture

Kjartan, see my post with definitions. Sorry not to have defined the terms I was using.

Kjartan’s picture

Now to read all these posts and links again to try and get some new connections. If you have more info keep it coming so we can great the best system ever Only local images are allowed.

--
Kjartan

polaar@www.drop.org’s picture

This is sort of like maintaining a controlled vocabulary with a thesaurus and then mapping the terms in the controlled vocabulary to a taxonomy.
er... yes, that's it... I think... Only local images are allowed.
A big question still remains, "Will people categorize their stories?". ... I think we should all avoid that question in any case and just design what we think WE will want to use.
Damn, just when I wanted to ask this question... Only local images are allowed.
I guess if you really want this it would be a good idea to implement it in different steps/separate modules.
Step 1: global taxonomy system: get this working without the other stuff and you have the most urgent issue fixed
Step 2: user bookmarks: give users the ability to 'bookmark' or organize posts (independant from the global taxonomy, or even if the site doesn't use a global taxonomy: as an extra feature for the users)
Step 3: find a means of linking the two together

I don't know if this is easy to do, but it would offer a lot of flexibility and doesn't stand in the way of being realistic and getting meta/sections/subject classification back up as soon as possible.

Unfortunately, I'm even worse than you and don't even know a litttle PHP Only local images are allowed.

Carl Ditzler’s picture

Step 2 is complete, though it is not included in drupal and is a contribution. It looks like we going back to jOe's comments here: Maybe a compromise is to offer the initial set of predefined folders plus the ability to create your own folders. I think it is ok to try to get people to adopt a single taxonomy, but to let them classify elements within that taxonomy in their own way....

j0e@www.drop.org’s picture

I think you've got my number Only local images are allowed. Although I don't get too embarrased by my stupid ideas as long as an occasional one has at least some redeeming value.

I think that to be leaders in the community publishing/blogging/knowledge management arena we need free open and creative dialog without the constraints of reality. If an idea is really worthy, I'm sure we can bend reality a little Only local images are allowed.

Being an idealist and an optimist, I believe if we foster real creativity, we will develop an exciting vision for drupal. As that vision becomes clearer, we'll start putting our creative energies into solving the problems of how. First what, than how and stay open and creative throughout it all.

Who knows though... I'll keep rambling and we'll see what happens.

Cheers,
- Joe

Reading maketh a full man, conference a ready man, and writing an exact man. - Francis Bacon

jibbajabba’s picture

> Oh, to your point about dumping the term meta tags. Is your suggestion to use "subject classification"?

That sounds good to me. Or any of the following: subject headings, topics, categories.

> I guess it's best to keep things simple to start with

I would agree that this is best for most uses, but I see Joe's point about uses that would benefit from n-level hierarchies, namely knowledge management blogs. I wish I got to see the "sections with subsections" version of Drupal that I saw mentioned on another thread. This is what really interests me. Being able to classify things in multiple sections/subsections (as we could do with 3.x's attributes) is also interesting.

Anonymous’s picture

Seems you're replying to two comments here...
I suppose your second quote (keep things simple) refers to mine:
I meant for the 'assigning' of metatags, not that you should go for flat sections.
As for the classification, I'm also in favour of the 'sections with subsections' (which were my own words in the other thread Only local images are allowed.), but maybe my comment was a little unclear: I asked 'why 2 levels?' because that seemed a little strange to me: I can understand people choosing flat sections, but if you implement a hierarchy, why restrict it to two levels?

polaar

jibbajabba’s picture

OK. I understand your question. We're on the same page I think, in support of n-level hieararchies.

polaar@www.drop.org’s picture

made a mistake in the above post, I see now. 'sections with sub-sections' weren't my words at all. They appeared in the post Meta's, Sections or something else by barry@alted.co.uk where he asked whether to go for flat sections, sections with with sub-sections, or metas. (To which I answered 'metas with submetas'. Now, what exactly did I mean again by that? Only local images are allowed.)

sorry for the false claim

polaar@www.drop.org’s picture

Ok, I'll try and explain what I meant by 'metas with submetas' in that previous post. I also read Natrak's 'Explaining things' comment, and he has some very important points there. (I must admit that I get lost in these technical terms too some, and then I just go on and post what comes to my mind Only local images are allowed. ) It's not the general summary he asked for (and it doesn't even involve the user bookmark stuff), but anyway, it's a summary of what I see as a basic system, and I guess it's not too much 'wild thinking'.

A system of 'keywords' which can be attached to posts, with the possibility of adding multiple keywords to one post.
These keywords can then be used for different purposes (not necessarily all 'activated'):
- they can be shown with the post so that users see at a glance what a post is about
- they can be used in an advanced search tool
- they can be used to generate 'section' pages (most recent posts with this keyword)
- they can be used in the site navigation bar (linking to a page as described above)

This basically gives you the typical slashdot-like 'flat sections', except that a post can belong to different sections. Come to think of it, make it an option to disable multiple keywords, so that people who want the basic system you find in a lot of weblogs can have it.

Then, the hi

jibbajabba’s picture

Thoughts on polaar's approach
Polaar's description is a pretty good one for how to approach this. I like the flexibility of what he suggests. I'm not sure I would want subcategories that can have multiple parents, so I'd be happy to just be able to map children to parents 1-1, but having the ability to 1-N child parent relationships might serve some people.

As for display, I think it is rather simple to display when you have a hierarchy where there are only 1-1 child parent relationships. 1-N relationships is best done if you use a thesaurus.

Definitions
Time for definitions to help explain what I have been talking about.

Taxonomy/Ontology -- A categorization of things by some convention. Used typically in science to show hierarchical relationships like "Fruit is the parent of Orange". The best example of a taxonomy I can give you is the MeSH system of medical subject headings. Click the + signs when you get there to expand headings and show children.

Controlled vocabulary -- A collection of preferred terms for describing documents. A thesaurus is a type of controlled vocabulary.

Thesaurus -- A type of controlled vocabulary that shows hiearchical (parent-child), associative/related, and equivalent (synonymous) relationships. A good example of a thesaurus is ERIC. Have a look at the searchable thesaurus and see how it relates terms.

To give one example, a thesaurus is what helps indexers know, to USE the term "Information Science" rather than the outdated term "Informatics".

Hope some of the above definitions clarify what some of us have been talking about. Sorry to have obfuscated these ideas before by not defining

j0e@www.drop.org’s picture

Polaar and I had an interesting discussion on subject classification not too long ago.

The idea was to distribute the work/responsibility of classifying posts among all community participants. Participants, who like their data classified and have developed their own mental taxonomy will naturally want to organize the posts they care about in ways that make sense to them. Providing them with this functionality, no matter what their role in the site can benefit the rest of the community.

So classification can happen at any level, administrator, author or registered reader.

An administrator can set a recommended taxonomy, meaning that the administrator optionally defines the keywords or hierarchy. This helps define the direction and scope of the community.

As an author submits a post, he/she has the option to choose subject categories. If they don't categorize the post, it is left in a 'catch-all' bucket.

Registered readers can categorize posts. I see this as a basic bookmark type function. "Hmm. I like this post and want to be able to find it again. I'll add it to my bookmarks" When the user goes to bookmark the post, drupal returns a list of categories with the post in the author's category. The reader can then file it in the same category as the author, or use one of the predefined categories or create a new category.

Personal bookmarks categories are then 'averaged' and shared with the rest of the community. If a reader bookmarks a post in "category3", the next reader to bookmark that post sees that information along with the authors categorization and can choose to use either or create a new category. The most popular categorization for a post is used for the public directory.

Writing this, I see it would be ridiculously complex to code and would probably be confusing to users. However, it provides a ton of flexibility and was fun to think about. Maybe it will give the group some useful ideas.

Reading maketh a full man, conference a ready man, and writing an exact man. - Francis Bacon

jibbajabba’s picture

YOUR PREVIOUS DISCUSSION
j0e and poolar, I read your past discussion and think you are on the right track. I wonder about user-generated categories. I see the value of letting users select subject headings based on the predefined subject taxonomy. But users adding categories to the taxonomy interests and troubles me at the same time. So I would propose separating any new user-contributed subjects from the taxonomy. Perhaps a system to request a subject be added could go to the moderators? I don't know. Here's some further thoughts on this topic based on previous untested work I've done.

VALUE OF USER-GENERATED INDEXING
A while ago, I wrote a database definition document that proposed a system for indexing images. The paper suggested an access point that used user-generated keywords. This is similar to what you are suggesting here and I am very interested in this type of categorization because it might add value to individual users.

SEPARATE USER KEYWORDS RATHER THAN NODES IN THE SUBJECT TAXONOMY
j0e's idea of using bookmarks is right on the money. It is similar to the idea I suggest in the paper of user-generated index terms/descriptors. The concept was borrowed from the Information Retrieval literature (I can't remember where) and I have not been able to find any literature/research that suggests that this is valuable. But I am interested in a proposal of this sort that is somehow related to the hierarchical subject taxonomy that we are discussing above, but is not exactly a part of it.

I think j0e is suggesting that the bookmarks/keywords would be separated from the subject taxonomy, is that right? I chose the word "keywords" rather than saying that the user-generated descriptions would go in the subject headings is because I think it would be better to have this kind of data fall into an alphanumeric listing of index terms rather than be part of the subject taxonomy. Therefore it would be separated physically in the database, but would be joined back with subject headings somehow in the presentation. What I envisioned at the time was a displayed index that allows you to browse by subject headings OR browse by user descriptors -- they would not be grouped together on screen.

Is this getting hard to follow? It might be because the models I followed were Print Indexes like the Bibliography of the History of Art and the Modern Language Association Index. I was attempting to describe a way of taking the immense strength of the presentation in these indexes and transform it into electronic format. Some of this is very relevant to Drupal.

j0e@www.drop.org’s picture

Neat paper! I've just started reading it, but there are definitely strong similarities.

As far as seperating the subject taxonomy from the bookmarks and keywords, I havn't formulated an opinion yet. Though I do agree that in the DB they should be stored seperately.

Reading maketh a full man, conference a ready man, and writing an exact man. - Francis Bacon

kika’s picture

Sorry for coming late Only local images are allowed. I posted a lengthy blog entry to drop.org about my experiences and ideas about meta system.

Here's a little summary:

Current shortcomings:

- loose LIKE when selecting nodes from database by attribute

- theme_index() should be configurable

- instead index.php?meta=attribute, attributes should be passed to module(s)

- Small textual explanations needed under evey collection dropdown - what this selection means, why it's needed (just like on other form fields).

- What happens when you change the tag attributes name in "tag" table? Node attributes wont't automatically remapped, so you probably won't find you content anymore:(

- All "tag" table fields are too short - especially "name" (32) and "collections" (32). Remember - English words are *way* shorter than in many other languages.

Crazy meta ideas:

- using somewhat similar system like user persmissions and roles

- metatag-based templates

- Dublin Core support

- Distributed conrolled vocabularies (categories) (using xmlrpc cloud)

- Localization support

Carl Ditzler’s picture

In case there are some who do not venture to Drop.org, Kika's article, Lessons learned in Drupal meta system, is a good read.

Ah, never mind. Now I see kika's link in the previous comment.

Kjartan’s picture

My idea for meta tags (I don't like that word so from now on they will be known as terms!) is as follows.

There will be a set of collections that have a set of options, and have several terms linked up to them. Each collection will have the following options:

Collection form
Collection name: [textfield]
Relations: [select: disabled | enabled]
Hierarchy: [select: disabled | simple | complex]
Node types: [multiselect: node types]

Term form
Term name: [textfield]
Synonyms: [textarea]
Relations: [textarea]
Parent: [select]
Parents: [textarea]

Details
Collection name
The name of the collection, should be unique, but not might not be a requirement.

Relations
If enabled this will let you specify terms that are linked in some way another term. Could be compared to "Category@" on Yahoo.com

Hierarchy
The simple setting will let you build structures ala Drupal v2. Each term can have one parent (one would have to be root though). The complex setting would let each term have more than one parent. This would allow for complex trees and needs a lot more administration.

Nodes
A multiselect of the loaded nodes.

Term name
This should be a unique name for a term. If you start making multiples start restructuring or use a complex hierarchy.

Synonyms
Enter words that are closely related to the term, abbreviations, alternate spellings, or common misspellings. These words should not exist as terms. One synonym per line.

Relations
Terms may be related to other terms. Exactly what is a related term is up to the site admin. These terms will show up next to the selected term for a node. This is option only available if the collection has it enabled.

Parent
Drop down of other the collection terms to select the parent. Only available if hierarchy if set to simple.

Parents
(Note the S.) A textarea to accept the parents of a term. Gotta have some error checking on this. Only available if hierarchy if set to complex.

User interface
Users should be able to set a few things in their own options. Like:
Multiple terms: [diasabled | enabled]

Depending on how the collections are setup each of them will have to have a different view on node submits. If there is no hierarchy or a simple hierarchy just show a normal drop down.

If its a complex hierarchy or the user has enabled multiple terms show a text area and let the user enter terms. Check entered values against terms and synonyms. Post warnings about invalid terms.

Other notes
- have an option to requiring the user to select a value of the collection.
- have an orphaned terms collection to show terms that don't seem to be long anywhere (collection less or got invalid parents).
- let users specify new terms and store them in the orphaned terms collection until someone categorizes them. Maybe have a special user permission for this or at least a config option.

--
Kjartan

jibbajabba’s picture

This sounds like a great proposal. And I can see from Marco's demo that this is coming together nicely. Thanks for this.
-Michael

polaar@www.drop.org’s picture

You've mentioned this before I think. What exactly is this? Can we view it somewhere?

Carl Ditzler’s picture

This message provides additional information of Marco's demo (in admin) found here.

polaar@www.drop.org’s picture

nice, it even answers the questions I posted in the other comment (questions about....) Only local images are allowed.

Kjartan’s picture

Note that Marco and I seem to have crossed eachother. I haven't had the opertunity to look into the test, but he seems to have gotten most of my points written on some plane I was on this week. Gotta learn to check my mail before posting.

--
Kjartan

marco’s picture

you're right, I saw you and was about to greet you but then discovered your notes and preferred to copy them instead Only local images are allowed.
--
Marco

--
Marco

polaar@www.drop.org’s picture

Strange how you explain your proposal from the point of view of the actual (administration) interface. It took me a while to figure it out Only local images are allowed. (or maybe that's just because I'm not thinking straight - think I'm getting the flu)
So tell me if I'm getting things right:
At first sight, this system has the same possibilities I mentioned in a previous comment (multiple terms, children-parent(s)...), with relations and synonyms as an extra (I had thought about synonyms too, but left them out because it seemed like an add-on functionality).
But I'm not sure if I understand the "collections". It looks as if they are different (separate?) hierarchies or sets of categories (so that you could define, say, a hierarchy for subject categories and one that reflects your company hierarchy independent from each other). The properties of a collection define the possible relationships etc. between the terms of that collection. Is that correct? (if so, I like it...)
Then there is the 'nodes'-thing: what do you mean by that? (I'm guessing something like 'this collection is for book entries, that one is for blog entries...' but I'm not sure - maybe it's just that I'm not familiar with drupal terminology)

I have some questions and thoughts about the interface, too, but I'll post them later on in a separate comment.

Kjartan’s picture

But I'm not sure if I understand the "collections".

Basically they are two different thesauruses that have their own world. Similar to how collections work in v3.

The properties of a collection define the possible relationships etc. between the terms of that collection. Is that correct? (if so, I like it...)

Yes, mostly to suit everyones needs. Its silly for someone who only wants a flat system to have options for everything else. Will make things easier for users as the default would be very basic and they could grow from there.

Then there is the 'nodes'-thing:

As you said, means that the collection is only for those node types. Logically you would have different sections for file than blogs, and books might not have any.

--
Kjartan

polaar@www.drop.org’s picture

As I promised: some remarks and questions about the interface aspects...

User interface
Users should be able to set a few things in their own options. Like:
Multiple terms: [disabled | enabled]

The user disables/enables multiple terms? How is that? I'd imagine this would be an option set by the administrator... What if I assign multiple terms to a post, and another user has disabled multiple terms? It sounds like a global option to me. Or am I not getting the point at all?

Depending on how the collections are setup each of them will have to have a different view on node submits. If there is no hierarchy or a simple hierarchy just show a normal drop down.
If its a complex hierarchy or the user has enabled multiple terms show a text area and let the user enter terms. Check entered values against terms and synonyms. Post warnings about invalid terms.

I'm not sure if this is always the best option. What if you have *lots* of terms, but no hierarchy: a normal dropdown? Doesn't seem right... The second sounds nice for complex hierarchies with lots of terms: instead of going through a lot of steps to find the term you want, just enter something in a textarea... But what if you just have 5 'flat sections', but you want to be able to assign more than one section to a post. It would be strange to use this system instead of just checkboxes or multiple selects.

Of course, the more flexible you want a system to be, the more complicated the user (and administration) interface is going to be...
That got me thinking: would it be possible (I don't know how complicated this would be) to define a basic system (database structure and basic functions) and do the interface in different modules, depending on what you want for your site?
Let me explain this with some examples, because I'm not sure if I'm making myself very clear Only local images are allowed.:
You could have a 'slash-like-flat-sections' module, a 'hierarchical categories' module etc... The underlying system of storing and assigning terms would remain the same, but someone who just wants a slashdot-like site could install this module and have the interface that is most suited to this use (both user and administration interface interface). Someone who wants a more powerful system could install a more powerful module (which will probably more complicated, but at least doesn't get bloated with options only needed for other uses). To return to the example of the multiple terms: a site with five sections needs a diferent interface than one with hundreds of terms (even without a hierarchy).

I think you get the point. What I'm not sure of is whether it would make things less complicated or even more complicated (that is, I think it would be a lot less complicated for the end user, and even for the site administrator, but I fear this is not necessarily the case for developing and maintaining the code).
What do you think? Is it possible to make a good general suits-all-purposes system? Is it better (and possible) to make separate modules for the most obvious kinds of sites, and let people design their own if they need something different (or provide 'advanced' module with lots of options?) Or is there such a thing as wanting too much flexibility? Only local images are allowed.

Kjartan’s picture

The user options were mostly just random thoughts at this point. I didn't think of the case you suggested, and you are right that posses a challenge.

I have to agree that the more complex the setup the more complex the user interface becomes. Its the price you pay really. What needs changing and the best approach is hard to decide on without starting to code it.

How to hook all this into Drupal is more what I wanted to talk about. I have played a little with a system to let other modules add fields to a node form. This would go beyond just categorizing data though. You would be able to have any module add a field (say you want to provide an option for file uploads to all files, or a URL, or a special box for related links). This would let any module plug it self into a node form. Taking it to the next level so to speak. Its easier to implement with the new node system than I had expected.

If people chose too we could have a dozen meta/structure modules around. Personally I would like one size fits almost all module for this at first at least, then possibly extend it later on. Multiple inheritance might be a special module as it adds a lot of complexity and needs more error checking to prevent circular loops (although that might be usefull we wouldnt want it to go into a recursive dead end).

If only next week is nice to me so I have some time to do some proper coding. Been flying around scandinavia all week, such a waste of time Only local images are allowed.

--
Kjartan

marco’s picture

(I use "terms" as Natrak did)
I think modules need two things:
- a table with nodes+terms [a]
- an api to have info about terms: basic info such as name, description, ecc and advanced info such as parents, childs, synonyms, related terms, ecc [b]

Some examples:
- a story system would probably use some flat structures; they just need collections and their terms; they could filter stories based on a term (get term id, select in [a] joined with nodes); or perhaps they need to know if user asked for a synonym of something else; or they want to show related terms or articles in related terms
- forum: two-tree level structure. each level needs terms basic data (name, description); it also needs term id and its children's id to see how many new posts there are ecc. photo album is similar
- yahoo-like directory: needs some tree structure, should be able to browse though it (get children list), and show "you are here" links (get parent list); then needs id to show nodes (actual links)

Can you think some example where [a] and [b] are not enough?

What functions should the API have?

Examples:
- get_term(id) -> [name, description, ..]
- get_term_id(name) -> [id, ...]
- get_parents(id) -> [id, id, ...]
- get_children(id) -> [id, id, ...]
- get_synonyms(id) -> [name, name, name, ...]
- is_synonym(name) -> id (original term) or false
- get_related(id) -> [id, id, ...]
- functions that return form elements
- function to save a node and its terms

Other functions?

The first implementation could have these tables:
collection: id, name, description, relations (on/off), multiple select (on/off), hierarchy(on/off), required (on/off), node types
term: id, name, description, collection.id
term_relationship: term.id1, term.id2, relationship (1=id1 is parent of id2; 2=id1 is related to id2; this field could be used for other relationships)
synonym: term.id, name

or perhaps no term_relationship and term_hierarchy and term_related?

I included synonyms and related terms as Natrak suggested because I think they are great ideas. Probably multiple parents are too complex to start with, but nothing prevents to implement them later (as an updated meta.module or another module). Another idea I'd really like is some kind of "weight" in node-term and term-term relationships, but that's too advanced for a first version.

We could also have hooks and have a meta_tree, a meta_synonym and a meta_related modules.. sounds cool but perhaps would make things harder. Perhaps first version should be "monolithic" and future versions modular?

--
Marco

--
Marco