Metadata and content construction

Last modified: November 19, 2007 - 14:41

Participants

  • John VanDyk (organizer)
  • Jonathan Chaffer (organizer)
  • Dries Buytaert
  • Matt Westgate
  • Moshe Weitzmann
  • Neil Drumm
  • Vladimir Zlatanov
  • James Walker (day 1)

Deliverables: next-generation flexible content generation and storage. Optimistically, optimization and caching strategy.

Preparation: be familiar with the following:

Potential strategies

Approach Pros Cons
Separate table per field type Easy searches across node types
Normalized
Many joins
All columns in node table SELECT is easy
Structure is transparent
Huge sparse table
ALTER
Harder multi-values
Mixed 1 and 2 Flexible but allows optimized common case
single- and multi-value fields searchable
ALTER
more decisions at run-time
Table per node type node_load very easy
easy migration
ALTER
cross-type queries hard
hard multi-values
#1 with caching probably fast in both cases redundant data/sync

Table per node type is rejected because we are interested in across-node-type searches.

Mixed 1 and 2 is rejected because (1) it has the worst characteristics of 1 and 2 and (2) too much complexity at runtime, and (3) it is complicated enough that we can't bench it without a lot of work. Treat this as a possible future optimization if necessary.

Decision: we will go with choice number 5, which is normalized database tables with node caching.

Fields and parts

Artti - March 4, 2005 - 12:37

Hi,

I would recommend looking into Daisy CMS approach in this matter:

Daisy CMS: Document Structure

It separates fields and heavier content in document types quite nicely. And it has query language though out too.

...and the point was

Artti - March 4, 2005 - 12:46

...that fields could be in their own table and content in different to prevent performance & indexing problems :)

identical

moshe weitzman - March 4, 2005 - 15:09

their document structure and schema are quite analogous ot the CCK that was agreed upon in Antwerp.

Flexinode

JonBob - March 4, 2005 - 15:32

Actually, their structure seems a little closer to the Flexinode approach. Data in a field is all stored in the same table, regardless of field type, and more complicated fields (called "parts" in Daisy) are stored serialized.

We intend instead to have each field type manage its own data storage, so that SQL queries can be made even against complicated data.

Field types

Artti - March 4, 2005 - 21:10

Hi,

Do you mean different SQL types (varchar, clob, blob, date, number, etc) or user defined types?

UDT

moshe weitzman - March 5, 2005 - 03:42

user defined types. they sometimes are same as SQL types, but often not.

How many tables will that make?

Artti - March 5, 2005 - 06:34

...and what about performance? Queries become more complex with lots of joins, database query parser needs to do lot more work for execution plans and it needs to use more indexes to get the results.

First examples that come into my mind are searching all fields from the appointment example and rendering a simple page with that same content; how would those queries look like? And perform with 100 simultaneous users?

And couple other ideas

Artti - March 6, 2005 - 21:27

Now that you are considering the best possible node architecture, please add support for ...

  • content localization (en, de, fr...)
  • content paging (content over multiple pages: "1, 2, 3, next page")
    • pages different fields in database for performance, but separated only by comments in editor
  • excerpt field

the matter at hand

brick - September 28, 2005 - 03:52

i'm glad to see such a deliberate decision-making process on the part of the group meeting at antwerp. i'm inclined to suggest that your circumspection beats the quibblers in this thread. so what has happened since march? i see in the CVS repository that the files cited have hardly been touched since then. was this some ill-advised maverick effort doomed to condemnation, or what?

Behind the scenes work

robertDouglass - September 28, 2005 - 06:13

There are some basic architectural work being done on Drupal core that is designed to support CCK. This includes some extra hooks and a forms API. These things need to go through before CCK can achieve all it has promised. All the active developers I know support CCK as one of the main goals, and I expect it to be fully available by 4.8, with nascent forms emerging for 4.7.

Niel Drumm is giving a talk in Amsterdam in October about the state and roadmap of CCK - I'm sure it will be interesting.

- Robert Douglass

-----
Rate the value of this post: http://rate.affero.net/robertDouglass/
I recommend CivicSpace: www.civicspacelabs.org
My sites: www.hornroller.com, www.robshouse.net

cool deal

brick - September 28, 2005 - 17:27

thanks for the update, robert! the architectural changes you cite make total sense, i just didn't know where to find out about them. i will look forward to reading about neil's report and seeing 4.8. :)

cheers,
aaron.

 
 

Drupal is a registered trademark of Dries Buytaert.