Just asking.

I'm getting through it thanks to a semi automated spreadsheet approach.

But looking for that comfort one get's from knowing others have been through the same pain.

Comments

nevets’s picture

I strongly suggest rethinking your approach, I worked on a project with a CCK with 100 fields give or take. It is a royal pain to administer.

If you get that many fields simply from the columns consider breaking the data up into logic groups and tieing the related nodes together.

If the 650 is from multiplying the number of columns by number of rows, consider making a content type that represents one row and use multiple nodes to represent the rows.

willieseabrook’s picture

Thanks nevets

It's a data collection/research project, so this isn't a case of overengineering. They simply collect this many data points. No way around it.

There are distinct groupings of fields, and I hadn't thought of splitting them into separate node types. Thanks for the idea - i'll definitely consider that.

What kind of royal pains did you experience? I'm trying to pre-empt potential problems in the future.

I'm generating the CCK from a spreadsheet with a little module. Write the field labels, help text, type and so forth into the spreadsheet and run it through the module, out comes a CCK def I can send through the import tool.

----
Willie Seabrook

nevets’s picture

For auto generation, make sure your field names start with 'field_', I can't recall the detail but ran into a module that expected that (I also auto generated the content type initially).

The manage fields page became hard to use, it may not even render with that many fields.

Exporting/Importing had to be done in pieces

WorldFallz’s picture

There can be other ways of organizing the data. Most data with that many fields tends to fall into some level of categories-- you could setup the categories as content types and then create a master content type that brings all the categories together in one place with nodereference fields. The bonus to this approach is the data becomes more manageable and there's a lot of modules that do cool things with nodereferences. Just an idea.

bg1’s picture

I mostly develop web applications (online transaction processing) rather than informational web sites. My databases usually consist of many (sometimes hundreds) of highly interrelated tables. The number of columns is often in the thousands. Even if one could import the data structure into CCK, would that be a good idea? Or is it better to just create the needed tables directly in the database and write custom modules to create, update and retrieve/display the data.

I have similar concerns with Views because my pages often display data that requires many joins (typically 5 - 10) and often need the results of subqueries. Even if I could get views to create the queries the way I need them (such that they would execute with the efficiency I need) it would take 10 - 30 times longer to develop those queries in Views than in native SQL (SQL is my primary language - then comes English ;-).

Does anyone know of any modules that generate grids, tables and other display formats but will take the data (results sets) from any designated source (Inline SQL, Stored Procedures, Web Services, etc.)? Within that, is anyone aware of modules that will display editable grids and tables that will allow users to edit/update multiple rows in one pass?

Thanks for any comments.

dman’s picture

For a start, there is
http://drupal.org/project/matrix
which looks pretty close to what you describe here.

I had a similar but comparatively simpler requirement - to have any number of arbitrary name-value pairs added to a node.

Apparently CCK Link can be mutated to do that job, and so can matrix

But, though I've not tested it, CCK Attribute now advertises this functionality.
ALTHOUGH it may not provide 'descriptions' for each of your 650 attributes, with 650 of them, I'd hope that the column titles are enough.

I'm pretty sure that managing your data as nid-attname-attval triples will be better than one 650 column row.

If you are more comfortable with raw SQL, then maybe you should indeed do your own handlers to your own DB table, though I can't estimate how much fun that will be to get it to talk to all the different aspects of views. It's been done, but I couldn't do it.

bg1’s picture

That was Willieseabrook with the 650 columns. I have several thousand columns spread across more than a hundred content types (if I use CCK). Many of my tables exist only to interrelate 2 tables that have a many-to-many relationships.

I also have many cases, where entering the data for a single entity has to update multiple tables in the database. (E.g., I want users to select a value from a drip down, but they also have the option of selecting "other" and if they select other, then I want them to enter the other value into a field. In that case, when the data goes back to the database, it gets entered into the table that the drop down selects from - along with a lot of other information like who entered that field at what time and what is the status of that data - candidate or approved, depending on the user's role, etc.)

I do not think it is practical to use name/value pairs for several reasons.
1. imagine having to pull together 50 - 100 name value pairs toget a single representation for a node with more than 100000 node instances.
2. dozons of those fields are used in joins to other tables (content types) and it would be very complivcate programming to get the database to understand how to formulate such queries.
3. my applications are actually driven by the entity model from which the statbase structure is derived. The database is defined before any user interfaces are even thought of. Having said that, we are constantly adding to the data model. Having the data model be a model of the real world, provides stability that enables us to easily manage database expansion (new fields). It would be very difficult to try to keep track of thousands of fields and hundreds of tables if one was not able to have the attributes tightly mapped to their entities.

Thank you for your input however and I will take a look at the links you provided.

willieseabrook’s picture

Thanks for the pointers Dan. I've used the matrix module on another project and its great.

I've chosen to shoot for a cck based setup on this project because I need so much basic drupal functionality - single page edit/revisions/lots of lists (views).

A big problem with the key/value pair approach is:
(a) I *do* need significant field descriptions
(b) I need all available fields/attributes displayed to the user by default. Standard node edit does this by default. Easy. Whereas an attribute system requires a user to select each attribute before the can enter it (generally)
(c) Each field has two child fields related directly to that field. Again easy with CCK.

Only issue I'm hitting at the moment is it takes about 120 seconds to create the CCK type in my testing!

I have a feeling this might impact views memory requirements also? I'll see in testing I guess.

----
Willie Seabrook

willieseabrook’s picture

Your after some kind of ajax data grid editor with a extjs type UI?

Haven't seen anything like that in Drupal.

I'm using Table Wizard module for the import side of this project.

I'm *very* new to table wizard but it looks like it leverages the schema api and can inspect any non-drupal table and let you set up joins. Then export that as a view.

Then an extjs UI and link the two together.

Perhaps that might be a starting point for a custom module.

Sounds complex!

As for the practicality of using CCK with this many fields... I've not made up my mind yet.

The data model on my project should change for years. So I'm not so much worried about using the CCK field UI.

But if you were making updates every few weeks???? Not sure how that would fly.

----
Willie Seabrook

bg1’s picture

We build SaaS (Software as a Service) so we spend every day just making enhancements. We follow an extreme agile approach so our change management cycle is every few hours - production line approach so each release is typically just one modification - easy rollout - easy test - easy rollback if needed.

Much of our applications do not involve user interfaces at all (passage of time triggers events and some events trigger other events). We build much of our business logic into the database itself in the form of queries, triggers, database jobs, stored procedures, etc. We add tables and fields to tables continually to support this (including archiving functionality, temporary tables, data warehousing tables, tables that exist solely for efficiency reasons such as precalculating/assembling data periodically to reduce the overall impact on database workload, etc.)

The database is therefore changing many times every day.

ajevans85’s picture

If I'm given a specific schema of the size wanted I'd create a custom module and database tables. CCK could do what is wanted but it'd just feel like a dirty way of doing it to save writing code but instead pointing and clicking.

With a custom module:

- I can tweak the input interface to be as user friendly as I want
- Data is stored in a format I want
- As data is stored in a schema I created I can auto import it from a souce saving manual entry
- As data is stored in a schema I created I can export it as I want
- As data is stored in a schema I want I can do clever reporting much simpler

The above would probably take me just as long as using CCK with the number of fields involved but giving me more power to manipulate the data however I want. If it was a task that I had to do day in day out for 100's of tables I'd make the above generic, create a dsl to map the table to a user interface or create some code generation tool which could reverse engineer a specified database table creating a basic CRUD interface.

shaynl’s picture

I'm now trying implementing this grid to store my main data in my drupal project:
http://www.extjs.com/deploy/dev/examples/grid/edit-grid.html
Did anyone try something like this before?
one way to do it is to build it separately as a custom module with its own database table, but how then I integrate it with drupal if I want to use this data in views?