Do not store serialized arrays in database fields
mustafau - December 25, 2007 - 22:34
| Project: | FeedAPI |
| Version: | 6.x-1.x-dev |
| Component: | Code |
| Category: | task |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | active |
Jump to:
Description
I think that parsers and processors fields should be separated from feedapi table to their own respective tables.
feedapi_parsers schema:
|nid|parser|
feedapi_processors schema:
|nid|processor|
Another thing is that settings field can be represented as seperate database fields.

#1
Any thoughts on this issue?
#2
Sure. What are the reasons against storing serialized stuff in DB fields? In Drupal, it's not unique ({variable} table). I see 1 reason: it's harder to set the default values (at the install stage and at the automated tests)
#3
It is much harder for developers to understand the database and hack into it. Searching for feeds with a specific parser, processor or setting becomes impossible.
#4
I think the idea behind {variable} table is different. It is there to store different types of variables in one table.
Storing serialized arrays in database only makes sense if there are small number of arrays to store. For example storing node type settings in {variable} table is acceptable.
#5
The reason why there are arrays in feedapi table is because it was the easiest way of implementing the storage for FeedAPI's setting handling.
Every add on module can expose settings in the feedapi_settings_form() hook - per content type presets and per node storage is handled then by FeedAPI. The method is the following: hook_settings_form() builds a form api array which is stored by FeedAPI by smushing it into the feedapi table's settings field.
This method has two disadvantages:
- it does not seperate data model from UI
- it uses a serialized format that is not searchable and hard to read for humans
but it has one huge advantage:
- it is simple _in comparison_ and pretty robust
If you wouldn't use a serialized array for storing those values, you would have to write a layer that maps arbitrary form input to your storage model and vice versa. That's not trivial, so we took a loop around it.
That said, I would love to simplify and improve this part of FeedAPI. It is hard to understand and sometimes I am not sure whether we actually need such complex settings handling. Thoughts are welcome.
Alex
#6
Dropping per node settings handling from Feed API won't be a serious drawback. When different set of settings are needed one would create another Feed API enabled content type.
#7
#8
What about letting add-on modules to inject their settings form and store their settings variables independently.
Benefits:
* simplify feedapi.module.
* reduce disk space used for {feedapi} table.
* developer friendly
* searchable
* easy to update when settings change (e.g. when an add-on is disabled or uninstalled.)
#9
I'm with mustafau on this. The serialized data is a real drawback to doing batch operations on feeds. Changing a setting on all of a class of feeds is impossible to do at the db level because the settings are buried. There is also duplication of data (url):
select url, settings from feedapi;
| http://zivtech.com/taxonomy/term/6/feed | a:6:{s:11:"feedapi_url";s:39:"http://zivtech.com/taxonomy/term/6/feed";s:17:"refresh_on_create";i:0;
s:15:"update_existing";i:1;s:4:"skip";i:0;s:12:"items_delete";s:1:"0";s:10:"processors";
a:1:{s:12:"feedapi_node";a:4:{s:12:"content_type";s:7:"article";s:9:"node_date";s:4:"feed";
s:7:"promote";s:1:"0";s:8:"x_dedupe";s:0:"";}}}
Robustness is not an argument in favor of serialization and neither is prior art in Drupal. Normalized data wouldn't lead to less robustness, and the prior art for serialization in Drupal (cach, variable as examples) tends towards data that is unique (you never operate on a whole class of variables, and caches never get updated, just read or destroyed). This is a case where normalization could be applied to most of the data, I think. Let processors have their own tables, if they have to.