| Project: | Millennium OPAC Integration |
| Version: | 6.x-2.x-dev |
| Component: | Code |
| Category: | task |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | closed (fixed) |
Issue Summary
Here's a really short description of the module: it maps marc metadata into millennium nodes and taxonomies, saving additional metadata and the original marc record into its own database table.
I think currently there is a slight uncertainty about what the correct place for a specific piece of data is and where the data should be handled. Maybe this has to do with the historical Biblio dependency. Let me explain. There are (at least) two places in the code where marc metadata is mapped into other structures:
millennium_marc_to_nodeobject maps chosen marc fields into a php dictionary array / hash map which is then serialized and saved into the database.
millennium_add_taxonomy_to_node maps chosen marc fields into drupal taxonomies.
Additionally, the original marc record is saved.
The problem is: what's the correct place to find a specific piece of metadata? For example, to find out the language, you could (1) parse it from the marc record, (2) look it up in the metadata array or (3) check the vocabulary.
I think some of this confusion leads to repetition in the code: for example, some of the same marc parsing takes place in both the functions mentioned above. Wouldn't it be better, if only millennium_marc_to_nodeobject was responsible for parsing the marc, and millennium_add_taxonomy_to_node would only use the internal "biblio" hash map. I'm basically talking about this: http://en.wikipedia.org/wiki/Don't_repeat_yourself .
So, we could clarify the data flow a bit to make it look more like this linear flow: MARC -> php array -> taxonomies. And, we could disallow using the marc record in other places than millennium_marc_to_nodeobject.
*Or* we could keep the MARC record as the authoritative metadata source, and parse it anew every time (at node load time, etc.)
Personally, I prefer to keep the biblio php array as the authoritative source, because it's easy to manipulate the php array with clear, mnemonic names as keys instead of having to remember all the different marc field numbers and subfields every time. To me, marc is an implementation detail, which should be hidden inside a user-friendly "api", in this case an easy-to-use array.
But maybe moving to CCK will make things even complicated.. Anyway, I think the answer to this issue should at least be documented somewhere to reduce confusion.
Comments
#1
I agree, there should first be parsing into the biblio array, then that could be used by millennium_add_taxonomy_to_node() and other modules if they want it.
We are now storing the serialized biblio array too, so that looks like the authorative source.
#2
How about this for a first try?
* One function does the marc -> Biblio array conversion: millennium_marc_to_biblio()
* millennium_record_to_nodeobject calls it (and currently it's the only caller)
* millennium_add_taxonomy_to_node is untouched by this:: it still parses MARC in a roundabout way (I think) not directly possible with the current biblio array we have.
So this simplifies and clarifies things just a bit. More will come...
Possible next steps:
* Maybe move all conversion code into another file. (the existing millennium.import.inc?)
* Make taxonomy assignment depend on the biblio info and not the MARC record.
#3
It's definitely a good idea to separate marc_to_biblio into its own function.
I tested this patch, and found that the subject and language taxonomy mappings don't work for some reason. I can try to find out why.
About the next steps:
Good idea.
This might be a good idea. Actually, if you think about it, the conversion code isn't even dependent on Millennium, only on the module's internal raw marc representation. So it might be possible to separate all the Millennium-related code (scraping, fetching, parsing the textual marc, etc.) from more generic code -- and this could be a start for a module which supports many different library systems! I know, this is Millennium Integration, but maybe this is one point to consider when thinking about separating functionality to different files: one file for all the Millennium related stuff is one option!
#4
About the taxonomy mappings failure: this fixed it:
millennium_add_taxonomy_to_node($nodeobject, millennium_parse_marc($marc_text));(At the end of record_to_nodeobject).
Of course, this problem will go away when we remove the taxonomy -> marc dependency.
#5
#4: Good catch. I keep breaking stuff =) Definitely need to get down and write some tests. =)
From your comments I realized that millennium_marc_to_biblio() could be further stripped down to only care about MARC and the biblio array (ignoring base_url, record numbers, etc).
New patch for review.
#6
This was committed in http://drupal.org/cvs?commit=318164
Beware, as you can see from the above link, the API keeps changing... but hopefully will freeze soon. =)
#7
Automatically closed -- issue fixed for 2 weeks with no activity.