Assume we have a built up drupal site that has mature taxonomy structre (very typical)

We know that some nodes are assigned certain taxonomy terms, (actually, the internal DB Primary ID's) ... so far, fine.

Assume you want to migrate these big vocabularies to another drupal installation (which exists earlier with its own categories.)...

Let us assume we utilize the taxonomy xml export module
Once doing the import into other drupal site, I assume new ID needs to be assigned to these intruder vocabularies, thus the old ID's are not valid,
Thus, the referencial integrity between taxonomies and nodes needs to be rearranged from scratch...

I hope this feature of being able to handle integrity issues through data/taxnomoy/node mobilization be seriously looked into.

Thank in advance. (I am not a php programmer, otherwise I would have helped.)

Comments

dman’s picture

It is, as you say, a problem of referential integrity.
And not many database structures can survive having one half of it picked up and paired with another half that's been making its own IDs in the meantime. Which is what you describe.

Yes, it is a supreme, and non-trivial issue. I've done lots of site-synch processes matching partial dev sites with live releases and hit all those issues. Mostly with nodes, which are bad enough.

I've been looking into taxonomy_xml to vamp it up a bit, and
- do search-matches on term strings rather than IDs
- re-use term ids to match the original if possible
But not looked further into renumbering all possible term references in other tables as it sounds like you'd need.

However, my advice is, now you can see the problems, is to take steps to ensure your new terms cannot clash.
Manually adjust the sequences table to start your (dev) term numbering (term_data tid) from a nice high number.
Create your new terms which will be counted from there.
Now you can merge this term_data table with your target without breaking things too much.
After that, bump the target (live) term_data tid in sequences to an even higher number and carry on.

Sorry if that's too technical, but the database simply doesn't like being split up, changed at both ends, then put back together again. The above process avoids the issue, but doesn't repair it if you've already created two different terms with the same ID in different places.

I can't tell from your OP, but are you trying also to merge nodes from one site into the other too? That will cause enough difficulties. If not, you don't have to worry about the tid integrity of the unused term set when doing the import.

bakr’s picture

Mmm,

That was a swift reply, thanks alot.

I can see what you mean, I also have been doing this (I am a Lasso programmer lassosoft.com, i was doing some scripts to pump the mysql vocab/term tables with 3000+ terms.)

I used drupal 5 then switched to 6, as the counts table has been eliminated, and we are more into InnoDB reliance (manually changing and overriding the auto-increment most recent value where possible),

So, I would say, that doing this kind of dirty job is cleaner in Drupal 6...at least one good step forward.

Another Challenge was recalibrating the menu tables....mid ... Oh my God.

As for my mentions about nodes, I really meant to say mobilizing the content itself (articles/stories..etc.) from one site to another... In fact, we dont bother here about having the nodes reassigned new automatically incremented ID, .... rather, as far as the intention, I would logically assume that my taxonomy ID is the primary dominating Key here...

ie. no problem if the underlying relation changed, or even if the Taxonomy term ID changed it self

I can further summarize it into two main points so far:

* No problem to have the ID of Both Terms & Nodes changed as far as we maintain the Ref. Integrity.
* We still face a narrow path to make a smooth migration as we have to dig into the realms of the new Menu System introduced in D6.

On the other side, I was thinking of a solution, that might me a lead to exit this taboo/tunneled issue. ....
* How about creating a new column in the master Node Table (-as well as-/or) the master Taxnomy Table...
* ---- this column will be a unique ID that is sort of a hash, might be composed of timestamp and machine ID... a sort of secondary key that is not Null, unlike to autoincrement primary key('ID')
* --- even the menu system could possible rely on that hash. (Oops, I hope it does not hit SEO badly)

So the way to proceed, would to be utilize that key whether for taxonomies or nodes and open-up a new era for drupal by building modules which act on it and brings node/term mobility to the best levels.

Background:

Big Financial Applciaitons like Microsoft Dynamics Great Plains (ERP product) which is having more than 1300 tables...including General ledger...etc...
... they are maintaining multiple primary keys... Yes... I see this is the way to go.... and to easily live and survive major upgrades and data-migration (mobility) issues.

Back to Drupal, this little thingy - extra non-DB dependanty universally-unique column could do the big trick.

No more we are dependant upon underlying DB; The Drupal site and its main assets (taxonomy,Node,Menu) becomes fluidily mobilizable. (Backups and restores become oriented around that uniqe ID, and the DB auto-incremented column 'ID' becomes automatically handled by the DB and regenerated passivley with out affecting the external skin.)

Best Regards,
Bakr

bakr’s picture

Sorry,

I mentioned about the counts table(3rd line) see above..

Actually i meant the sequences table..

Dman, reagrding the last sentence in your nice elaboration you mentioned about not caring about non-used terms' ID,
I Agree, but to know them visually, another feature request I have posted http://drupal.org/node/202295 , which compliments this one as well.

dman’s picture

I always hate hashes where there can be more meaningful information.
I'm drifting towards Using URIs for my unique taxonomy term identifiers (as I've been bitten by the RDF bug) and when matching my nodes, I index primarily on path (where appropriate)
Neither are efficient, but it's only run-once and background synching this matters for.

... more to come when I get my taxonomy_server project public.

pancho’s picture

If you think that this is a bug, please change to bug report.
If not, please set it to 7.x-dev, as D6 is in feature freeze.

bakr’s picture

Version: 7.x-dev » 6.x-dev
Category: feature » bug
Priority: Normal » Critical

I would rather mark it as a bug(critical) for some reasons:

(I Accept your wise opinion as well and may change it though.)

* Drupal 6 is highly anticipated by the yearning community. - Can't wait for 7 ;-)
* So many existing installations require some (house keeping and dusting-off), especially those having huge taxonomy terms. but are not brave enough to do it being a fragile process.
* Provide more room for Taxonomy export/import standalone modules to do it more properly in 6, as it is not genuinly doing what is supposed to be done in D5.
* This one does not require any substantial changes in the latest drupal 6....i.e. it can be implemented as a provision of extra column only, where as let every thing as is, that column shall be used in the backend by another utility modules that shall make use of it, especially for migration of taxonomy categories and any associated nodes between different drupal installtions.
* an early seed implementation may suffice in D6, and it can be extended further in D7.

I leave it to the expertise of the community to help judge whether to make it in the core or to be impletented in a different way.

Ultimately, and In principle, we are talking about moving away from reliance on DB-generated Taxonomy/Node Term-ID's into an alternative independant method that maintains clarity / mobility / referencial integrity / functionality and clean URL's,

chx’s picture

Title: Taxonomy VS Node - Cross-Installtions Refrencial Integrity Major Issue » Taxonomy VS Node - Cross-Installations Referencial Integrity (Major Issue)
Version: 6.x-dev » 7.x-dev
Priority: Critical » Normal
catch’s picture

Version: 6.x-dev » 8.x-dev
Category: bug » feature
Priority: Critical » Normal
amateescu’s picture

Status: Active » Closed (duplicate)
bakr’s picture

Very promising indeed.