Closed (fixed)
Project:
Family Tree
Version:
5.x-3.x-dev
Component:
User interface
Priority:
Normal
Category:
Feature request
Assigned:
Reporter:
Created:
21 Jun 2008 at 01:58 UTC
Updated:
14 Jul 2012 at 23:19 UTC
Jump to comment: Most recent file
Comments
Comment #1
pyutaros commentedStarted work on the import feature. The remainder of this thread is for the purpose of detailing the progress and thought process for developing the import feature. The import feature will be based off of the feature from the 5.x-1.x. More detail as I go.
Comment #2
pyutaros commentedFor my own information, this is the code of the old import function. This is the part that parses the ged file and throws entries into the DB. Since we massively revamped the DB in version 5.x-3.x, this is what needs the major rewrite. There are some minor edits already which I notate here.
I split the code up for ease of reference. This section contains the code that makes the DB entries. The funct
Obviously the entire methodology has to be changed here. I'll discuss ideas for accomplishing this in my next post.
Comment #3
pyutaros commentedOkay, once again to aid in my thought processes, here is an example GED file. We need to get this into the existing DB structure, which will be listed in my next post.
0 HEAD
1 DEST ANSTFILE
1 GEDC
2 VERS 5.5
2 FORM Lineage-Linked
1 CHAR UTF-8
1 SOUR PhpGedView
2 NAME PhpGedView Online Genealogy
2 VERS 4.0.3 stable
1 DATE 26 Sep 2007
2 TIME 07:06:52
1 PLAC
2 FORM City, County, State/Province, Country
0 @I1@ INDI
1 NAME First1 Middle1 /Last1/
2 GIVN First1 Middle1
2 SURN Last1
1 SEX M
1 BIRT
2 DATE 26 JUL 1978
2 PLAC Place1
1 CHAN
2 DATE 07 FEB 2007
3 TIME 17:37:12
1 NCHI 1
1 OBJE @M2@
2 TITL First1
1 FAMS @F1@
1 FAMC @F2@
1 NAME First2 Middle2 /Last2/
2 GIVN First2 Middle2
2 SURN Last2
1 SEX F
1 BIRT
2 DATE 02 FEB 1978
2 PLAC Place2
1 CHAN
2 DATE 29 JAN 2007
3 TIME 11:27:17
1 FAMS @F1@
1 FAMC @F4@
0 @I5@ INDI
1 NAME First3 Middle3 /Last1/
2 GIVN First3 Middle3
2 SURN Last1
1 SEX F
1 BIRT
2 DATE 24 JUL 2006
2 PLAC Place3
1 CHAN
2 DATE 29 JAN 2007
3 TIME 10:17:44
1 FAMC @F1@
0 @F1@ FAM
1 MARR
2 DATE 27 MAY 2006
2 PLAC Place4
2 OBJE @M4@
1 CHAN
2 DATE 31 JAN 2007
3 TIME 18:22:52
0 @F2@ FAM
1 CHAN
2 DATE 07 FEB 2007
3 TIME 16:59:03
1 MARR
2 TYPE Religious
2 DATE 20 FEB 1971
2 PLAC Plac5
1 DIV
2 DATE 1994
0 @F4@ FAM
1 CHAN
2 DATE 04 FEB 2007
2 TIME 19:20:38
1 MARR
2 DATE 11 OCT 1975
2 PLAC Place6
0 @M10@ OBJE
1 FILE media/HPIM0631.JPG
1 TITL Title1
0 @M11@ OBJE
1 FILE media/HPIM0601.JPG
1 TITL Title2
0 @M12@ OBJE
1 FILE media/HPIM0566.JPG
1 TITL Title3
0 TRLR
The biggest problem I am seeing out of the gate is creating these relationship entries (FAMC @F1) on the fly during import. Next post will really outline the new structure. Got to sleep for now.
Comment #4
Microbe commentedSorry for my bad communication recently, I haven't been able to do any work on the import feature. I see you look like you are starting it though. I you need any help with any coding things I can find time to answer them but I don't think i can take on any major tasks at the moment.
Sorry
Peter
Comment #5
pyutaros commentedPeter,
Please, no apologies necessary!!! :) You've been a tremendous help so far! I guess if you can help me get my thoughts organized here, I would greatly appreciate that as well.
Thanks again,
Jonathan
Comment #6
pyutaros commentedHere's a quick sketch of the logic loop I'm thinking of for the new import code. Since the old version just dumped the entire file into the family_facts table, initial import was easier, but working with the data in Drupal required massive amounts of work and was still not gedcom compliant in its output. This brings to mind a few methods that might be used to import the data.
All that being said, a hybrid approach seems appropriate. I like the idea of the temp DB. Here's the proposed description of how the code will change.
Import feature begins at line 73. Line 132 (while (!feof ($fp))) begins the line by line evaluation. I'd like the leave the initial variable setting parameters in lines 134 thru 149. The ged files haven't changed, so how we set our initial variables can remain the same.
We will basically be replacing lines 154 to 223. We'll basically want to skip the header section of the ged file. I think we can accomplish this by NOT doing anything unless the XREFs or Fact Codes we are looking for come up.
We have five different conditions we are checking for:
On the other hand, it may be better to just monitor what $level the $gedline is at and pass "tokens" from one while loop to the next.
Well, out of time again. I'm getting a better idea of how I want to approach this. If anyone has any input, please chime in.
Comment #7
Microbe commentedI think a case/switch based system would be good with a very similar structure to what you have added
level 3 will auto skip.
Each piece of data can be saved and process into the right location as it works like a tree because of the variable switches.
the only really difficult part is the family data. to do this i think you should save a column of the xref to the database which can be used to reference to the individual maybe? not sure though.
Comment #8
pyutaros commentedI am very slowly making progress on this. Some issues are cropping up, but I will ask questions when I finish the parts I know how to get. Commented lines are snags. I have only created the switch/case code ofr INDI nodes. I have not yet created the DB Insert code. Very slow going due to other distractions.
Comment #9
Microbe commenteda date conversion function that i quickly wrote:
it has been tried and tested so should work fine
you can also use the explode function to split the names
Comment #10
Microbe commentedhaving looked at the GEDCOM source that I have it doesn't have GIVN and SURN lines after the NAME line so maybe it would be safer to stick to splitting up the name line and not using the lines below. not sure though? i will send you my GEDCOM source so you can see and decide.
Comment #11
pyutaros commentedPeter,
Thanks for the GED file. That's a big help to see how another program sets up a GED file. We should definitely do the name handling at the 1 level instead of level 2. I know the solution in either case involves preg_match, but I still need to read more to understand that. Also, excuse me for being dense, but as for your date function. Should I just put that in common.inc and then call it with something like $birthdate = family_changeDateFormat($value)?
Thanks,
Jonathan
Comment #12
Microbe commentedYour spot on for how to use the date function :)
I would use two explodes for the name splitting maybe as preg_match is hard to use (well I tried and couldn't work out how)
I would use them as follows:
This certainly isn't the most effective way to get it too work and if you can get preg_match to work it will be more versatile.
Comment #13
pyutaros commentedWell, so much for any kind of standard in GEDCOM. Take a look at how the FAM records are referred to in your file, then look at mine.
YOURS
MINE
Just an observation. It's still workable. I think also what I'm seeing here is that you never entered information like type and date into your original program. It definitely gives me plenty to think about when creating the export file, but that is a whole other story. Anyhow, back to work.
Comment #14
Microbe commentedOh dear...
I think its only the positioning of the relations - HUSB, WIFE and CHIL on mine and and FAMS and FAMC on yours- I haven't added the data like date place and type so they should come up the same.
I'm not entirely sure but by looking at the old family module source code mine seem more like what it should be as there are alot of searches for CHIL and HUSB but none (as far as i have seen) for FAMS and FAMC which yours uses.
I'm now really confused because GEDCOM is supposed to be a very defined standard. :(
Comment #15
pyutaros commentedI'd say you're probably right about your file being closer to standard. I guess it may just be a statement on how phpGedView handles the data. they obviously have their own "standard" of GEDCOM compliance.
I had to end this in mid coding again. Mostly done. Have to insert DB variables. Also have to create node. Have to look at the node creation text that the import previously used. Then I suppose I'm going to have to figure out what I'm going to do with all the relationship data. Here's where it currently stands, along with the old node creation routine following. No more till Tues.
Comment #16
Microbe commentedonly comment is that the variables you are entering into the database need to be the same as the ones you are assigning values to.
e.g. gender value goes to
$genderwhereas you insert the value of$node->SEXComment #17
Microbe commentedoops sorry you said this in your post :(
Comment #18
pyutaros commentedOkay. Here's where it stands today. All node creation and table insertion is complete. I believe the only nagging detail is going back and inserting the NID into the ancestor_group field for individuals. Should be small work, but I'm out of time. Also, I discovered a few flaws in how I am evaluating things. One such flaw is if children do not share the same name as PARENT1, some relationship data may be skewed.
Finally, I ASSUME that group data will not be in the gedfile until the end. If this is not the case for a file, the import will fail. My head is definitely spinning. Here is the current code:
Thanks,
Jonathan
Comment #19
pyutaros commentedOkay. I haven't even remotely tested this yet. Here is the first draft of the import code. It has been committed to ver 5.x-3.x-dev. Download should be available after midnight.
Comment #20
pyutaros commentedDone a quick test. The new code creates nodes, but they are of an unknown type. Trying to figure out what the problem could be. Think it has something to do with the node_save function.
Comment #21
pyutaros commentedThe problem was with the variable being used in the Switch / Case evaluation structure. $current0record was never being set, so the remaining values were not being set. Corrected variable and a few other errors (include statement , duplicate functions). now the nodes import as the proper type, but no data is coming through. Already did one test with echo statements, and it looks like all the variables are there. Receiving the errors like the following on import.
Here is the import code as it stands currently. Too cloudy to go any further tonight.
Comment #22
Microbe commentedDrupal saves node vaiables using node_save() i have found so they don't then have to be inserted afterwards. (see import.inc) this still doesn't seem to work for family group nodes (no idea why) also keep getting errors temporary tables- not sure what these do yet so maybe you could look at that.
Comment #23
Microbe commentedmade a couple of changes to your import script - it should now work fine. :)
Comment #24
pyutaros commentedMicrobe, thank you very much for the update. I can see where I went wrong with node_save and the temp DB. I have updated CVS with ver 5.x-3.3. I have also tested and it works beautifully. Thanks again. I will udate the 6.x branch with the import file you submitted over there as well.
Jonathan
Comment #25
pyutaros commentedNo complaints. I am marking as fixed. I should also add that 6.x is now the official "New Features" version of Family Tree 2. The only feature that will be added to 5.x (barring community backports), will be the pending export feature. 6.x will remain the official branch for new features until the 7.x code freeze, at which point we will begin developing the 7.x version.
Comment #26
Anonymous (not verified) commentedAutomatically closed -- issue fixed for two weeks with no activity.