I am working on a Real Estate Site developing my own custom modules to query the MLS system via SOAP queries. This portion is working.
In order to keep the real estate listing up-to-date I need to keep track of price changes, what’s for-sale, and what is not longer for-sale. These become my xml feed processing tasks.
My approach is to create CCK content type for all listings. These are the listings shown to the public. To process new feeds with update data, I created a second CCK content type with identical fields. This 2nd content type is a temporary staging table. The temporary staging table is used to compare existing listings against the new XML feed data just arrived. I can then compare listings by MLS Listing Number as these are unique and identify new listings and current listings with new data updates such as price changes.
Here are my questions:
1.- To insert new found listings, should I just use an insert SQL query within php to copy records from the temporary table to listing table or should I use the node_save() function. ? The latter seems like a lot of processing overhead in php. However if node_safe() is not used, then do I risk the chance of not performing other ancillary tasks such as creating node teasers, etc.?
2.- If I use the node_save() function, how can I use the MLS Listing Number as an index to reference the nodes. So far they are referenced by nid and vid., correct?
3.- Is the approach of creating an identical table as a temporary staging table the wrong approach? Any thoughts welcome. There are 100s to 1000s of listings that need processing daily with massive amounts of data.
Thanks for your thoughts and comments in advance.
Sandro
Comments
I would not use a temporary
I would not use a temporary table at all. On import I would look for an existing record that has the MLS Listing Number using a query that returns nid. If a record is found (nid > zero), I would load the node using node_load() and compare the node to XML data, updating fields that are changed and if any are changed, use node_save().
(There is a module that imports XML and I think will update existing node using a unique field you specify)
I am currently using Feeds +
I am currently using Feeds + xml parser to import the xml file. The XML data is mapped to a CCK Content type. However, I have not seen a function or setting that would update an existing node by specifying a unique field, hence the need to create a temporary table. So far, each subsequent feed creates duplicate nodes.
I agree with you in the sense of not wanting to create a temporary table if not necessary. This just adds more processing and computation time.
BTW, thanks for your previous comments. I will take a look at the node_load() function to get more insights.
Thanks.
Nivets, Thanks for your
Nivets, Thanks for your comments again.
I just want to report back that your comments guided me to look at other clues leading to a solution.
You were right in that I could set up one of my xml fields to be the unique field by assigning (mapping) it to the GUID. This solved two of my planned tasks (updating existing and adding new CCK content without duplicates). The "Feeds" built in functionality works beautifully.
Now, I just need to implement on my own a process to purge my nodes.
Once again many thanks.
Sandro.
Feeds does have a built in
Feeds does have a built in method to remove nodes based on age.
Yes, you are right.
Yes, you are right. Unfortunately the MLS listings can persists until the listing contract expires. The listing contract expiration date is not public therefore I must keep the listings active without a node expiration date.
The way it works for our IDX service is that we get a full list of listings (List A). The next time we get a full list of listings (List B) we need to run a join query to compare List A with List B. Any listings in List A not found in List B need to be purged from List A.
One way to do this would be to simply empty List A and update it with List B.
I suppose you can set a node expiration date that expires the node just before the new List comes in. That would probably empty the current listings and load the new list to the database.
NOTE: I will give it a try and see how it works. Thanks.
It works perfectly
Just to report back.
I upgraded to the latest Feeds version and it works perfectly. No need to mess around with custom code.
The latest Feeds version has all the functionality I need. Now it is up to me to setup the appropriate feed schedules to get the new feed from the xml query service, update, and expire nodes. This is just a matter of planning my cron runs.
Nevets, thank you much for guiding me in the right direction. I would just have overlooked so many details and gone in the wrong direction.
Best regards,
Sandro.