I'm not really sure if this is a bug as much as a feature request, but I thought I'd start here.
I noticed that every day dRealty would do a complete listing refresh including all images. Subsequent updates on the same day would only update listings changed that day, as expected. The next day all listings would fail to match based on the hash comparison, and every single listing would be re-downloaded, re-geocoded, and all images re-downloaded. After doing some testing I figured out that the hash value used for thumbprinting listing data was changing every day. Of course it took me dumping the listing array and diffing it between two different days to finally spot what was changing, though I should have seen it much sooner:
< [L_DOM] => 20
< [L_DOMLS] => 20
---
> [L_DOM] => 19
> [L_DOMLS] => 19
I had the the days on market field mapped to an entity field, which I've since removed since it's not needed... However, I believe that since the RETS query used returns all fields available for each listing, and the hash is built on all returned fields (as opposed to only fields that are mapped to an entity field), this is always going to break the update-only-changed functionality, since every since day the DOM field will change on every listing, even though in most cases the rest of the listing data will be untouched from the previous day.
So, it seems like the there's a couple of possible solutions... One (probably the most correct one, though the hardest) would be to only generate the hash based on RETS fields that are mapped to entity fields. Thus if something changes about the listing that the site doesn't care about, i.e. an unmapped RETS field, then the listing need not be updated. This could have issues with photo updates, though, as the hash is also used to control whether or not photos should be updated... Not sure about that.
The easier solution would be to have a way to manually exclude a specific RETS field from being included in the hash generation. That way we could just exclude DOM (days on market) from that calculation, and the problem would go away. This requires a little more insight on the part of the person setting the site up, but perhaps that could be handled via documentation.
Sorry for the long-winded post, mainly trying to get my thoughts in order ;)
Comments
Comment #1
webavant commentedNice find. I noticed that every listing seemed to be updating too in my limited testing, and I mapped every field I could including days on market, so hopefully that's my issue. I'll update when I test again soon.
Comment #2
camidoo commentedWasn't long winded, and the solution is actually pretty easy, and is how, at one point, the system worked. There is a way to limit the fields returned from the RETS query to just "selected" fields.
By including a field list in the "Select" paramater of the options array passed to the query, i can limit the fields returned to only include the fields that are mapped. I'll add this in tonight and test it out. Additionally i think it is worth mentioning / asking if it's worth including a stop gap to check to see if we should import photos again, in the option section for selecting wether or not to import photos, i think i'll include a config option to select the photo modification field and use it to determine if photo's should be reprocessed.
Same goes with the geocoding, if a property has been geocoded, what are the chances of the address changing and it being needed to geocode again?
Nice Find indeed
Comment #3
stockliasteroid commentedYeah, that's what I was hoping you would say ;). I looked at the query builder, but it wasn't immediately apparent to me as to how/where you could limit the fields returned. That sounds like a perfect solution, though!
As to the photo issue, my MLS does has a special timestamp field for photo_updated_date that's separate from the listing update date, so if I could select that as the means of checking to see if photos need to be updated, that would be awesome! Then we could avoid downloading images when they haven't changed, and the listing data itself has changed.
For geocoding, you're right, it's unlikely to change... Geocoding itself is such a lightweight process I wouldn't be terribly concerned if it does get re-coded as a result of an unrelated update, as on updates we're never going to hit a rate limiter. But I suppose it would be nice to avoid re-coding if possible :).
Comment #4
webavant commentedYou guys are so awesome. I can't wait until I have time to mess with this some more.
Comment #5
stockliasteroid commentedCamidoo-
Any luck on this? Can I help out at all? Let me know if so! I've been watching the repo, but I didn't want to get started on this on my own if you already had made some headway... No biggie either way, just checking in!
Thanks!
Comment #6
kevinquillen commentedComment #7
camidoo commentedEnded up going with this suggestion, as i felt it was the more elegant of the two:
http://drupalcode.org/project/drealty.git/commit/2150c7f
Comment #8
camidoo commentedas far as i'm concerned i have this fixed, if it's giving someone trouble please re-open this issue.