I'm starting to work on distance searching/views integration on Geofield (see #1469956: [Meta] - Improve Views-powered Geofield proximity searches), and it occurred to me that an exposed filter doing geocoded views searches is likely to run into API limits pretty quickly. Under our regular use (saving geocoded data to geofield), this isn't really an issue, but I don't really have a place to store data with views filters. By utilizing Drupal's caching system, this should be relatively easy to do.
Ideally, I think we could make an option that states whether or not we cache a result. The data probably should be segmented to it's own table (cache_geocode?). For fields, it's probably not worth the effort, but makes sense if we need it for other purposes.
One thing that might be an issue is various providers Terms of Service. Looking at Google's for example (https://developers.google.com/maps/terms#section_10_1_3), they forbid caching results unless you're doing it for performance reasons, which I think could be reasonably argued for something like a views filter search.
Comment | File | Size | Author |
---|---|---|---|
#54 | geocoder-1515372-54.patch | 10.8 KB | hussainweb |
#52 | geocoder-1515372-52.patch | 10.08 KB | rudiedirkx |
#48 | interdiff-geocoder-1515372-41-48.txt | 612 bytes | WidgetsBurritos |
#48 | geocoder-1515372-48.patch | 12.38 KB | WidgetsBurritos |
#41 | geocoder-1515372-41.patch | 12.23 KB | rudiedirkx |
Comments
Comment #1
phayes CreditAttribution: phayes commentedGood idea.
What we should do is base the cache-keys off the md5 hash of the incoming data (that way it works on all data, including KML files and the like). Then the results should be a gzipped serialized geoPHP object.
geocoder function should take a $cache parameter, and an option should be added to geocoder widget settings.
Comment #2
phayes CreditAttribution: phayes commentedI've added static caching, we still need to add persistent-caching and make it configurable in the UI
Comment #3
sdboyer CreditAttribution: sdboyer commentedi'll just weigh in quickly and say +1 to having a separate cache bin, e.g.,
cache_geocode
. in fact, i might even go so far as to recommend *multiple* separate bins, given that different providers have different terms of use. that way you can easily create tailored clearing logic for each provider.you COULD utilize key prefixing and wildcard clears...but wildcard clears aren't necessarily the sanest thing on all cache backends (*COUGH* memcache), so bins are still the cleanest way to segment. that or cache tags, but you can't rely on those until D8 - they're now in core, iirc.
Comment #4
phayes CreditAttribution: phayes commentedRelevant IRC discussion cut and paste:
Comment #5
jenlamptonI was just testing the static cache, and wasn't able to get persistant caching working - but it looks like that's just because it's not done yet. :)
Let me know what I can do to help move this forward.
Comment #6
jenlamptonI'm going to take a stab at setting up a cache_geocoder table and storing results in there for views-exposed-filter searches.
*edit* fixed typo, yes cache_geocoder not cache_geocode :)
Comment #7
phayes CreditAttribution: phayes commented@jenlampton++ !
Oh, be sure to call it
cache_geocoder
(not cache_geocode)Comment #8
steinmb CreditAttribution: steinmb commentedWe also interested in some kind of cache. We are currently working on migrating data from an older systems into Drupal. One of the external systems contain addresses that we store on the Drupal-side in addressfields and then let geodecoder decode these into geofield. We rather quickly run into 'OVER_QUERY_LIMIT' at Google.
The migration (migrate module) are getting rerun every time we change the code but the addresses are 'always' the same so hitting Google this hard does not make any sense. I thought about putting a proxy cache between us and google with a looong static cache, but solving it here would be a much better, and smarter solution.
Comment #9
emilyf CreditAttribution: emilyf commentedI'm in the same situation as steinmb. Importing loads of content in via a feed importer, not even including a location field in the import but there is an addressfield and geofield in the node type that I'm importing into. Every import triggers 10,000+ OVER_QUERY_LIMIT errors...
Let me know if there is anything I can test.
Comment #10
nd987 CreditAttribution: nd987 commentedBig +1 on this. Is there any progress towards persistent caching? Seems like a must-have feature to me, especially for those of us who do batch imports daily. Once you have the address geocoded, it ain't going anywhere, so you can cache it indefinitely.
Comment #11
Dean Reilly CreditAttribution: Dean Reilly commentedI would like to see some discussion of using a proxy to cache responses from services instead of implementing this in geocoder. It seems to me that this approach might have some advantages:
Comment #12
nd987 CreditAttribution: nd987 commentedThe problem with relying on a proxy to cache geocode results is...you have to have a proxy. While the advantages @Dean Reilly posted are valid, most developers will not have or be able to setup (due to knowledge, hosting, business, or other limitations) an http proxy for this purpose. It could definitely be supported, but we need something that is easier and more trouble-free for the average developer.
Comment #13
Dean Reilly CreditAttribution: Dean Reilly commentedHosting limitations could be an issue. However, I'm not sure how many people would be running a site that would be making 50,000 geocoding requests a day (Yahoo!'s limit), and be running on a service where they don't have the necessary access to set up a proxy.
Exceptions might be sites hosted on a PaaS such as Acquia and Pantheon. But these are hosted on the Amazon and RackSpace clouds respectively. The user can always spin up an instance in the same AZ and configure that to act as the proxy.
There's also an argument to be made for the performance considerations of this module on smaller sites without proxies. I'm not sure the technical costs justify the benefits in this case either, though. Smaller sites on shared hosting probably have bigger performance bottlenecks than a geocoding api. (Although that is just a gut feeling and I have no numbers to back up that statement.)
I also believe the average developer could go away, learn how to and proceed to set up a proxy server fairly quickly.
-------------------------------
I'd like to point out I'm not terribly opposed to the idea all things considered but I do want to point out that there are alternatives and pros and cons to each.
I also think the issue regarding hitting the API limits probably suggests a queuing system which can spread out the load is also required as caching will only take us so far.
Comment #14
nd987 CreditAttribution: nd987 commentedKeep in mind that Google's API limit is only 2500 per day (for non-business clients). Of course developers could setup a proxy if they really had to, but why make them? Adding a regular caching layer would involve:
Comment #15
Dean Reilly CreditAttribution: Dean Reilly commentedStep 3 would be implement different caching rules to comply with the different terms and conditions of the various services. See replies #3 and #4.
------------------
To switch sides quickly, I'd like to give some of the reasons I think it would be a good idea for geocoder to cache results even if a proxy is being used.
I think the big advantage to having caching happen at the api level would be that it's agnostic of the service. So we could make use of multiple service end points and have the cached results of one prevent requests to another.
We can also be smarter with the cache. We can provide caching up to a variable level of resolution. E.g. are we only plotting addresses up to the level of a city to provide anonymity to users? Then we don't need to geocode two different addresses from the same city for different users.
Finally, APIs aren't always brilliantly implemented. The ToS may be much more permissive towards caching than the http headers suggest. Sure, you can configure the proxy to override these headers but that then means lots of people doing the same job. Easier to have it done by geocoder.
------------------------------
It's probably also worth keeping an eye on the pluggable http client issue for D8. This would allow us to use a tool like Guzzle which has support for HTTP caching built in and allow people to get the benefits of a proxy without having to install one.
Comment #16
jpstrikesback CreditAttribution: jpstrikesback commentedWhat if step 3 was simply an opt in on a config page?
Comment #17
drunken monkeyI've now taken a first shot at implementing this, for now without a config page, but with the caching variant overridable (in code) for each handler. The only problem is: it doesn't work. Seemingly, the PHP serialization functions have some problems with Geometry objects (or at least Points). Direct serialization and unserialization like in
seems to work, but as soon as there is some external storage involved (be it a database or just a file or the clipboard), it fails and only returns an
__PHP_Incomplete_Class
object. Base64-encoding or similar tricks also don't work, as far as I can see.So, to get this even basically started, without even considering the ToS and configuration problems, we will have to solve this. As far as I can see, we have the following options there:
Any ideas or comments? What do you think of my approach to the other problems? What we could easily add, too, is some options on the config page to disable caching for some (or all) handlers.
Comment #18
jpstrikesback CreditAttribution: jpstrikesback commentedI'm gonna test this and play the moment I get a second, unfortunately that could be 2 weeks from now, but dang it's nice to see this thread lives!
Comment #19
attiks CreditAttribution: attiks commentedThis should work, I added
geophp_load();
togeocoder_cache_get
so the classes are known for the unserialize.Comment #20
drunken monkeyWow, great! Thanks for your help, I was completely lost with that one.
I don't think it's necessary to serialize the objects ourselves, though. Just move the
geophp_load()
call above thecache_get()
and it's fine.Other reviews would also be very welcome, suggestions how to proceed with the configuration, etc.
Comment #21
attiks CreditAttribution: attiks commented#20 works as well
Comment #22
drunken monkeyRe-roll of #20.
(Only whitespace changed, so I don't think we need a new review.)
Comment #23
drunken monkeyImproved the patch in several ways regarding the
cache_function
callbacks.$geometry
argument when getting the cache value (in this case,NULL
).$processor
,$data
and$options
to the caching callback. We don't know whether custom cache implementations will use the same cache ID we create.geocoder_cache_set()
.Comment #24
Sinan Erdem CreditAttribution: Sinan Erdem commentedDoes anybody work on this issue? It would be nice to see the patch committed to a version.
Comment #25
drunken monkeyFrom the looks of it, I'd say nobody is working on this module.
See also #1563648: Taking a break from geocoder development. Some new developer would be needed – otherwise, you'll just have to hope phayes returns soon.
Comment #26
mikey_p CreditAttribution: mikey_p commentedJust a note, that while this patch looks great, the widget that geocoder exposes for geofield and location doesn't use this function and thus doesn't benefit from this caching.
I'm not sure what the simplest solution to this is, outside of re-thinking the plugin system and possibly moving to an object oriented structure.
Comment #27
mikey_p CreditAttribution: mikey_p commentedHere's a slightly hacky patch that allows geocoding through the field widget to be cached as well.
Comment #28
arbee CreditAttribution: arbee commentedI just want to thank you all for your excellent work. So glad I stumbled upon this, I was having query limit problems with a Views proximity search and this looks like it's going to take care of it.
Comment #29
SocialNicheGuru CreditAttribution: SocialNicheGuru commentedI am getting this now though:
Notice: Undefined index: cache in geocoder_widget_get_field_value() (line 283 of geocoder/geocoder.widget.inc).
Notice: Undefined index: cache in geocoder_widget_get_field_value() (line 294 of geocoder/geocoder.widget.inc).
Comment #30
quicksketchLet's unassign @jenlampton. :)
Comment #31
SocialNicheGuru CreditAttribution: SocialNicheGuru commentedIf I add this patch in #27 then cache tables for geocoder are not created when I install using Aegir.
Comment #32
rudiedirkx CreditAttribution: rudiedirkx commentedDoesn't apply to dev correctly anymore. Dev fixed a bug with multiple value fields, but this patch reintroduces it, because of
so that part should be patched/merged manually.
Comment #33
rudiedirkx CreditAttribution: rudiedirkx commentedRe-roll of #27. With caching, without the multiple values bug.
Comment #34
Jelle_S@rudiedirkx I think you forgot someting ;-)
Comment #35
rudiedirkx CreditAttribution: rudiedirkx commentedWhat's that?
Comment #36
Jelle_SI think you forgot to attach the patch in #33 ;-)
Comment #37
rudiedirkx CreditAttribution: rudiedirkx commentedHaha crap. Eeeeeeh let me find that one....
Usually I remove my local patches, because "they're on d.o anyway!". Not this time thankgod!
Comment #38
kerasai CreditAttribution: kerasai commentedHi there, sorry to step on anyone's toes but this seems like an exceptionally complicated way to solve a fairly simple problem.
For a recent project build, I rolled a module that provides a geocoder_handler plugin that is essentially a wrapper for the existing Google plugin, and adds the caching at that layer. No need to touch any of the other parts of Geocoder and the integration with fields and Views just keeps on doing its thing.
https://www.drupal.org/sandbox/bmoresafety20/geocoder_google_adv
Take a look, please ping me with questions or input.
Comment #39
rudiedirkx CreditAttribution: rudiedirkx commented@kerasai But you don't want to just cache google. Or any other one handler. You want to cache ALL handlers, depending on a field's config. I agree it's overcomplicated, but it works perfectly and we REALLY need geocoder caching, so let's get it in!
(It's probably not a good idea to serialize and persist Exception objects. I don't KNOW, but it seems wrong... And why do you save into
geocoder_google_adv
and intocache_geocoder_google_adv
?)Comment #40
sheijtel CreditAttribution: sheijtel commentedThe patch from #23 works fine when just using the geocoder API, great! Patch of #37 does not include the geocoder.install file which includes the schema
Comment #41
rudiedirkx CreditAttribution: rudiedirkx commented#23 has the bug described in #32
Attached patch should include everything. Thanks @Steven. Please try that one.
Comment #42
blacklabel_tom CreditAttribution: blacklabel_tom at Edo commentedHi Rudie,
The patch in #41 won't apply for me against the latest dev. I applied it manually and I couldn't get any results through from the module.
A first glance the cache check here is a bit too aggressive. By default $cache_reset is FALSE and $cache_type is 2 meaning ALL requests will look in the cache before doing the lookup regardless of if the query is run for the first time or not.
I'm assuming this check needs to look if this location has been cached rather than the caching settings the function defaults to?
Cheers
Tom
Comment #43
blacklabel_tom CreditAttribution: blacklabel_tom at Edo commentedComment #44
rudiedirkx CreditAttribution: rudiedirkx commentedThat's weird. Applies fine for me:
0 lines offset even.
I don't know about the actual code. It's weird that there are levels of cache checking, I agree, but I haven't scrutinized the code. I skimmed it, tested it and it worked. It's not perfect, but it's been over 3 years now and we really need caching.
You're very welcome to improve on it, or even start over, I'll review and test and RTBC when I'm happy =) but I don't think there'll be anything committed in 2015 then =(
Comment #45
blacklabel_tom CreditAttribution: blacklabel_tom at Edo commentedHi Rudie,
I'm going to leave this one well alone as I'm tied to the 1.2 branch at the moment.
Cheers
Tom
Comment #46
PolPatch #41 applied and working flawlessly.
I have a question though, are we obliged to use the static cache ?
Comment #47
lukusWould it be possible to get this patch committed?
Cheers
L
Comment #48
WidgetsBurritos CreditAttribution: WidgetsBurritos commented#41 shouldn't be committed because blacklabel_tom's concern in #42 above is a legitimate problem. I just ran into it as well.
I've attached an updated patch that fixes it, along with the interdiff so you can see the changes made.
Basically the idea is to separate the if/else into two separate steps.
- The first step attempts to retrieve from cache if the settings ask it to.
- Then in the second step. If the value is still empty (cache was either not retrieved at all, or cache failed to find a result), then it forces a new lookup and recaches.
Comment #49
rudiedirkx CreditAttribution: rudiedirkx commentedIf caching is enabled, do we still want to diff old and new values to see if the field value has changed? If caching is enabled, it doesn't matter anymore, does it? And settings might have changed, so it should do a new lookup. Or when the geofield might not have any data yet, a save should fetch them, even if the source field hasn't changed. Know what I mean?
Adding this before line 140 in
geocoder.widget.inc
would do it:so the 'has the source value changed'-check only runs for non-cached fields.
If that's a bad idea, I RTBC.
Comment #50
WidgetsBurritos CreditAttribution: WidgetsBurritos commentedI'm not sure I get what you're asking. Are you sure that's the correct line number in geocoder.widget.inc? Can you tell me a little bit more context of the surrounding code there, so I can see what specifically you are referring to?
Comment #51
rudiedirkx CreditAttribution: rudiedirkx commentedOh, it was line 140 for me. In
geocoder_widget_get_field_value()
, below commentthere's a block of code that compares the old source field value (
$field_original
) against the new source field value ($source_field_values
). It only compares the field values, not the widget settings etc. It's not guaranteed the coordinates will be the same (even if they exist), or even if they already all exist. The check is also 'wrong', because not the entire field data is used for geocoding. Only the geocoder knows which data is relevant, so let it handle performance. The cacher knows which data is used, so it knows when a new geocoding is necessary.With caching enabled, the check is useless, because it'll use caching if the settings etc are the same, and will live fetch if it has to. If caching is not enabled, it shouldn't live fetch always, so the comparison code is still necessary.
The comment is my line 239. The code block ends on line 253. There's a
return FALSE
in the middle.I think caching shouldn't even be an option, it should be enabled always, but that's another issue.
Comment #52
rudiedirkx CreditAttribution: rudiedirkx commentedI've taken another stab at it. Quite different from previous patches (sorry!), so no interdiff.
It's much simpler than before. 1 cache table. Always cache results. Removed static cache. Removed cache overrides (callback, bin, ttl etc). Removed cache table from cache hook, because it should be really persistent. Coordinates don't change as often as we flush the cache. We could make a button to manually flush all/some geo cache from the settings form.
I only tested geocoding from an Addressfield, only with Google, but other fields and handlers haven't changed, so should work the same. I don't know how to test the
geocoder()
function...Comment #53
PolCould you please re-roll those patches (this and #2159925: Geocode from virtual fields/entity property instead of just real fields) against the current development version so I can apply them ?
thanks!
Comment #54
hussainwebHere is an attempt to reroll.
Comment #56
PolThanks to all for this :-)
Comment #58
rudiedirkx CreditAttribution: rudiedirkx commentedYes, finally, fantastic! The 2 most important additions. Thanks @Pol, and @hussainweb for the quick work.
Comment #62
AdamPS CreditAttribution: AdamPS commentedCaching is great, but this commit seems to have accidentally made a significant API change also.
As an example of the consequences, this has made a corresponding change to the results of a view filtered by geofield proximity when no results are returned:
It's presumably impossible to identify and correct all the consequences of this change in contrib/custom code, so I think we have to put the API back how it was.
I have raised #2689545: API of function geocoder must not change.
Comment #63
AdamPS CreditAttribution: AdamPS commentedVery sorry I was mistaken. The old code also returned FALSE - just the exception was previously caught in the plug-in and now it is caught in the main module file.
However I am now baffled as to how geofield proximity ever worked correctly with no results. I wonder if perhaps it never did. However the Google geocoder is very creative at interpreting text as a location, and often finds somewhere on the globe to return as a result. If there are no database entries near this, then we do in fact get no results.
So next step, I will raise an issue against geofield proximity filter. Sorry to have taken up everyone's time.
Comment #64
AdamPS CreditAttribution: AdamPS commentedI believe there are some problems with caching in relation to error handing.
In particular, it looks like a temporary error (such as Google API limit reached) might be cached as FALSE for a certain search data. This means that even after the temporary error has resolved, that particular data will not geocode.
Secondly, the comment of geocoder_cache_get indicates an intention to cache "no results" entries using FALSE. However it looks like this wasn't working in some cases - in particular not for Google geocoder.
I have a patch for these in my issue #2689211: Confusion with errors and logging which attempts to fix these and some other problems with error cases. Any comments from the experts here on my issue would be much appreciated.
Comment #65
rudiedirkx CreditAttribution: rudiedirkx commentedRelevant:
they were 'fixed' before 1.3, but I think they (or one) address your concerns.
Comment #66
AdamPS CreditAttribution: AdamPS commented@rudiedirkx Thanks for a quick reply.
As far as I can see #2649096: Fix exception throwing and empty result handling fixed matters so that Google geocoder empty results were cached. However it also meant that temporary errors (such as API limit reached) from Google and the majority of other geocoders were incorrectly cached. The problem is that the Google plugin throws an exception for both cases.
It seems that the plug-ins are currently not handling "no results" and "temporary error" in a consistent manner. Until that is fixed, it's probably not possible to get everything working. #2689211: Confusion with errors and logging provides that fix to the plug-ins.