Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Original issue:
I am working on a site with 10k+ nodes that were all added, only 2500 were geocoded, and now I'm planning on running the drush command to gradually chew threw the remainder. Unfortunately, the command as it stands now runs completely out of memory, and will needlessly hit the Google API limit again. Patch forthcoming.
This issue has become a place to add several options to the drush command geocoder-backfill including:
- --limit: A limit to the total number of entities that can be updated with the command, to prevent over-limit requests
- --only-entity: An option to only attempt to geocode a given kind of entity. This may be field collections, nodes, profiles, etc.
- --only-bundle: An option to only attempt to geocode a given bundle. This may be something like "page" or "article" if your entity is a node.
Comment | File | Size | Author |
---|---|---|---|
#31 | interdiff-2077317-29-31.txt | 630 bytes | jantoine |
#31 | geocoder-drush-2077317-31.patch | 9.05 KB | jantoine |
#27 | geocoder-drush-2077317-27.patch | 8.32 KB | jessehs |
#20 | interdiff-13-to-20.txt | 445 bytes | mariacha1 |
#20 | geocoder-drush-2077317-20.patch | 6.97 KB | mariacha1 |
Comments
Comment #1
jhedstromThis patch adds the limit option, and I couldn't help but bring the drush commands up to drush 5/6 standards (no need to explicitly define a callback function).
Comment #2
jhedstromAnd the patch...
Comment #3
jhedstromThat last patch was a bit broken. This patch is working with the limit. It also sets
$entity->original
to empty since otherwise code further on in geocoder won't process the data.Comment #4
mariacha1 CreditAttribution: mariacha1 commentedUnless I'm missing something, the patch in #3 will only read the first X items of a given bundle, even if they've already got geocoding information. So once those first X items are geocoded, no new items will be geocoded in subsequent runs.
I'm adding some more logic to the query to only pull in the first X UNgeocoded items:
I also figured it might be a good idea to order the results from newest to oldest, so if you've got X junk addresses that fail to encode every time, they don't take over all attempts every time the drush command runs.
$query->orderBy('node.nid', 'DESC');
I also fixed some coding standard.
And, of course, this whole thing will need to be reworked to apply to all entities instead of just nodes before it can be marked RTBC.
Comment #5
mariacha1 CreditAttribution: mariacha1 commentedHere's my latest changes, which allow updating geocodes for any entities.
I also noticed that the $limit was applied on a per-entity/per-field basis. So if I had 3 entities, each with their own geocoded fields, and set the limit to 50, I would send 150 requests to the server, instead of what I'd requested.
I believe I've fixed that.
I'm including geocoder-drush-2077317-05.patch and interdiff2.txt, which shows the difference between patch -05 and -04.
Comment #6
duntuk CreditAttribution: duntuk commented@mariacha1 So is this the proper command?
UPDATE:
Yup, that patch and command works great! Thanks @mariacha1!
Comment #7
duntuk CreditAttribution: duntuk commentedFor convenience to others, here's the latest DEV version with #5 (geocoder-drush-2077317-05.patch) applied patch.
Comment #8
jhedstromOne potential issue, with large amounts of bad data, it will be possible that bad requests are always sent first. The response from Google in the case of a bad address:
If there are enough of those, with the current patch, they will always be run first, and if there are more than the amount set in
--limit
, or the daily limit of Google (or other geocoding service), then no progress will ever be made. I wonder if the patch should set an empty row in the event of such a response?Comment #9
mariacha1 CreditAttribution: mariacha1 commentedI'm worried about potential unintended consequences for adding blank entries into the geocoder field in the database.
At this point my inclination is to add a more meaningful error message in the case of such failures, so they can be cleaned up manually. Perhaps a message in the watchdog table with information on what was passed and where to edit the address that failed?
With some fancy query footwork, we might even be able to cross-reference such geocode errors with these watchdog entries in future geocoder-backfill calls to avoid re-calling them. (Although I'm inclined to make this a configurable option instead of the default behaviour to avoid bad data bad habits.)
Comment #10
mariacha1 CreditAttribution: mariacha1 commentedComment #11
jhedstromI like the idea of adding watchdog messages, ideally with a link to the offending entity. This way the bad entries could be manually cleaned up.
Comment #12
mpgeek CreditAttribution: mpgeek commentedWatchdog linking has been implemented in #2221663: Add a link to offending entity in watchdog message when geocoding fails, when backfills happen, watchdog gets messages that link to the entity in question.
Towards more backfill control, I wanted to be able to backfill field collections without backfilling the host entities (i have multiple geofields), so i added a
--only-entity
option. Usage:Should work with any entity type.
Comment #13
jessehsThis patch adds a --only-bundle flag as well, so you can limit geocoding for bundle type in addition to entity type. So if you limit to the node entity, you can also limit to node-type.
Usage:
drush geocoder-backfill --only-entity="profile2" --only-bundle="my_profile_type"
Comment #14
mpgeek CreditAttribution: mpgeek commentedComment #15
mariacha1 CreditAttribution: mariacha1 commentedComment #16
quicksketchHi guys, on a site I run, we've been running into an issue where Google rate-limits us to a certain number of geolookups per second, in addition to the per-day limit. As discussed over in #2024809: Allow using a google API key, we're limited to 20 lookups per second (essentially), and it's very predictable behavior for us. Adding a 1-second delay ever 20 lookups solves the issue for us (up to the daily limit of 2500 look ups). That problem looks like it would clearly affect users doing drush requests like this, so I wanted to see if anyone else had experienced this problem here.
Comment #17
quicksketchSorry ignore my #16 comment. The issue #1672742: Avoiding OVER_QUERY_LIMIT due to more than 5 requests in a second in google geocode already solved this problem, it's just been more than a year since the last stable release.
Comment #18
jhedstromLatest patch is throwing this notice:
and looking at the code, I don't see anyplace where that variable is actually set.
Comment #19
jbiechele CreditAttribution: jbiechele commentedAdding at about line 38:
$force_reload = drush_get_option('force');
resolves it for me.
The line was deleted in the patch #13.
Comment #20
mariacha1 CreditAttribution: mariacha1 commentedOoops, looks like I accidentally removed the --force option way back in patch 4. Re-adding it and providing an interdiff.
Comment #21
paypaul CreditAttribution: paypaul commentedUnless I am missing something the latest patch is giving me a PDOException:
-------
WD php: PDOException: SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL [error]
syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '=
'user') AND (geo.entity_id IS NULL )
ORDER BY geo_ent.uid DESC' at line 2: SELECT geo_ent.uid AS uid
FROM
{users} geo_ent
LEFT OUTER JOIN {field_data_field_geolocation} geo ON geo_ent.uid=geo.entity_id
WHERE ( = :db_condition_placeholder_0) AND (geo.entity_id IS NULL )
ORDER BY geo_ent.uid DESC; Array
(
[:db_condition_placeholder_0] => user
)
in drush_geocoder_backfill() (line 85 of
/home/apache/www/newkameleon.dev.modernmedia.ca/html/sites/all/modules/geocoder/geocoder.drush.inc).
-------
I will try and sort out the issue myself I just thought I should let someone know. I am hoping this will be the final piece needed to finish my migration from Drupal 6.
Thanks
Comment #22
dubcanada CreditAttribution: dubcanada commented#20 doesn't seem to work properly if you run it on entities attached to users. The $entity->original = array(); causes issues. If you change
if (empty($diff)) {
to
if (empty($diff) && !drupal_is_cli()) {
In geocoder.widget.inc on line 244 it solves the issue completely while continuing with the regular path for normal usage.
Comment #23
mpgeek CreditAttribution: mpgeek commented@paypaul, @dubcanada it would be helpful to know how you are calling the command, i.e. what did you type? Also, I've used this many times on profile2 entities, so i'm wondering if something else is amiss here.
Comment #24
dubcanada CreditAttribution: dubcanada commentedThe regular drush command (drush geocoder-backfill). There is an addressfield field added via Account Settings to the user profile.
When geocoder goes to determine if the entity needs to rerun geocoder (via determining if the user was modified, comparing the original with the new values) it determines there is no difference (because there isn't) so it just stops there.
Does that make sense?
Comment #25
jessehsAdding related issue #2202689: Entities not getting updated programmatically (via drush geocoder-backfill).. @dubcanada - try the patch there.
Comment #26
mvcthis works perfectly for me for geocoding nodes, except that i had to patch revisioning: #2276657: Undefined property errors in combination with geocoder thanks!
not marking RTBC because a previous poster mentioned a problem with user entities, which i didn't test.
Comment #27
jessehsThis patch addresses a few differences between the way user entities are stored and the way geocoder handles users. It also incorporates the patch from the issue mentioned in #25. Please test with geocoder/geofields put directly on the user object (rather than a profile2 entity associated with the user).
Comment #28
jessehsDoh! This fixes an indentation error.
Comment #29
mvcThis patch is so useful, I can't help but add more options :)
I have perhaps 5% of my dataset returning ZERO_RESULTS exceptions, which is fine, except that when I use the --limit flag I keep running over those same nodes over and over again. So, inspired by the migrate module, I added optional --idlist, --max-id, and --min-id flags which add further conditions to the query. The bundle, entity type, and limit flags will still be respected, but this will help me poke around in my database and find batches of ~2500 nodes to geocode.
Comment #30
marcvangendThe patches in #28 and #29 threw errors at me, no matter what I tried...
Going in with a debugger, I found that drush tries to execute the following query:
Of course the users table doesn't have an 'nid' column. Also, I don't get why this query is necessary if I'm restricting to nodes only.
Comment #31
jantoine CreditAttribution: jantoine commented@marcvangend,
I was receiving the same errors as you when using patches from #28 and #29 as well. It appears that in patches #27 and #28, the entity base table was replaced with a hard-coded 'users' table (see the interdiff in #27). The attached patch fixed this issue.
Comment #32
minorOffense CreditAttribution: minorOffense commentedOn translatable nodes the following errors occur when trying to geocode
Comment #33
stevenx CreditAttribution: stevenx commentedI can confirm the issue #21 reported by paypaul
I applied patch #13 running with 7.x-1.x
geocode from other field -> addressfield in normal USER
running drush geocoder-backfill
Comment #34
stevenx CreditAttribution: stevenx commentedNever mind. I reverted and applied now #31 which works.
thx
Comment #35
hockey2112 CreditAttribution: hockey2112 commentedI used the patch in #31, but it doesn't appear to be working. When I run the backfill command via SSH, it returns this, almost immediately:
No new locations have been geocoded, and there are no errors in the log. What am I doing wrong?
Also, when I patched the module (latest dev, released 2016-Feb-01), the geocoder.widget.inc potion fails with this message:
Comment #36
PolComment #37
donquixote CreditAttribution: donquixote as a volunteer commentedI made an issue and PR in github.
https://github.com/drupol/geocoder/issues/15
https://github.com/drupol/geocoder/pull/16
Currently contains a lot of unrelated commits, because the github repo is behind Drupal core.. but I am sure @Pol will fix this :)
Comment #38
PolFixed!