Original issue:

I am working on a site with 10k+ nodes that were all added, only 2500 were geocoded, and now I'm planning on running the drush command to gradually chew threw the remainder. Unfortunately, the command as it stands now runs completely out of memory, and will needlessly hit the Google API limit again. Patch forthcoming.

This issue has become a place to add several options to the drush command geocoder-backfill including:

  1. --limit: A limit to the total number of entities that can be updated with the command, to prevent over-limit requests
  2. --only-entity: An option to only attempt to geocode a given kind of entity. This may be field collections, nodes, profiles, etc.
  3. --only-bundle: An option to only attempt to geocode a given bundle. This may be something like "page" or "article" if your entity is a node.
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

jhedstrom’s picture

Title: Add a --limit option to drush » Add a --limit option to drush command
Status: Active » Needs review

This patch adds the limit option, and I couldn't help but bring the drush commands up to drush 5/6 standards (no need to explicitly define a callback function).

jhedstrom’s picture

And the patch...

jhedstrom’s picture

That last patch was a bit broken. This patch is working with the limit. It also sets $entity->original to empty since otherwise code further on in geocoder won't process the data.

mariacha1’s picture

Issue summary: View changes
FileSize
4.65 KB
2.72 KB

Unless I'm missing something, the patch in #3 will only read the first X items of a given bundle, even if they've already got geocoding information. So once those first X items are geocoded, no new items will be geocoded in subsequent runs.

I'm adding some more logic to the query to only pull in the first X UNgeocoded items:

if (!$force_reload) {
                $query->leftJoin('field_data_' . $field_instance['field_name'], 'geo', 'node.vid=geo.revision_id');
                $query->condition('geo.entity_id', NULL, 'IS NULL');
              }

I also figured it might be a good idea to order the results from newest to oldest, so if you've got X junk addresses that fail to encode every time, they don't take over all attempts every time the drush command runs.

$query->orderBy('node.nid', 'DESC');

I also fixed some coding standard.

And, of course, this whole thing will need to be reworked to apply to all entities instead of just nodes before it can be marked RTBC.

mariacha1’s picture

Title: Add a --limit option to drush command » Add a --limit option to drush command, enable for any entity
FileSize
5.91 KB
5.98 KB

Here's my latest changes, which allow updating geocodes for any entities.

I also noticed that the $limit was applied on a per-entity/per-field basis. So if I had 3 entities, each with their own geocoded fields, and set the limit to 50, I would send 150 requests to the server, instead of what I'd requested.

I believe I've fixed that.

I'm including geocoder-drush-2077317-05.patch and interdiff2.txt, which shows the difference between patch -05 and -04.

duntuk’s picture

@mariacha1 So is this the proper command?

drush -d geocoder-backfill --limit=2500

UPDATE:

Yup, that patch and command works great! Thanks @mariacha1!

duntuk’s picture

For convenience to others, here's the latest DEV version with #5 (geocoder-drush-2077317-05.patch) applied patch.

jhedstrom’s picture

One potential issue, with large amounts of bad data, it will be possible that bad requests are always sent first. The response from Google in the case of a bad address:

WD geocoder: Exception: Google API returned bad status.\nStatus: ZERO_RESULTS in      [error]
geocoder_google() (line 58 of

If there are enough of those, with the current patch, they will always be run first, and if there are more than the amount set in --limit, or the daily limit of Google (or other geocoding service), then no progress will ever be made. I wonder if the patch should set an empty row in the event of such a response?

mariacha1’s picture

I'm worried about potential unintended consequences for adding blank entries into the geocoder field in the database.

At this point my inclination is to add a more meaningful error message in the case of such failures, so they can be cleaned up manually. Perhaps a message in the watchdog table with information on what was passed and where to edit the address that failed?

With some fancy query footwork, we might even be able to cross-reference such geocode errors with these watchdog entries in future geocoder-backfill calls to avoid re-calling them. (Although I'm inclined to make this a configurable option instead of the default behaviour to avoid bad data bad habits.)

mariacha1’s picture

Status: Needs review » Needs work
jhedstrom’s picture

I like the idea of adding watchdog messages, ideally with a link to the offending entity. This way the bad entries could be manually cleaned up.

mpgeek’s picture

Watchdog linking has been implemented in #2221663: Add a link to offending entity in watchdog message when geocoding fails, when backfills happen, watchdog gets messages that link to the entity in question.

Towards more backfill control, I wanted to be able to backfill field collections without backfilling the host entities (i have multiple geofields), so i added a --only-entity option. Usage:

drush geocoder-backfill --limit=500 --only-entity=field_collection_item

Should work with any entity type.

jessehs’s picture

This patch adds a --only-bundle flag as well, so you can limit geocoding for bundle type in addition to entity type. So if you limit to the node entity, you can also limit to node-type.

Usage:


drush geocoder-backfill --only-entity="profile2" --only-bundle="my_profile_type"

mpgeek’s picture

Status: Needs work » Needs review
mariacha1’s picture

Title: Add a --limit option to drush command, enable for any entity » Add a --limit,--only-entity, and --only-bundle option to drush command, enable for any entity
Issue summary: View changes
quicksketch’s picture

Hi guys, on a site I run, we've been running into an issue where Google rate-limits us to a certain number of geolookups per second, in addition to the per-day limit. As discussed over in #2024809: Allow using a google API key, we're limited to 20 lookups per second (essentially), and it's very predictable behavior for us. Adding a 1-second delay ever 20 lookups solves the issue for us (up to the daily limit of 2500 look ups). That problem looks like it would clearly affect users doing drush requests like this, so I wanted to see if anyone else had experienced this problem here.

quicksketch’s picture

Sorry ignore my #16 comment. The issue #1672742: Avoiding OVER_QUERY_LIMIT due to more than 5 requests in a second in google geocode already solved this problem, it's just been more than a year since the last stable release.

jhedstrom’s picture

Latest patch is throwing this notice:

Undefined variable: force_reload geocoder.drush.inc:95 

and looking at the code, I don't see anyplace where that variable is actually set.

jbiechele’s picture

Adding at about line 38:
$force_reload = drush_get_option('force');
resolves it for me.

The line was deleted in the patch #13.

mariacha1’s picture

Ooops, looks like I accidentally removed the --force option way back in patch 4. Re-adding it and providing an interdiff.

paypaul’s picture

Unless I am missing something the latest patch is giving me a PDOException:
-------
WD php: PDOException: SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL [error]
syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '=
'user') AND (geo.entity_id IS NULL )
ORDER BY geo_ent.uid DESC' at line 2: SELECT geo_ent.uid AS uid
FROM
{users} geo_ent
LEFT OUTER JOIN {field_data_field_geolocation} geo ON geo_ent.uid=geo.entity_id
WHERE ( = :db_condition_placeholder_0) AND (geo.entity_id IS NULL )
ORDER BY geo_ent.uid DESC; Array
(
[:db_condition_placeholder_0] => user
)
in drush_geocoder_backfill() (line 85 of
/home/apache/www/newkameleon.dev.modernmedia.ca/html/sites/all/modules/geocoder/geocoder.drush.inc).
-------

I will try and sort out the issue myself I just thought I should let someone know. I am hoping this will be the final piece needed to finish my migration from Drupal 6.

Thanks

dubcanada’s picture

#20 doesn't seem to work properly if you run it on entities attached to users. The $entity->original = array(); causes issues. If you change

if (empty($diff)) {

to

if (empty($diff) && !drupal_is_cli()) {

In geocoder.widget.inc on line 244 it solves the issue completely while continuing with the regular path for normal usage.

mpgeek’s picture

@paypaul, @dubcanada it would be helpful to know how you are calling the command, i.e. what did you type? Also, I've used this many times on profile2 entities, so i'm wondering if something else is amiss here.

dubcanada’s picture

The regular drush command (drush geocoder-backfill). There is an addressfield field added via Account Settings to the user profile.

When geocoder goes to determine if the entity needs to rerun geocoder (via determining if the user was modified, comparing the original with the new values) it determines there is no difference (because there isn't) so it just stops there.

Does that make sense?

jessehs’s picture

mvc’s picture

this works perfectly for me for geocoding nodes, except that i had to patch revisioning: #2276657: Undefined property errors in combination with geocoder thanks!

not marking RTBC because a previous poster mentioned a problem with user entities, which i didn't test.

jessehs’s picture

This patch addresses a few differences between the way user entities are stored and the way geocoder handles users. It also incorporates the patch from the issue mentioned in #25. Please test with geocoder/geofields put directly on the user object (rather than a profile2 entity associated with the user).

jessehs’s picture

Doh! This fixes an indentation error.

mvc’s picture

This patch is so useful, I can't help but add more options :)

I have perhaps 5% of my dataset returning ZERO_RESULTS exceptions, which is fine, except that when I use the --limit flag I keep running over those same nodes over and over again. So, inspired by the migrate module, I added optional --idlist, --max-id, and --min-id flags which add further conditions to the query. The bundle, entity type, and limit flags will still be respected, but this will help me poke around in my database and find batches of ~2500 nodes to geocode.

marcvangend’s picture

Status: Needs review » Needs work

The patches in #28 and #29 threw errors at me, no matter what I tried...

$ drush geocoder-backfill
SQLSTATE[42S22]: Column not found: 1054 Unknown column 'geo_ent.nid' in 'field list'

$ drush geocoder-backfill --only-entity=node --only-bundle=project
SQLSTATE[42S22]: Column not found: 1054 Unknown column 'geo_ent.nid' in 'field list'

Going in with a debugger, I found that drush tries to execute the following query:

SELECT geo_ent.nid AS nid
FROM 
{users} geo_ent
LEFT OUTER JOIN {field_data_field_project_geolocation} geo ON geo_ent.vid=geo.revision_id
WHERE  (type = :db_condition_placeholder_0) AND (geo.entity_id IS NULL ) 
ORDER BY geo_ent.nid DESC

Of course the users table doesn't have an 'nid' column. Also, I don't get why this query is necessary if I'm restricting to nodes only.

jantoine’s picture

Status: Needs work » Needs review
FileSize
9.05 KB
630 bytes

@marcvangend,

I was receiving the same errors as you when using patches from #28 and #29 as well. It appears that in patches #27 and #28, the entity base table was replaced with a hard-coded 'users' table (see the interdiff in #27). The attached patch fixed this issue.

minorOffense’s picture

Status: Needs review » Needs work

On translatable nodes the following errors occur when trying to geocode

drush geocoder-backfill -l spotlight --idlist=9977 --only-entity=node --only-bundle=venue
Attempt to assign property of non-object translation.handler.inc:917                                       [warning]
WD node: EntityMalformedException: Missing bundle property on entity of type node. in entity_extract_ids() [error]
(line 7788 of /var/www/html/ottawa.ca/includes/common.inc).
WD php: EntityMalformedException: Missing bundle property on entity of type node. in entity_extract_ids()  [error]
(line 7788 of /var/www/html/ottawa.ca/includes/common.inc).
EntityMalformedException: Missing bundle property on entity of type node. in entity_extract_ids() (line 7788 of /var/www/html/ottawa.ca/includes/common.inc).
Drush command terminated abnormally due to an unrecoverable error.
stevenx’s picture

I can confirm the issue #21 reported by paypaul
I applied patch #13 running with 7.x-1.x

geocode from other field -> addressfield in normal USER

running drush geocoder-backfill

 
$ drush geocoder-backfill

WD php: PDOException: SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; check the[error]
manual that corresponds to your MySQL server version for the right syntax to use near '= 'user') AND (geo.entity_id IS NULL )

ORDER BY geo_ent.uid DESC' at line 2: SELECT geo_ent.uid AS uid
FROM 
{users} geo_ent
LEFT OUTER JOIN {field_data_field_my_geoloc} geo ON geo_ent.uid=geo.entity_id
WHERE  ( = :db_condition_placeholder_0) AND (geo.entity_id IS NULL ) 
ORDER BY geo_ent.uid DESC; Array
(
    [:db_condition_placeholder_0] => user
)
 in drush_geocoder_backfill() (line 85 of /sites/all/modules/geocoder/geocoder.drush.inc).
PDOException: SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '= 'user') AND (geo.entity_id IS NULL ) 
ORDER BY geo_ent.uid DESC' at line 2: SELECT geo_ent.uid AS uid
FROM 
{users} geo_ent
LEFT OUTER JOIN {field_data_field_my_geoloc} geo ON geo_ent.uid=geo.entity_id
WHERE  ( = :db_condition_placeholder_0) AND (geo.entity_id IS NULL ) 
ORDER BY geo_ent.uid DESC; Array
(
    [:db_condition_placeholder_0] => user
)
 in drush_geocoder_backfill() (Zeile 85 von /sites/all/modules/geocoder/geocoder.drush.inc).
Drush command terminated abnormally due to an unrecoverable error.                                                           [error]
root@server2 [/sites/all/modules/geocoder]# 

stevenx’s picture

Never mind. I reverted and applied now #31 which works.

thx

hockey2112’s picture

I used the patch in #31, but it doesn't appear to be working. When I run the backfill command via SSH, it returns this, almost immediately:

Found command: geocoder-backfill (commandfile=geocoder) [0.63 sec, 37.14 MB] [bootstrap]
Command dispatch complete [2.73 sec, 47.46 MB] [notice]
Timer Cum (sec) Count Avg (msec)
page 2.637 1 2636.81

No new locations have been geocoded, and there are no errors in the log. What am I doing wrong?

Also, when I patched the module (latest dev, released 2016-Feb-01), the geocoder.widget.inc potion fails with this message:

patching file geocoder.drush.inc
patching file geocoder.widget.inc
Hunk #1 FAILED at 231.
1 out of 1 hunk FAILED -- saving rejects to file geocoder.widget.inc.rej

Pol’s picture

Status: Needs work » Closed (outdated)
donquixote’s picture

I made an issue and PR in github.
https://github.com/drupol/geocoder/issues/15
https://github.com/drupol/geocoder/pull/16

Currently contains a lot of unrelated commits, because the github repo is behind Drupal core.. but I am sure @Pol will fix this :)

Pol’s picture

Fixed!