Same request as a similar recent issue, but not directly related & more generic.

Starting a big project using the GeoNames db, and if I get a positive quick response, I'll release as a simple module that maintains and queries this data source (time permitting).

This contains simple lat/long points:

latitude : latitude in decimal degrees (wgs84)
longitude : longitude in decimal degrees (wgs84)

And I'm about to add another column POINT (using Postgres), but I thought that I see if this is possible with Geo to make this more generic.

Due to the size, a node based solution is simply not feasible.

Any suggestions?

Comments

allie micka’s picture

What you really want to do is write a handler for Geocode. I've always wanted a Geonames backend for Geocode, and nobody has had the time to write one.

What the Geocode module does is allow its backends to declare what type of data they're able to accept (e.g. text string), and what type of data they're able to return (point data, text representations of other data, etc.)

It is likely that you want to start with a dependency on http://drupal.org/project/geonames From there, writing a Geocode backend would be pretty short work.

Once that exists, we'll all have access to Openlayers data for CCK formatters and Views filters ("find me things located near [geonames-data-for] my city".

You will also be able to use Geocode to get point data into Geo's CCK fields, your own fields, or into a custom module by simply calling geocode().

Why use Geo to store your data? The easy part is storing values into a column. The tricky parts, which Geo aims to solve, are:

  • Providing a variety of flexible ways of obtaining Geo information from users (via Geocode for Postal/Geonames/etc. conversions, Openlayers, etc.)
  • Relating different geometries to each other. This is the key thing. Geo keeps a record of all datasets available, and what format they use. Because it knows what's what, it can be used to relate different types of items (nodes near "me", points within a political district, cities covered by a road trip's linestring, etc.)

Your code to create a new Geo Dataset would look something like this:


  // Load or create a Geo dataset.
  if (!$geo = geo_load(array('name' => 'unique_name_of_dataset'))) {
    $geo = geo_sql_load();
  }
  $geo->setTitle($title);
  $geo->setSRID($your_srid);
  $geo->setGeoType(GEO_TYPE_POINT);
  $geo->setTableName($table_name);
  $geo->setColumnName($geo_column);

  // Save the dataset to the Geo API.
  geo_save($geo);

  // Schema definition for whatever other columns you want.
  $schema = array(
    $table_name => array(
      'fields' => array(
        'your_column' => array('type' => 'int'),
      ),
    ),
  );

  // Merge Geo's field(s) into your field definition.  
  // There will be one field, called 'geo' of type GEOMETRY.  The array_merge is required to support
  // possible alternatives such as individual lat / lon columns.
  $schema['fields'] = array_merge($schema['fields'], $geo->sqlFieldDefinition('geo'));
alan d.’s picture

Thank you for the very clear and detailed answer.

I'll see if I have the time in the next few weeks to have a play. Currently I have the import in the following table:

CREATE TABLE geonamesdb_data (
  geonameid integer NOT NULL,
  "name" character varying(200) NOT NULL DEFAULT ''::character varying,
  lat numeric(13,10) NOT NULL DEFAULT 0,
  lng numeric(13,10) NOT NULL DEFAULT 0,
  fclass character(1) NOT NULL,
  fcode character varying(5) NOT NULL,
  country character(2) NOT NULL,
  admin1 character varying(20) NOT NULL DEFAULT ''::character varying,
  admin2 character varying(80) NOT NULL DEFAULT ''::character varying,
  admin3 character varying(20) NOT NULL DEFAULT ''::character varying,
  admin4 character varying(20) NOT NULL DEFAULT ''::character varying,
  population bigint NOT NULL DEFAULT 0,
  elevation integer NOT NULL DEFAULT 0,
  gtopo30 integer NOT NULL DEFAULT 0,
  timezone integer NOT NULL DEFAULT 0, // normalised into timezone table
  modified integer NOT NULL DEFAULT 0, // timestamp
  pt geometry,
  CONSTRAINT geonamesdb_data_pkey PRIMARY KEY (geonameid),
  CONSTRAINT enforce_dims_pt CHECK (st_ndims(pt) = 2),
  CONSTRAINT enforce_geotype_pt CHECK (geometrytype(pt) = 'POINT'::text OR pt IS NULL),
  CONSTRAINT enforce_srid_pt CHECK (st_srid(pt) = 4326)
)
WITH (
  OIDS=FALSE
);

Even after filtering out a lot of the unwanted types, I still ended up with 4.5 million rows. The queries for distance within xxx were still acceptable with the latest versions Postgres/Postgris. I think I read that Postgris automatically adds a bounding box limit from v1.5 that simplified the query.

Any module will have to use a batch API, as the import took forever, about 3 hours. It is so much easier to set the memory high with no timeouts!

alan d.’s picture

K, I got this working on the base data install, slight modifications:

<?php
  // Load or create a Geo dataset.
  module_load_include('inc', 'geo', 'includes/geo.api');
  $geo = geo_load(array('name' => 'geonamesdb'));
  // Do this conditionally, otherwise a new one gets created with every schema call
  if (!$geo) {
    $geo = geo_sql_load();
    $geo->setTitle('geonamesdb');
    $geo->setName('geonamesdb');
    $geo->setSRID(GEONAMESDB_WGS84);
    $geo->setGeoType(GEO_TYPE_POINT);
    $geo->setTableName('geonamesdb_data');
    $geo->setColumnName('geo');

    // Save the dataset to the Geo API.
    geo_save($geo);
  }
  $schema['geonamesdb_data']['fields'] = array_merge($schema['geonamesdb_data']['fields'], $geo->sqlFieldDefinition('geo'));

?>

And the data was successfully retrieved using:

<?php
  // Populates $items with all values as wkt, eg: "POINT(1.12 45.3454)"
  $geo = geo_load(array('name' => 'geonamesdb'));
  $res = db_query("SELECT ". $geo->AsBinary($geo) .' AS geo  FROM {'. $geo->tableName() .'} ');
  $items = array();
  while ($row = db_fetch_array($res)) {
    $items[] = geo_value(db_decode_blob($row['geo']), 'wkt', 'wkb') ;
  }
?>

Issues with the definition, no indexes generated.

As I work through the non-field usage, I update the post. As stated above, 3.5 Million nodes is going to kill the site, but I need access to the values!

ahtih’s picture

Things seem to have changed a bit since then, because "if (!$geo)" is never triggered even if the dataset did not exist. The following works (using "geonamesdb" name as an example):

function geonamesdb_install() {
  module_load_include('inc', 'geo', 'includes/geo.api');

  // Create a Geo dataset.
  $geo=geo_sql_load();
  $geo->setTitle('geonamesdb');
  $geo->setName('geonamesdb');
  $geo->setSRID(GEONAMESDB_WGS84);
  $geo->setGeoType(GEO_TYPE_POINT);
  $geo->setTableName('geonamesdb_data');
  $geo->setColumnName('geo');

  // Save the dataset to the Geo API.
  geo_save($geo);

  drupal_install_schema('geonamesdb');

  // Add database spatial index.
  $geo->setIndexed(TRUE, TRUE);
}

function geonamesdb_schema() {
  $schema = ... // define the table for Schema API as usual

  // Add Geo field definition to schema
  module_load_include('inc', 'geo', 'includes/geo.api');
  $geo = geo_load(array('name' => 'geonamesdb'));
  $schema['geonamesdb_data']['fields'] = array_merge($schema['geonamesdb_data']['fields'], $geo->sqlFieldDefinition('geo'));

  return $schema;
}
ahtih’s picture

Actually, there is another problem with all of the above code: it assumes that by hook_install() time, Geo is already properly installed. This assumption is not always correct when Geo is being installed/enabled at the same time as our module; Drupal does not observe module dependency order in this case, and calls their hook_install()'s at arbitrary order.

Other modules use hook_requirements() to get around that, perhaps we should too. But this would mean installing our whole schema in hook_requirements('runtime') rather than in hook_install(). I don't know if this is ok from Drupal core's viewpoint? Does anyone know?