Closed (duplicate)
Project:
Location
Version:
6.x-3.x-dev
Component:
Code
Priority:
Normal
Category:
Feature request
Assigned:
Reporter:
Created:
14 Oct 2006 at 18:09 UTC
Updated:
20 Nov 2009 at 18:38 UTC
Jump to comment: Most recent file
Comments
Comment #1
geodaniel commentedI second that request, there shouldn't be a dependency on having zipcodes for the proximity search to work
Comment #2
fortytwo commentedI guess, one of the problems is performance.
You'd have to do something like (attention metacode)
SELECT * FROM locations WHERE calc_distance(lat, lon, $slat, $slon) < $max_dist
calc_distance is a somewhat complicated function (there are lots of those functions floating aroung in the internet, this one seems to be one the more accurate):
degrees(acos(sin ( deg2rad ( $slat ) ) * sin(radians(lat)) + cos ( deg2rad ( $slat ) ) * cos(radians(lat)) * cos(radians($slon - lon)))) * 69.172
doing this for every record in the location-table will result in sluggish response-times with high sql-server-load. especially, if you have thousands of records. That's why the developers decided to limit the query to postal codes first and then calculate the distances in php.
a possible solution could be to limit the query by a simpler algorithm:
$lat_range = $max_dist/69.172;
$lon_range = abs($max_dist/(cos($slat) * 69.172));
SELECT * FROM location WHERE lat BETWEEN $slat - $lat_range AND $slat + $lat_range AND lon BETWEEN $slon - $lon_range AND $slon + $lon_range
for these results you can calculate the exact distance via above formula and display those, whose distance is smaller than max_dist
Regards 42
Comment #3
geodaniel commentedGood point about the performance on bigger sites. I like your idea of having a bounding box to limit the results in the SQL (a nice easy calculation on an table with an index for the lat/lon fields) and then further narrowing the results down to the actual radius in PHP afterwards - I've done something similar for projects in the past.
Comment #4
adminfor@inforo.com.ar commentedHow many beers to get it done?
Comment #5
adminfor@inforo.com.ar commentedI´m not an expert but I was looking for more information in the subject.
I noted that in the formula submitted by 42 It´s necessary to compute an absolute value in the second term of the formula, when substracting longitude for the meridians convergence near to the poles.
degrees(acos(sin ( deg2rad ( $slat ) ) * sin(radians(lat)) + cos ( deg2rad ( $slat ) ) * cos(radians(lat)) * cos(radians($slon - lon)))) * 69.172
Should be:
degrees(acos(sin ( radians ( $slat ) ) * sin(radians(lat)) + cos ( radians ( $slat ) ) * cos(radians(lat)) * cos(radians(abs($slon - lon))))) * 69.172
And, for a kilometers calculus, 69.172 should be replaced by it´s equivalence 111.325
Regards,
Gustavo
Comment #6
jamesJonas commentedThe best method I have found is based on a two phase proximity search with support of a spatial index inside your database engine. The first search bounds the area by a box. This is only an estimated distance, but it can be very fast and greatly reduce the set of locations. The second phase refines the area via the more costly “round world” algorithm (similar to the one above). The key is using a database engine that support a spatial index mechanism to perform the first phase (box phase). I have used mysql, but I have heard Postgre also has spatial support.
The speed boost using the technique is impressive and what I have read seems to be how most large GIS systems are built. What does not work for large sites is to look at each row, calculate the distance (algorithm of choice) and kick out the answer. This works great for a small number of locations, but when you move into a large number of locations the performance can bring your system to its knees. The next trick it to create a set of spatial indexes for you most common searches (3, 6, 9, 25, 50 miles).
If there is some interest I can start detailing the algos and sql. I just mocked up a working model but can attest that the speed increase. I can outline how it works but someone else needs to help implement it into the location_views.module. This is where I got stuck.
-james
Comment #7
m3avrck commentedSubscribing... I'd love to see this as well :-)
Comment #8
wolfderby commentedI can't get my basic zip code search to work but this interests me as well.
I know very little about all of this but would converting lat and long into a zip code help?
Comment #9
ankur commentedCurrently, there are a few things necessary to get the zipcode search to work: (1) the search module needs to be enabled, (2) appropriate permissions need to be granted for searching by location (and this only exists for nodes as of now), and (3) you need to import zipcode data for the countries for which you are trying to search data.
Currently, the search function is (un-ideally) dependent on zipcodes. The way the whole zipcode-lat-lon search thing works is as follows:
When a location-enabled node-type is submitted, the location module looks at the zipcodes table to see if the zipcode-country combination is in the table. If it is and is accompanied by an approximate latitude-longitude pair, those are recorded to the location table for this node having these latitude-longitude. However, the search function currently ignores the latitude-longitude data in the location table. When you search for nodes within a certain proximity now, the location module goes through the zipcodes table to see which zipcodes fall within the search radius (sometimes caching the resultant list). Then it goes on to see which nodes in the location table have the same zipcode.
Ideally, we would query the location table directly for nodes that fall within a search radius. This would be beneficial esp. because zipcode data isn't available for all countries and because google now provides free geocoding (which gives more accurate lat/lons for a given address) for a lot more countries than when this search function was originally written.
I just haven't gotten around to it but am open to anyone who wants to submit patches for this issue if I don't get the chance to work on it anytime soon.
-Ankur
Comment #10
m3avrck commentedJames if you can post your logic/psuedo code + SQL code I'd be willing to help write a patch to make that work :-)
Then we could get rid of that giant zip codes database... mmm.
Comment #11
m3avrck commentedNot sure of the implementation details of this, but this new query should be *cached* in a separate table if need be. See http://drupal.org/node/107567 for more details. This is very easy to do in 5.x and keeps the main cache clear of all of these location queries which can fill it up extremely fast.
Comment #12
JohnG-1 commentedi spy with my little eye something beginning with ... a really good idea ... (tracking)
Comment #13
karens commentedJust to clarify, are we talking about the location search tab or the proximity search feature in Location Views? They are different, but either or both could benefit from these kind of changes.
Comment #14
Scott Reynolds commentedSubscribing to this node. I am going to start some small work on it using R-Trees and only lat long
Comment #15
jamesJonas commentedJust completed a very rough cut of a spatial.module using R-tree indexes as a proof of concept.
See: FatAtlas.com (note: this is a rough cut and subject to change)
(1) Input a US Place Name (city, beach, mountain, stream, valley .... make it as obscure as you want). Nothing up my sleeves.
(2) Click on the bubble on the retrieved map and click on the your Place Name.
(3) This will take you to the node for that location. Take a look at the bottom of the page. You will see more than a dozen views that have been generated in real time. If you are using firefox you can hit the 'Reload current page' to insure that the page is a current page and not served up via my edge cache.
Example: http://fatatlas.com/node/1327060 this is New York (in New Mexico!)
(4) Follow any of the 'More' links.
(5) Take a look at the URL (my view form is working but not quite ready, so here is the URL passing arguments into the view)
Example: http://fatatlas.com/atlas/local/Environment/-107.526670/35.058610/10/
Format is longitude, latitude, radius, radius unit (km or mi)
Example of kilometers: http://fatatlas.com/atlas/local/Environment/-107.526670/35.058610/20/km
Change the lat lon to anything you want (US only for right now). Retest.
Results:
My tentative measurements are showing a cost per view of .1 to .15 seconds. Not great, but within an acceptable range for hitting 2 million location enabled nodes.
Alternative:
The alternate method is to draw a box around the point and then use greater than or lesser than for the lat and lon. This is quite acceptable for smaller datasets and where the you don't mind working only with boxes.
The downside of R-trees is that you are now pushing the limits of the database versions that are acceptable to drupal and the implementations will not easily span both mysql and Postgre. But if you need rocket speed, this is the way to go.
James
FatAtlas.com - US Atlas - 2 million places - Thanks Drupal
Comment #16
Scott Reynolds commentedMind sharing your spatial.module with us. R-Tree is the way to go for fast queries.
Comment #17
geograt commentedI think that lat/long search and proximity search are the best way to make the location module more universal.
Localities.module (http://project.para.ro/localities) it's strong and faster doing this two things with support of maxmind database.
Maybe the solution is more simple than to get again together the formulas and harcoding, The more apropriate is to brige localities.module with location.module to work together.
Comment #18
vangorra commentedLat/lon coordinates stored for the node should be the only fields queried when performing a proximity search. Focusing on the proper algorithm to perform this task is important. It is just as important to make sure that lat/lon coordinates are set on all possible location enabled nodes.
If the user knows the coordinates (unlikley), then they can provide them, but what if they do not? (most likely)
In the second condition, automated search for the coordinates based on the address should be done.
Does anybody know if the GMap fellows are working on this? If not, maybe I will. Here is the google api link to perform such a task for anybody who's interested.
http://www.google.com/apis/maps/documentation/#Geocoding_JavaScript
So, I propose that this is a two-sided problem.
1) Proximity search improvement
2) Lat/Lon input improvement
Comment #19
jamesJonas commentedvangorra is correct. There are two areas, the first being simple and easy lat/lon setting inside every node (geocoding when necessary) and then proximity searches using these lat/lon. I have most of this done in Drupal and will setup some demos when the code settles a bit more.
James
FatAtlas.com
Comment #20
Scott Reynolds commentedfor those of you still waiting for this...
Here is the drupal db code I have used to generate proximity searches
Where $lat and $long is the point in the center and $distance_miles is the max distance from the center and $k is the max number to return
Comment #21
ellisgl commentedYeah that would work... But you would really choke your server if you have a lot of records (like a project that I'm working on that has millions of them....).
The almost best way - which I'm still trying to figure out how to code it - is to have you center point - calc the distance to the max lat long for north - east - south - west. so if you point is (15, 15) and your north point would be (20, 15), east would be (15, 10) west would be (15,20) and south would be (10,15)
So we know that would would do something that would inbetween x(10-20) y(10-20).. Of course it wouldn't be radius, but it would be a square area.
Comment #22
ankur commentedI have been working on this and have a version that uses pager_query() instead of db_query() to pull results and break them up over pages. My version of the patch is pretty much complete, but I do need to comment out some debugging code and do some other cleanup. There might be some other details that still need fixing that I don't know about, that further more thorough testing can filter out.
Things that are left out of this patch that can be fixed in later issues include:
(1) Current search still relies on zipcode data for finding the lat/lon of the search-point. The search algorithms now finds results by looking directly at the lat/lon columns in the location table, rather than finding a list of zipcodes from the lat/lon column of the zipcodes table and then using that list of zipcodes to return location rows with zips in that list. This can be remedied by making the code geocode the submitted zipcode via google or whatever might be configured for a certain country, possibly falling back on the zipcode if no geocoding web-service is available.
(2) The current search code only allows a zipcode to be used for a search point. I'd like that to be configurable and would like to allow admins to configure what can be used, or maybe just include an option that allows a full address, but requires either at least a zipcode or at least city, province... kind of like how yahoo search or google search does it.
(3) There are plenty of other things that can make the search more useful, but I want to emphasize the above 2 as being what I really want to add before going on with this list below:
* Allowing users multiple addresses in his/her user account
* Allowing users to select from a drop down as a search point one of his/her own saved locations on his/her account
* Allowing configurations of which node types can be searched for on the location search and/or merging search-by-location w/ the main search page (being able to legitimately merge location search w/ drupal's main search will require patches to core)
* Adding code to search users by location in the admin section
Comment #23
ankur commentedWell, I cleaned up (or just commented out) the ugly debug code and fixed a couple of other things. I've committed this to the DRUPAL-5 branch, but have not given it a release tag yet. I'm going to let this sit for a few days and wait to see if anyone notices anything grossly wrong (as this was a quite huge patch). One area for things to be grossly wrong is my returning location.css, which tries to format the location entry form in a way more easy on the eyes. The new location.css was obviously composed by someone not too sharp with their CSS skills (i.e., me). So if anyone familiar w/ CSS gets a chance to look at the file, please feel free to submit a patch. I'm guessing it should be pretty obvious what I was trying to do w/ my CSS rules.
Leaving this issue open until I'm convinced it's ok... for now.
-Ankur
Comment #24
vangorra commentedAnkur,
Not to be critical, but why not just use a table to structure the fields for the location fieldset?
Comment #25
Mister_Krunchee commentedIdeally you would use a cck address field, allowing blank sections to determine scope. This way, when creating a node in a specific postcode, of a city, in a state of a country, you could choose which level the node spans. EG Is this node relevant only to people in the same city? Or is it relevant to the whole state? Is it relevant just in country X or globally?
For example, San Diego is a city in the State of California in the USA. Therefore:
If (in the node creation) you fill out:
If you also filled out City=San Diego then the node would only be considered relevant in San Diego (not the rest of California). Filling in nothing would presumably mean relevant worldwide.
If someone searches for San Diego then nodes with:
This is also simple for users to fill out as you just tell them to fill in address fields down to the area of relevancy. No need to know lat/long co-ordinates. No need for slow distance mapping calculations. Instead of a huge worldwide postcode DB you just have a city DB with what state & country (and possibly 1 wider area zone EG North America, Europe, Asia, Oceania, Africas etc) each city is in.
Fast. Easy.
Comment #26
ankur commentedTo Vangorra in #24, Are you talking about using tables to format the display of locations, or the location form?
I've themed the location form into a table, but display of locations might be a bit tricky since we got some microformatting thing going on with that and I'm not quite sure if a table formatting will interfere w/ the microformatting. If you could look into this and submit a patch that does the table formatting while preserving the microformats, I'll commit it.
In the meantime, I think the current search function is ready to go. I think I'm going to focus on allowing people to enter more than or (alternatively) just a city/state-province for the search parameters in a location-based search.
-Ankur
Comment #27
vangorra commentedI'm referring to the location form, but sounds like you've taken care of it.
Comment #28
vangorra commentedI'm referring to the location form, but sounds like you've taken care of it.
Comment #29
Scott Reynolds commentedAnkur,
What is the point of the square? It doesn't do you any good you still LOOK at every row in the database. It just made your SQL _FAR_ more complex. I don't see the point to the square at all. It doesn't gain you anything, just makes your life harder by adding more points of failure.
My suggested query: #24 is a commonly used query, I am not the first nor the last to have suggested it.
Comment #30
ankur commented@Scott Reynolds,
I'm not sure if you understand the point of the square. Suppose you have in your query to conditions where you AND the conditions:
In our case, condition1 is easier and quicker to compute than condition2 and condition1 serves to narrow down the qualifying rows to a much smaller subset than all the rows in the table. So, either by MySQL's query optimization (or perhaps the order in with the conditions are listed) the query will eliminate rows quicker for the simpler condition and then check the more complex condition for the remaining rows. In this case, it's easier to filter out undesired lat/lon pairs in the location table using conditions that involve less-than and greater-than queries for the four corners of the lat/lon square and *then* go through the trouble of narrowing down the qualifying lat/lon points using the trignometric calculations in the second condition (since MySQL trig functions take more cpu cycles than the simple less-than and greater-than computations used in the first condition).
Imagine if you hundreds of lat/lon coordinates in the location table that span the entire world or an entire continent or country. If you want to run a query that asks which lat/lon points in the table fall within a small radius of a lat/lon search-point, then running just the trig query is going to take forever because you will be performing the computation for EVERY row in the table. However, if you're search result is just going to be 10 or 20 results that fall within the circle, then it is much quicker to first enforce a condition in the query that asks for the (hypothetically speaking) the 30 or so results within the square defined by the same radius and then just have the trig portion of the query ("condition2") compute which of those points within the square of radius x fall within the circle of radius x.
I encourage you to insert some calls to the time() function and output some drupal_set_message() calls to tell you how long the current queries take and then to change the query to remove the less-than/greater-than portions of the query and do the same. I did this when i first wrote the module a couple years back and noticed the difference. There will be a major difference.
Regarding comment #24, that comment was in regard to how the location input form is formatted, not the query. I have since changed the location input form to use a table.
Comment #31
ankur commentedI should really check my grammar and spelling before clicking submit.
Comment #32
dgtlmoon commentedMy two cents, given drupal's really high amount of SQL queries for url_alias, taxonomy requests etc, i dont know if it's such a big worry for something like http://drupal.org/node/89220#comment-457546 to happen
if you have 2 million nodes in your site, we'll that's a problem i'de like to have so i can generate revenue to setup a server farm todo just this.
Comment #33
gagarine commentedtrack this feature request
Comment #34
gagarine commentedAnd with the Harversine formula?
http://en.wikipedia.org/wiki/Haversine_formula
Here an implementation in PHP:
http://www.pjenkins.co.uk/blog/index.php/2007/04/04/uk_post_code_distanc...
Some discussion about implementation in SQL:
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=81360
And a very interesting SQL implementation with bounding box:
http://forum.geonames.org/gforum/posts/list/522.page
Comment #35
bdragon commentedI will be fixing this thoroughly in the future.
Actually, I'm anti "postal code based search" in the first place. Postcodes do NOT map to reality.
The CORRECT solution is to use tables with spatial indexes, with a fallback to a bbox search + a manual distance filter.
Comment #36
gagarine commentedYou are right. I read http://dev.mysql.com/doc/refman/5.0/fr/spatial-extensions.html very interesting...
Comment #37
tanoshimi commentedsubscribing. (sorry - wish there was a better way to do that).
p.s. is there?
Comment #38
markDrupal commented@tanoshimi
you could just make a sly comment and pretend that you are participating in the discussion but in fact you are just subscribing
like i just did
Comment #39
socialnicheguru commentedsubscribing
Comment #40
ankur commentedI believe I made a commit a while back that made the search directly use the lat/lon columns of the location table instead of the lat/lon of the zipcodes table (where the matching postal codes were then used to select rows from the location table).
See http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/location/lo...
and comment #23 above.
Getting this in was important after Google geocoding was added to this module. Google geocoding is much more accurate than using an approximate central lat/lon for a postal code and you can get exact coordinates even without a postal code given an a proper street address.
If the current maintainers have a plan to use spatial indexes or whatever, perhaps that should be opened up as a separate ticket. There's too many people talking about different things on this thread.
-Ankur
Comment #41
aaron commentedwell, we still need a way to do proximity search by postal code instead of lat-lon. for instance, if a site collects only postal codes from users, rather than full addresses, one might want to do a rough proximity search.
for now, i'm going to store location_latlon_rough on the user locations and see if that tricks the system.
Comment #42
aaron commentednm, that should be working, actually, but it's not. looking at location_nodeapi, it should be setting the location's lon/lat with location_get_postalcode_data, but it doesn't seem to be. i'll look more into it, and open a new issue if needed. otherwise, just ignore my rambling :/
Comment #43
gagarine commentedI don't test it for the moment...
But http://drupal.org/project/gproximity provides the ability to enter an address and the closest locations to the address will be listed next to the map.
Comment #44
bdragon commentedMarking as duplicate of #319278: [master][meta] Location_search rewrite.
Comment #45
Cactii1 commentedMy site is based around a country - Mexico. Here in Mexico 1 decimal degree is equal to about 100km so putting together a "Nearby Nodes" block was quite simple. I'll share my code just in case someone can use it. If you can improve on it for me or make it into a mini (cacheable) module I'd appreciate it - just share the code! Right now I have it plugged into the template file for the content type that I want to use it with.
Comment #46
Bill Choy commentedYou can change the DB Engine as I did in my modules/geo/includes/shp2sql.inc line 60-ish
db_create_table($ret, $table_name, $schema);
db_query("ALTER TABLE {$table_name} ENGINE = MyISAM");