Project:Parsing API
Version:6.x-1.8
Component:User interface
Category:support request
Priority:normal
Assigned:Unassigned
Status:closed (fixed)

Issue Summary

I think this is what I need to do some scraping....but what modules support this API?

I don't have the skills to use a bare API like this.

What I want to do is to tell my Drupal site to got to a user's submitted URL and scrape the page for location data (a bunch of GPS coordinates) and then put each of those coordinates into individual nodes on my site including it's location data so i can plot the points on a map, etc.

Any advice is appreciated.

Comments

#1

Well you are definately in the right place, but unfortunately scraping websites requires a little html/php knowledge. Your next 2 steps would be to find a contract developer and to beg me to upload my get_webpage API module heh. But any real developer could get by without that part anyways. I would offer to do it myself but I dont do small projects like this. If its less than $4000, then I wouldnt be able to help you. You could try to pick up some el-cheapo indians to do this, they should be more than capable of doing what you need and they work dirt cheap. Just give them the link to this module and tell them you want to use it to scrape websites and that should be all they need.

#2

This is a formal beg with a cherry on top, for you to upload your get_webpage API :)

Cheers
Craig

#3

+1 for get_weppage API :) Greetings, Martijn

#4

Hmmz ok, here you go :D I guess no one picked up on my lame attempt at a developer joke lol. drupal_http_request() exists for Drupal 4.6 to 7.0. It can handle GET, POST, and PUT type requests as well as https and http. It's very handy, it does NOT rely on Curl, get_file_contents(), etc so it should work on any server regardless of its setup as long as fsockopen works and outgoing requests are allow on ports 80 and 443.

http://api.drupal.org/api/function/drupal_http_request/6

<?php
//Load up googlieshcmooglie into a variable for manipulation.
$request = drupal_http_request('http://www.google.com/');
$data = $request->data;
$title_of_googles_website = between('<title>', '</title>', $data, FALSE);
echo
"$title_of_googles_website ownz!";
?>

#5

Status:active» closed (fixed)
nobody click here