The business i work for have a real estate database that exports XML files to other large real estate websites but also exports generic XML files which we would like to use to populate and update our Drupal website which uses the rListing module. Apparently an XML file is generated for each property along with copies of jpegs which is then packaged into a zip file (one zip for each property). I don't know how to write PHP or MySQL from scatch but i am able to modify working examples to a certain extent, so where should i start in trying to accomplish this? From what i have read, the first part involves turning the XML data into an array and the second part would involve inserting the data into the appropriate tables. I also read that SimpleXML function might be usable. A new XML file will be generated each time a new property is created or an existing one is updated so it would be great if i could automate this process somehow, maybe with a cron job?

This is a sample of the XML file layout;

<!-- Contact Details -->
<!ELEMENT Contact 	( ContactName,
					BHPhone?,
					AHPhone?,
					Mobile?,
					Email?,
					UpdateStatus,
					UpdateTime )>

<!ATTLIST Contact 		ContactID 		CDATA #REQUIRED >
<!ELEMENT ContactName		( #PCDATA ) >
<!ELEMENT BHPhone		( #PCDATA ) >
<!ELEMENT AHPhone		( #PCDATA ) >
<!ELEMENT Mobile		( #PCDATA ) >
<!ELEMENT Email			 ( #PCDATA ) >

<!-- Property Details -->
<!ELEMENT Property 	( ListingType?,
					SaleAuthority?,
					StreetNumber?,
					StreetName?,
					Suburb?,
					Postcode?,
					State?,
					Municipality?,
					EstateName?,
					EstateStage?,
					Zone?,
					StreetDirectory?,
					MapPage?,
					MapReference?,
					AddressDisplayType?,
					BuildingType?,
					PropertyType?,
					LandFrontage?,
					LandWidthRear?,
					LandDepthLeft?,
					LandDepthRight?,
					LandCrossover?,
					LandArea?,
					BuildingArea?,
					NewlyBuilt?,
					Rent?,
					RentPeriod?,
					DateAvailable?,
					SalePrice?,
					PriceView?,
					DisplayPrice?,
					Outgoings?,
					CommReturn?,
					LeaseEndDate?,
					AuctionDate?,
					AuctionLocation?,
					UnderOffer?,
					SoldPrice?,
					SoldDate?,
					AdvHeading?,
					AdvDescription?,
					AdvNotes?,
					ContactID?,
					Features?,
					Inspections?,
					ExternalLinks?,
					Images?,
					UpdateStatus,
					UpdateTime )>
<!ATTLIST Property 		PropertyID 		CDATA #REQUIRED >

<!-- Listing -->
<!ELEMENT ListingType 		EMPTY >
<!ATTLIST ListingType 		Type 		( Rent |
								Sale ) #REQUIRED >
<!ELEMENT SaleAuthority 	EMPTY > <!-- Sale Only -->
<!ATTLIST SaleAuthority 	Type 		( Auction |
								Sale |
								Exclusive |
								Multilist |
								Conjunction |
								Open |
								Tender |
								EOI ) #REQUIRED >

<!-- Address -->
<!ELEMENT StreetNumber 		( #PCDATA ) >
<!ELEMENT StreetName 		( #PCDATA ) >
<!ELEMENT Suburb 		( #PCDATA ) >
<!ELEMENT Postcode 		( #PCDATA ) >
<!ELEMENT State 		( #PCDATA ) >
<!ELEMENT Municipality 		( #PCDATA ) > <!-- Sale Only -->
<!ELEMENT EstateName 		( #PCDATA ) > <!-- Sale Only -->
<!ELEMENT EstateStage 		( #PCDATA ) > <!-- Sale Only -->
<!ELEMENT Zone 			( #PCDATA ) > <!-- Sale Only -->

Could anyone explain the process involvement in terms a newbie would understand, supply a piece of code to get started with or maybe point me to something existing which might work? I think this job is above my current skill level but would like to give it a try before outsourcing it so any help would be great.

Thank you,

Joshua Howell

Comments

joshua.howell’s picture

Does anyone know if it's even possible to import XML data into a MySQL database using PHP?

joshua.howell’s picture

Could someone please suggest a starting point? Perhaps a module which parses XML? Anything?

summit’s picture

feedapi maybe www.drupal.org/project/feedapi?
Greetings, Martijn

paper_tiger’s picture

I needed to get a newsfeed to import into a database. Unfortunately not to familiar with XML other than its basic structure and xml PHP functions seemed very complex. So I came up with a hack that does the job. Most newsfeeds are based on tags and often this is the data that one wants. So the following basically allows you to collect a newsfeed and store it in your database. I run the program with cron, thus now its automatically done.

<?php
//We need to examine the xml we are parsing to work out the tags. THe one I want looked like this... with everything I needed between <item> and </item>
##tags in order
#<item>
#<title>
#</title>
#<description>
#</description>
#<body>
#</body>
#<link>
#</link>
#</item>
function get_content($url) /// basically opens the page and stores it as a variable. Buggered if I know how it works! 
{
   $ch = curl_init();

   curl_setopt ($ch, CURLOPT_URL, $url);
   curl_setopt ($ch, CURLOPT_HEADER, 0);

   ob_start();

   curl_exec ($ch);
   curl_close ($ch);
   $string = ob_get_contents();

   ob_end_clean();

   return $string;
   $string = NULL;
   $ch = NULL;
   $url = NULL;
}
$url = "http://theurlwewanttoget.com";
$content = get_content($url);

//Here's the hack. Break it into an array.... !

$content_x = explode("<item>", $content);

include('connect to the database.php');

//iterate through each item in the array. Its everything between <item> and </item>!
foreach ($content_x as $item)
{
#tag one is title....
$p1 = strpos($item, '<title>');
$p2 = strpos($item, '</title>');
$l1 = $p2 - $p1;
	$title = substr($item, $p1, $l1); /// pulls out everything between <title>and</title>
	$title = str_replace("<title>", "", $title); ///removes the remaining tag...
	$title = preg_replace( '/[^[:print:]]/', '', $title ); //removes non-printing characters... I needed this as the feed was a bit dirty.
///and so on with each item between <item> and </item>
$q1 = strpos($item, '<description>');
$q2 = strpos($item, '</description>');
$m1 = $q2 - $q1;
	$description = substr($item, $q1, $m1);
	$description = str_replace("<description>", "", $description);
	$description = preg_replace( '/[^[:print:]]/', '', $description );

$q1 = strpos($item, '<body>');
$q2 = strpos($item, '</body>');
$m1 = $q2 - $q1;
	$body = substr($item, $q1, $m1);
	$body = str_replace("<body>", "", $body);
	$body = preg_replace( '/[^[:print:]]/', '', $body );

$q1 = strpos($item, '<link>');
$q2 = strpos($item, '</link>');
$m1 = $q2 - $q1;
	$link = substr($item, $q1, $m1);
	$link = str_replace("<link>", "", $link);

$date = date('Y-m-d H:i:s', (time()+ (5*60*60)));
///I got the retrieval date. Later once in the database I can parse the date data to get a common format... 

$trigger = 0; //$trigger defines exclusion criteria for *not* including the information. Like its not properly formatted....
if (empty($title)) {$trigger =1; echo '<b>IN title</b>';}
if (empty($link)) {$trigger =1; echo '<b>IN link</b>';}
if (empty($description)) {$trigger =1; echo '<b>In descrip</b>';}
if (empty($body)) {$trigger =1; echo '<b>In body</b>';}

$chkq = "SELECT * FROM `News_Archive` WHERE `Link`='".$link."'"; #echo $chkq;
$chkr = mysql_query($chkq);

if (mysql_num_rows($chkr) > 0) {$trigger = 1; } ///....or we've already collected it. You can also match against content or title, if necessary.... 
 


if (empty($trigger))  ///the data is properly formatted and we haven't already collected it.... 
{
$q = "INSERT INTO `News_Archive` (`Title`, `Link`, `Summary`, `Date`, `Source`, `SourceURL`, `Article`)
VALUES
('".addslashes(trim($title))."', '".trim($link)."', '".addslashes(trim($description))."', '".$date."', 'The Source', 'http://www.url.org/', '".addslashes(trim($body))."')";


///insert it into your table....
mysql_query($q);
}


}
///close the foreach loop
mysql_close(); //and close the database... 
?>

I'm into dating sites. This code runs on my black dating site and my filipino dating site. Hope the above code helps someone - who needs a simple solution to what can be very complex!

Alex.