Quickstart Guide, just give me enough to get started!
Ok, install the module and put the following examples in your module somewhere so you can test it out and you are ready to roll. The following examples will lead up to a real world test on realestate.com. It will lookup properties for zipcode 80439 and display the featured property (including it's HTML) and then it will also list ALL of the properties as well. This will demonstrate it's ability to parse out HTML in just a few simple lines. Very cool stuff. The use of this API truelly is limitless.

There are going to be THREE example branches here. One for plain text, one for XML, and one for a real world scraping of a website. The first example set doesnt use the FALSE argument. This runs ALL results through the check_plain() function. This is the default behavior of our API. But in many cases (such as example 2), running everything through check_plain() simply isnt feasible. So we add FALSE to the end of the arguments to disable check_plain() calls. This method is used for the XML examples as well as the real world HTML example.

Plain Text Examples
Regular Text example (this stuff is all check_plain()'able, for stuff that is NOT plain text please see the XML examples below the text examples)

This one is a very practical use that could almost make the drupal arg() command obsolete. Try this on for size. Lets say you were using arg() to grab an argument from the URL of http://yoursite.com/page/1 where argument was filled using arg(1). Pretty simple and it works right? Well now through a wrench in there and do this to your url and watch your code sadly break http://yoursite.com/new/page/1.

//Old code, which is now broken because of the new link
$oldpage = arg(1);  //Would output 'page' now instead of '1' like it use to do.
$newpage = between('page/', '1', $url); //This would still produce '1' no matter how much you stuff into the url.
//So there you have it.  An example of how the Parsing API can directly affect how you do things in drupal itself.
//This is the string we will use in all the Text examples
$test_string = "To: Tove
From: Jani
Heading: Reminder
Task: task1
Task: task2
Task: task3
Body: Don't forget me this weekend!";

//Between example.  Notice the \n on the end of the output.  This is because the API functions do not trim anything as it could possibly result in undesired results.  I leave that up to the developer to do.  \n = line break/carriage return/new line, whatever.  You can use trim() to trim things up if you need to.
$to = trim(between('To: ', 'From:', $test_string), "\n");
//Output:
//Tove\n

//After example.
$after_example = after('From', $test_string);
//Output:
//: Jani\n
//Heading: Reminder\n
//Task: task1\n
//Task: task2\n
//Task: task3\n
//Body: Don't forget me this weekend!\n

//Before example.
$before_example = before('Jani', $test_string);
//Output:
//To: Tove\n
//From:<space>

//The following examples are a little complicated, hopefully you can understand them.  We want to parse out all 3 tasks but since our API functions strip out the 2 Haystack items, we need to add them back by appending it to the variable like in the following examples.  This way we can use them for picking out the all elusive middle task item.  If these were dynamic tasks, the power of this capability becomes very apperent because you cant simply search for the word "task2" as it might be "mow the lawn" or "do the dishes".  So you cant rely on its value.  Thus we do some fancy footwork using the snazzy _last() functions and some clever coding.  There are actually SEVERAL methods to do this in php but this is an example, not a best practice tutorial ;)

//Between_Last example  Grab the last task item
$last_task = "Task: ".between_last('Task: ', "\n", $test_string)."\n";
//Output:
//Task: task3\n

//Grab the First task item
$first_task = "Task: ".between('Task: ', "\n", $test_string)."\n";
//Output:
//Task: task1\n

//Show the middle taske item
$middle_task = between($first_task, $last_task, $test_string);
//Output:
//Task: task2\n

XML Examples
XML is a good easy to understand example of NON-check_plain()'able text. So I've elected to parse out some XML for the simple examples because check_plain() doesnt like XML. If you were to attempt to run any of these without FALSE for the check_plain argument, it would always return NULL.

$test_string = "<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<task>task1</task>
<task>task2</task>
<task>task3</task>
<body>Don't forget me this weekend!</body>
</note>
";

//Between example.
$to = between('<to>', '</to>', $test_string, FALSE);
//Output:
//Tove

//Between example WITHOUT check_plain disabled, it shows the basic security of our function is working properly.
$to = between('<to>', '</to>', $test_string);
//Output:
//NULL

//After example.
$after_note_tag = after('<note>', $test_string, FALSE);
//Output:
//<to>Tove</to>\n
//<from>Jani</from>\n
//<heading>Reminder</heading>\n
//<body>Don't forget me this weekend!</body>\n
//</note>

//Before example.
$before_to_tag = before('<to>', $test_string, FALSE);
//Output:
//<note>\n         

//Between_Last example
$last_task = between_last('<task>', '</task>', $test_string, FALSE);

//This method is probably the most useful of all the API functions and really makes the above example of getting the middle task REALLY simple.  Like, 1 line simple.  How can u possibly beat that?!  lol  This is where the power of our API has been put together in 1 awesome function.  Multi_Between() leverages the power of all the functions together and really shines, this function is just to super awesome ;)  Entire parsing engines could be written with this API.  Literally.  So try it out, you'll fall in love with it.
$all_tasks = multi_between('<task>', '</task>', $test_string);
//Output
//Array ( [0] => task1 [1] => task2 [2] => task3 )

Real World Example
Now for a real world advanced Example! Lets put this stuff to work. Let's say a client of ours has asked if we could scrape just the 'featured property' for a specific search result on realestate.com. Normally this would be a real headache. But with our new snazzy parsing module this is truelly a breeze and I will prove it with a working example ;). Since we have done our HTML Homework we know that 'RealEstate.com Featured Listing

' is the unique begininng of the featured listing and the end of it can be uniquely identified by 4 closing div tags. Since the between (and all of these functions) cut out the needle strings, we need to re-append 2 closing Div tags so that our HTML is closed properly. That is why I've added 2

tags onto the end. It gives us 2 nice div blocks that we can do with what we please. So this 1 line does some pretty cool stuff. It looks at realestate.com for zipcode 80439 and snags just the featured listing that is currently being displayed on the page. I have no idea what this is useful for since I am not in realestate but its an excellent example of how to use the between() function on a very advanced website that would otherwise be a pain in the tail to parse out using just php. Now that we have the whole thing in a nice confined little variable, we could continue to break it down even further and strip out all the HTML if we wanted to and then start placing each of the elements of the featured item in a Database. I wont get into that since it would turn into a full fledged application rather than just an example but with a little work you could do it if you like.

$realestate_com = file_get_contents("http://www.realestate.com/80439/homes-for-sale.aspx?listingtypelist=4,2,3,1");

//We use FALSE as the last argument for the between() function because we need to disable the use of check_plain().  Otherwise our resulting HTML will be HTML entitized rather than plain HTML.  Most of the time you probably wont need to do this but when working with HTML it is almost always required.
$featured_listing = between('RealEstate.com Featured Listing</tpl></div>', '</div></div></div></div><div id="re-listing-', $realestate_com, FALSE).'</div></div>';

//We've got our listing, now lets parse out the phone number so we can put it in the database using queryable variables module (another module I use ;)  )

echo $phone = between('re-listing-phone">', '</div>', $featured_listing, FALSE);
queryable_variables_set('featured_listing_phone', $phone);  //Replace this line with whatever DB saving method you choose, db_query(), variable_set(), queryable_variables_set(), etc.

Comments

rreck’s picture

"Ok, install the module"

-which module?

aaronmfisher86’s picture