Wrapping External Content - What's The Best Way?
In short, I'm looking for the best way to include some externally generated content into a drupal-based website and I haven't been able to find any specific help in the drupal doco.
In more detail, I'm assisting a sporting organisation with making their results, tables, draws, etc. available on their drupal-based website (which is already doing a fine job of news, etc). The raw data for the results, tables, etc. comes from a separate program which exports in CSV format. So I'm going to write a script to convert that raw data into a set of pages.
Now I've done this in the past for a basic html website so I'm ok with this aspect of it. What I want to know is what is the best way to do this for a drupal-based website. I only have two main requirements:
1) We want the pages to have the same layout and menus as the rest of the website. So we'd like to incorporate the default drupal theme in some way.
2) We want the URLs for the pages to be memorable, eg. "http://oursite.com/2006/results/division1". This is because many clubs and teams will link to these pages and they'll struggle with things like "http://oursite.com/node/817".
Other than that we're very flexibile. So what the best way to go about this? Can I generate straight HTML pages and wrap them somehow? Or should I generate PHP pages? Or what?
All suggestions welcome. I'll report back on the winning solution.

Addendum
Just to clarify, I can convert the data from CSV tables into HTML tables. I'm wondering how to get those HTML tables to appear on the website surrounded by the usual header, menu, footer, etc. from the drupal theme.
Some thoughts
There is more than one way to approach this, here are a couple.
Both involve writing a module. While a module is often associated with node base content they can also be used to present data from other sources. As part of the module, you can determine the path that will be used to call this code. In both cases below a function will read data (filter if needed), format the data and then output then output the page (details differ for 4.6 and 4.7).
The main difference of the two solutions is where the data comes from.
You can read the data directly from the file each time you need to display it. The only real plus to this is it one step, the menu hook defines the path and the callback does the work (though you probably want a settings page to define the file location). Drawbacks are you probably need to read all the data to figure out what to display and filter out the data not needed. You lose any data caching a database might do and if supporting things like paging (if long output) or providing a table the user can sort become more work.
You can also split the work into 1) import the data and store in tables 2) retrieve data and display. For any sizeable data set this will provide faster response for users and saying you want paths like results/2006/division1 can easily be mapped to a callback that query the database using 2006 and division1 to limit the data queried. By getting the data from the database you can using the built in support for paging data and can also use the support for table layouts that allow for sorting.
Re: Some thoughts
Hi Steve,
Thanks for the very prompt reply. I forgot to mention I don't know any PHP. How hard would it be for a beginner to write this script? Can you give me a link to a starting point?
I guess I was imagining I would convert (using perl or ruby) the CSV for all clubs/teams into a simple HTML table for each club/team and storing each as a file, presumably in a directory structure reflecting the URLs I'd like to have. (This conversion would probably take place every week.) I was hoping drupal could then magically wrap these files in a theme and present them. Each page would be quite short so paging or caching would not be an issue.
Is this even possible? Could I adapt your first suggestion and have the module read from a set of files?
Your second suggestion sounds quite complex (to me anyway). I guess I could add the data to the database, but I'd prefer not to mess with stuff like this. I'm after a fairly simple solution (assuming one exists).
Writing A Module
OK: I've found the tutorial on writing a module:
http://drupal.org/node/17914
You could do it the way you suggest
You could do it the way you suggest making each page a partial html file. I say partial because you only want the part the represents the table ( no body tags, and no tags that would normally set outside the body tags)
Then your module would need (at least) pieces.
A setting hook to set the "base" of the directory structure.
A administrative piece to process the directory structure (when ever updated) and register all the paths.
A function that is called when one of the paths is visited, open and read the file and the not so magically wrap it the page layout. Assuming $output contains the contents of the file, in 4.6 you would print theme('page', $output), in 4.7 you would simpy "return $output;" (ok maybe it is magical).
Now since I know PHP and mysql my second approach would be faster for me. With your approach the real work comes in setting up the path aliases. You could simplify it by have your conversion script record in a file all the file paths generated. Then step 2 could read that file and generate the paths needed.
There is information in the handbooks that would be useful, but the example tend to lean toward modules that generate content based on nodes.
Re: You could do it the way you suggest
I've started reading the module HOWTO so I'm getting a feel for what you're talking about.
I agree it would be partial html files. I understand the settings.
Where I'm lost is registering the paths / aliases and why I need them. For example, does the node module have to register all the nodes as paths? I was imagining that
http://mysite.com/node/8564was intercepted by the node module and it received the 8564 as an argument (and then loaded node 8564 from the DB). How does drupal work in this area?I was hoping that, for example, my results module would be called by links such as
http://mysite.com/results/2006/cluband would get2006/clubas an argument and load the corresponding file.Am I on the right track?
Right track
You register your module (using hook_menu() ) to own the '/results/' path, pass the other arguments through to your modulename_page() render function (or whatever) and do your magic from there.
I've done so for a wrapper for both local static (un-imported) files and remote requests. Sometimes via a screen-scraper sorta algorithm when I didn't have the time or ability to re-write the sources.
.dan.
http://www.coders.co.nz/
Re: Right Track
I've been trying to go down this track. I've created a simple module using the example hooks from the tutorial. I've set up a hook to own the '/results/' path. However whenever I browse to such a path I get an "access denied" message. So I'm getting there, because it's not a "page not found" error, but it's still not right. I can't tell if my render function is being called or whether it's blocked beforehand.
I don't suppose you can post the solution you had working? Or a trimmed version that doesn't give away any IP. Thanks for any tips.
Eg
Did you set 'access' properly? Try setting it to true to test eg.
<?php
// In hook_menu
if ($may_cache) {
// Adds UID to kill_file by URL 'kill_file/add/[uid]'
$items[] = array(
'path' => 'kill_file/add',
'title' => t('Add to killfile'),
'callback' => '_kill_file_add',
'access' => true,
'type' => MENU_CALLBACK);
// Redo this in one call_back taking an argument 'remove' or 'add'
}
// Call_back
function _kill_file_add($uid) {
?>
edited to add: get yourself the devel module to allow you to quickly clear the cache.
--
Tips for posting to the forums.
When your problem is solved, please post a follow-up to the thread you started.
Weeellll...
It was actually done for a client, but they ended up not using it.
And It seems I don't have the nice, final version on-hand either.
I've posted a very very early proof-of-concept up at
http://coders.co.nz/drupal_development/?q=node/107
You can look at that, but I've enhanced the real thing a lot.
Note also, that first time with devloping a module, and tweaking the code to get the menus right the menus supplied by your module are cached so any time you make a change to them or their parameters they are not reflected in the system until you go back to admin/modules and disable/re-enable it. ... or just press 'submit' on that page at least.
This was a killer when I first wondered why my new paths were not showing up.
http://www.coders.co.nz/
More thoughts
Ok, so maybe making replies late at night is not the best idea. To simplify the path registration, register the path results and have it used as results/2006/something (everthing after results is a path to a file (including a filename)
Note when addiing changing the menu hook, either visit admin/menu or disable and re-enable the module.
Here is a quick untested example to help you get going
<?php
/**
* Implementation of hook_perm().
*
*/
function results_perm() {
return array('access results');
}
/**
* Implementation of hook_menu().
*
*/
function results_menu() {
$items = array();
$access = user_access('access results');
if ($may_cache) {
// This path expects the path to the file including the filename to read
// So if called as http://www.example.com/index.php?q=results/2006/club or with clean urls http://www.example.com/results/2006/club
// the module expects there is a directory called 2006 in the site root with a file called club
// WARNING: result/2006/club can not be a valid file on the site or this will fail
$items[] = array('path' => 'results', 'type' => MENU_CALLBACK, 'access' => $access, callback => 'results_page');
}
function results_page() {
// Build the file name from the path
$path = $_GET['q'];
// Take of the results part
$flename = str_replace('results'/, '', $path);
// Open and read the file contents
$content = file_get_contents($filename);
// For 4.7 replace the next line with "return $content;"
// Wrap the contents on the theme
print theme('page', $content);
}
Solved!
Thanks to everyone who made a contribution. I've managed to prototype a very simple solution. When I've polished it a bit I'll post it.