Schamper: our student newspaper on Drupal

Schamper is the independent student newsmagazine/newspaper of the university of Ghent, the largest university in Flandres, Belgium. We've recently redesigned our website and made the switch from Textpattern to Drupal. The switch enabled us to turn our website from a mere repository of stories into the central hub for a web-to-print workflow and into a general solution for our content management - most of the interesting stuff is actually going on behind the scenes.

Making the transit from Textpattern to Drupal

When I joined the newspaper four years ago the website didn't have any content management, everything was put online directly in HTML. We wanted a CMS with the dual purpose of making online publishing faster, but also to rethink the horrible workflow of our print edition. Articles used to be delivered by mail in a variety of formats, without uniform markup, and proofreading was done on print-outs that were scattered around the newsroom. This was a burden, especially on the DTP-people.

We started out with Textpattern - a simple and elegant system - that we then modified to support a web-to-print workflow. The time needed to produce the paper and update the website was cut almost in half. After using Textpattern for two years, however, it became clear to us that we needed something more sophisticated if we wanted to grow. That concern became even greater when we found out that Textpattern 4 was actually a renamed Textpattern 1, and that new development was proceeding agonizingly slow. So in the summer of 2007 we switched to Drupal.

Two things impressed me immediately. One was the exceptional handling of taxonomies in Drupal. The other was the amazing system of hooks and api's that makes creating modules so easy. Textpattern also had a plugin system, but anything complex required hacking the distribution. I converted all my hacks into Drupal modules in a few days. "Pro Drupal Development" by VanDyck and Westgate did ease the learning process.

Data migration wasn't really an issue. We don't have a big archive, and we'd used few Textpattern-specific features, so writing a few SQL select-and-insert statements was a breeze. However, it did get very messy because of charset problems. The formatting in Textpattern was UTF8, but the database and tables were in latin1, so the data was actually stored as latin1 but could only be read if it was interpreted as UTF8. Textpattern did this correctly, but the SQL exports didn't. This took hours and hours to fix. In the end, I managed to get everything into "real" UTF8 by first converting all special entities to HTML entities and then converting those to UTF8. Whew.

We did continue to use the Textile markup language that comes bundled with Textpattern but also exists as a module for Drupal, simply because it's so easy to use and more flexible than Markdown. We also imported the PDFs of our print edition, and they are searchable thanks to the search_attachments module.

Content Management

The Content Construction Kit (CCK) is very, very handy. It allowed us to radically rethink our content management. We don't just use Drupal for our website, we've also added an internal discussion forum and a lot of documentation, lists of subscribers and advertisers, manuals and reports (which are automatically mailed to all our reporters with the help of Actions). A student newspaper typically loses about a fourth of its workforce every year as students graduate, and ours is no different. That makes effective organisation pretty hard, and a lot of expertise is lost forever when somebody leaves the paper. By having all available information in a central repository and encouraging people to write how-to's, checklists and so on, we've been able to alleviate that situation.

Starting now, we'll also use Drupal to coordinate our photography. This is possible because we have a dedicated server that is located in our newsroom, and is connected to our local network and to a super-speedy university line, so our layout staff doesn't have to download anything because it's all on the local network. Pretty much everything that can be centralized has been centralized.

Integration with InDesign

I've hinted at our web-to-print workflow and will elaborate a bit on that. From the start, our back-end has been more important to us than the front-end website. (That's why our new front-end design lags half a year behind our switch to Drupal itself.) So the first thing I did when we switched to Drupal, was convert my homebrewn automatic xml export code into a module. Every time an article is saved, it updates or creates an xml file on our server with all the articles in it for the edition that article belongs to (which it grabs from the taxonomy). I'll try to make it available as a contributed module this summer.

Our DTP-software is InDesign. We switched over from Quark the moment that InDesign 1 was available. We're currently using CS3, and I must say: it's a great piece of software and the team behind it knows what it's doing and they really listen to users' needs. Anyway, InDesign has advanced XML support, and gives you the possibility to "link" to the XML instead of doing a one-time import. That means that when the XML is updated (because editors have proofread an article) the content that is put on the page in InDesign instantly refreshes as well. So the layout team doesn't have to wait on our proofreaders anymore, but can start working whenever a rough version of an article is available.

This kind of workflow is really worth a look for small papers and magazines that don't require the fancy functions of expensive publishing systems. It's free, it brings enormous timesavings, and people can do more without having to be physically in the newsroom.

Technical details

There is one XML file for each edition, and it contains all the articles for that edition. Upon every save in Drupal the "xmlize" module is activated (with the nodeapi hook, I think). That module grabs the edition number (it's part of the taxonomy) and then selects all of the nodes of that edition in the database. This is with a custom SQL statement tailored to provide the kind of content we want in the XML. The data is then put into XML with a pretty generic sql-to-xml script and saved away to a directory on our server. The messy part about this is that (a) you have to know SQL and how drupal stores things in the database to adapt it to your own situation and (b) it imports the raw data, so the Textile markup filter has to be applied again to turn the body field into valid XHTML. There is probably more future in an approach that is based off Views and a minimalistic xml-makeup of the contents of that view, but that's still mostly talk at the moment.

Various

A few things worth mentioning.

Author handling

Basic author handling in Drupal is not geared toward publications at all. There is no support for multiple authors, it displays usernames instead of full names, and it doesn't support guest authors. Thankfully all of that was pretty easy to fix.

We made a user "Press photographer" and another user "Guest writer". We also added a CCK field "byline" so that editors can override the author information without having to register a new user every time. The theme works so that the user information is ignored every time a byline is filled in. This has the advantage that articles where a user wants another name displayed (e.g. for comical effect in a satire article) are still linked to them and part of their portfolio, and that pieces by guest writers display the name of that writer instead of just "Guest Writer".

We then added a multivalue userreference CCK field to the story to accomodate multiple authors. This is themed so that all authors get equal credit. Somebody surfing the site can't see the difference between the "real" author in Drupal and the additional authors.

We also used a template.php codesnippet to display users' full names everywhere instead of usernames.

Image handling

I decided to use the standard Image functionality instead of imagecache/CCK for two reasons. One: I adopted the attitude not to use a custom module when something very similar is available in core, because of concerns about upgradability. Less modules means less to worry about. Two: images as nodes easily allow commenting, ratings et cetera. Because our images are nodes, the aforementioned handling of authors applies to images too.

(The frontpage contains a lot of photos. The beautiful information.dk website inspired us in this regard.)

Comment display

As you can see on our frontpage under "Commentaar", we've grouped recent comments per article, and display the title of the article and the last reaction to it.

This was actually not that easy to do. Views' "node: distinct" filter didn't work to filter out multiple recent comments to the same article, so I had to cook something up myself. The View now loads about 30 comments and then a .tpl for that view picks out the last comment on each article (a simple PHP loop and conditional) until it has found seven distinct nodes. This is a bit wasteful, and will probably be replaced by a bit of custom SQL sometime soon.

mailing lists

Our sysadmin has set up Postfix to make mailinglists by querying the user profile category 'Mail'. Users can select the lists they want to be part of on their profile page, and the administrator (me) can add users to lists that have restricted access. I can also add new mailinglists directly in Drupal.

Design

Because our student newspaper is actually more of a newsmagazine, we wanted a look that is fresh like the site for a magazine, but with a lot of information clearly and efficiently placed like on newspaper sites. It also couldn't look too serious, because, well, we're not too serious ourselves.

We recently started doing daily news, and made the somewhat bold move - from our perspective - to put that daily news at the top of our frontpage, the user-contributed stuff (comments, feeds) below that, and the content that is available offline (i.e. in our print edition) at the bottom of the page. Editors tend to think of our print content as more prestigious, but there really is no point in giving the content that is already available in our print edition prime coverage on the website as well.

Panels 2 came in very handy, although the complex organisation of the frontpage proved to be a mess to theme for Internet Explorer. As usual. It's still not perfect in IE, but for the moment I've had enough of it's quirks to do additional finetuning.

Challenges and lessons learned

Make sure your OS, php, mysql, the drupal database and all of its fields have UTF8 encoding. Really.

Selecting modules takes a lot of research. There are quite a few unstable modules out there, modules that are no longer actively developed and so on. It is often best to just skim the lists instead of performing a search on drupal.org because it's nearly impossible to guess how the functionality you want will be described and titled. The upside to this approach is that it is sometimes possible to find a module with some interesting features you weren't even looking for.

Another challenge was that I had to do just about everything myself, because we have no real programmers or webdesigners at our newspaper, and hiring external help was prohibitively expensive for our limited budgets. However, knowing a little PHP and mysql goes a long way. Producing the site took a lot of time and effort, but most of the coding that I had to do was relatively simple. If you're looking to start a news site without any experience with coding and with no budget to hire someone, you'll do good to get some basic PHP under your belt. (To be fair: I've been messing with PHP since I was a kid, so don't fool yourself into thinking you'll get up to speed in a day. It won't take years but it may take weeks or months if you're not familiar with scripting.)

Another thing I've learned is that it is worthwhile to take as much or more time thinking about what you want, making sketches (on paper, not in photoshop) and so on, before you start theming and building your site. It allows you to focus on the essentials and not the implementation.

Addendum: partial list of modules used

A few custom ones

  • xmlize: xml export
  • columnize: splits the node entry form in three parts and allows you to put fields in any of those three columns; after I wrote it I found out there is actually another module generally available that gives you complete control over the entry form layout, although I can't recall its name at the moment)
  • Smart Tags for Textile: textile allows you to give classes to a paragraph, but we only want to allow a few like 'introduction', 'quote' and the like. This module converts all classes to the predefined allowed ones based on similarity. That means it also allows shortcut-tags such as 'intro' for introduction. This one is actually perfectly stable, but I haven't put it on d.o. yet because so few Drupalistas seem to use Textile, let alone have use for this specific functionality.
  • HideAuthor: to hide the author when the byline field is filled in

Contributed modules

(not all of them, a selection)

  • CCK and Views. Heavily. More than most people, I think, because we use Drupal for general content management and not only to publish articles, so that requires a whole bunch of CCK types and Views for e.g. the phonebook and so on. Also, the userreference and nodereference submodules of CCK, to refer to additional authors and to link to sub-articles, respectively.
  • Taxonomy Multi Editor: does what the title says, this was especially when we just switched over, so we could tag articles en masse.
  • Formfilter: dumbs down the content entry form (you can select which fields it shouldn't display, e.g. some superfluous log fields and the like that are only interesting for me as an administrator)
  • Contact: basic contact form. I've found that people contact us more this year because they can do it right on the site, and don't need to start up their mail application, so that's why I mention it.
  • Image_attach for image display. Downside is that it only allows a single image per article.
  • Actions, for automatic mailings
  • Checkout: Enables users to lock documents for modification. This is crucial so that proofreaders don't step on each other's toes.
  • Comment Closer
  • Masquerade: man, this thing is great
  • search_attachment to search through pdf and word files.
  • Taxonomy Theme to give more items the admin theme (Garland)
  • Pageroute to be able to rapidly enter multiple nodes.
  • Panels 2 beta
  • Captcha for spam control
  • Tagadelic for tag clouds
  • Fivestar for voting
  • Workflow for, well, workflows ("draft", ..., "second check", ..., "ready for publication")
 
 

Drupal is a registered trademark of Dries Buytaert.