The Augusta Chronicle, the flagship newspaper of Morris Publishing Group, recently relaunched its website on the outstanding Drupal framework.

Morris first began using Drupal in 2005 with the launch of BlufftonToday.com, a blog-centric community website coupled with a free daily newspaper. In 2006 it adopted Drupal for both news and blogs at SavannahNow.com, the website of the Savannah Morning News. Both newspapers won Digital Edge awards for innovation in user participation.

Since then, the digital media arm of Morris Communications, Morris DigitalWorks, has developed a robust digital newspaper platform built on Drupal 6, to eventually power all 13 of its daily newspapers. Morris also uses Drupal for its radio stations and Skirt.com, a national specialty site for women.

Reader Participation

Morris has made a commitment to make their online platform a dynamic arena for reader participation and contributions. Readers are encouraged to comment on stories and blogs, and, on some papers, are encouraged to create their own blogs on the site. Journalists are expected to post news online immediately and to interact with the public, and they need to be able to do it without learning HTML or tools such as FTP. These requirements made Drupal a natural choice.

Front Page and Section Fronts

The front page and section fronts (local news, sports, opinion, etc.) make extensive use of Panels and Views for layout. Stories are promoted based on a mixture of automatic sorts (by date/time and priority fields) and manual sorts (using Nodequeue). The views blocks make extensive use of arguments, limits and offsets, allowing a small number of views to be reused throughout the site.

Some minor sections get a simpler treatment with taxonomy view pages that have received minor tweaks using Panels overrides.

Sources of News Stories

Most routine stories are created by staff writers using an internal database system designed for print production. Each night those stories are imported into Drupal through an XML feed. The loader process is custom code. It accommodates images as well as text.

Typically, breaking news items and story updates throughout the day are written directly in the Drupal UI. The Drupal system allows any reporter to post news without having to learn any technology skills.

In addition to Drupal's normal RSS output, the site provides iAtom feeds to the Associated Press and NewsML feeds to Yahoo for content enrichment and syndication.

Like most newspapers, parts of the Chronicle's website are hosted by specialty vendors on external servers. To manage the provision of theme-dependent headers and footers for those external pages, a Drupal "third party wrappers" module was created.

The Anatomy of an Editorial Node

There are more than two dozen content types in the system, but the most important is the "editorial node" — the not-so-simple news story that is at the core of the publishing platform.

The editorial node may contain a breaking new story, an opinion piece or just a single image or video. Taxonomy data and other factors are used in the theme layer to make decisions about how a particular editorial item is displayed.

The content type implements more than two dozen fields with the use of CCK and other modules. Publishing a story can be a simple task of completing just a few required fields (headline, body, section) or can be as elaborate as desired (multiple images and videos of varying sizes, breakout boxes, related-item lists, etc.)

Media: We made use of the Imagefield and Emfield modules to add photos and video to editorial content. Multiple images and video are grouped into a tabbed media box near the top of each story. A bit of custom code was used to add caption fields for each photo and video. Clicking on the media box opens large-format versions via Lightbox2. All necessary image sizes are created automatically through Imagecache — a luxury greatly appreciated by the journalists.

Related Content: News stories can be linked to other stories through a multi-value node reference field or by adding custom HTML content to a simple text field.

Authorship: Because the author of a published item may not always be a registered user of the site, a vocabulary of authors is used to contain byline information. This also supports multiple authors per story, a common practice in journalism. The theming layer formats the byline appropriately.

Topics Pages

Topics pages allow news, opinion, blog posts, Web links and other content types on a topic to be aggregated into a single page. A Topic content type allows for an editorially crafted summary of each topic, while all other content types are grouped into several river-of-news type formats. A great example for Augusta is the James Brown topic page.

Multi-user Blogs

One of the unique requirements for blogging was the ability for a single user to contribute to multiple blogs and for a blog to allow multiple contributors. For example, several sportswriters might collaborate on a football blog. One of them might blog separately about bowling.

To accomplish this, we created a custom module based loosely on Organic Groups that defines a blog container node and a blog post node. A single user can create multiple blogs (containers) and add blog posts to any of the blogs to which he has access. Integration with Views allows lists filtered by blogs and contributors.

Blogs also can have titles, images and descriptions.

User Profiles

All users are given the opportunity to personalize their profiles and network with other users on the site. The Flag module is used to support a Twitter-style "follower" model.

Each profile page aggregates a user's comments and blog posts.

Staff members get some additional fields for contact information, and staff writers' stories are listed and linked from their user profiles. A custom module associates staff user accounts with author taxonomy terms. A mini-profile is embedded on each author's taxonomy term page.

Mobile

Most newspapers have mobile sites that are disconnected from their main websites. Using Drupal, the Chronicle has a companion mobile-optimized site for smartphones. This is straightforward for a small site, but not so easy for a high-volume news site that interoperates with a caching server.

If you visit a Chronicle story link using a mobile device — a Blackberry, iPhone, Droid, etc. — your browser will be detected and you'll be redirected to the same relative path on a mobile-optimized domain (m.chronicle.augusta.com). This redirection is instantaneous and from the user's perspective, the site "just works." Since it's running from the same database as the main site, it's fully interactive and supports posting of nodes and comments.

The mobile functionality was built using the Domain Access module, Browscap for browser detection and a unique mobile theme. Domain Access originated as a Morris project for Skirt.com and was contributed to Drupal.org in 2006. For the Chronicle, its primary use is to redefine the theme and override section fronts, replacing Panels pages with simpler lists on the mobile site.

Data Migration

Migration of legacy data for the Chronicle has been the largest to date. Content dating back to 1996 was migrated to the new site including users, stories, blog posts, comments and images. There were more than 400,000 stories and 600,000 comments migrated. Most source content had been stored in a proprietary PostgreSQL database, while blog content come from a Drupal 5 blogging site.

The migration process included creating a custom script (built as a module using the Batch API) that connected to the source data, built content objects ($node, $user, $comment) and used internal Drupal save functions to store the data into the new database. The biggest hurdles included:

  • relating source data to it's new destination in Drupal
  • cleansing, splitting and/or merging data as it was migrated
  • maintaining a record of source IDs to destination IDs so that content related in the source data could also be related in the destination (ie. comments related to users and nodes).

Performance

The Chronicle gets around 7-8 million page views per month and spikes up to 1.7 million page views per day during the Masters Tournament. The site runs the Pressflow variant of Drupal on a LAMP cluster that is shared with other Morris newspapers. A Squid reverse proxy cache protects Drupal from most anonymous traffic. Drupal cache tables have been moved into Memcache. The Authcache module helps accelerate authenticated-user performance. APC accelerates PHP performance.

Because of the unusually large dataset, a few Views-related performance issues cropped up that needed to be addressed.

By writing some code to alter the Views queries, removing some MySQL date functions and moving some WHERE clauses into subqueries, we were able to sufficiently minimize these performance issues.

Developers and Contributors

Augusta Chronicle

Jonathan Dozier

Morris Digitalworks

Steve Yelvington, Duane Jennings, Tim Bell, Geoff Maxey, Rick Havill, David Plutado Fugate, Nathan Rambeck, Chris Johnson, Dante Taylor, Roger Soper, Steven Jackson, Ben Holmes, Cameron Guill

Comments

Thomasr976’s picture

Thanks for sharing your experience with us. I see that u encourage Reader Participation, but noticed on the site that one has to register to comment on an article. Have you experimented with allowing comments without registration on the Augusta Chronice site or any other sister sites? Also are comments moderated or do u allow comments into production once a user is registered?

On my site, comments come to a screeching halt if I require registration, so I no longer do. To encourage comments though and membership, I give users a heads up at the end of each post that if they comment or Ask for Assistance, I'll create an account for them. So far this seems to work.

yelvington’s picture

Yes, we've experimented with comments sans registration. We're not interested in random comments from random people; we're interested in convening a community conversation among people who live in our newspaper markets, so we require registration with (privately) real names and addresses.

Yes, we moderate. We remove objectional comments that we discover or are reported by users.

Yes, comments are immediately posted. Any required moderation (removal, notification, banning) happens after the fact.

If you have a low-volume site that's struggling to get comments, it makes sense to allow unregistered commenters. (I allow unregistered commenting on my personal blog, for example.) As an alternative, you might try one of the several Facebook authentication modules, which would make it easy to register without going through the email validation process.

docwilmot’s picture

now officially my 'best drupal news site' winner. well done.

-------------------------------------------------
Always be nice to people on the way up; because you'll meet the same people on the way down.
Wilson Mizner (1876 - 1933)

JBI’s picture

I find strange that there was nothing to make it easy for your user to register with their faceebook or twitter account.

In the US you "enjoy" a very large coverage than in Europe.

It would make it much more easy to register but also leverage your user network.

I would allow you to enabled cross posting comment or content recommandation in your users networks.

Huffingtonpost is probably the best example. You are able stay tuned on your facebook and twitter lifestream in HP :)

I heard very interesting facts about such integration

The first one was Facebook

http://paidcontent.org/article/419-huffpo-ceo-eric-hippeau-we-are-now-in...

Facebook referral traffic is up 48 percent since the launch—and the already-heavy volume of comments jumped to 2.2 million from 1.7 million in July. Fifteen percent of HuffPo comments now come from Facebook. In September, Facebook referrals accounted for 3.5 million visits, up 190 percent from June and 500 percent from January. Those numbers continue to build, according to HuffPo’s internal stats.

Idem yahoo
http://www.huffingtonpost.com/arianna-huffington/do-you-yahoo-we-do-too-...

Yes, there are some concern like :
http://www.thebigmoney.com/articles/impressions/2009/08/18/huffington-po...

But imagine what happens if other sites start to integrate so wholly with Facebook. (Already 15,000 have some type of integration, but it’s minor compared with Huffington Post’s.) Once there’s a critical mass of activity coming into Facebook from elsewhere on the Internet, the roles change—Huffington Post isn’t the first-stop news portal on the Web; Facebook is.

This may be the future of journalism, but it doesn’t mean it’s going to save it.

But as I said a newspaper website can become a hub where you can be conected to your social network.

You are able stay tuned on your facebook and twitter lifestream in HP :)

It's up to the media to build up on that and find way to make the user experience better in their website than in social networking site.

http://paidcontent.org/article/419-huffpo-launches-news-sharing-collabor...

yelvington’s picture

The Chronicle already had a large set of registered users through a proprietary system developed at Morris, and they were migrated into Drupal, so there are fewer new signups than a completely new site would experience. The primary advantage of Facebook integration is avoidance of the regular Drupal email confirmation step. We're considering that for the future (already testing it for a different newspaper project).

Also, see my previous answer to Tom Russo on a related question.

JBI’s picture

I agree the avoidance of the regular Drupal email confirmation step.

I'm very interested to see if it empower your users to let there social graph know what he is doing in your newspaper.
That way you could have more referral by fb. The figures of Huffington post were impressive;

Just tonight there was a very emotional event in Chile with the earth creak. On HP I could follow it and comment in the piece of news and getting my friend to read HP content.

I understand there was some special event Augusta (Masters Tournament, about sport?) it may be interesting to see how your local users’ base could interact between them self and get you referral. I think a local newspaper should be the central hub of what's going on the local community.

JBI’s picture

Also very impress by the use of http://drupal.org/project/authcache
-What is the ratio anonymus and authentificated trafic ?
-How much time and effort did it takes you to setup authcache as it's suppose to be difficult to setup.
-How did you avoid having the cache being flushed when a user is edited or a node created/updated/deleted ?

Was it not over engeniering for a medium size website ?
https://www.google.com/adplanner/site_profile#siteDetails?identifier=aug...

Compare to site big site using Drupal (in France as far as I know)
http://trends.google.com/websites?q=augusta.com%2C+augustachronicle.com%...

yelvington’s picture

Search graphs are deceptive. Most of a US newspaper's traffic is repeat visits by regular users, not searech referers.

Rue89 has around 10,000 monthly unique users, according to Compete.com. Our Omniture measurements show Augusta with 600,000 to 800,000 monthly users and around 8-10 million pageviews, and in April it will spike extremely high due to the Masters golf tournament.

So, the performance tuning is not overengineering.

Authcache is no problem. Memcache and performance tuning of MySQL and Views-generated queries, on the other hand, is not trivial. Pressflow helps with some of that.

JBI’s picture

I realy appreciate that you were sharing this information.

I espacialy liked what you said about topic hubs. From an SEO and user point of view it's great.

From an SEO perspective the effect can be huge like it's reported for NYTimes
http://searchwritten.com/topic-pages-big-site-seo.html

Is there semiautomation with Opencalais for topic hub creation for some hyperlocal or company ?

yelvington’s picture

We are not using the Calais module for Drupal. We do use Calais to improve the relevancy performance of our FAST search engine, but all the topics-related work on the Augusta Chronicle is the result of editorial tagging. We may in the future use the FAST data or run the Drupal Calais module, but that's not yet settled.

robin1988’s picture

Dude i like your primar link menu in header
Even i want something like this
can u tell which module did You use and help me attain something near to this

robin1988’s picture

Hey even u have a block in right corner of ur header
Which module is that????? is that dynamic display block??

yelvington’s picture

Not sure what you're referring to -- the small promotional Flash item to the right of the 728x90 advertising banner? It's just a promotional SWF embed that's managed manually. If you're creating custom templates, you can create custom regions wherever you want, and custom blocks can be dropped in easily.

The ads themselves aren't managed that way -- all the IAB standard ad units are managed by CASAA, a framework we developed at Morris for centrally managing all our advertising and analytics code, mapping contexts (such as paths) to the insertion of specific Javascript calls. We use Yahoo's APT for display ad management, Yahoo's YCA for context-based text ads, and both Omniture and Google for analytics.

local-search’s picture

Have you worked with any of the other ad networks other than Yahoo? If so, how easy was it for you to run tags outside of the Yahoo network?

NeedToKnow’s picture

Steve,

As others have said, thank you for sharing your hard-earned experiences. I really like your Spotted feature, and I've got some questions.

* Is it built on a module?
* Can users contribute photos?
* Clearly it drives pageviews, but the e-mail and scrapbook features don't appear to do much business. Is that the case?
* How about MyCapture? Do readers use it?
* It appears the photographer doesn't get ID's for all of the photos. Can readers tag people in photos? Or are the IDs unnecessary?

Again, thank you very much for the good work and the spirit of sharing.

yelvington’s picture

The Spotted program goes back several years. At its heart, it's not technology -- it's content and marketing. You send a photographer (usually an intern) to a community event with a digital camera and instructions to shoot hundreds of photos of people attending the event (the audience, not just participants) and hand out business cards for the website. No captions, no IDs. Post them all and watch site traffic take off.

When we began the program, Drupal's image handling sucked, so we wrote custom code -- first a mod_Perl implementation (which the Chronicle still uses), then a PHP version. The latter version of Spotted (which is a Morris trademark) became a commercial product, which we licensed to other newspapers across the USA and in Europe. If we started doing this today, we'd build it straight off using Drupal's much improved image handling rather than custom coding the entire system.

We integrate Spotted with Drupal through a custom authentication module and by pulling in XML (RSS) feeds that refer to specific Spotted galleries or keywords. A custom module caches the data in Drupal and creates several displays we can choose for various locations on the main site.

To improve page performance, we use Javascript to pull in the images as needed.

Adam S’s picture

Brilliant work. How did you build the menu system? I have yet to figure out how to configure proper active trails.

yelvington’s picture

I think the menus are built using Superfish (jquery) scripting at the theme layer. It's not implemented as a Drupal module. I'll check with the developers.

Adam S’s picture

What I would like to know is not what jQuery system was used but how to theme the active menu items with css id's such as #active, #first and #last and to also wrap each line item in a span tag so that a person can create proper tabbed design with rounded corners. If I could just get the id's correct in the <li> tags, I can build the css and implement the jQuery.

I've seen quite a few other websites built on Drupal that have done this and all large scale online newspaper websites do this. However, I have not been able to figure it out and every time I ask in forums or issue queues I don't get much response.

I was thinking of using SuperFish (http://drupal.org/project/superfish) rather than the Persistent Dynamic Menu module on my rebuild. The project page says that SuperFish jQuery has some problems with IE. If you used SuperFish, does your jQuery code have problems with IE? And, if so, how did you solve this?

Your website does it very well and if you could give me a little direction, clues or something to read about it on drupal.org, I would more than grateful.

timonweb’s picture

I'm a bit concerned about url's performance. Article said AugustaChronicle has 400,000... as far as I can see, all stories have their seo friendly urls...so there are more than 400 000 aliases in url_alias table and growing. How does this impact on performance? To my experience, such a big amount of urls (especially when they're long) impacts on the performance negativelly...

NaX’s picture

I think Path Cache is what you looking for.
http://drupal.org/project/pathcache

nrambeck’s picture

Pulling node aliases from the database is quite a large chunk of total SQL execution time. We never considered any caching mechanism for aliases like that mentioned above, but it might be worthwhile to add it to our platform. The alias lookup queries, when all added totogether, sometimes account for more query time on a page load than anything else. This is espcially true on a news site where you can have over 100 links on a single page.

--
Nathan Rambeck

saltwaterskin’s picture

We also added an index to the url_alias table to help speed things up

blavish’s picture

Brilliant site you made there.

Can you tell us about the theme /design?

tomsherlock’s picture

Thank you for the review. Impressive work.

Are you using a rich-text (wysiwyg) editor for content created directly in Drupal?
If so, which one?

Did you start with a base theme such as Zen or Fluid Grid? Is your derived theme vastly different from the base theme?

nrambeck’s picture

We use TinyMCE for wysiwyg. Even though it causes it's fair share of headaches, we have found it causes less headaches than others we've tried.

Our base theme was built from scratch and provides a foundation upon which we can build subthemes for each individual newspaper. One of the difficulties we've had is trying to stick the sane defaults of our base theme. We find our subthemes many time overriding way too much of the parent theme templates, css, etc. which makes using a base them less useful.

--
Nathan Rambeck

Sree’s picture

great site & nice write up!

-- Sree --
IRC Nick: sreeveturi

shyamala’s picture

Real nice site! The theming is perfect, your write up very interesting. I particularly like you links to DON't miss and Shortcuts!

The Topic page and Author pages are good. I see links to your Author pages from the Article pages, I do not see any related content/ links to Topics on the Article page. Any specific reasons? The right side bar seems repetitive.

FYI: the 'Why sign up?' link on the right hand side top is broken.

gmasky’s picture

Hi,

Is there any reason you chose Authcache over Boost? Does Authcache give you better performance?

Thanks

nrambeck’s picture

Boost is meant to improve performance for anonymous traffic only, while Authcache is primarily intended to improve performance for registered users. Because we use Squid for delivering cached pages to anonymous users, Boost is unnecessary. Our use of Authcache, was actually very limited, because it can be quite tricky to use.

--
Nathan Rambeck

gmasky’s picture

Thank You,

Did you consider using PANTHEON Mercury on your server considering the 1.7 million page views

Gerry

yelvington’s picture

We established our system architecture long before Mercury was available, and since we run this on our own servers, the AMI isn't particularly valuable to us. It's the same sort of setup, though -- with Squid instead of Varnish.

cmsquickstart’s picture

Handling 1.7 million page views in a day is very impressive, if Tiger plays at the Masters this year that is likely to double :)

daveslc’s picture

You say " Each night those stories are imported into Drupal through an XML feed. The loader process is custom code. It accommodates images as well as text."

Can you say any more about this?

This seems to be one of the hardest things about Drupal - bulk import and/or update of stories/nodes.

NaX’s picture

I also would like to hear more about this. There is a long forum discussion of a similar nature with regards to drupal_execute. I would love to know if you used node_save and even better see an example.

REF: http://drupal.org/node/131704#comment-270433

yelvington’s picture

Yes, it performs insertions through the API with node_save(). In a configurable way, it supports updating of existing nodes, merging changes with existing Drupal-only fields.

The feed is an Atom-based format with local extensions to meet our needs, which include taking data that originated in a variety of newsroom systems including DTI and Mediaspan (Baseview). Some preprocessing is performed outside of Drupal -- some of it extensive -- to reformat the data, handle some mind-bogglingly bad XML, and put it in to a sane format for handing to Drupal. The resulting XML is dropped into a directory that is accessible to the website through HTTP.

The feed processor can be kicked off manually, by cron, or through a "knock on the door" from the data source.

The loader fetches the XML, processes it with SimpleXML, and performs some minor cleanup such as html_entity_decode(), and converts the result into an array for ease of use. Various XML tags are mapped to Drupal CCK field names.

Any time the loader sees an image reference, it reaches out and grabs the image (which is likely on a server not accessible to the outside world) and copies it into a defined place in the local filesystem, and adds an emfield reference to it. This lets Imagecache take over the display work.

We originally tried using FeedAPI, but abandoned that route when Tobby Hagler became frustrated trying to deal with multiple-image processing and just wrote his own loader one weekend.

In the next several months we're going to be working on a Texas project with a vendor of a newsroom workflow system to integrate directly with it. In the process we may go through a steep rewrite of how this works, perhaps even switching to XMLRPC so that the two systems (which have similarly extensible content item structures) can dynamically update one another.

cmgui’s picture

Very nice site - very clean and well-designed -- but i hate to say this -- it is slow.

but u r still faster than another showcased drupal site: http://www.nysenate.gov/

but much slower than this drupal site: http://www.lp.org

yelvington’s picture

See patch at http://drupal.org/node/732864, one of several optimizations we've been adding to the system as a result of monitoring the mysql queries.

nrambeck’s picture

Another Panels-related patch to Ctools, improved performance significantly on our latest release of www.jacksonville.com.
http://drupal.org/node/754086

--
Nathan Rambeck

mtcrutch’s picture

Just a quick note that the "Why sign up?" link in the top right corner (http://img.skitch.com/20100325-pchb7hdfyguarsups7ifwm32ys.jpg) points to 'http://help/signup'.

ithacaindy’s picture

A very interesting explanation of the migration to Drupal. Can the author share the custom code used to generate the photo captions?

jochovitch’s picture

Hello,

how did you export the newsml feed? Is there a module that handles this?