Introduction

 

TweenTribune,TeenTribuneand TTEspañol deliver the teen and tween audience with compelling stories kids won’t find anywhere else. Stories chosen for TweenTribune are selected by tweens working closely with professional journalists. Tweens can submit links to stories they'd like to share, submit their own stories and photos, and comment on the stories they read.

More than 53,000 teachers across the U.S use Tween Tribune in their classrooms.

Generates more than 5 million page views per month.

10,000 nodes are added every day

 

Brief History - From WordPress to Drupal

TweenTribune and its sister site, TeenTribune, work through schoolteachers across the U.S. Registered students log onto the site and post comments on selected stories of the day, and teachers review the responses for approval before making them “live” for other students to see.

During Christmas in 2008, Founder of Tweentribune, Mr. Alan Jacobson, decided to move its website from Wordpress to a more capable and flexible Content Management System Drupal. He contacted us in December 24th 2008 and worked with us to develop the application that would allow Tweens of ages 8 to 14 to read a variety of interesting content as well as comment on news for other Kids to see. Teachers can easily use Tween Tribune as a teaching tool. First, the site uses high-interest reading material to engage students with the news.

Teachers can register their classes on the site, which allows them access to special features like custom generated pages that show students comments or stories the class has commented on. Teachers can print out reports by student; these reports allow them to see which articles students have read and to access to individual student’s comments. In this way, teachers can easily grade or comment on students’ writing. There’s even a Faculty Lounge where teachers can interact with each other, sharing ideas and lesson plans.

Using Drupal 6 and a variety of excellent contributed modules, the site Tweentribune.com was launched in March, 2009. Modules used include Views, CCK (both core and imagefield), and Imagecache.

Codes were written for all the custom features of TweenTribune. This custom code was integrated into a Drupal Content Management System in the form of Drupal Modules.  

Tweentribune is now a success story that has been featured in LAtimes, YPulse.com, KillerStartups, WeMedia, GoodHouseKeeping and getting

  • more than 5 million page views a month.
  • more than 16 million add impressions per month.
  • more than 3000 comments and 6000 quizzes

 

SCALING WITH CONFIDENCE

Tweentribune.com had couple of unique challenges. The traffic used to pick during US school hours with most users logged in and hence, creating making maximum connections to the database. The webserver and database were separated on 2 different machines in the same network (LAN).

Further Following measures were taken to improve drupal performance:

  1. Optimize database queries and modules
  2. Use Memcache for all database cache.
  3. Sessions which are typically stored in database in Drupal were also stored in memcache.
  4. Boost module to serve html content for anonymous users
  5. Using Lighttpd to serve static files like css, js, images.
  6. APC as the PHP accelerator was used.
  7. Using Linux shell, Munin and Nagios for monitoring.

Memcache - way better than cash

Memcache, Squid, APC, etc were used to make Drupal scale. Memcache, APC and Squid were installed and configured on the server. Memcache was monitored and configuration of Memcache was changed with time as traffic improved and RAM of the server was changed.

Lighttpd

Lighttpd is a web server that was used to serve static files (images, javascripts, css) to reduce burden on Apache webserver as lighttpd is faster at static contents.

Apache Solr vs DSS

Drupal Search Sucks as it doesn't deal with large amount of content, it doesn't scale and gets bogged down.Drupal Search is integrated - it runs and searches on the same database thus, slowing down the system. Apache Solr's advantage for Drupal is that it indexes nodes, not pages. This means it can have access to attributes of the node that are not readily parsable from the rendered page. These attributes can be used to filter the results. Apache Solr provides faster search experience than default Drupal search.

Varnish or Squid

 But either is better than getting shellacked, and both are better than Boost.

InnoDB, instead MyISAM. - Who wants to get locked under a table?

  • InnoDB implements row-level lock for inserting and updating while MyISAM implements table-level lock.
  • InnoDB inherently takes care of data integrity by the help of relationship constraints and transactions.
  • InnoDB is faster in write-intensive (inserts, updates) tables as it utilizes row-level locking and only hold up changes to the same row that’s being inserted or updated

InnoDB buffer pool. How big is too big? We know. .

The larger the buffer pool, the more InnoDB acts like an in-memory database, reading data from disk once and then accessing the data from memory during subsequent reads. The buffer pool even caches data changed by insert and update operations, so that disk writes can be grouped together for better performance.

KeepAlive on or off?Contact us and we'll tell you.

 

THE TEAM

  • Ebizon NetInfo: Ebizon builds World's fastest growing Drupal site and is the backbone of the project with the expertise in performance and scalability tuning that is essential for Drupal sites with millions of nodes and users. Ebizon supports Tweentribune's rapid growth of almost 10,000 nodes addition everyday through multiple layers of content caching in multi-server environment. Ebizon extends Drupal to meet the unique needs of the site to handle traffic of more than 1 million authenticated users during school peak hours.
  • BrassTacksDesign: The BrassTacksDesign Team were responsible for project conceptualization and use cases. All day-to-day operations are managed and administered by them.
  • Rackspace: The website is hosted on Rackspace.

 

HARDWARE

The underlying hardware included 2 machines on the same Gigabit network:

One with apache webserver and memcache with following configuration:

  1. Quad Socket Quad Core Intel Xeon E7440 2.4GHz
  2. 64GB Memory
  3. Operating System: Red Hat Enterprise Linux 5 - 64 bit

Database server has following configuration:

  1. RAID 5
  2. 12 GB DELL RAM
  3. Single Socket Quad Core Intel Xeon L5520 2.26GHz

 

HOW THE CHALLENGES WERE MET?

  • Challenge: Drupal is both resource intensive and database intensive. Its strength is ease of development, extensibility through modules and faster development time. Its downside is that it requires more CPU and RAM than other CMSs.

Solution: With our experience we found that couple of Drupal contributed modules are resource intensive and their optimization is necessary in order to scale the system. We monitored SQL queries using devel module and identified the queries that consumed most resources. Then we optimized those queries and monitored their performance and load on the system for couple of days. The results and improvements were captured in a performance report that was published for client’s review.

  • Challenge: Busted Page issue which was causing page to break. The busted page was a much trickier issue solely due to its intermittent nature.

Solution: The Busted Page Issue was THE MOST important issue since the site had scaled to 2 million page views a month and we couldn’t risk this problem to survive any longer. Initial attempt was to disable BOOST module but to our surprise disabling Boost did not solve the problem. After 24 hours of rigours effort and monitoring it looked like menu paths were restructuring during CRON that was running every hour. The best of teams in the world were thinking on it but no one could get to the root. Finally, one of our best technical leads made the cron to run instead of every hour only at night at 12 am. This resolved the Busted page problem and was a GREAT success for us and Alan.

  • Challenge: Location based advertisement and headers implementation in Drupal 6.

Solution: Drupal ad geoip module were customized to implement the feature whereby advertisements and headers can be displayed based on users location.

  • Challenge: Only teachers of a classroom should be able to moderate the comments and comment should be published only after they have been approved.

Solution: Drupal moderate module was customized and an interface was designed where teaches could see all the comments in a classroom and can approve or disapprove them.

  • Challenge: Blocking inappropriate words that student puts in their comments.

Solution: Initially Watchlist module was recommended which automatically flags a node or comment if it contains any questionable content (these can be set in the Watchlist settings by adding regular expressions of words that are considered bad). But it flags the word and notifies admin AFTER the comment is posted, which is TOO LATE. Therefore Spam module was utilized to resolve this problem.

  • Challenge: Alan needed a way for the teacher to send every student’s comments to the printer with one click, instead of sending them one at a time with one click per student.

Solution: It was not feasible to put restriction on users to have an email to sign up on Tweentribune.com therefore team found a way for not letting users create their email and instead having system create their email automatically from their Full name. The contrib module that was modified for this purpose was “Localemail” and was made to create email ids automatically for each user and let them register directly on Tweentribune.

  • Challenge: A new workflow for teachers registration was required where teachers could register themselves without requiring Alan to personally verify each registration as in the previous workflow.

Solution: Team worked on a new workflow where:

    1. Teacher can submit information on webform, which is almost identical to existing webform with very minor change. This new form replaced the existing form.
    2. Drupal generates 9 classrooms for teacher, but does NOT use classroom taxonomy. Instead, user profile contains username and classrooms only. Classroom names use teacher's school email address + taxonomy ID. Example: mary.jones@collierschools.com-151365
    3. Drupal generates new usename = teacher's school email address. Role = teacher_private. This role is a clone of existing role = teacher.
    4. Drupal sends 2 welcome emails with username and password generated by Drupal to 2 email addresses: home email address and school email address. Email includes link to "dashboard" page where teacher can register students. See screenshot, attached. The dashboard is 600px wide, so it fits in the main content area of the current pages.
    5. Teacher logs in and is redirected to /teacher_landing_page or uses link provided in welcome email.
    6. Teacher can do the following on the dashboard:
      • register students
      • see usernames and passwords of students previously registered
      • delete students
      • print out student usernames and passwords
      • change classroom name

 

TWEEN TRIBUNE APPLICATION AND DATABASE ARCHITECTURE

Tweentribune.com is a news site for Tweens and following are the cores around which it was built:

    • CCK
    • Views
    • Webform
    • Taxonomy
    • Imagecache
  • Custom AJAX-based drop down select developed as a replacement of hierarchical select module (http://drupal.org/project/hierarchical_select) when selecting classroom during registration or posting of stories.
  • Custom module was used to allow non-email based registration on the site, since; Tweens usually do not have email addresses.
  • Also, custom functionalities like allowing administrator to register teacher’s requests easily from an interface that are received from webforms were also developed. Comment moderation by teachers was also integrated into the site using Modr8 module.

Content Types

  • Stories: This is the main content type around which whole Tweentribune.com stories are built.
  • Profile:  This content type carries the student and teacher profile information like classroom.
  • Your-stories: Using this content type, teachers can post their own news into their classrooms.
  • Quiz: With this content type, teachers can post quiz on the website for their classroom.
  • Your Entry: This content type allows student to submit short stories and essays.

content types

 

Taxonomy

  • Topics for tween: This vocabulary is used to define category of the story posted on Tweentribune.com.
  • Classroom: This vocabulary allows users to be assigned to the classroom. Classroom is based on parent-child hierarchy with country, state, city, school and then classroom following parent child relationship. Certain stories can also be optionally put in some classroom/school.
  • Spanish: This vocabulary is used to post stories in spanish
  • Your town: This vocabulary is used to post stories from affiliate partners

taxonomy

Comments

sanamzaman’s picture

Interesting! Many people talk about scaling these days and some criticize Drupal of its limitation saying it is good for small websites only. It is good to see such a high traffic Drupal website being build. Did you use Pressflow for scaling?

sudeepg’s picture

Well drupal has great strength for both small and large websites. The great many modules available gives drupal all the flexibility that is required and with that, an expert team is all that is required to scale your drupal website. No Pressflow was not used. Memcache, Squid, APC, etc. were used for scaling Tweentribune.com.

Prateek Mishra’s picture

Guys i have recently started following Drupal and found it quite interesting, its really encouraging to know that Drupal can be scaled to such a level. But along with scaling my concern is about handling dynamic content effectively. Is this platform capable of managing high traffic booking sites where data needs to updated dynamically ?

sudeepg’s picture

Drupal offers a dynamic framework right out of the box and is capable of handling dynamic contents. Drupal has quite a many modules available for booking system which can be used to meet your requirement.

shamio’s picture

I just want to know why when thousands of nodes are adding every day to this website, there is not a lot pages of this site and also Google doesn't index thousands of pages daily? I think you need to optimize the website to be indexed faster by Google and other search engines. When thousands of nodes add to a website, its a great site and should be indexed more.

sudeepg’s picture

This site is NOT like other sites, so a different SEO strategy is used.

Unlike most sites, more than 90 percent of the traffic is direct - teachers and student of the particular school.

The site IS optimized for SEO - based on stories, not the homepage.

But if you click on the top result on Google, it takes you to the homepage. Try it.

shamio’s picture

Its a great strategy to have direct visitors. Its better than waiting for Google algorithm changes :) . But can you please explain a little about the way you did it?

sudeepg’s picture

Basically our target audience are teachers. We authenticate and verify teachers. Once registered on tweentribune they create their own classroom. Teachers bring in their students in the classroom where students:

  • do their homework
  • comment on posts
  • participate in quizzes
  • read articles
  • write articles - and the best one is put on the bulletin of the classroom

The compelling news article generates interest and reading habits among students. Tweentribune is No Gossip, No Game, Just news that engages kids.

Being popular among kids, it catches interest of parents as well.

So the whole marketing campaign is carried around teachers. For every teacher there are about 120 students and so on.

linuxpimp’s picture

google usually take sometime to adapt itself to a certain site, after that , the web spider will check the website based on how frequently updates occurs. for instance, google spider check drupal.org for new contents every hour or less.

willhowlett’s picture

I think it's fair to say that in terms of scaling most of the options available are for anonymous users only. Any high traffic site that primarily deals with dynamic content will need a lot of customising in order for it to perform adequately. Much as I love Drupal I would say that for something like a high traffic booking system a bespoke built site, or an off the shelf commercial product (which I'm sure much exist) is likely to be a better option.

That said you might find Drupal good as part of an agile approach for building such a system. Assuming the project is just starting out you may find it financially viable to build it in Drupal initially, but being aware that you will need to rebuild from scratch / buy an off the shelf system once the site gets popular enough for it to be financially viable.

edit: (this was as response to http://drupal.org/node/1480464#comment-5736100 by the way, not sure that's clear from the comment layout)

Prateek Mishra’s picture

Thanks Will, It was really insightful. Can you please elaborate on Bespoke built because as you suggested, I also think that buying off the shelf solution later on is the correct approach to follow.

willhowlett’s picture

Glad to be of use. By bespoke I literally meant finding a good developer and getting them to build exactly what you need (on whatever platform they deem suitable). But obviously that's likely to be too expensive upfront (for a business you'd want to find a reputable development house, not a one person shop who could up and leave you in the lurch). That's why I was suggesting that to start with you either buy an off the shelf product, or piece it together with Drupal yourself. Then as the business grows you could look to invest in a bespoke build.

Honestly, I think that the off the shelf product approach is the best idea. You'd have to invest a lot of time to build it in Drupal, whereas (from a very quick google), there seem to be plenty of cheap ticket booking options available.

Just my thoughts on it. Not bashing Drupal, I love it, I just think it's naive to say that just because it is flexible that it is the best option for everything.

Prateek Mishra’s picture

Hi Will, last weekend i was in conversation with one of Drupal development companies and they told me that the website can be developed in max 2 weeks and then put on Pilot, it can be further scaled as the traffic on the site increases.
I am afraid of going on it with myself as i lack the expertise.

willhowlett’s picture

Well I don't pretend to be an expert (I'm just going on my personal experience), and I'm sure if they say they can do it then they can. Maybe you could ask them for specific info on the techniques they would intend to use as the traffic increases, and then google those techniques to put your mind at ease about their effectiveness.

I personally find Drupal tricky for performance, but there's no doubt that there are people out there who can do it (and like I say, I am in no way an expert).

ppro’s picture

from the Aelxa ranking of 233k I think unique daily visitors should be below 1000 and 67% of the traffic is coming from google.com .

have you checked that? http://www.alexa.com/siteinfo/tweentribune.com

sudeepg’s picture

Hey Raymond,
Alexa is dependent on FF/Chrome and Alexa plugin which is not always the correct measure to know daily visitors because it counts people who have installed alexa toolbar.
All the users in Tweentribune's case are school kids who use it as part of their school assignments and are on Internet Explorer in the school lab.

Regards,
Priyanka

shamio’s picture

You are right, Alexa ranking is not correct always and its just an estimation of traffic of websites and is not 100% accurate. Also Alexa ranking is an average of traffic in 3 months and when the traffic of a website increases, it will not be shown instantly because the previous information for last 2-3 months has more effect on it and it needs to pass 3 months to have a more accurate information about the traffic of a site.

sudeepg’s picture

That is absolutely TRUE, Shamio!

- Alexa does not provide accurate measurements. Only tag-based, or pixel-based systems that use .js, such as GoogleAnalytics or comScore, provide accurate counts of page views.

- More than 1/3 of our traffic comes from Google and Bing - when people use the following search terms, which is effectively the same as direct traffic:

tween tribune
tweentribune
tweentribune.com
www.tweentribune.com
tweentribune login
tween tribune.com
teentribune.com

- "Uniques" do not measure unique visitors. Instead, "uniques" measure unique DEVICES, such as a home computer, work computer or mobile device. If one person uses all three devices at least once a month, they count as 3 uniques per month. So for MOST sites, uniques are highly inflated. Research has shown that most news sites over-report uniques by 30-40%.

- We have the OPPOSITE problem: our users are students who SHARE computers in school computer labs. Eight different students may use a SINGLE device each day, yet only count as ONE unique because they all use ONE computer. So we have many more users than our metrics show. comScore tells us there is no way to accurately measure the number of unique users on publicly shared computers.

Rgds, Priyanka

shamio’s picture

Yes. you are true. As students at schools have the same IP address, even thousands of traffic will be count as one unique visitor, however they are several students that use your website and it's services. honestly, your idea is really excellent and unique and i am sure it will have much more users in the close future.

sudeepg’s picture

Thanks for the appreciation, Shamio.

The Best part I like about TT is, Number of Page Views are VERY HIGH. Students are simply bumping to the database to post their comments so that their teacher could showcase the best comment on School NoticeBoard. This keeps everybody excited! The unique thing about it is that the traffic Drivers are Teachers who register an average of 52 students.

Infact, TT’s peak traffic –40,000 page views in one hour – was driven by a story about a $15 Border's gift card for teachers

Rgds, Priyanka

shamio’s picture

40,000 page views in one hour is rally amazing for a website. I am sure this website will be one of the most famous sites between students and teachers. By thinking about new features and tools, your visitors will like it more and more.

sudeepg’s picture

Thanks Shamio! I am excited to tell you that we revamped TT to implement a new workflow for teachers registration - where teachers could register themselves without requiring Alan (Founder of TT) to personally verify each registration as in the previous workflow.The new workflow resulted in the registration of 2,000 new teachers within 30 days. And every teacher brought 50-100 students with them. The new workflow amazingly increased the website traffic.

shamio’s picture

Its a great feature to allow teachers to register an account and verify their own accounts themselves without need to site admin approval. So as you said, if you had 2,000 teacher registrations in one day and every teacher had even 50 students, you can expect for about 500,000 yearly new registrations. Am i right? Do teachers approve the accounts of students or student accounts will be verified by themselves too?

sudeepg’s picture

Well Shamio, 2000 teachers registered in 30 days and not in 1 day :)
And if we go by the numbers, every month 1,002,000 (1,000,000 students+2,000 teachers) new registration can be expected.
Students do not register by themselves but teachers add them. Since, it is illegal to collect email addresses of kids below 13 years TT does not ask for email address of students. Every teacher add their students and once added students are provided with a username and password.

shamio’s picture

Yes you are true. My meaning was 2,000 registrations in 30 days instead of one day :) and i estimated about 500,000 yearly new registrations too. But i think all of students register by their teachers on your site are not under 13 years old. I mean you can have a way for getting their email address after they being 13. Because the current way doesn't allow you to connect the students directly. Am i true? What do you do if you want to email them directly?

Gomesh’s picture

Really a awesome site , good work Ebizon .

sudeepg’s picture

@Shamio: TweenTribune is for students below 13 years whereas TeenTribune is for students above 13 years. Well Students do not connect directly as it works on the model of a classroom where teachers bring in their students.

@Gomesh: Thanks for the appreciation!

NPC’s picture

Thanks for the overview, very insightful!

Please check the text, looks like the “Varnish or Squid” section lost some paragraphs in the process of editing. Thanks!