Top7news is a news aggregator that operates automatically by gathering and archiving the RSS feeds from several Greek news sites, providing a functional and easily accessible front end for the user to read through the latest news.

Top7news homepage
Why Drupal was chosen: 

We examined and reviewed several popular CMS, both open source and commercial, searching for a system which provides an efficient layer of abstraction for the database. The client required a content creation workflow that can easily support the creation of up to 3000-4000 news items (nodes) per day, apply filters (Views) and archive (taxonomy) them in the site categories.

So, with Drupal, we were able to build our automated site and dress it with a custom theme to exactly fit the aesthetic and the imagination of our client. The end result really justified our choice.

Describe the project (goals, requirements and outcome): 

Goals and requirements

The client required the following functionality for the end product:

  • Ability to automatically download RSS feeds.
  • Automatically scrape source websites found in downloaded feeds and extract information (photos, text, video, category).
  • Automatically archive the created news items in the appropriate categories based on a set of predefined filters.
  • The site has to be easily extensible, the editors in the future must have the ability to easily add new feed sources from the admin interface.
  • Back end to easily edit, correct, classify content for site editors.
  • Image-based front end, with great aesthetics, bringing a tablet-app like interface for the user to navigate the content. Client's inspiration derived from the famous usatoday.com website.
  • Responsive layout, with 5 different breakpoints, based on the approach of progressive enhancement (more content available as screen size grows).
  • Fixed column widths to hold ad units, that merges with the responsive layout.
  • Provide up to three different layouts for the user to view the content (grid, list and blog view).
  • Automatically publish content to social media channels (Facebook and Twitter).

Outcome

The final product met all the above goals, and even more:

  • Every 15 minutes more than 30 different news webpages are crawled and content is extracted and downloaded to the main top7news page. Up to 3000 nodes are created every 24 hours.
  • The created backend provides an accessible "office" like environment for editors to review, edit and manage content.
  • Five unique views in each page present the content to the user.
  • The fluid layout that has been built provides a lovely experience on mobile devices.
  • The ad units that are already been placed on the site, merged beautifully with the rest of the content.

Technical specifications

Drupal version: 
Drupal 7.x
Why these modules/theme/distribution were chosen: 

Zen theme is the cornerstone of our responsive layout. It provides a smooth, almost magical workflow for creating code for the grid system

Feeds and Feeds Image Grabber provided us an API with effective and powerful functions to easily and automatically access and download the content.

Views combined with Panels powers the structure of the content presentation. We really liked how smoothly views merge with panels in order to classify the archived content.

Rules helped us create triggered actions that the editors can easily supervise in order to automatically post content to social media channels, without the danger of spamming them.

Project team: 
Top7news - front page
Top7news - blog view
Top7news - smart view
Top7news - news article
Top7news - news video article
Sectors: 
Media

Comments

korn3lius’s picture

Hi,

Thanks for this very nice and informative articles.
I have a question, how do you filter the feeds to specific taxonomy?
I don't see any option in feeds module that can let user to filter the feeds to specific content type or taxonomy.

Thanks

highvrahos’s picture

Hey korn3lius,

Thank you for your nice comment.

There are two ways to do what you are looking for, and it all depends on the amount of data the feed that you are parsing provides:

1) If the feed provides a "categories" element, then you all you have to do is go to your Feeds importer->Mapping and map Source: Categories -> to a taxonomy term reference field that you've already created. On import the created nodes will automatically be classified with the incoming terms.

2) Now, if the feed doesn't provide some kind of information to categorize content, as was the case with Top7news, let me tell you how we are doing it:
From the feed we take the URL of the original content and then we use a web scraper to grab from the html document the part that provides some info/hint of which category the article belongs to (and then we strip tags, normalize etc).

We have created a set of predifined variables that contain xpaths for each website that provides us content, so the web scraper that we built knows that if the incoming item belongs to A website it can find the category element under the B id/class html.

Take a look at Feeds XPath Parser , it provides excellent functionality to do this (although we don't use it in Top7news, as we've build our own parser, it is case specific).

Best regards,
Leonidas

Prochymar’s picture

Can I also ask a question? How did you solve the caching, internally or externally? Memcache or Varnish or Microcaching? Going on nginx or apache? Which the database you use, MySQL, MariaDB? I ask because, since similar portals such as aggregators or portals with information about cities or states (different categories of information) on Drupal, need a fast response and a good server settings. We planned a similar product... Thank you.

highvrahos’s picture

Ah, my bad, we missed that question. I apologise about that!

Right now top7news was moved from a VPS where Boost and APC caching systems were installed to a cloud account, mostly for RAM availability. Boost is still installed, although we had to drop for the moment APC, as it's not supported in cloud servers.

Top7news is a typical drupal installtion, MySQL and Apache.

Our main bottleneck that is slowing right now the site is unoptimized views (render times are killing us) and not so much the database queries.

At some point if the project's scope allows it we plan to move the site to a dedicated server and apply some serious caching using Memcache and/or Varnish.

Regards,

ibapi’s picture

And how do you (from which service) download the weather infomation?

highvrahos’s picture

Hi ibapi,

It's a third party widget that is loaded inside an iframe container, nothing special.

The widget is developed and provided to us by a greek weather website.

Best regards,

alinouman’s picture

In what pages you used views infinite scroller any example or demo?

highvrahos’s picture

Hello,

Here you can see a demonstration:

http://www.top7news.gr/videos
http://www.top7news.gr/blogs

We are using it together with a masonry grid, and they are playing quite well, except one bug.

When infinite scroller loads more items the grid isn't always built correctly, thus many items end up one upon another until another scroll is issued. This is a bug with a plugin that js masonry is using, image resizer.

Best regards,
Leonidas

alinouman’s picture

Thanks liked the site wish u good luck

Kristina Katalinic’s picture

I just visited your videos and blogs pages and noticed that there is an issue with Views Infinite Scroll. When I scroll down to the bottom of the first page the page doesn't expand to show articles from the next page, instead articles from the next page are loaded on top of the first page so they are essentially overlapping so it doesn't look very good.
I'd send you screenshots but no way to attach them to comments but you can email me in case you cant replicate it on your end.
Using Google Chrome 31.0.1650.63

Otherwise, a very impressive looking website and a great job.

Brisbane Web Design, Development and SEO consulting services.
www.webmar.com.au

highvrahos’s picture

Thank you very much Kristina for your nice comments and support!

We are aware of the issue with the Masonry Grid and the Views Infinite Scroll pager.

The issue is that when infinite scroller loads more items the grid isn't generated always correctly, thus many items end up one upon another until another scroll is issued. Then everything falls correctly in place.

This is a bug with a plugin that js masonry is using, image resizer. We will fix it at some point.

Thank you,

Anonymous’s picture

Hi, I read "that Automatically publish content to social media channels (Facebook and Twitter)."

Which is the model for automatically publish content in Facebook here ?
Thk in advance.

highvrahos’s picture

Hello, I apologize for the late reply.

There are two ways to accomplish what you are looking for:

1) There are contributed modules that combined with Rules can automatically publish to social media channels.

Take a look at the following:

http://drupal.org/project/fb_autopost
http://drupal.org/project/twitter

2) If you don't want to load your server with an extra task, you can use some external services that take an rss feed from your site and publish the latest items to social media channels.

For that you will need to create an rss feed, that is easy as its built straight in Drupal, and if you want more control you can use Views and Views RSS.

Then you can use the following:

http://feedburner.google.com/fb/a/myfeeds?gsessionid=JdW5CnhujCbhWtkQwcDaQw
(to publish to Twitter, check the Socialize option after you connect to your feed)

http://www.rssgraffiti.com
(to publish to Facebook)

I hope that will help you.

Best regards,
Leonidas

shamio’s picture

This site is really nice looking. I want to know:
1.Does it fetch images automatically to from news sources?
2.Did you use any special module to do this?
3.Do you use any caching module for this website too? if yes, what is it?

highvrahos’s picture

Thank you for your nice comment. To answer your questions:

1) Yes, top7news.gr fetches automatically all media found in news items (images/photos and videos).

2) We are using extensively Feeds and the API from Feeds Image Grabber with a custom module of ours (it's case specific, mainly to categorize the final content). I'd recommend the very powerful Feeds XPath Parser module if you'd like to build something similar (although we don't use it in top7news.gr as we've build our own parser).

3) Yes, given the amount of views that we use in our site (especially in the front page) and the rapidly growing database (1GB/3months) it was imperative to use some caching mechanics. We are using the Boost module, which works great alongside with php APC and the APC module.

The RAM allocated for APC is used not only as a php opcode optimizer, but to cache in the RAM all the "cache_" tables from the database. This functionality can be found in that module, and it works great for all those pages that aren'y yet cached by Boost.

As traffic grows, we are ready to implement MemCache in order to improve perfomance, but for the moment is overkill.

If you'd like to learn anything in addition, don't hesitate to ask.

Best regards,
Leonidas

najim’s picture

I have to say this is really, really great am amazed by what drupal can do,
On top of the questions above i can only ask which module you used for grid and column listing content on the homepage.

Great work, keep it up.

highvrahos’s picture

Hello Najim,

The grid/column listing in the homepage is done with Views, plus some custom javascript to be able to switch between them. Essentially it's a single view pane, with two different css layouts, javascript just adds / removes classes.

Although it's a grid, we use an unformatted list display in this view. The grid is created using css and the zen grid system. Check the Zen theme on that, it really simplifies the creation of similar displays.

Best regards,
Leonidas

KWang’s picture

Nice! I did not know Top7news was using Drupal...

Kevin

@Kambiyaso’s picture

I'm a newbie to drupal , geez whiz, is this what Drupal can help you do...? I'm thoroughly impressed....well done drupalstars.

simone960’s picture

It's just awesome ~ Thanks for sharing !

mbawazir’s picture

The website is wonderful ,

I have a question about traffic ,

as you said that traffic grows , in order to used Boost module and APC module , Is happened that the website was offline ,

and how many visitor per minute or what top number of visitor per minute ?

Thank you for value information

highvrahos’s picture

Thank you very much for your kind comment.

Sadly I can't disclose yet information regarding the traffic of the site, as is our agreement with the client.

But I can tell you this, the main reason for implementing cache mechanics in top7news isn't so much the growth of traffic, but the ever growing database.

Every day there are around 2000-3000 nodes added to the database (at this date there is a total of 260.000 nodes), thus in order to serve the site with a modest server setup it was crucial to off-load as much as we can the server (our main concern was to reduce the RAM needed for the page generation).

Best regards,
Leonidas

mbawazir’s picture

Thanks a lot , you have been really helpful

MikeWing’s picture

Great work!
How to insert google adsense in views like on your site?

highvrahos’s picture

Thank you Mike for your kind comment.

To answer your question, all you have to do is override the views-view-fields.tpl.php template file for the specific view in question. You can do that by copying the original template file from the views module and pasting it in the template folder of the theme you are using, renaming it to match your view's machine name (do a google search on that, there is plenty of info if you get stuck).

Inside this template you can append the google adsense script code to the $field->content variable, with your custom mark up that you'll use to style/position the ad.

For example:

       $adsense = '<div class="google-ad-box"><script ........... google adsense code...... </script></div>'
       $field->content = $field->content.$adsense;

Now if you do only that you will get a google ad inside every row, of course we don't want that, thus you will need to create a counter to track the row you want to append the ad to and print the adsense code only in there.

If the abode sounds too complicated you could use the Views PHP module, but I'd advise against that, as it's bad practice to store and execute PHP code from the database.

I hope I helped you, if you need anything in addition don't hesitate to ask.

Best regards,
Leonidas

MikeWing’s picture

Thank you very much, Leonidas. You have been really helpful. Good luck!

jimmic’s picture

Great site!
I'm very interested in making a similar site.
May I ask how you get the full text rss feed?

highvrahos’s picture

Hi Jimmi,

We are happy that you liked top7news.

Regarding your question in order to get the full text of a given article found in an rss feed you'll have to implement a web parser, download and scrape directly the HTML document the original article is found on.

Take a look at Feeds XPath Parser module, it provides excellent functionality to do that (although we don't use it in Top7news, as we've build our own parser, it is case specific).

Best regards,

jimmic’s picture

Thanks very much!

klnews’s picture

Bonjour,

Nouvel utilisateur de drupal et en création d'un site média à contenue, je souhaiterais savoir comment réaliser un frontal d'image comme le vôtre à savoir les articles sous forme d'image en première page

Merci d'avance pour votre aide

Cordialement

highvrahos’s picture

Good morning,

I apologise but my knowledge of the french language is elementary.

If you could post your query in english (or email it to info@sevenline.gr) I would be more grateful.

Thank you.

Regards,

klnews’s picture

Hello,

New user of drupal and owner of the site to www.klnews.fr (sité media with contents), I would like to know how to create a frontal bone with image as that of top7news?

Worth knowing(Namely) the creation of the panel of the front page?

Best Regards,

highvrahos’s picture

Replied by email.

Thank you,

Sandip Choudhury’s picture

What I have understand that this website is automatically taking the Feed from other website i.e. automatically coping the other website content. So, have you notice any effect in SEO ranking in Google?
Because, please see the link below what Google says about duplicating content of other website -

Webmaster Guidelines - https://support.google.com/webmasters/answer/35769
Automatically generated content - https://support.google.com/webmasters/answer/2721306
Scraped content - https://support.google.com/webmasters/answer/2721312

It is said that to avoid -
"Text generated from scraping Atom/RSS feeds or search results"
"Sites that copy and republish content from other sites without adding any original content or value"
"Sites that copy content from other sites, modify it slightly (for example, by substituting synonyms or using automated techniques), and republis"

Though I have seen in Google search result page, when I click some website link, and visited some website, it has content - when clicked - it just re-directed me to another website.
For example - http://www.junglee.com/. This is an Indian e-commerce website, who take contents (products) from other website and compare prices with each other. This website rant is good in Google Search result.

Do not know, why Google is telling something and showing another thing in search result. Don't know how Google measuring this, which to be shown and which to be not. I am little confused about this.

I have also thought to make this type of website but then I have thought that Google may punish the domain name in future. I am new in SEO. Please help, by letting me know your opinion about this.

Sandip Choudhury
http://hostingultraso.com

onelight’s picture

Hi Sandip Choudhury
did you get your answer??

Sandip Choudhury’s picture

No. I have not got the answer from the person who submit or create this website.
I am very eagerly waiting for the answer.

Sandip Choudhury
http://hostingultraso.com

highvrahos’s picture

Thank you for your interest in this project. We are more than happy to answer your questions with our comments and knowledge that we acquired building top7news.

Firstly, some background info, top7news is a special website that is aiming at a specific target group, with one single purpose: to provide users with an objective view of the current news, without the subjective flavour of human editors. Our purpose wasn't to replace or compete with the conventional news websites, but to supplement them with this tool, top7news.

Now to your question, yes, today Google Search, the way we see it, is smart enough to simulate human intelligence and nature. It searches the internet for new, unique, fresh articles to answer search requests. So, if you are just copying another site's content, then without any other measures (back links, so important), you aren't going to score high in the organic search.

If your site's main traffic comes from search results, then by all means avoid duplicate content, or at least try to supplement that with new fresh content.

Top7news is something different as it is promoted in various news channel and social media, thus only a small part of the traffic is organic.

I hope that I provided you with some insight. Feel free to ask any questions that you may have in addition!

Regards,

Sandip Choudhury’s picture

Thanks I have got my answer. And also I got an idea to promote this type of website through social media, because organic traffic will be hamper.

But what about copyright? Though you are giving backlinks, but the RSS feeds is for individual to read the news and not to gain any money (you are giving advertisement) through distributing those news.

I know that Greece copyright law is different from India and other countries. But just for knowledge I am asking. In India, national and local news papers probably buy news from international news agency like Thomson Reuters and also they mention it in the newspaper.

Are you paying any charges to those news website for contents?

And how many days happen that website is online?

Sandip Choudhury
http://hostingultraso.com

highvrahos’s picture

Yes, that's something that you will have to investigate on a per country base. Both legally and for each new source that you are going to parse.

That's what we have done with top7news, a per case permission request.

Top7news is online for about 10 months.

Sandip Choudhury’s picture

Thanks for sharing your experience.

Sandip Choudhury
http://hostingultraso.com

farcry4’s picture

Looks like i got motivated by Drupal Showcase.

Mackee’s picture

The design is too much of usatoday.com. It's like the usatoday with just a minor tweak in design.

adamjsmith’s picture

Yes, I can see that the design is a USAToday.com clone.

abhishek.imp’s picture

Hi,

I am making a site using drupal 7 . There is ios and android app for this site which consumes data from the drupal database using API. I wanted to send push notification to both ios and android app as soon as a new article is added to the website. I achieved this for android by using GCM module and created Rules using Rules module.
But I am not able to achieve this for the ios app. There is option available in rules modules for the apple push notification.

I will be very thankful if you can provide any solution.

Thanking you

highvrahos’s picture

Dear abhishek.imp,

I apologise for the late reply.

Sadly, we don't experience relevant to the issue you are facing. Top7news doesn't have yet a mobile app reader, content is available only though the main website for the moment.

Best regards,
Leonidas

sibiru’s picture

I see you use panel, how do you create the analog clock?

highvrahos’s picture

Hi Sibiru,

Yes, we are using Panels, although the analog clock it's just a custom content pane, nothing special Drupal wise.

Here you can find more info regarding the plugin we are using on Top7news:

http://joaquinnunez.cl/jquery-clock-plugin/

Best regards,
Leonidas

Jony_Niuqiang’s picture

Nice website case!I learn the case when I first met drupal,and it teach me a lot,especially the feeds module.Thank you again.