By patrick.gelin on
Hi,
I saw a benchmark on OpenBrick with Zope, SPIP, Templeet, and Drupal:
and it's very bad for Drupal v4.10....
- What about cache with last version 4.40 compared to 4.10 ?
- Is there any benchmark or test values accessibles?
- Is there a diffrence between private and public performances of publication?
My goal is to use Drupal for a portal for a community of 30,000 users.
Thanks.
Comments
Scalability
I have always frowned upon that benchmark: it is unclear how Drupal was configured though it is clear that caching must have been disabled (it was disabled by default). Either way, the benchmark is dated and Drupal 4.4.0 will outperform Drupal 4.1.0.
I don't know of any recent benchmarks, though there are Drupal sites with thousands of users. Drupal can handle 30.000 users.
In the contributions repository lives a script that let's you populate your database with thousands of users/posts in case you want to evaluate scalability.
Scalability
The automatic translation from Google looses some of the original text. It is explicitely stated that (I translate) "...No optimization was done, except that the caching available throught the admin screens was activated..." (this is the sentence translated by Google as '...No optimization was not used, only the mask available via the interface of administration was activated...' :-) )
So caching was apparently definitely enabled.
Nevertheless, as you say, these tests are now one year and 3 Drupal versions old.
load script
Hello Dries,
Could you point me to the script in the contributions repository that you refer to? I looked, but could not find it. It would be nice if such a tool already exists.
devel.module
These scripts are packaged with devel.module, in contributions/modules/devel/generate.
There are very big Drupal sit
There are very big Drupal sites out there, e.g. KDE Developers, so this should be possible I guess.
Ok, but what about maximum concurent users...
OK, may be you can register 30.000 users (What about LDAP connexion?) but this is not the crutial point. What is important is concurent users, i.e. users alive... Furthemore Drupal cache is efficient only with public content, not with private content which is not cached... (Is it possible to put the admin part of the site into a server and the published part into an other server???)
I read that Drupal use a database to store cached pages. Why do not use file system like SPIP? It seems to me this second solution would be more efficient and robust, because you can consult the content even if the database connexion is down...
Cacheing versus publishing
Absolutely. If you want to take cacheing to its logical conclusion, then why not take the option of publishing the static content to flat files (eg on a nightly basis)? That's how HTML was meant to work after all and whichever way you look at it I can't see how you cannot improve performance by cutting out two layers: PHP and SQL database.
Performance through Cacheing, Publishing, etc.
I’m curious, has anyone built a publishing system for Drupal, where various blocks, or portions of pages can be saved as rendered HTML and refreshed on some schedule?
Also, has anyone used a product like the Zend Performance Suite in conjunction with Drupal to cache db query results, parts of pages, or entire pages, etc. to enhance performance/reduce load on the various layers?
MAXIMUM CONCURRENT USERS... VERY IMPORTANT!
This is probably one of the MOST important discussions I've seen. With the proliferation of massive dating (match.com) and social networking sites (ecademy, friendster, tribe, linkedin, ryze) we are considering DRUPAL as the foundational code to handle above 10 million users with the possibilities of 2 million concurrently. It's got to do this to get into the next level of where CMS is ultimately heading with the aggregation of human collections online.
If drupal can't do this "today with 4.4.0", what IS needed to get it there?
Scalability
It is hard to say as none of us have deployed Drupal for such a popular site. With two million concurrent users you'd have one of the most popular websites on the internet.
To give you an idea, Slashdot has 2 million visitors/day (not 2 million concurrent users). Their hardware consists of 5 load balanced webservers dedicated to pages, 3 load balanced webservers dedicated to images, 1 SQL server and 1 NFS Server. Clearly, it takes a number of people to administer and tweak both hardware and software on a day-to-day basis.
Ecademy.com, a Drupal site, has 'only' 30.000 registered users and, from what I've seen, between 50 to 250 concurrent users.
Managing 10 million users is like running a small country -- it is not feasible with a stock Drupal 4.4.0.
Scalability...what does one have to do?
I could not pass this discussion by...it's far too interesting and important. I'm not a computer scientist by profession(sorry!)...I'm a mathematician...but it's still all numbers stuffed into algorithims with computations being spit out by modern day calculators...our computers...so I will add my two cents for what it's worth. My comments apply to open-source cms's only, not custom made multi-million-euro behemoths. I believe Drupal to be the overall BEST CMS I have ever seen...and I have seen many, certainly not all...but dozens. I personally do not know of ANY cms including SPIP that can handle 10 million users. I have used SPIP for research purposes only...yes a controlled environment. It's nice, but not Drupal. I don't like it's general design, it lacks templates, it lacks THOROUGH documentation, it lacks the fine tuning features of Drupal. What do I mean by fine tuning features? The web is a viral, self organizing structure built atop a very organized...though at times it seems not to be...more stable server foundation which is our beloved internet. When you have the possibility of ANYONE with a computer at their disposal interacting with your site...you have great opportunity and great risk. Interaction is a two way street on the web...Drupal reduces this risk to your site with such goodies as database page caching, flood control with throttling and of course clean code. I find clean code the most important because EVERYTHING, including the above-mentioned functionality, is built on top of it. Take Tiki or PhpNuke...sloppy, sloppy code. Yes, they have all sorts of added gizmos that Drupal may not yet have...who cares....neiter one of these cms's can scale on their own WITHOUT problems if at all. Certainly not out of the box. To me, true scalability is a combination of clean core software running on a massive amounts of hardware, properly organized to balance and redistribute loads incurred...nothing more. It is often forgotten that today's big sites like Ebay and Amazon were created in an environment TOTALLY different than today's web. Spam did'nt yet exist and a virus was something you saw a doctor for. The requirements were far lower to get a site up and running. I agree storing cached pages in a database may not produce lightining fast results...but I am VERY HAPPY Drupal does this. Storing information in flat files is a catastrophe waiting to happen if you are talking about 10 million users. Cutting out Php and Sql is also a catastrophe waiting to happen. No other language in computerdom offers the functionality, flexibility, and ease of use that Php does. Like it or not....Sql is the lingua-franca of databases and it inegrates beautifully with Php applications...thank you Php designers. Drupal offers a solid, SAFE, modular foundation to build your 10 million person site...just not out of the box. Anyone here think that the Ebay of today has ANY of it's original code left? No, it HAD to scale which means rewriting code. With Drupal you have this solid foundation to begin with. I'm curious, does anyone know how many CONCURRENT visitors Ebay or Amazon get's on an average day?
Thank's for reading,
Larry
--There are no Kangaroos in Austria--
Merging Television and the Internet
Grasping to hold the sky?
In the US a popular TV show will have 20million + in it's audience for a period of continuous 'use' for approximately 1-hour. I'm extrapolating a bit... assume that half of this 20 million will actually 'interact' with this popular TV show. Those who want to interact will be required to become 'members.' Why?
In order to create some form of filter, and more importantly to appease the future of what advertising will become, there will need to be a simplistic login process. Simplistic profiles (age, location etc.) are the most important 'take away' to those paying for the program... So, the NET will be utilized and in some cases syncronized with this broadcast.
I understand Shirky's thoughts of a community limit of 150. And an audience is certainly NOT a community in the sense that most of us would define it. However, for a "quick BURST" during this one hour TV show there is a "community of interest" a play off of Dr. Wengers "Community of Practice."
I'm beginning to envision a form of p2p distribution that links a global audience/community of this massive size. See article on i2hub knowing that drupal sites will interact with one another much like live journal does...
I'm curious if I'm off here Dries...James??
This is good, and yes, I believe Marc Canter is here already with Chris...
"by logging in with your 'drupal' ID into drupal-based sites, you have an automatic registration... i regularly use an enjoy this feature... now, what if my profile was ported in between these various sites, so i did not need to manage multiple profiles... what if those sites i was registered at were pinged back, and posted in my profile." (From: http://drupal.org/node/view/6835 )
PHP Scalability and Friendster
There are some excellent links about PHP and scalability here: http://shiflett.org/archive/46 The below exerpt is from this blog:
"The present discussion is about developing Web applications that scale well, and whether particular languages, technologies, and platforms are more appropriate than others. My opinion is that some things scale more naturally than others, and Rasmus's explanation above touches on this. PHP, when compiled as an Apache module (mod_php), fits nicely into the basic Web paradigm. In fact, it might be easier to imagine PHP as a new skill that Apache can learn. HTTP requests are still handled by Apache, and unless your programming logic specifically requires interaction with another source (database, filesystem, network), your application will scale as well as Apache (with a decrease in performance based upon the complexity of your programming logic). This is why PHP naturally scales. The caveat I mention is why your PHP application may not scale.
A common (and somewhat trite) argument being tossed around is that scalability has nothing to do with the programming language. While it is true that language syntax is irrelevant, the environments in which languages typically operate can vary drastically, and this makes a big difference. PHP is much different than ColdFusion or JSP. In terms of scalability, PHP has an advantage, but it loses a few features that some developers miss (which is why there are efforts to create application servers for PHP). The PHP versus JSP argument should focus on environment, otherwise the point gets lost."