I've been simple testing like crazy over the last couple of days and have found that MySQL seems to finish its tests _WAY_ earlier than PostgreSQL does. This is a critical issue as it will critically change anyone's reason to choose PostgreSQL as the database to run Drupal on. In which case you might as well say goodbye to PostgreSQL in Drupal.
I'm not sure why things are so slow which is why I'm making this issue, I do know however it does suck PHP's memory also as the Javascript simpletest blows out of memory on my machine and I use 128MB of memory for PHP alone.
I'm happy to tackle this issue but would much appreciate help on the topic.
My IRC nick is fiasco in #drupal on freenode, feel free to talk to me if I'm around.
I know DamZ has been promoting things to the PostgreSQL Surge but I think this definately needs to be put up so I did it for him ;p
| Comment | File | Size | Author |
|---|---|---|---|
| #9 | Screenshot.png | 923.67 KB | josh waihi |
Comments
Comment #1
kbahey commentedSubscribe.
Can you please post the time for MySQL vs. time for Postgres? Just to give a baseline against any improvements.
Also, if anyone has contacts for people in the PostgreSQL community, please point them to this issue so those interested can jump in with suggestions, patches, ...etc.
Comment #2
Crell commentedHow do you have postgres configured? Its default configuration on most Linux distros is, as I understand, horrific. SSL is frequently enabled for no reason, the memory settings are all wrong, etc. I don't know the details for how to speed it up, but I'm sure Google does. That is probably worth looking into (and documenting!) before we try to micro-optimize the Drupal Postgres driver.
That said, I suspect most of the extra time in the Drupal Postgres driver is due to the need to hit the schema for all insert/update queries. There's no way around that because of Postgres' requirement for special handling of BLOB fields, at least not without patching PDO itself to do internally what we're doing in userspace. :-)
Comment #3
damien tournoud commentedMy experience of before-DB:TNG times, when I first battled to make simpletest work on PostgreSQL is that Drupal was already slower on PostgreSQL than on MySQL (both in their default Debian configuration). So I'm not sure it comes from the schema overhead.
Anyway, about that, I'm pretty sure we could issue a simply query against the information_schema to grab all the columns of type blob. That would avoid wasting memory with the full schema when we in fact only need to know which columns are blobs.
Comment #4
Crell commentedHm. If the extra query is faster than hitting the schema, I'm fine with it. We can optimize for Postgres specifically in the driver however makes sense. That sounds like it could also at least partially solve #301038: Add a cross-compatible database schema introspection API. Damien, can you open a new issue to give that a try?
Comment #5
alexanderpas commentedWrite queries in PostgreSQL are always slower, when compared to mysql, due to the explicit syncing it does. (in order to protect the data.) (afaik)
Comment #6
alexanderpas commented#11 was already taken ;)
Comment #7
damien tournoud commentedWell of course it would only be fair to compare PostgreSQL vs. Innodb.
Comment #8
josh waihi commentedI used some xdebug to find out what is happening on load, I used the schema api test. attached is what I got back, you'll need kcachegrind (sudo apt-get install kcachegrind) to beable to view it. The biggest thing I noticed is that the PDO execute function is called 1740 times in mysql and takes 26.11 s in total while in postgres it gets called 1845 times and takes 48.06 s. So why does the same simple test use more queries in postgres and for 105 more querys, postgres takes almost twice as long.
.... just tried uploading the files, they didn't come back after I hit attach. so they may not be attached
Comment #9
josh waihi commentedattached screenshot instead
Comment #10
alexanderpas commentedyou might want to use EXPLAIN to find the reason for the slow queries.
http://www.postgresql.org/docs/8.3/static/sql-explain.html
Comment #11
damien tournoud commentedWhen comparing MySQL on MyISAM with PostgreSQL (with fsync = 0, and with a pgpool frontend with query caching) I see consistently that Drupal on PostgreSQL is two times slower than on MySQL (both with caching and without caching).
Comment #12
josh waihi commentedIt does seem like there is very little in the way of PHP optimizing that can be done (that) I can see. The whole 2x slower was comparing MySQL MyISAM to Postgres which of cause will be faster - also its worth noting that simpletests do alot of creating and droping of things which has more over head than inserting, updating, deleteing and selecting. I talked to a PostgreSQL DBA from my company how has helped me spend up the performance of my postgres instance. It seems the simpletests, while testing Drupal, require different optimizations then a normal production or development environment. One of these optimizations being putting PostgreSQL databases on sepereate partitions formated for quicker creates and drops. He kinda whizzed around my computer a little faster than I could keep up. When I understand, maybe I'll write a node about it.
Comment #13
Crell commentedI'm inclined to remove this issue from the surge list. To keep Postgres in core it needs to work without bugs, not work at optimal speed. Vis, I don't think this issue is a blocker for supporting Postgres, especially when it looks like the main issue is not in PHP to begin with but in the server configuration.
Comment #14
miklYeah, any sort of comparison against MyISAM is useless, given the fact that it's hardly the same thing. MyISAM doesn't do transactions, referential integrity, only does table-level locking, is not ACID compliant, etc.
So it does a lot less work and is thus able to do many more queries.
Also, MySQL has a built-in query cache, that, if enabled, will skew the results of a benchmark even more.
So, any sort of testing with less than ~100 concurrent database connections is more or less bound to come out in MyISAMs favour. So this is, in my humble opinion, not a bug.
Comment #15
alexanderpas commented/me agrees, not a surge item.
Comment #16
drewish commentedsubscribing. i was amazed at how slow pgsql is running the tests. it'd be great to speed it up a bit just to keep people willing to test it.
Comment #18
josh waihi commented@drewish: I found when I used a testbed rather than a browser, the tests went alot faster
also seperating the browser, websever and database out to different machines helped alot - more cpu for everyone.
Comment #19
miklA good way to make the tests run a lot faster on on PostgreSQL is to run them inside a transaction, like Django has just started to do. Then we can just rull back whatever we did, instead of having to reset the entire database for each test :)
http://simonwillison.net/2009/Jan/16/fast/
Comment #20
josh waihi commented@mikl, this is a great idea however, I think there a test that want the db queries to fail. if a db query fails in postgres, no further SQL commands can be executed untill a rollback occurs
Comment #21
ivansb@drupal.orgPostgreSQL and MySQL behaviour are quite different on different workloads.
Furthermore PostgreSQL spend some more time taking care of data integrity and fast roll backs.
Anyway a couple of tricks:
- you've to increase shared memory on most default Linux setup and then adjust shared memory in postgresql config too. PostgreSQL and Linux use very conservative setup.
Just hints
/etc/sysctl.conf
kernel.shmmax=134217728
kernel.shmall=2097152
postgresql.conf
shared_buffers = 120MB
work_mem = 32MB
Actual settings depends on memory on the box and quality of workload, but I'd say these are the first things to consider.
- EXPLAIN and EXPLAIN ANALYZE may help to tune queries
- VACUUM ANALYZE may help. I've seen dramatic improvements running queries on a just restored DB and on a just restored DB once I ran VACUUM ANALYZE
- log_min_duration_statement = may help to see where the problems are
Comment #22
josh waihi commentedI think the majority of slowness is the bottle neck of creating and dropping tables, that seems to be the biggest problem. VACUUM and VACUUMN FULL won't do anything since the tables are dropped after tests are run. And tables are essentially files on the file system.
Tests are a unique condition and don't replicate production environements. I think tests have gotten better though, maybe since drupal schema doesn't get loaded for every insert and update
Comment #23
grub3 commentedThe main advantage of PostgreSQL over MySQL is the ability to log long-running queries, have full access to statistics on index/disc use and optimize queries. This is a long issue to discuss. For example, UPDATEs can be written in batches, which boosts performance. Choose:
fsync = offAt first, you need to set shared memory to a good level and tune Postgresql.conf to make sure that indexes are loaded in memory. Then all your queries should run in less than 30 ms. Make sure you are using pgAdmin3 graphical interface. It has good representation of statistics on index/seqential scans.
I am presently migrating a 500.000 user messages board from PhpBB to Drupal. I log every query > 30 ms and there is a bunch of them.
In postgresql.conf, set:
Example:
2009-03-13 16:05:24 CET LOG: durée : 305.646 ms, instruction : SELECT iid FROM aggregator_item WHERE fid = 55 AND link = 'http://www.ouest-france.fr/ofdernmin_-Le-groupe-immobilier-Celeos-repris-avec-une-centaine-de-salaries-en-moins_-855374--BKN_actu.Htm'In this example you can see that the database is querying data using index on fid and then does a sequential scan on link. Looking at the database structure, you find that the table aggregator_item misses an index on link.
So I added the index manually, I hope you can add it to Drupal code:
Drupal is full of these issues. My logs are completely feed-up (Mbytes), which means that a real SQL query design work is needed to improve Drupal SQL queries. This is not a MySQL or a PostgreSQL issue. But using PostgreSQL during design helps and will benefit MySQL.
Also I would like to point out that I manage one site using MySQL with 1000 user and another using PostgreSQL with 500.000 nodes. The problem with MySQL is that it breaks under heavy load and sometimes your page is full of SQL queries with no result.
I would say: design Drupal and run large professional sites using PostgreSQL, and use MySQL for small/medium sites. I am not against MySQL, I can only discourage people from using MySQL in a serious development work like Drupal.
Comment #24
grub3 commentedQuery plan shows full usage of index, no sequential scan i what PostgreSQL and ANY database including MySQL likes :
After setting an index, the query runs in 60 ms:
Could someone add the index on aggregator_item?
Comment #25
grub3 commentedI opened a Drupal query surge index page:
http://drupal.org/node/401040
Edit: I think I found a possibel flood of Drupal forum.
Comment #26
greg1104 commentedI wanted to provide clarification about some of the points raised above.
Any test that includes a heavy dose of table modifications (adding or dropping for example) is extraordinarly unfair to PostgreSQL, due to its transactional DDL capabilities. Basically, any change like that you make can be rolled back, which adds considerable overhead to the whole thing. See http://wiki.postgresql.org/wiki/Transactional_DDL_in_PostgreSQL:_A_Compe... for more details. This isn't really an important performance concern in production, where such changes are rare compared to routine queries, but it can really skew performance comparisons for typical regression tests that do lots of table changes.
If there's any common Drupal benchmark that focuses more on the query side of things, that might be a more fair comparison to run. Also, even as a fan I'd never suggest the PostgreSQL is the best choice for trivially sized data sets or user counts. Pop a million rows in and throw over 100 clients at the server if you want to start making a more real-world comparison here. There may very well still be a performance gap, but that would make it more likely you'd be optimizing the important bits here.
Tuning a PostgreSQL server for good performance hasn't really been well documented until around a year ago when I created http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server to collect the best practices here. That covers all the common optimizations now.
In older versions of PostgreSQL, the usual way to turn off the need to commit after every transactions was to set "fsync = off", as suggested above. Starting in PostgreSQL 8.3, the synchronous_commit feature provides similar speed improvements in a way that might even be safe enough to use in some production environments.
Postgres really expects that it has good data about the tables it's working with available, in the form of the statistics it collects via the ANALYZE feature. If you insert a large number of records, any recent version will try to take care of that automatically via the autovacuum daemon; older ones (before 8.2) aren't so good at that. Note that this is independent of the need to VACUUM old tables to get rid of dead rows. The only coupling between the two is that both are routine maintenance features and are therefore often combined into one operation: "VACUUM ANALYZE" is simply VACUUM followed by ANALYZE, and the autovacuum daemon handles both tasks.
Comment #27
damien tournoud commentedWell, there are two different questions here (I voluntarily put aside the general SQL design considerations of jmpoure, which apply the same to all the database engines we support):
Comment #28
kbahey commentedThis is not directed at any particular person in this thread, but rather a common trend that I have seen every time PostgreSQL is discussed.
We get people extolling the virtues of PostgreSQL vs. MySQL, that it is free and not owned by any entity, that it has a superior architecture, that it is more robust, ...etc.
All of the above is true in absolute terms, e..g when you are coding an application from scratch and decided from the start to use PostgreSQL, and optimize your application for it. It is a moot point though when we are talking about an existing application that is widely used such as Drupal, they are moot points.
So, to sum up: Enough preaching: we are on the same side, and we are tired of hearing this again and again. Most importantly, let us see practical solutions (i.e. concrete patches that make Drupal better with PostgreSQL, and testing/porting of existing modules to work with it).
If you care about PostgreSQL, no one will do the work for you. Jump in and join the rest of us ...
Comment #29
Shiny commentedsubscribing
Comment #30
miklClarified title, the original bug report is about performance when running simpletests.
I think there might be a possible solution in truncating the tables between tests instead of having them recreated for each test, and then wrapping each test in a transaction.
Doing that would involve a bit of refactoring of the test system, though, and I wonder if it's worth the trouble.
Comment #31
josh waihi commented@mikl, that wouldn't really work as the schema often changes during a unit tests, also different tests will require the database to being different states, for example, install modules that are not installed by the default profile.
I've thought a lot about this, but because of the diversity between schema possibilities for each unit test, truncating/caching/restoring an existing schema simply won't work. We'd also be doing something that Drupal doesn't do which falsifies the testing process to some extent.
Alternatively, defining optimizations for PostgreSQL for the purpose of running unit tests is definitely an approach worth investing in. I've got a blog post that mentions a few things that can help PostgreSQL move faster during test: http://geek.joshwaihi.com/content/performance-tuning-postgresql-drupal-7...
Comment #32
Crell commentedchx recently noted to me that SimpleTest now has 2 base classes we can inherit from for unit test, one of which does not do a full reinstall. He recommended moving the bulk of the database test classes over to that since they really don't need a full Drupal install; just a working database. I wanted to get a few outstanding unit-test-adding patches committed before we do that sort of refactoring but that should greatly reduce the amount of schema traffic necessary for the DB tests, which currently are some of the slowest. That should help all DBs in test runs, but Postgres more than the others.
Comment #33
drewish commentedCrell — this is sort of tangental to this thread but the one problem with the db-less base classes is that there's no db at all. So you can very easily test functions that don't have any db-calls but sadly there are very few such candidates. Lots of code has a call to watchdog() in the error handling and other functions that require database support.
Comment #34
Crell commentedchx implied that the DB layer itself still works; that is, we can create and populate just the tables that the DB tests use and not the 50 others that Drupal includes by default. If PostgreSQL is slow on the table creation operations, that's a lot of table creations we can skip.
It's tangential, but tangential does mean it touches. :-)
Comment #35
josh waihi commentedI did a quick test to see where the bottle neck was :
The results showed that Setup seems to be the biggest bottleneck taking between 6-12 seconds to complete. While teardown takes around 2 seconds and runtime varies.
Comment #36
josh waihi commentedMoving this to the simpletest component as this isn't, as far as I can see, a DBTNG issue.
Comment #37
tpfeiffer commentedSetting fsync=off in the postgresql.conf *massively* speeds up performance, but since this is a cluster-wide setting, think twice before you enable this on your production server. Some people from #postgresql on irc.freenode.net consider "fsync=off" equivalent to "inappropriate for any scenario where you care even a little bit about your data".
Another consideration might be to use TEMPORARY tables for simpletest, but then AFAIK all statements need to be within the same transaction.
Comment #38
poker10 commentedI am closing this as in the latest release (7.90) the PostgreSQL tests run as fast as MySQL tests.
See the PostgreSQL test: https://drupal.org/comment/14538642 in
#3259739: [meta] Priorities for 2022-06-01 release of Drupal 7
If someone think that the "fixed" status will suit better, feel free to change it.