I've been simple testing like crazy over the last couple of days and have found that MySQL seems to finish its tests _WAY_ earlier than PostgreSQL does. This is a critical issue as it will critically change anyone's reason to choose PostgreSQL as the database to run Drupal on. In which case you might as well say goodbye to PostgreSQL in Drupal.

I'm not sure why things are so slow which is why I'm making this issue, I do know however it does suck PHP's memory also as the Javascript simpletest blows out of memory on my machine and I use 128MB of memory for PHP alone.

I'm happy to tackle this issue but would much appreciate help on the topic.
My IRC nick is fiasco in #drupal on freenode, feel free to talk to me if I'm around.

I know DamZ has been promoting things to the PostgreSQL Surge but I think this definately needs to be put up so I did it for him ;p

CommentFileSizeAuthor
#9 Screenshot.png923.67 KBjosh waihi

Comments

kbahey’s picture

Status: Active » Postponed (maintainer needs more info)

Subscribe.

Can you please post the time for MySQL vs. time for Postgres? Just to give a baseline against any improvements.

Also, if anyone has contacts for people in the PostgreSQL community, please point them to this issue so those interested can jump in with suggestions, patches, ...etc.

Crell’s picture

How do you have postgres configured? Its default configuration on most Linux distros is, as I understand, horrific. SSL is frequently enabled for no reason, the memory settings are all wrong, etc. I don't know the details for how to speed it up, but I'm sure Google does. That is probably worth looking into (and documenting!) before we try to micro-optimize the Drupal Postgres driver.

That said, I suspect most of the extra time in the Drupal Postgres driver is due to the need to hit the schema for all insert/update queries. There's no way around that because of Postgres' requirement for special handling of BLOB fields, at least not without patching PDO itself to do internally what we're doing in userspace. :-)

damien tournoud’s picture

That said, I suspect most of the extra time in the Drupal Postgres driver is due to the need to hit the schema for all insert/update queries.

My experience of before-DB:TNG times, when I first battled to make simpletest work on PostgreSQL is that Drupal was already slower on PostgreSQL than on MySQL (both in their default Debian configuration). So I'm not sure it comes from the schema overhead.

Anyway, about that, I'm pretty sure we could issue a simply query against the information_schema to grab all the columns of type blob. That would avoid wasting memory with the full schema when we in fact only need to know which columns are blobs.

Crell’s picture

Hm. If the extra query is faster than hitting the schema, I'm fine with it. We can optimize for Postgres specifically in the driver however makes sense. That sounds like it could also at least partially solve #301038: Add a cross-compatible database schema introspection API. Damien, can you open a new issue to give that a try?

alexanderpas’s picture

Write queries in PostgreSQL are always slower, when compared to mysql, due to the explicit syncing it does. (in order to protect the data.) (afaik)

alexanderpas’s picture

Title: PostgreSQL surge #11: Performance significantly slower than MySQL » PostgreSQL surge #12: Performance significantly slower than MySQL

#11 was already taken ;)

damien tournoud’s picture

Well of course it would only be fair to compare PostgreSQL vs. Innodb.

josh waihi’s picture

Status: Postponed (maintainer needs more info) » Active

I used some xdebug to find out what is happening on load, I used the schema api test. attached is what I got back, you'll need kcachegrind (sudo apt-get install kcachegrind) to beable to view it. The biggest thing I noticed is that the PDO execute function is called 1740 times in mysql and takes 26.11 s in total while in postgres it gets called 1845 times and takes 48.06 s. So why does the same simple test use more queries in postgres and for 105 more querys, postgres takes almost twice as long.

.... just tried uploading the files, they didn't come back after I hit attach. so they may not be attached

josh waihi’s picture

StatusFileSize
new923.67 KB

attached screenshot instead

alexanderpas’s picture

you might want to use EXPLAIN to find the reason for the slow queries.

http://www.postgresql.org/docs/8.3/static/sql-explain.html

damien tournoud’s picture

When comparing MySQL on MyISAM with PostgreSQL (with fsync = 0, and with a pgpool frontend with query caching) I see consistently that Drupal on PostgreSQL is two times slower than on MySQL (both with caching and without caching).

josh waihi’s picture

It does seem like there is very little in the way of PHP optimizing that can be done (that) I can see. The whole 2x slower was comparing MySQL MyISAM to Postgres which of cause will be faster - also its worth noting that simpletests do alot of creating and droping of things which has more over head than inserting, updating, deleteing and selecting. I talked to a PostgreSQL DBA from my company how has helped me spend up the performance of my postgres instance. It seems the simpletests, while testing Drupal, require different optimizations then a normal production or development environment. One of these optimizations being putting PostgreSQL databases on sepereate partitions formated for quicker creates and drops. He kinda whizzed around my computer a little faster than I could keep up. When I understand, maybe I'll write a node about it.

Crell’s picture

I'm inclined to remove this issue from the surge list. To keep Postgres in core it needs to work without bugs, not work at optimal speed. Vis, I don't think this issue is a blocker for supporting Postgres, especially when it looks like the main issue is not in PHP to begin with but in the server configuration.

mikl’s picture

Yeah, any sort of comparison against MyISAM is useless, given the fact that it's hardly the same thing. MyISAM doesn't do transactions, referential integrity, only does table-level locking, is not ACID compliant, etc.
So it does a lot less work and is thus able to do many more queries.
Also, MySQL has a built-in query cache, that, if enabled, will skew the results of a benchmark even more.

So, any sort of testing with less than ~100 concurrent database connections is more or less bound to come out in MyISAMs favour. So this is, in my humble opinion, not a bug.

alexanderpas’s picture

Title: PostgreSQL surge #12: Performance significantly slower than MySQL » PostgreSQL Performance significantly slower than MySQL
Priority: Critical » Normal

/me agrees, not a surge item.

drewish’s picture

subscribing. i was amazed at how slow pgsql is running the tests. it'd be great to speed it up a bit just to keep people willing to test it.

josh waihi’s picture

@drewish: I found when I used a testbed rather than a browser, the tests went alot faster

also seperating the browser, websever and database out to different machines helped alot - more cpu for everyone.

mikl’s picture

A good way to make the tests run a lot faster on on PostgreSQL is to run them inside a transaction, like Django has just started to do. Then we can just rull back whatever we did, instead of having to reset the entire database for each test :)
http://simonwillison.net/2009/Jan/16/fast/

josh waihi’s picture

@mikl, this is a great idea however, I think there a test that want the db queries to fail. if a db query fails in postgres, no further SQL commands can be executed untill a rollback occurs

ivansb@drupal.org’s picture

PostgreSQL and MySQL behaviour are quite different on different workloads.
Furthermore PostgreSQL spend some more time taking care of data integrity and fast roll backs.

Anyway a couple of tricks:
- you've to increase shared memory on most default Linux setup and then adjust shared memory in postgresql config too. PostgreSQL and Linux use very conservative setup.
Just hints
/etc/sysctl.conf
kernel.shmmax=134217728
kernel.shmall=2097152
postgresql.conf
shared_buffers = 120MB
work_mem = 32MB
Actual settings depends on memory on the box and quality of workload, but I'd say these are the first things to consider.

- EXPLAIN and EXPLAIN ANALYZE may help to tune queries
- VACUUM ANALYZE may help. I've seen dramatic improvements running queries on a just restored DB and on a just restored DB once I ran VACUUM ANALYZE
- log_min_duration_statement = may help to see where the problems are

josh waihi’s picture

I think the majority of slowness is the bottle neck of creating and dropping tables, that seems to be the biggest problem. VACUUM and VACUUMN FULL won't do anything since the tables are dropped after tests are run. And tables are essentially files on the file system.

Tests are a unique condition and don't replicate production environements. I think tests have gotten better though, maybe since drupal schema doesn't get loaded for every insert and update

grub3’s picture

The main advantage of PostgreSQL over MySQL is the ability to log long-running queries, have full access to statistics on index/disc use and optimize queries. This is a long issue to discuss. For example, UPDATEs can be written in batches, which boosts performance. Choose:
fsync = off

At first, you need to set shared memory to a good level and tune Postgresql.conf to make sure that indexes are loaded in memory. Then all your queries should run in less than 30 ms. Make sure you are using pgAdmin3 graphical interface. It has good representation of statistics on index/seqential scans.

I am presently migrating a 500.000 user messages board from PhpBB to Drupal. I log every query > 30 ms and there is a bunch of them.

In postgresql.conf, set:

log_min_duration_statement = 30

Example:

2009-03-13 16:05:24 CET LOG: durée : 305.646 ms, instruction : SELECT iid FROM aggregator_item WHERE fid = 55 AND link = 'http://www.ouest-france.fr/ofdernmin_-Le-groupe-immobilier-Celeos-repris-avec-une-centaine-de-salaries-en-moins_-855374--BKN_actu.Htm'

EXPLAIN SELECT iid FROM aggregator_item WHERE fid = 55 
AND link = 'http://www.ouest-france.fr/ofdernmin_-Le-groupe-immobilier-Celeos-repris-avec-une-centaine-de-salaries-en-moins_-855374--BKN_actu.Htm'
"  Recheck Cond: (fid = 55)"
"  Filter: ((link)::text = 'http://www.ouest-france.fr/ofdernmin_-Le-groupe-immobilier-Celeos-repris-avec-une-centaine-de-salaries-en-moins_-855374--BKN_actu.Htm'::text)"
"  ->  Bitmap Index Scan on aggregator_item_fid_idx  (cost=0.00..8.69 rows=58 width=0)"
"        Index Cond: (fid = 55)"

In this example you can see that the database is querying data using index on fid and then does a sequential scan on link. Looking at the database structure, you find that the table aggregator_item misses an index on link.

So I added the index manually, I hope you can add it to Drupal code:

CREATE INDEX aggregator_item_link
  ON aggregator_item
  USING btree
  (link);

Drupal is full of these issues. My logs are completely feed-up (Mbytes), which means that a real SQL query design work is needed to improve Drupal SQL queries. This is not a MySQL or a PostgreSQL issue. But using PostgreSQL during design helps and will benefit MySQL.

Also I would like to point out that I manage one site using MySQL with 1000 user and another using PostgreSQL with 500.000 nodes. The problem with MySQL is that it breaks under heavy load and sometimes your page is full of SQL queries with no result.

I would say: design Drupal and run large professional sites using PostgreSQL, and use MySQL for small/medium sites. I am not against MySQL, I can only discourage people from using MySQL in a serious development work like Drupal.

grub3’s picture

EXPLAIN SELECT iid FROM aggregator_item WHERE fid = 55
AND link = 'http://www.ouest-france.fr/ofdernmin_-Le-groupe-immobilier-Celeos-repris-avec-une-centaine-de-salaries-en-moins_-855374--BKN_actu.Htm'

Query plan shows full usage of index, no sequential scan i what PostgreSQL and ANY database including MySQL likes :

"Index Scan using aggregator_item_link on aggregator_item  (cost=0.00..8.28 rows=1 width=4)"
"  Index Cond: ((link)::text = 'http://www.ouest-france.fr/ofdernmin_-Le-groupe-immobilier-Celeos-repris-avec-une-centaine-de-salaries-en-moins_-855374--BKN_actu.Htm'::text)"
"  Filter: (fid = 55)"

After setting an index, the query runs in 60 ms:

Total query runtime: 61 ms.
1 rows retrieved.

Could someone add the index on aggregator_item?

grub3’s picture

I opened a Drupal query surge index page:
http://drupal.org/node/401040

Edit: I think I found a possibel flood of Drupal forum.

greg1104’s picture

I wanted to provide clarification about some of the points raised above.

Any test that includes a heavy dose of table modifications (adding or dropping for example) is extraordinarly unfair to PostgreSQL, due to its transactional DDL capabilities. Basically, any change like that you make can be rolled back, which adds considerable overhead to the whole thing. See http://wiki.postgresql.org/wiki/Transactional_DDL_in_PostgreSQL:_A_Compe... for more details. This isn't really an important performance concern in production, where such changes are rare compared to routine queries, but it can really skew performance comparisons for typical regression tests that do lots of table changes.

If there's any common Drupal benchmark that focuses more on the query side of things, that might be a more fair comparison to run. Also, even as a fan I'd never suggest the PostgreSQL is the best choice for trivially sized data sets or user counts. Pop a million rows in and throw over 100 clients at the server if you want to start making a more real-world comparison here. There may very well still be a performance gap, but that would make it more likely you'd be optimizing the important bits here.

Tuning a PostgreSQL server for good performance hasn't really been well documented until around a year ago when I created http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server to collect the best practices here. That covers all the common optimizations now.

In older versions of PostgreSQL, the usual way to turn off the need to commit after every transactions was to set "fsync = off", as suggested above. Starting in PostgreSQL 8.3, the synchronous_commit feature provides similar speed improvements in a way that might even be safe enough to use in some production environments.

Postgres really expects that it has good data about the tables it's working with available, in the form of the statistics it collects via the ANALYZE feature. If you insert a large number of records, any recent version will try to take care of that automatically via the autovacuum daemon; older ones (before 8.2) aren't so good at that. Note that this is independent of the need to VACUUM old tables to get rid of dead rows. The only coupling between the two is that both are routine maintenance features and are therefore often combined into one operation: "VACUUM ANALYZE" is simply VACUUM followed by ANALYZE, and the autovacuum daemon handles both tasks.

damien tournoud’s picture

Well, there are two different questions here (I voluntarily put aside the general SQL design considerations of jmpoure, which apply the same to all the database engines we support):

  • How to improve the performance of testing on PostgreSQL? As noted above, testing is very DDL intensive, which strongly penalizes PostgreSQL. That's one part of the problem. Another part is that our testing framework supports running tests in parallel, but that seems to fail hard on PostgreSQL. Any ideas why would be very appreciated.
  • How to improve general Drupal performance on PostgreSQL? At this time, we have no aggregated performance index for Drupal. But you can *feel* the difference between two local development environments running on MySQL and PostgreSQL. This "feeling" needs to be quantified and analyzed. Part of it is probably psychological, but there is definitely room for improvement in several key areas. We need to analyze the performance gap first, then discuss possible solutions.
kbahey’s picture

This is not directed at any particular person in this thread, but rather a common trend that I have seen every time PostgreSQL is discussed.

We get people extolling the virtues of PostgreSQL vs. MySQL, that it is free and not owned by any entity, that it has a superior architecture, that it is more robust, ...etc.

All of the above is true in absolute terms, e..g when you are coding an application from scratch and decided from the start to use PostgreSQL, and optimize your application for it. It is a moot point though when we are talking about an existing application that is widely used such as Drupal, they are moot points.

So, to sum up: Enough preaching: we are on the same side, and we are tired of hearing this again and again. Most importantly, let us see practical solutions (i.e. concrete patches that make Drupal better with PostgreSQL, and testing/porting of existing modules to work with it).

If you care about PostgreSQL, no one will do the work for you. Jump in and join the rest of us ...

Shiny’s picture

subscribing

mikl’s picture

Title: PostgreSQL Performance significantly slower than MySQL » Simpletests run significantly slower on PostgreSQL due to schema recreation

Clarified title, the original bug report is about performance when running simpletests.

I think there might be a possible solution in truncating the tables between tests instead of having them recreated for each test, and then wrapping each test in a transaction.

Doing that would involve a bit of refactoring of the test system, though, and I wonder if it's worth the trouble.

josh waihi’s picture

@mikl, that wouldn't really work as the schema often changes during a unit tests, also different tests will require the database to being different states, for example, install modules that are not installed by the default profile.

I've thought a lot about this, but because of the diversity between schema possibilities for each unit test, truncating/caching/restoring an existing schema simply won't work. We'd also be doing something that Drupal doesn't do which falsifies the testing process to some extent.

Alternatively, defining optimizations for PostgreSQL for the purpose of running unit tests is definitely an approach worth investing in. I've got a blog post that mentions a few things that can help PostgreSQL move faster during test: http://geek.joshwaihi.com/content/performance-tuning-postgresql-drupal-7...

Crell’s picture

chx recently noted to me that SimpleTest now has 2 base classes we can inherit from for unit test, one of which does not do a full reinstall. He recommended moving the bulk of the database test classes over to that since they really don't need a full Drupal install; just a working database. I wanted to get a few outstanding unit-test-adding patches committed before we do that sort of refactoring but that should greatly reduce the amount of schema traffic necessary for the DB tests, which currently are some of the slowest. That should help all DBs in test runs, but Postgres more than the others.

drewish’s picture

Crell — this is sort of tangental to this thread but the one problem with the db-less base classes is that there's no db at all. So you can very easily test functions that don't have any db-calls but sadly there are very few such candidates. Lots of code has a call to watchdog() in the error handling and other functions that require database support.

Crell’s picture

chx implied that the DB layer itself still works; that is, we can create and populate just the tables that the DB tests use and not the 50 others that Drupal includes by default. If PostgreSQL is slow on the table creation operations, that's a lot of table creations we can skip.

It's tangential, but tangential does mean it touches. :-)

josh waihi’s picture

I did a quick test to see where the bottle neck was :

// Iterate through all the methods in this class. 
    foreach (get_class_methods(get_class($this)) as $method) {
      // If the current method starts with "test", run it - it's a test.
      if (strtolower(substr($method, 0, 4)) == 'test') {
        $start = time();
        $this->setUp();
        $setup = (time() - $start);
        try {
          $start = time();
          $this->$method();
          $runtime = (time() - $start);
          // Finish up.
        }
        catch (Exception $e) {
          $this->exceptionHandler($e);
        } 
        $start = time();
        $this->tearDown();
        watchdog('debug', 'Setup Time for ' . $method . ': ' . $setup . ' s');
        watchdog('debug', 'Runtime for ' . $method . ': ' . $runtime . ' s');
        watchdog('debug', 'TearDown Time for ' . $method . ': ' . (time() - $start) . ' s');
      } 
    }

The results showed that Setup seems to be the biggest bottleneck taking between 6-12 seconds to complete. While teardown takes around 2 seconds and runtime varies.

josh waihi’s picture

Component: postgresql database » simpletest.module

Moving this to the simpletest component as this isn't, as far as I can see, a DBTNG issue.

tpfeiffer’s picture

Setting fsync=off in the postgresql.conf *massively* speeds up performance, but since this is a cluster-wide setting, think twice before you enable this on your production server. Some people from #postgresql on irc.freenode.net consider "fsync=off" equivalent to "inappropriate for any scenario where you care even a little bit about your data".

Another consideration might be to use TEMPORARY tables for simpletest, but then AFAIK all statements need to be within the same transaction.

poker10’s picture

Issue summary: View changes
Status: Active » Closed (outdated)

I am closing this as in the latest release (7.90) the PostgreSQL tests run as fast as MySQL tests.

See the PostgreSQL test: https://drupal.org/comment/14538642 in
#3259739: [meta] Priorities for 2022-06-01 release of Drupal 7

If someone think that the "fixed" status will suit better, feel free to change it.