According to nnewton at OSUOSL, the best chance Drupal.org has to scale is by distributing read-only queries to a set of replicated database servers. This is not a new idea; Wikipedia successfully uses this technique to serve enormous numbers of users. Here's my Drupal-style take on MediaWiki's method. This patch does not break compatibility with existing code. Code can simply be updated to run read-only queries on the set of replicated servers.

New functions:
* db_query_read_only()
* db_query_range_read_only()

This patch adds support for a new configuration option in settings.php. Administrators of replicated servers would set $db_slave_url to the slave databases in the same way $db_url is set to the regular databases. $db_slave_url supports an additional twist: administrators can provide one additional level of arrays to list sets of slave servers. If multiple slave servers are available, db_set_active will choose among them randomly. If $db_url identifies database URLs by name, $db_slave_url is expected to do the same.

If transaction support gets added to Drupal, this patch will be updated to send all queries to the read/write server until the transaction completes. We may also consider the possibility of read-only transactions.

Comments

david strauss’s picture

Assigned: Unassigned » david strauss

I should add that this patch is completely untested.

kbahey’s picture

We can take that a step further.

We can assume that the only queries that are read only are the ones that start with SELECT, and direct those to a slave db automatically. The UPDATE, DELETE and INSERT ones do not go there.

david strauss’s picture

@kbahey I originally started writing the module to auto-detect, but I decided not to for a few reasons:
* Autodetection is complicated. MySQL queries allow comments even before the main action (like SELECT). We would have to parse out the various forms of comments and make sure to get the main action.
* Autodetection is expensive. Because we'd be using regular expressions and string comparisons to identify SELECT queries, we'd be devoting lots of valuable processing time to a task whose sole purpose is to speed things up. I'm not saying such trade-offs are always bad, just that there's a simple, fast alternative in this case.
* Autodetection can be inaccurate. Unless we basically write an SQL parser, we stand a good chance of making mistakes. Even if select isn't in a string or a comment, it still doesn't indicate that the query is read-only. MySQL allows INSERT INTO ... SELECT syntax. We'd have to find SELECT and verify that the result isn't being used for a write operation.

The manual identification has related benefits:
* We can slowly go through the code and update queries at a convenient page. If a query is misclassified as read/write, it just results in non-optimal performance.
* Compared to autodetection, manual identification is faster and much easier to write a database layer for.
* Manually identifying read-only queries is extremely easy to learn if you can already write SQL.

kbahey’s picture

Keep it simple. Ignore the case where we have comments. Worse case is they get redirected to the read/write node. Besides, I have not seen any such case in contrib, let alone core. We can document that as a restriction/caveat "don't use comments before the SELECT otherwise you will not benefit from replication ...etc. ".

We can have the regexp be "/^SELECT /" to eliminate the case of INSERT/SELECT.

Manual is fine, until you realize that you have to change every db_query in a zillion modules, including core, and that will become a heck of a lot of custom code to track and maintain.

david strauss’s picture

@kbahey Several of my contrib modules prefix queries with comments. It's the easiest way to track the source of queries when you're trying to improve performance. It's important that the comments prefix the query because many MySQL interfaces only show the first n characters of a query. Comments at the end tend to be hidden. "/^SELECT /" won't work for the regular expression because it checks every line. It's quite possible for an INSERT INTO ... SELECT query to be split with a newline right before the SELECT. If you want simple, conservative detection, we can just slice the first six characters off the query and compare (case-insensitive) against 'SELECT'.

I still don't think the manual system would be difficult to implement and maintain. It would only take a few hours to fix core. The only contrib module authors who would have to care are ones writing modules for high-load sites.

I'd probably go with a hybrid solution right now: very conservative autodetection combined with manual identification. The autodetection should capture most cases, but I'd eventually like to migrate to manual identification for performance and accuracy.

david strauss’s picture

Title: Provide support for replicant databases » Provide support for database replication

Patch updated to include very basic, conservative autodetection of SELECT queries. It should identify the vast majority of read-only queries without affecting performance noticably. Because of shortcut evaluation of the conditions, queries manually specified to be read-only aren't even subjected to the string check.

david strauss’s picture

StatusFileSize
new11.88 KB

This time, there's an attachment.

david strauss’s picture

StatusFileSize
new5.41 KB

Here's a patch with only automatic detection.

david strauss’s picture

Status: Needs review » Needs work

Testing and fixing the code.

david strauss’s picture

I'm not sure automatic detection alone is going to work for this. MySQL replication is asynchronous: it updates the slaves as quickly as possible, but slave aren't guaranteed to have the very latest changes. This system needs to accommodate replication lags up to a minute or so. It's basically the equivalent of running all INSERTs and UPDATEs with LOW_PRIORITY on a single server. You're not guaranteed to see the update immediately. The problem with automatic detection is that it sends even these queries that require immediately updated data to the replicated slaves. This causes problems with form validation if the replication happens more slowly than a person loading a form and submitting it.

Some possible solutions to this problem:
* Don't depend on reading information submitted to the database in the same request. Basically, behave like variable_set and variable_get.
* If data needs to be passed to the next page, put the data in the page and cryptographically sign it. You can verify the signature on the next page. This could replace the current Form API method of validating someone's authorization to submit a form and its values.
* Encapsulate a series of writes and dependent reads to the database in a "critical path" like a transaction. Queries in this critical path will always go to the master server. This doesn't solve inter-page dependencies, though.
* Manually identify which queries should go to the slave servers.
* Automatically identify which queries should go to the slave servers, but allow an override to be included in the query that will force the query to be sent to the master server.
* Provide a mechanism where if some check against a slave server fails, check the master to see if it has updated data to allow the check to succeed. This could cause a dangerous domino effect where a replication happens slowly, so more queries hit the master server, which further slows down the replication and causes more queries to hit the master.

david strauss’s picture

To give you a real-world example of the phenomenon that everyone here's experienced, the replication lag is what causes pages you've saved on Wikipedia to sometimes not show immediately after you save them. Sometimes, you can even save a new page and immediately get a screen that says no article exists.

One major architectural change that could resolve the intra-page issues is moving to a monad model for SQL UPDATEs, DELETEs, and INSERTs. Monads are a concept in functional programming; they function as queues for changes. Basically, nothing during the actual processing of the page would be allowed to write to the database. They would only be able to schedule the updates to run after the page is finished processing. That way, nothing during the page load would depend on changes to the database created during that page load. If the next page depends on values from the current page, those values must be passed through GET, POST, or another client-side mechanism.

Phases of page processing:
1. BEGIN transaction on slave server (to ensure consistent snapshot of data).
2. Read-only operations allowed on the database. Write operations must be queued.
3. COMMIT transaction.
4. BEGIN transaction on master server.
5. Queued write operations run.
6. COMMIT transaction.

This would make Drupal the most frighteningly scalable CMS in existence. :-)

david strauss’s picture

StatusFileSize
new5.85 KB

Assuming replication happens for the private key before any form submission is validated, this new patch works.

david strauss’s picture

I take that back. There are still issues with many parts of Drupal trying to read things they just wrote to the database.

david strauss’s picture

StatusFileSize
new7.75 KB

This update ensures all session activity happens on the master server.

drewish’s picture

what does the /* master */ comment do? anything?

david strauss’s picture

@drewish It's a hack to prevent the query from being detected as read-only so sessions perform properly despite asynchronous replication. Now, I'd prefer that everything be manual and explicitly stated about read/write and read-only queries, but everyone else seems to prefer the automatic detection method, which occasionally must be tricked into sending SELECT queries to the master DB.

drewish’s picture

that makes sense, i think you ought to add a comment to that effect so that someone doesn't "optimize" it out.

personally i'm a bit torn, i'd rather that read only queries be explicit but the longer function names are a bit of a turn off. i guess the select filtering is probably a better way to do it.

david strauss’s picture

@drewish The SELECT filtering is sufficiently dumb that a comment before the query trips it up. This is by design. C-style comments are acceptable for both MySQL and PostgreSQL, so they're a convenient (if stylistically questionable) way to send SELECT queries to the master server. I'd add the requested comment and re-roll the patch, but I really want transactions to be in core before this is, anyway.

I still officially prefer manual assignment of queries to the read-only slaves. I think we're in for some big surprises when many modules don't appreciate their SELECT queries reading from servers with delayed replication.

drewish’s picture

David Strauss, by comment it i meant put in a PHP comment explaining why that SQL had a comment in it. I've never seen a comment in a query in core, someone would try and fix it next time they updated the query.

Chris Johnson’s picture

Do we care that comments embedded in queries in core might make Drupal not work with a bunch of other databases?

moshe weitzman’s picture

subscribe.

if we are going to revisit tons of queries, i suggest we alias db_query to q() and qr() where qr is a read only query.

david strauss’s picture

I like moshe weitzman's suggestion. I'm going to re-roll the non-autodetection version of the patch with shorter function names.

crackerjackmack’s picture

I'm glad to see this work is being done, but I would like to throw a 'look out' statement into this thread.

http://drupal.org/node/49836

This patch outlines that after an insert is made (using auto increment columns or serial types) that most DBs support a curval or last_insert_id() function to retrieve the last unique ID inserted into a database. This poses a problem for replication in general as there has to be executed on 'default' or 'master' (depending on the naming schema).

The patch is attempting to remove the need for the sequences table all together as it only seems to be used for MySQL right now as PostgreSQL uses the curval instead of the sequence table at all.

Just FYI and this can get very messy. maybe db_last_insert_id() needs a db_set_active('default') call or something to help prevent db_last_insert_id() from being called on the slave nodes instead of the master.

or maybe

function db_last_insert_id() {
$prevs_db = null ;

if ($active_db != 'default')
{
   $prevs_db = db_set_active('default');
}

/* DB last ID code here */

if (isset($prevs_db))
{
  /* don't waste addtional overhead searching for a new slave
      just reuse the one we already determined for this session 
   */
  db_set_active($prevs_db);
}

The overhead is minimal as it is a simple string literal comparison in both db_last_insert_id() and db_set_active(). Should this kind of change be suggested here or over on http://drupal.org/node/49836 ?

crackerjackmack’s picture

I like moshe weitzman's suggestion. I'm going to re-roll the non-autodetection version of the patch with shorter function names.

As do I. Maybe if we do break out any hacks into CRUD style ( I know, looks like java's persistence manager)

* qi = query insert
* qm = query merge (updates and maybe inserts)
* qd = query delete
* qr = query read
* qmr = query master read (for db_last_insert_id() or other master-only functions)

Also, in the interim of renaming db_query to q() and db_query_read_only() to qr(), why not add a master-only query ? db_query_master() db_query_range_master() which are just wrappers for qr() that would add in the /* MASTER */ hack or set the appropriate DB accordingly before running. Something along those lines I feel would be more practical than having additional coders having to add in comments to their queries were a simple, cleaner wrapper can apply this for them and provides us a tad bit more flexibility when having to update this implementation instead of having to go back and update a lot of modules along the way.

david strauss’s picture

@crackerjackmack Based on your suggestions in #23, you should take a closer look at how this patch works. db_set_active() doesn't affect whether a query goes to a master or a slave.

crackerjackmack’s picture

@David Strauss #25 yeah, I see that now, I feel foolish about that db_set_active() comment.

However, my base point is still valid in the fact that another patch is working to do away with the sequence table and use SELECT LAST_INSERT_IDI() for mysql and SELECT curval for postgre which has potential to return NULL from a slave instead of the actual value from the master since LAST_INSERT_ID() is per connection.

The solution to the problem was in #24 because of this particular issue. Whether or not the patch gets implemented, it would be good to have the db API replication safe with a db_master_query() for situations such as these. Even when no replication exists in the configuration files, it is still performing as normal w/o having too large of an impact on existing modules (but they could utilize it where slave queries would perform ill-wanted results, i.e SELECT INTO ...).

david strauss’s picture

@crackerjackmack This patch will always offer the chance to send a query to the master server. db_last_insert_id() will simply use this feature.

owen barton’s picture

I wonder if there is a way we could use automatic detection, but have a more elegant way of forcing a master read than the comment hack.
For example, what about checking the db_query $args array for '#force_master' == TRUE, and if it exists remove it from the array and use the master DB instead?

If this approach won't work then I would be find using function names q(), qr() etc to deliniate this - but auto detection feels cleaner if we can manage it.

A big +1 for a monad type setup (this should be a new issue methinks!) - we could have a qdw() (delayed write) function to collect any query that does not need to be executed immediately, and shunt them all into a single hook_exit transaction.

david strauss’s picture

@Grugnog2 Welcome to my monadic tribe, where our ancient rituals are guaranteed to have no side-effects.

dmitrig01’s picture

nere is my take on how autodetection should work: strpos("SELECT", $query) == 0

dries’s picture

I think we should rename db_query() to db_update() and db_select(). That's what they use in the Java world, sort of. I don't think you need more than two such functions.

PreparedStatement statement = connection.prepareStatement("INSERT INTO foo (bar) VALUES (?)");
statement.setString(1, "hello world");
statement.executeUpdate();
kbahey’s picture

What happens if someone puts an INSERT in db_select() or a SELECT in a db_update()? We now have two things that say the same thing (the function name and the SQL DML passed to it)?

If we validate that db_select can only have SELECT, SHOW, ..etc. and that db_update() can only have INSERT, DELETE, UPDATE, then we are OK. But without that validation, we just open the door to a lot of errors.

moshe weitzman’s picture

we don't include parachutes for developers. use the right api and your module works. we are not going to validate every query in order to protect a few clueless devs.

owen barton’s picture

@moshe - I agree in general, but the problem here is that the clueless devs. module would still work *just fine* if they mixed these up. The problem would only show up when it was used on a site with replication, and could appear as those extremely tricky bugs that only reproduce occasionally (because it would be a race condition if the server synched before the next query). Hence, if we are not going to check this in code that means that anyone implementing a site with replication should check that each and every one of these calls is correct before deploying.

kbahey’s picture

@moshe, I hear ya on the performance penalty.

How about this then: we have a setting. Call it "trace replication compliance". If it is turned on only when someone wants to verify that modules comply (or not) with being replication safe. If this is on, then parsing is on, and logging of non-compliant queries to the watchdog is done.

It could even be in devel or anywhere else, but it will save a lot of grief in situations that grugnog2 mentioned.

moshe weitzman’s picture

sure, some compliance checker outside of core is fine. it can be a CLI script checked into /scripts, if desired.

sure, it would be a tricky bug to find. kinda like when when a database doesn't enclose self in braces or doesn't call db_rewrite_sql(). we accept those as bugs today, and noone cries foul. i think this is analogous.

neclimdul’s picture

I'm with moshe and kbahey on that. Debug database include or devel module to check compliance. You could probably even make a quick complaince script you could put in the scripts directory. The point is, if we're going to validate that input we might as well use the single function because the check is nullifying the gains of splitting it.

Btw, the script might be a good thing anyways to help devs with large scripts convert if this goes through.

hass’s picture

subscribe

ewlloyd’s picture

I'm all for splitting the read and write queries, but the autodetection, no matter how quick, simple, or well-intentioned, is over-engineering what should be a simple, elegant solution.

How about this: db_read() and db_update() become the new supported methods for database access (as others have suggested). Then, db_query() is removed. That way, the places in core that need attention are obvious the minute you try to start the system. And module developers will have no trouble at all finding where they need to fix their code! ;-)

I just searched for "db_query" in a recent 6.x snapshot, and came back with 871 instances (including the definition, I assume). With some planning, you could visit, evaluate and convert all of them fairly quickly.

RobRoy’s picture

I agree with separating into two functions: either db_select/db_update or db_read/db_write (don't think we should mix and match though). I actually prefer the latter pair as it is the clearest.

And I'm sure if we entice Doug Green he could write in some checks to coder.module to replace any db_query('SELECT ...') and db_query('INSERT/UPDATE...') along with identifying unknown db_query()s as well.

So once we have a consensus, is the last piece for this patch to go in and replace all those instances?

david strauss’s picture

The problem with db_update and db_read is that there are a number of read-only queries that need to run on the master server to get perfectly up-to-date data. Yes, we could run them through db_update, but that's a semantic error.

david strauss’s picture

Here's how this can be safely added to Drupal 6: we need to re-roll the latest version of the patch with autodetection removed and manual functions to signal the master/slave target. The earliest versions of the patch demonstrate this signaling but have bugs. The latest versions of the patch function but have oh-so-nasty autodetection.

Another issue with the db_update and db_read functions is that module developers might inappropriately use db_read when they need to use db_update. I think it's better to just supplement db_query with a slave-access version so that most developers don't need to care, but ones who do can optimize their modules to use slave DBs. It's better to have the degraded state be slow performance rather than failure.

Tresler’s picture

subscribe.

Also, it seems to me that we should think about naming the functions what they actually are. db_query_master() and db_query_slave() Call them something shorter if you like.

The point is:
* This eliminates any confusion as to which db you are targeting a query towards.
* All of core can be patched with a find/replace to db_query_master - then updated ASAP to use db_query_slave where efficient, without any concern that we are mis-naming a bunch of things (i.e. there is nothing wrong with querying the master - just preferred to query the slave where possible)
* New devs who may not know ANYTHING about what replication is don't need to worry cause the rule is to use db_query_master if you have any doubts.
* No semantic questions when a query needs up-to-the-second data such as the sessions table
* These names reflect the true intent of these functions - not what we anticipate the intent will be.

I'll jump in and test as soon as I can.

hass’s picture

Why do you like to rename all db_query() and others? I think it is much better to add additional params to the sql functions. This gives more control.

db_query($query, $server = all | session | search, $mode = read-write | read-only | write-only) (defaults to the first param)

This gives us a chance to add configuration to settings.php that can have an array of slave servers (if they are not load balanced with hardware), and additional if gives you big control what job the server are doing. For example the session part. It is a very good idea to move this to an extra server that only manages 'session' variables, but this server is *not* read only! The same will happen with CPU consuming 'search' server.

After this we create an array of servers in settings.php. This contains the "keys" above as an param or falls back to "all" and "read-write" mode, if not specified. Developers can think about performance, but not made to think and many will not understand or think about this complex high traffic scenarios.

I don't know how we can ever find a way to do auto detection for complex SQL tasks that are required to run one after another and may be sprinkled over hooks... this requires the developers thinking how it a module can work and if it is acceptable if something is out of sync or not.

hass’s picture

After reading all the above and a look to the last patch in #14... i don't like this way of adding /* Master */ to all querys. Additional this strtoupper(substr(ltrim($query), 0, 6)) == 'SELECT' looks not very speedy and error prone.

This cannot be the right way to scale db speed and getting hands around complex synchronous sql logics...

Tresler’s picture

Could be I misunderstand the replication scenario we are going for, but as I understand it, once you get into 'balancing' as opposed to 'distributing, you don't want to split things up to seperate db's manually as hass above suggests, rather only differentiate between read and read/write, or distribute the queries and leave the actual balancing to a piece of hardware or a box running hardware-mimicking software.

The idea is that in order to scale properly the load balancer needs to be able to do a 'quick' Q&A with the db servers to see who is busy and who is not and route accordingly. This means that a 'good' server system would be able to also route db_query_slave to the master if the slave is back-logged. That happens on the hardware level and isn't a drupal configurable event. Doing that Q&A through drupal is, I think, beyond the scope of this patch.

Don't get me wrong, I think having the ability to force a query to go to a specific box might be useful, but it is a tricky one and I think outside of the scope of what we can achieve for this in D6, and can be filed as a whole separate feature request. Whereas a basic delineation between queries that can go to a slave and queries that can go to a master, isn't and is HIGHLY useful to scaling drupal to multiple db servers.

Tresler’s picture

Also, realized that in my thinking from #43 above it ould make even more sense to name the functions db_query() for the one that must hit the master and db_query_slave for the one that *can* read from the slave or the master with a decision to be made at the balancer level.

More points:

* At the lack of a load balancer a query distribution situation can be achieved by simply forcing db_query_slave to use the slave box - with the lack of any load balancing.

* By just modifying db_query to be a must use master query - we don't need to touch core... this just adds the ability for people to go through and patch every module around, but not the necessity too.

hass’s picture

@Tressler: what i suggested looks not very difficult... it is no more work then any other solution. It is nearly impossible to build an autodetection if something need to be routed to a master or slave. This requires *deep* knowledge - how the app is working, when logic requires sync sql requests, and so on and so on. Thats much more out of scope and requires complete drupal review asside from redirecting some well known performance killers to a different server with less load... and finaly it does not change the current API fall back, what makes it much easier to patch step-by-step.

Let's look on things that are problematic for performance in general. Well known: fulltext search / search in gernal, db-based client variables (session variables) and some others. This are only two examples of well known heavy stress makers.

Nobody is made to use this server task splitting. If you add additional params and know - this can be out of sync and route requests to a read-only server (slave) let's do it. If my searches are too heavy - reroute them to read-only db (slave), but don't try to identify them automatically. Session table is very independent from other tables. So it's very easy to move them to a different machine. I'm sure you cannot scale enough with master / slave logic only where you have only *one* writing master. You need to reduce the writing requests, too.

This patch shouldn't be done half-hearted and only trying to reduce some problems a little bit. It should solve the problems in general and allow to scale with some minor config changes. It will took long enough to get D6 out. It should be enough time to review the performance killers (like search) and move them to read-only (slave) servers. Additional the Sessions requests (read-write) can be moved to a different server to reduce the senseless "write" operations on the "master". I'm sure one problem will be sessions... after all this small adjustments are done, it is time to review all core and logics for D7, what can be routed to "read-only" and what needs to run in sync on the one and only "read-write" server (master) for data. Session are not "important" user or project data - this will be regenerated on new requests if we talk about a loss of data.

I think it's much better to extend the current DB API by additional params that fall back to defaults if not specified and therefor will not break current code and allows to touch only performance killers for now and near future. This will help now and solve the problem in general. For e.g. if we find out, there is a performance issue with many logins, we can alter the user module only and reroute this to read-only DB (slave), but we are not made to do so today.

Tresler’s picture

@hass - What I'm saying is that these are two very separate issues. Both are ultimately useful. Eventually, I think the idea would be to have both the ability to split the servers in a read, read/write capability and target specific servers for specific queries.

But this thread is about splitting queries in a read, read/write fashion. When it comes down to it, adding support for splitting out specific parts of drupal to individual servers is less useful because it pre-supposes that we know what people are using drupal for.

If you are operating a heavy hit site like a blog with few logins - but heavy commenting, splitting out the sessions table doesn't do much. Similarly, not everyone wants or needs a full text search.

But adding support for dedicated read/write and multiple read only servers can be almost universally useful. As it doesn't pre-suppose that we know what a site is about - and you can have a master-master configuration with multiple slave read-only servers if the master gets over heated - there is a replication delay, but we are dealing with that already.

hass’s picture

But this thread is about splitting queries in a read, read/write fashion.

Yes, and this is what i'm talking about. The Server param should be optional - see below. Remember you cannot solve this auto detection problems. If you redirect a "read" operation to a slave server and this server is not up-to-date the next transaction based on this result will fail!! This is what i've said with auto detection is impossible. So while auto detection is impossible and splitting read and write in a error free way is impossible, the only way to solve this may be the way i described. We need to know what mode ("read-write" or "read-only") a module needs to scale heavily. Without this knowledge you cannot optimize anything. Additional we can use the server param to scale much more...

If you are operating a heavy hit site like a blog with few logins - but heavy commenting, splitting out the sessions table doesn't do much. Similarly, not everyone wants or needs a full text search.

Well, not everyone needs a search... but this is what Drupal needs, isn't it? With the supposed way every module maintainer is able to extend his module in the way that makes it as powerful/speedy as possible. If i need such a module and it is not optimized i can patch the module and gave this back to d.o. Drupal themself is able to upgrade all modules it uses or that produces performance issues. And keep in mind, if someone don't need a extra search server he will not specify it in settings.php and we fall back to the master server.

I'm talking about simply every module that can be optimized... for whatever "mode" it can work. Doesn't matter if mirrored servers are used or not.

But adding support for dedicated read/write and multiple read only servers can be almost universally useful. As it doesn't pre-suppose that we know what a site is about - and you can have a master-master configuration with multiple slave read-only servers if the master gets over heated - there is a replication delay, but we are dealing with that already.

This way we are able to split - if we need a split - and don't do a split if this is not appropriate for a site. We give it to the admins to allow them to optimize their drupal with a "connections strings" (server) and or leave it to them to use master (read-write) / slave (read-only) logic only.

Lets turn the params about my suggestion in the correct order to explain that the server param is not required for my way. I ordered them wrongly in my first posting. Hope this helps understanding my idea and the extendibility more.
db_query($query, $mode = read-write | read-only | write-only, $server = all | session | search)

Tresler’s picture

I've not mentioned auto detection at all. Splitting db_query into two queries is the solution to avoid auto detection. Its dependent on which querying function a dev uses as to which server it hits.

I see your code and as I understand it, I am talking about leaving it just at the $mode parameter. I suggest making it two functions as this will make search and replace hunting a little easier, but that might not be worth the overhead.

However, adding both the $mode and the $server param is too much for a single patch. It makes sense to keep this simple. These are two different methods of setting up replication - there is no reason to try to execute both in one patch. This confuses the issue.

Crell’s picture

I'm coming in a bit late on this issue (I've been otherwise occupied :-) ), but I want to chime in on the API front. I agree with Tressler that we should keep the changes small at this point. To that end, I would recommend keeping db_query() as-is, operating against a master server only. Then add a function db_query_readsafe() (or something) that hits a slave database (or pool of slave databases, potentially) if one is specified and falls back to the master server if not. Then we can go through core and swap db_query() to db_query_readsafe() where it is safe to do so.

  • This minimizes the API change. (We are post-freeze, after all, even if this issue gets a special exception).
  • The fallback for developers who do nothing is still always valid, if possibly sub-optimal.
  • We can't do this as a function parameter without making it required, because db_query*() already takes a variable number of arguments. db_query_range() gets away with it because it can assume the last 2 parameters are always the range; db_query() can't do that.
  • One of my main goals for Drupal 7, now that it's PHP5-allowed, is to try and move the database backend to PDO. That will drastically change the internals of the database system and therefore change what is easy or hard to do in this regard. So for now let's do something simple that gets us a boost in some areas and then revisit it again later as part of a more complete database overhaul. (See the first point.)
peterx’s picture

I suggest using the PDO style separation of reads from updates even if replication support does not make it into Drupal 6.
PDO separates update, insert, delete from select because the first three return a count of rows changes while the third returns a result set containing rows. One uses exec() and the other execute(). The same approach could be emulated in changes that separate reads from everything else; although the workload might be too much for Drupal 6. The change will set up Drupal for both replication and PDO.

Once you split reads into a separate function, there is another consideration because of the time delay with replication. Suppose I update something then read it immediately. I need a read_immediate which has to go to the master database. You need query(), query_read_only(), and query_read_master(). Using three functions makes more sense than directing some selects back to query because you will have to change again for PDO.

When do you need read_master? When I update content and the next page shows me the preview. I need to see the master, not an out of date replication. Ajax makes read_master more important because one script might update a node and retrieve the node in a script that has none of the normal page load overhead. I set up a site with one master, many slaves, a separate replication network, and a very light workload. The slaves could not keep up with back to back pages. An Ajax write-read would be impossible.

Adding a read_master function for those occasions when I specifically want to see the results of an update on the current or immediately previous page gives me the change to document the intent of my selects by choosing read_only or read_master. Perhaps the intent of the changes could be shown by calling the new functions db_select_recent() and db_select_current().

petermoulding.com/web_architect

Crell’s picture

Actually while I rather like PDO, I don't like the way it separates ->query() and ->execute() from ->exec(). SELECT statements can be prepared, but there is no prepared statement support for ->exec(), which sucks. :-( I'm actually strongly considering not using it at all for that reason, unless you know of a way to coax it into supporting prepared statements the same way as ->execute(). (Of course, I'd like to separate off insert/update/delete so that we can use a proper array syntax for them, which I find a lot easier, but now we're getting off topic. :-) )

For the rest, what you're suggesting is basically the same as I suggested. I just have db_query_read_master() named db_query() because there's no reason to make life more complicated than it has to be. :-) It seems to be agreed that we need to be able to read against the master server, so let's just make that the default and only hit the slave server(s) when we know it's safe to do so.

david strauss’s picture

Status: Needs work » Needs review
StatusFileSize
new12.81 KB

I've modified the latest version of my patch to enable manual slave query selection and backported the patch to Drupal 5.1. I haven't had time to test it yet.

david strauss’s picture

I should note that this patch only provides the infrastructure for slave server support. Code will have to be updated to use db_query_slave() and db_range_query_slave(). That process can occur gradually.

moshe weitzman’s picture

it would be very helpful if the $queries global variable contained a information about whether the query was dispatched to master or slave (and which slave, if possible). this wil be displayed in query log for sites that use master/slave and will help us port queries to slave model.

david strauss’s picture

StatusFileSize
new13.79 KB

Patch updated to record whether the query went to the master or a slave.

dries’s picture

David: I'd _really_ like to see us alter the API to have at least a db_select()/db_read() and db_update()/db_write(). It can still go into D6 as I granted an exception for this patch. What is your thought on that? Anyone willing to put some effort into that? :)

Crell’s picture

@Dries: As discussed above, just because something is a read query doesn't mean it's necessarily safe to be sent to a slave server. Read/write does not necessarily imply slave/master. (I'd still like to see a structured db_insert(), db_update(), etc. but that's a separate issue. :-) )

@David: I haven't tested #58 yet but it looks good visually from a quick code read through. I'll see if I can squeeze in some time tonight to go over it in more detail. My only issue is that I'm not sure db_query_slave() is the best name. It implies the query will go against the slave instead of "is safe to be run on the slave" (and the PC part of me isn't wild about the word 'slave' sprinkled throughout the code :-) ). I'm not sure that I have a better name, though. db_query_readsafe? db_query_lazy? Not sure here.

david strauss’s picture

@Crell The problem is that there are no abstract semantics for sending a query to a MySQL slave: it behaves how MySQL implements it, whatever that means for your DB server. Calling the slave query function "read_only," "read," or anything else that seems friendlier or more PC puts false semantics on this system, and the only thing worse than no semantics is false semantics. I'm not sure there's a concise way to convey that a function runs a query on a (slightly) delayed, read-only replica.

Crell’s picture

What about db_query_delaysafe()? (I.e., this query is safe to run against a delayed server.) That seems a more accurate description of the use-case. Or db_delaysafe(), if we want something shorter?

david strauss’s picture

"Delay safe" conveys half of the slave server limitations. The slave server is also read-only. We could do another clever thing with delay-safe for UPDATE and DELETE queries: we could send them to the master server with DB-specific priority changes. MySQL allows us to queue low-priority changes. This causes the query to return immediately without waiting for completion. If you know you won't be using the changed information until the next page request, it may be useful.

Crell’s picture

I agree INSERT DELAYED and friends may be useful, but then we're getting into read/write questions again. Since Drupal is read-heavy, I'd say for now just deal with the read case and we can revisit INSERT DELAYED in D7.

db_query_delaysafe()? db_query_slavesafe()? db_query_secondarysafe()? Something 'safe' I think is a good indicator of what it does. Dries, feel free to decide on a name by fiat and tell me to shut up. :-) The functional design at this point I think is fine.

david strauss’s picture

Good overview of MediaWiki's MySQL slave support: http://www.scmca.org/docs/database.txt

It includes many of the more monad-like features I suggested.

And the API used (which is admittedly not Drupalish): http://www.mediawiki.org/wiki/Manual:Database_access

I think "safe" is redundant. Why would you send a query to a slave server if it's not safe to do so?

dries’s picture

I spent 2-3 hours researching other systems/platforms. I posted the result as a blog post at http://buytaert.net/scaling-with-mysql-replication, along with some recommendation for Drupal 6. Of course, these are all open for discussion and feedback.

bjaspan’s picture

subscribe

gerhard killesreiter’s picture

Did anybody actually test the patch in #58?

gerhard killesreiter’s picture

StatusFileSize
new13.84 KB

I've update the patch to apply to D5-CVS. No excuse to not test it...

Crell’s picture

Assigned: david strauss » Crell
StatusFileSize
new14.42 KB

Apropos of this, here's an alternate implementation of the same general concept. In addition to the API additions, it also modifies node_page_default() just a bit to exercise the changes. That can be committed or not as the committers prefer.

See the link above for how to setup $db_url to test it. If you don't want to actually set up a replicated environment, setting all of the query strings to the same value should at least provide a good API debugging setup.

This patch is for Drupal 5.x-dev. I'll roll an alternate version for HEAD tomorrow when I'm awake.

david strauss’s picture

Assigned: Crell » david strauss

Crell's patch breaks existing settings.php files by requiring even a single DB url to be named. This will create a massive support burden.

The API change for the database functions makes code harder to read by requiring an extra array in the parameters and using TRUE/FALSE to specify slave safety. We should at least be using constants like DB_MASTER and DB_SLAVE to specify slave safety if this API is going to be used all over Drupal.

The API change is likely to introduce bugs in code because it uses a clever trick to detect the purpose of the final argument. A human isn't going to step through each argument and apply conditional criteria (like checking if the second argument is an array) to understand the effects of other arguments. This API also makes it impossible to scan the arguments from right to left and still understand them.

Can you tell, at a glance, how these behave? This isn't even using the more complex parameter structure of db_query_range().

db_query('SELECT * FROM {node} WHERE status = %d', TRUE);
db_query('SELECT * FROM {node} WHERE nid = %d AND status = %d AND changed > %d', array(123, TRUE, (time() - 123)), FALSE);
db_query('SELECT * FROM {node} WHERE status = %d', array(TRUE));
db_query('SELECT * FROM {node} WHERE status = %d AND sticky = %d', array(TRUE, TRUE));
db_query('SELECT * FROM {node}', array(), TRUE);
db_query('SELECT * FROM {node}', FALSE);
db_query('SELECT * FROM {node} WHERE status = %d', TRUE, FALSE);

The API is inconsistent because it allows the old-style arguments for master-only queries but requires complete reformulation of the arguments to use the slave server. By allowing developers to use two APIs -- one of which has long been in use and sees no threat of removal -- we will see limited adoption of the new API.

Finally, and least importantly, the use of array_rand() produces a less-even distribution than mt_rand() because it uses a slower, less random algorithm. See the documentation on PHP.org for array_rand() and mt_rand().

dries’s picture

  return _db_query($quer, FALSE, TRUE);

That should be '$query' instead of '$query'.

I spent the better part of this evening setting up a master-slave configuration to start testing this patch. More to come.

The API proposed in "repli_D5.patch" are probably OK for now (it gets the job done on drupal.org), but doesn't seem like a good candidate for inclusion in core.

Crell’s picture

Assigned: david strauss » Crell
StatusFileSize
new15.55 KB

Actually no, an unnamed $db_url still works. I just tested it and it works fine.

Also, please see the linked article. I am planning to remove the variable-parameter-count format in Drupal 7, for reasons stated. That's why it's better to use that format for slave-based queries now, as it reduces the API changes later. I don't really think array-based is any harder to read, especially as you can also write it "down", the way we write associative arrays all over Drupal, which is arguably easier to read than some of the very-long-line queries we have now.

That said, I agree that named constants would be cleaner. Attached patch switches to that, and renames the parameter to $replication_type. (Better suggestions welcome.) Actually, looking at it now using constants is more flexible because it opens up the potential to have more keys than just master and slave in the future. I'm not entirely sure why you'd want to, but it's an option for later. :-)

I didn't see anything on php.net to say that mt_rand() was faster, just "more random", but this switches to it anyway.

gerhard killesreiter’s picture

I think we should start to collect slave safe queries.

1) on s.d.o I've moved all pager queries to the slaves, this seems to work.
2) I've also moved a lot of queries in forum.module. The count query in forum_get_forums apparently doesn't work. No idea why.

david strauss’s picture

mt_rand() performance vs. the default random number generator (taken from php.net/mt_rand):

Many random number generators of older libcs have dubious or unknown characteristics and are slow. By default, PHP uses the libc random number generator with the rand() function. The mt_rand() function is a drop-in replacement for this. It uses a random number generator with known characteristics using the » Mersenne Twister, which will produce random numbers four times faster than what the average libc rand() provides.

Considering the history and naming of array_rand, I have every reason to believe it uses the rand() algorithm for picking values.

Crell: Regarding connection URL names, I assumed your patch required them to avoid ambiguity and inconsistency. I also must have also missed the code implementing an unnamed default connection URL.

To illustrate the connection URL ambiguity with examples:
$db_url = "http://user:password@example.com/database";

$db_url['default'] = "http://user:password@example.com/database";

$db_url['master'] = "http://user:password@example.com/database";
$db_url['slave'] = "http://user:password@example.com/database";

$db_url['default']['master'] = "http://user:password@example.com/database";
$db_url['default']['slave'] = "http://user:password@example.com/database";

If you allow unnamed databases, the third option cannot be allowed because it becomes unclear whether the databases are named master and slave or are simply master and slave pairs for the default database.

My criticisms of readability weren't related to arrays directly, but the possibility of using arrays or a variable number of arguments combined with an optional final parameter indicating slave-safety. I agree that arrays are better than the current syntax, but I think the transitional syntax you're suggesting is confusing.

Oh, and thank you for updating your patch to include the constants I suggested. It considerably improves readability.

Crell’s picture

@killes: I suspect most of statistics and tracker modules would be slave-safe, too, but haven't actually tried it.

@David: I'm really not sure what array_rand()'s pedigree is, so absent any benchmarks I'll take your word for it.

I see what you mean now regarding unnamed connections. Yes, form #3 results in a database error because there is no key "default". I don't think that's a problem, though. Right now no one is using $db_url for replication (at least not in a standard way), so there is no existing master/slave syntax to break. Given that replication is going to be a rare case I don't think it's that much of a burden to require that drupal.org and similar sites specify 'default' as lone key. The only BC break I could see here would be if someone had an existing multi-database config and one of their databases had a key named "master" or "slave", and I'm not even certain that would cause a problem.

As for the array syntax, just to clarify since I'm not sure if it's what you mean, the syntax options are:

1) variable parameter count
OR
2) array for field values with optional server specifier.

With the intent being that #1 goes away in Drupal 7. What part of the transitional state is confusing?

dries’s picture

A new day, some more testing and reviewing:

1. We don't use the 'and' syntax; we write '&&'.

2. $replication_type is an awkward name -- why not call it $db_target? If we use it for anything else than a $db_target, we can still rename the variable. Let's not rename it prematurely.

3. I have to agree with David that the way you fiddle with function parameters is rather ugly. I don't see a reason why we'd want an ugly API in Drupal 6. The fact that we're going to clean up the API in Drupal 7 doesn't seem like a good reason. We'll have to significantly change the API in Drupal 7 anyway. I'd prefer to write db_query_slave() or something; that's a lot easier to parse with the eyes.

4. It's not clear to me why we need the 'default' array. Can't we just do:

  $db_url['master'][] = 'master 1 settings';
  $db_url['master'][] = 'master 2 settings';
  $db_url['slave'][] = 'slave 1 settings';
  $db_url['slave'][] = 'slave 2 settings';
  $db_url['slave'][] = 'slave 3 settings';
  $db_url['slave'][] = 'slave 4 settings';

5. I've tested this patch on my local master-slave configuration and it seems to work.

6. I wonder if there are any best practices for testing? It's somewhat difficult to see whether it all works and how the queries are distributed -- it required me to add debug snippets to the replication related code. It would be nice if non-developers could somehow confirm that it works without having to make changes to the code. I know, this isn't really important right now but I figured I'd raise the issue so we can let it develop in the back of our head.

dries’s picture

Gerhard: note that in your patch in #69, there still was a typo:

+  return _db_query($quer, FALSE, TRUE);

should be:

+  return _db_query($query, FALSE, TRUE);

(Note the missing 'y'.)

david strauss’s picture

I wonder if there are any best practices for testing?

You can enable full query logging on the MySQL server, or you can view the statistics tables that MySQL provides. phpMyAdmin has a very nice tool for examining server statistics.

It's not clear to me why we need the 'default' array.

At least with the syntax provided in Crell's patch, directly hopping to ['master'] and ['slave'] is ambiguous because you could, theoretically, use those as DB connection names. Crell gets around it by requiring a minimum array depth to use master and slave. This solves the ambiguity problem for existing installations, but it requires ['default'] to be specified for installations using master and slave server. I get around the same problem by having a separate $db_slave_url value/array that is assumed to be the same depth as $db_url. (If $db_url specifies connection names, $db_slave_url is also expected to specify connection names.)

There doesn't seem to be a nice solution that doesn't break everyone's settings.php configurations.

Gerhard: note that in your patch in #69, there still was a typo

I assume he's at least fixed it on the scratch installation. That typo would make every slave query fail. Of course, it's important to reflect those updates here, lest we end up testing something quite different from the posted patch.

moshe weitzman’s picture

@Dries - devel.module will soon show whether the slave or master handled a given query. this info is in $GLOBALS['queries']

Crell’s picture

1) Wow. I think that's the first time I've ever actually used "and" instead of &&. I blame lack of sleep. Will be fixed in next version.

2) I like $db_target. That wouldn't even need to be renamed if we add more targets besides master and slave. Will be changed in next version.

3) The ugliness is not exposed at the API level. Unless you mean that

db_query("SELECT ...", array(1, 2, 3), DB_SLAVE);

is ugly. That's the intended API for D7. If it's too ugly, then say so now so I can revise that before I start writing anything D7-targeted. :-) Which function signature is ugly and in what way? (Same question for David; which part, specifically, is confusing?) Or do you mean the internal argument parsing? If so, is that something that could be handled with better in-line docs? (And I go back to my previous statement that variable-parameter functions suck. :-) )

In terms of quick visual scan, I don't think db_query_slave() or db_query(..., DB_SLAVE) are any easier or harder than the other to notice at a glance.

db_query_slave(), db_query_range_slave(), and pager_query_slave() would do the job, I suppose, but I'm wary about introducing an API function that we know is only going to last one version, even with Drupal's policy on BC compatibility. That would also be another set of functions to temporarily emulate when implementing the new system. (See comment in the aforementioned blog post.)

4) What David said. However, neither $db_url syntax is going to break existing settings.php files unless someone has defined a global by the name of $db_url_slave in their own code or someone is already using a second-level array in $db_url. The odds of either one are so remote that I'm comfortable calling either variant fully backward compatible.

moshe weitzman’s picture

i don't know if i should be testing crell's patch or strauss patch.

moshe weitzman’s picture

i don't know if i should be testing crell's patch or strauss patch.

david strauss’s picture

@moshe weitzman
Right now, s.d.o is running my patch. So, mine is probably closer to running on Drupal.org and in need of more urgent testing. Crell's model is more likely to find its way in Drupal 6 or 7.

BioALIEN’s picture

Developers may find it confusing when we introduce a new API for D6 and then have it considerably changed for D7?

gerhard killesreiter’s picture

StatusFileSize
new19.6 KB

Attached is the patch that is currently running on drupal.org. It contains some unrelated stuff.

hswong3i’s picture

subscribe :)

sun’s picture

Although it might last for one major version only, I'd vote for db_query_slave(), too. It's easier to recall. Personally, I'm always in favor of clear function names and less function arguments.

re: 4) I'd vote for a separate $db_url_slave array. Placing all master/slave information into one array looks quite cumbersome and might be not easy to explain for novice users in the documentation. Furthermore, wouldn't that giant $db_url array break support for additional (non-Drupal) database connections? Currently, I'm able to define and switch to alternative databases by declaring

$db_url = array(
  'default' => 'mysql://username:password@localhost/databasename',
  'myapp' => 'mysql://user:pass@localhost/mydb',
);

Thus, how would I define master and slave databases in $db_url? This way?

$db_url = array(
  'default' => array(
    'master' => 'mysql://username:password@localhost/databasename',
    'slave' => 'mysql://username:password@otherhost/databasename',
  ),
  'myapp' => 'mysql://user:pass@localhost/mydb',
);

...looks not impressive.

hswong3i’s picture

just for brain storming: as we now have drupal_write_date() within latest D6 CVS HEAD (http://drupal.org/node/169982), which will build query for INSERT and UPDATE dynamically, what if we move down the building of query into our database abstraction layer (e.g. db_query_insert(), db_query_update() and db_query_delete(), similar as http://drupal.org/node/53488 but more simplify), so we can handle the switching of master/slave server within these wrapper functions? may be we can have something similar as:

drupal_write_date(): 
  - serve both insert and udpate action
  - collect schema information
  |
  |-> db_query_insert() or db_query_udpate():
      - build INSERT/UPDATE query
      - MySQL: additional database replication handling
      - Oracle: additional LOBs handling
      |
      |-> db_query()

surly that we may need some pre-requirements:
1. we need to use drupal_write_date() (or at least, db_query_insert() and db_query_update()) among all core modules
2. we need to use db_query_delete() among all core modules

on the other hand, this 3 additional APIs may be too late for D6 development cycle: beta2 should coming soon. BTW, it shouldn't be too far for an official MySQL database replication supporting: INSERT/UPDATE are now wrapped by drupal_write_date(), it is just one step more :)

david strauss’s picture

@hswong3i No. Please read the earlier discussion on mapping queries to master and slave servers.

hswong3i’s picture

@David: thanks for your reply. sorry that may be a bit off-topic, but really hope to learn more about this issue. after a simple studying about previous comments (e.g. #10, #11 and #86), i hope to ask some questions:

  1. so the key difficulty of MySQL replication should be the time lag? which also means the core difficulty is not about how to change existing data in master (BTW, i guess we still need to consider about it), but when and how to fetch correct and updated information from slave? e.g. during form submit.
  2. do we need to consider about some core revamp (at least partly) if we hope to due with MySQL master/slave replication correctly? e.g. need some changes in FAPI.
  3. will the above changes makes the other database driver development become more difficult? as we need to take a lot of special handling based on MySQL limitation in replication?
  4. should we consider about MySQL5.1 Clustering (http://www.mysql.com/products/database/cluster/faq.html, http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-overview.html), rather than existing master/slave modeling? as it works as a "resource pool" but not separated as different roles, application seems need not to have special changes/handling/etc and just work as like as normal single beckend server mode. surly it will require for more hardware resource when compare with database replication, but need not to change our existing core implementation? i guess MySQL 5.1 should be rocks in coming year, its clustering feature should be a good alternative choose :)

Regards,
Edsion

Crell’s picture

Version: 6.x-dev » 7.x-dev

@hswong3i: The PDO-based database API I'm building for Drupal 7 uses the methodology from my post in #70 above to differentiate master/slave. That is, it's up to the developer to say which queries are slave-safe and the database layer routes the query appropriately based on whether or not there is a slave server defined.

I don't know if foregoing master/slave configuration in favor of clustering is a good idea, as I have no experience with clustering. I invite nnewton from OSL to tell me what would be best for Drupal.org and sites of its scale. :-)

I think it's also a foregone conclusion at this point that Drupal 6 core is not going to have any more database API changes before release, so bumping this to Drupal 7.

hswong3i’s picture

@Crell: yes, this issue should belongs to D7. BTW, what i am really concern is not about how to implement it, but shouldn't we implement it within core (or at least, outside individual database driver implementation). scalability should be the concern of database vendor, but not our duty. e.g. MySQL is now working for a clustering solution that similar as OracleRAC (http://www.mysql.com/products/database/cluster/faq.html). this is the field that database vendors frighting for, and they will always propose a suitable solution if that is requested from the market :)

should we implement MySQL-specific database replication handling within core? i guess we shouldn't (someway hard coded). if the lag of database replication is a MySQL-specific problem, but able to place by MySQL's clustering solution (or some other database vendor's solution), i can't see the point to implement it within drupal core. that may be not a good idea if we need to encourage contribute developers understand about the needs of choosing master nor salve server :(

nnewton’s picture

My apologies that it took me this long to get to this. The topic of clustering vs replication is quite interesting, clustering has real advantages and huge weaknesses. One of its main problems is youth, as of 5.0 it is entirely an in-memory DB. It has no filebackend while the cluster is up. The upshot of this is that your entire DB system has to fit into memory. Not really practical for most people. This will be fixed in 5.1, but that is a rather large change to the engine and isn't even marked stable yet. I'd wait on putting that into production for quite awhile.

Speaking of waiting on production, if you check out the MySQL changelogs for every version release of 5.0, you will see that many are dominated by NDB (the cluster engine) bug fixes. So clearly its being tested, but some of those bugs are rather terrifying. There is a very real stability and maturity argument againstNDB (MySQL cluster).

There are other problems. The design of Cluster is such that the data nodes (which actually store the DB) and the SQL nodes (which execute the queries) are separate. They _can_ be o
My apologies that it took me this long to get to this. The topic of clustering vs replication is quite interesting, clustering has real advantages and huge weaknesses. One of its main problems is youth, as of 5.0 it is entirely an in-memory DB. It has no filebackend while the cluster is up. The upshot of this is that your entire DB system has to fit into memory. Not really practical for most people. This will be fixed in 5.1, but that is a rather large change to the engine and isn't even marked stable yet. I'd wait on putting that into production for quite awhile.

Speaking of waiting on production, if you check out the MySQL changelogs for every version release of 5.0, you will see that many are dominated by NDB (the cluster engine) bug fixes. So clearly its being tested, but some of those bugs are rather terrifying. There is a very real stability and maturity argument againstNDB (MySQL cluster).

There are other problems. The design of Cluster is such that the data nodes (which actually store the DB) and the SQL nodes (which execute the queries) are separate. They _can_ be on the same machine, but usually arn't....as thats kind of the point of cluster. The data nodes are configured to insure that every piece of data exists on at least two NDB instances, thus giving it redundancy. However, this leads to the obvious issue of speed between the nodes. Consider that most DB systems are IO bound and your IO is now going over the network. The interconnects need to be at least gigabit and I have heard of fiber between the nodes. Now, this isn't _so_ bad as not all of the SQL statements is executed at the SQL nodes. Some where conditions are "pushed down" to the data nodes. However, this brings me to another point...joins are not pushed down. I'll leave it as an exercise to the reader to consider a full table scan over the network :).

So, there are real issues here. That is not to say its not a cool technology or that it will some day be a great solution. I'd just argue that its not a solution now.n the same machine, but usually arn't....as thats kind of the point of cluster. The data nodes are configured to insure that every piece of data exists on at least two NDB instances, thus giving it redundancy. However, this leads to the obvious issue of speed between the nodes. Consider that most DB systems are IO bound and your IO is now going over the network. The interconnects need to be at least gigabit and I have heard of fiber between the nodes. Now, this isn't _so_ bad as not all of the SQL statements is executed at the SQL nodes. Some where conditions are "pushed down" to the data nodes. However, this brings me to another point...joins are not pushed down. I'll leave it as an exercise to the reader to consider a full table scan over the network :).

So, there are real issues here. That is not to say its not a cool technology or that it will some day be a great solution. I'd just argue that its not a solution now.

hswong3i’s picture

@nnewton: Thank you for your detail explanation, which give out a very clear picture about the real cases.

From my point of view, MySQL's database replication seems more suitable for "hot swap, hot spare" backup solution, rather than solving the problem of scalability. It should be a good idea if we hope to have a "try the best" real time backup, which don't need to shutdown -> backup -> restart service.

Surly that MySQL clustering solution shouldn't be the best solution of today (as it is too young and buggy), but it should be a better choose when compare with replication, from a point of view of application development. By using clustered backend database servers, we need not to change any of existing programming logic, focus on our main duty, and just simply left the scalability issue to DBA :)

I am not trying to seems MySQL's replication limitation as some sort of "bug" that need to be overcome, but a "fate" of this design and implementation. BTW, it should be another story, when we are discussing about implement a MySQL-specific replication handling within our core. We need this for the case of today's drupal.org, but that's not means we should standardize it as core feature, which will introduce complexity and increase the difficulty for other database driver development :(

Moreover, this is also the main reason that I am trying to introduce Oracle/DB2/MSSQL backend into Drupal's core: if database B is more suitable than database A based on cost/difficulty/maintenance/scalability/security/etc concern, just simply chose database B but not trying to accomplish the missing feature within database A with a lot of extra effort.

I would like to suggest left MySQL-specific database replication solution as an "always active" issue, but not trying to merge it into core, if we need some sort of "hard code branching" nor "automatically detect and branching" solution, which implemented outside database specific driver. I would like to vote for "wait for the improvement of MySQL clustering solution" or "introduce other databases backend for Drupal, but not eat MySQL (nor died)" :)

david strauss’s picture

"From my point of view, MySQL's database replication seems more suitable for 'hot swap, hot spare' backup solution, rather than solving the problem of scalability."

Wikipedia uses it for scalability.

hswong3i’s picture

Anyway, just back to the main concern of this issue. I did some study to #86 implementation, and hope to propose a minor suggestion: should we also seems cache_get() as slave-safe function? As this is widely used for anonymous user, branching its query requests to slave server should be a good idea for load sharing?

hswong3i’s picture

@David: As you mentioned in #11, this is not a good solution and good implementation. It comes with a lot of drawback (BTW, Wikipedia simply left all of this behind and forget about it...).

It is for sure that MySQL's replication is much reliable than its clustering solution, and so we would always like to use its replication rather that clustering. BTW, is this the best solution for solving scalability problem? Seems not to be, when compare with Oracle' RAC solution (please don't question about this: MySQL's clustering FAQ also use Oracle's RAC for comparison: http://www.mysql.com/products/database/cluster/faq.html). It is easy to face a "deadlock", if we always solve different problem with single solution ;p

Crell’s picture

There's nothing necessarily wrong with master/slave replication. And given that 99% of the Drupal world right now uses MySQL, supporting multi-server setups under MySQL is rather important. Telling them to just move to Oracle is not an option, either. For one thing, Oracle support in core is not guaranteed at this point. For another, many organizations have no interest in moving from MySQL, which is free, to Oracle, which is very much not.

Asking organizations to move from MySQL to Oracle in order to use Drupal is going to get even more resistance than telling Oracle organizations to install a MySQL box, and with good reason. Telling someone they have to buy an expensive, proprietary application in order to use a Free Software application effectively just does not compute.

hswong3i’s picture

@Crell: I am not trying to move the discussion into some way "MySQL Replication vs MySQL Clustering vs Oracle RAC", but hope to point out a relatively simple idea: if there is some better choose, try to think about others; but not trying to using a toolkit that is not suitable for serving a critical problem. If replication is more suitable for "hot swap, hot spare" backup but not solving scalability problem, don't use it in a wrong way nor force ourself to overcome the difficulty. We may try our best to solve the problem as like as Wikipedia (which INSERT/UPDATE/DELETE API should able to help for), but shouldn't trying to do more than that. This is not application developers duty but database vendors :)

Oracle RAC is not the only (BTW, it should also the best) example of other clustering solution. Wikipedia also list out some Oracle RAC competitors (http://en.wikipedia.org/wiki/Oracle_RAC#Competition), where some solutions are using PgSQL as backend database server. There is also some hybrid commercial clustering solution that use MySQL as backend without dealing with replication lag (I found that though google some days ago...). JAVA + PHP should even be one of our choose, too (http://buytaert.net/scaling-with-mysql-replication#comment-2446). The main point is: that will seems to be no choose IF MySQL is our only choose, but we always have numbers of solution for solving the same problem, if thinking with an open mind. So should we implement some sort of hard code solution within core? Seems we shouldn't, but let it become an open-end question :)

On the other hand, EnterpriseDB also provide replication feature (http://www.enterprisedb.com/products/enterprisedb_replication.do). BTW, even they try to promote it as "replicates data across the enterprise in near real time to meet a wide array of business challenges.", you can also found that they are trying to promote it as a suitable tool for "hot swap hot spare" backup solution, but not target for scalability:

# Single-master, multi-slave replication, which enables the deployment of multiple warm standby databases, either locally or remotely
# Snapshot replication or scheduled incremental replication, which provides increased flexibility

I am not opposing the needs of this issue: at least it is proved as require in case of drupal.org (and so I am also trying to improve it if possible: http://drupal.org/node/147160#comment-316036). Before the day comes, before MySQL's clustering solution become our best choose in OOS, this issue is still very meaningful. BUT I am questioning if we should highly merged together with MySQL, and implement some MySQL-specific feature within our core as an official supported feature, where there is still a lot of other solutions.

I am not trying to discuss this issue with "existing market sharing" nor "commercial concern", but logically and technically :)

Crell’s picture

How does implementing database targeting for master/slave queries in a non-DB-specific way preclude the use of clustering if it happens to work on the database du jour? It doesn't. There's nothing fundamentally MySQL-specific about it (nor is there anything fundamentally fail-over-only about master/slave replication, as drupal.org is now showing). You should be able to just toss a cluster behind one of the connections without the code knowing the difference.

hswong3i’s picture

So let's give a simple assumption: IF I am a normal PHP programmer who working for a huge project that keep on scaling up. My single MySQL database backend is now not able to handle the client requests, and so my boss order me to propose a suitable solution. If choosing MySQL replication solution, I will need to:

  1. Understand what is the different between single server and master/slave server
  2. Understand what is slave-safe: how do this happening, and what should I care about this
  3. Figure out what are/aren't slave-safe, and change my programming logic for this branching
  4. Compromise with my DBA (as I am a normal PHP programmer but not DBA...), request for a master/slave testing servers, and preform UAT
  5. etc...

Do I need to consider ALL OF THESE ISSUES if we are able to use clustering solution? NO, just pass these nightmare to my DBA: they will research for a suitable solution, setup these backend for me, and even perform DB migration (if we are trying to toss away MySQL...). Even this is not MySQL-specific issue (e.g. If we choose PgSQL/EnterpriseDB replication solution), but we can't escape from these special handling in programming level, if we are choosing database replication as our ONE AND ONLY ONE solution :(

This main point is: replication should be more suitable for "deployment of multiple warm standby databases, either locally or remotely" (http://www.enterprisedb.com/products/enterprisedb_replication.do), but not about solving scalability issue (take Wikipidia as poor example: http://drupal.org/node/147160#comment-250439). I guess even JAVA + PHP (http://buytaert.net/scaling-with-mysql-replication#comment-2446) may be more simple and suitable for this issue.

We are application developers, but not DBA (for clustering solution) nor system administrator (for JAVA + PHP solution), right? It is no point for us to take away their duty ;p

hass’s picture

@hswong3i: however i think you are right about your normal PHP developers and their experience, but master slave is often the only available possible solution. As example - MSSQL there is no active/active/n-active configuration available. The only way is master/slave replication. Other DB's might have such limit's, too. Additional you might know - cluster versions are often Enterprise products with additional fees - replication not. This may be out of scope for small or medium companies. MSSQL2K5 Enterprise CPU license is about $25K and Oracle is not cheaper... but replication requires only Standard edition for a fifth of the costs or less. We should care about this and do not say - DBA - it's your job to solve this problem. Acting this way is no solution for problems (like overload) a DBA cannot workaround. Bad programming style makes every SQL overloaded. Throwing hardware on every performance problem is nearly every time the wrong idea and only stupid companies do this.

I can only talk about my experience here and i assure you - if the sever is new and fast, dev do not care about anything... you can replace the hardware every half year with such "normal" programmers. Only solution is - find the source with analysis, profiling and so on - and you will find bugs and they can be solved and you can replace the machines every 3-5 years *G*. Every year we have this replacement discussion if the seasonal business quadruples to prior traffic and the machine gets overloaded... what have ended every time with better SQL code and no new hardware.

I'd like to say - thinking is not forbidden... normal developers must care about system limits as well - however hard/difficult this is to learn for them.

david strauss’s picture

@hswong3i Please stop pushing your ideals of application/database abstraction on us. There are tons of things applications do to improve interaction with database servers that technically fall under the DB's burden, like every query rewrite to accommodate quirks of the particular RDBMS's optimizer.

Multi-master has intense scalability issues of its own. Unlike with the master/slave model we're working on, a multi-master system has no idea whether reading the very latest update is critical. The RDBMS simply has the additional burden of locking everything that could read the update across the cluster until the update replicates. In our case, many of the things that could be slightly stale but still function (search results, forum listings, etc.) are treated as though they must be fresh. And don't forget all the issues with multi-server transactions!

Crell’s picture

hswong3i:

... if we are choosing database replication as our ONE AND ONLY ONE solution :(

We're not. It's the only one that requires application level code changes, so we're making those changes possible. As you have repeatedly pointed out, the others do not require application level code changes so are completely and totally irrelevant to this discussion.

And yes, if you're going to be running an SQL-based app that is big enough to require multiple servers, you'd damn well better know something about SQL and how SQL works and what the internal delays are that affect replication. If not, you'll be writing a crap system anyway, no matter what your scalability solution is.

Please either stay on topic or at least don't distract the issue from the topic. Thanks.

Tresler’s picture

Speaking of staying on topic.... what is it currently?

I'm assuming it is review of this in a master/slave configuration? http://drupal.org/node/147160#comment-284037

Are there any specific use cases you would like to see? I think that I can maybe procure three servers with which to do some testing, but if I'm provisioning them from scratch would like to know what is a best use case.

Crell’s picture

I think for the moment this issue is stalled, since it's not going into D6 and D7 isn't getting heavy development yet. I am working on integrating replication support into the Drupal 7 database API, so when HEAD opens up again it will be great to have something to test on besides Drupal.org. :-)

If you can offer a trio of servers to test with when the time comes, that would be awesome. Probably the best configuration for that would be one web server, one master DB server, and one slave DB server. A second slave server running on the web server (its location shouldn't matter for functionality and it's not a live site) could be useful for testing multi-slave configurations for functionality, although not performance.

hass’s picture

I wonder why this isn't integrated into D6. Isn't d.o currently running an unofficial and custom hacked D5 version? If this master/slave logic don't get integrated - a d.o with D6 must be manually patched again and no one other then d.o can benefit from this. Additional all modules d.o is running needs to be patched only for d.o. This sounds very strange to me and not easy to maintain...

Crell’s picture

I agree, hass, it's not optimal. Unfortunately for whatever reason this issue didn't get enough attention to get into Drupal 6 before it closed. Beyond that, it's up to Dries and killes to decide how to handle it. They're welcome to overrule anything I've said above. :-)

peterx’s picture

MySQL master/slave versus Oracle RAC (based on personal experience):

MMS works, is easy to implement, and is easy to expand. You can expand the slaves around the world which means you can have mirrors everywhere.

OR is expensive, in people time, people skills, hardware, and software. OR does not work for geographical expansion so forget about using OR for mirrors around the world.

Community organisations cannot afford OR and the big dollar companies can implement OR without changes to Drupal, apart from completely rewriting all the SQL to fit the parts of Oracle that are weird. I support a change that makes Drupal suitable for community organisations that happen to become popular.

MMS lets you put a slave database in every school in the world so they can access the same courses. The Moodle package for Drupal makes the delivery of one course worldwide an easy project. When backed by Drupal with MMS, every school or school district can select the right hardware to serve their students.

There is something in Drupal 6 that might help make MMS easier. I ran up one Web site in Drupal 6 beta 1 to see how it fits and discovered the brilliant code division feature, similar to the one I promoted for Drupal 5 but with finer control, that lets you load only the code you are going to use for a given page. Something similar could apply to requests for data from the master, where regular code is served from a slave and special code, including admin code, is served from the master.

Take the example of a module that updates a node type in administration but only reads the node for a regular page and updates an access log. The access log does not have to be up to date for a regular page read but has to be accurate in the admin page. The MMS code could default to supplying the regular code from a slave. When the user enters the admin update code, something could be set to say "use the master for this page". Everything on that page then works from the master because it is an administration page.

example.module defaults to using slaves. example.admin.inc is switched to the master and all the db access within that file is served from the master.

You could have a function, drupal_use_master_for() that lets you specify a list of tables and from that point on in that page, all SQL requests that mention one of the specified tables will use the master. If I use drupal_use_master_for('example_book', 'example_page') then any occurence of {example_book} or {example_page} will trigger a switch to the master for that database request.

The selection of the master would almost be the same as starting and ending a transaction. The database requests in the middle would not have to change.

petermoulding.com/web_architect

birdmanx35’s picture

Status: Needs review » Needs work

I just tested the latest version of this patch and it failed against core:

$ patch -p0 < D5_replication_0.patch
patching file includes/database.inc
Hunk #1 succeeded at 67 with fuzz 1 (offset 23 lines).
Hunk #2 FAILED at 129.
Hunk #3 FAILED at 166.
Hunk #4 FAILED at 230.
3 out of 4 hunks FAILED -- saving rejects to file includes/database.inc.rej
patching file includes/database.mysql.inc
Hunk #1 FAILED at 142.
Hunk #2 succeeded at 112 (offset -46 lines).
Hunk #3 succeeded at 218 (offset -78 lines).
Hunk #4 succeeded at 231 (offset -78 lines).
1 out of 4 hunks FAILED -- saving rejects to file includes/database.mysql.inc.rej
patching file includes/database.mysqli.inc
Hunk #1 FAILED at 121.
Hunk #2 succeeded at 112 (offset -25 lines).
Hunk #3 succeeded at 229 (offset -56 lines).
1 out of 3 hunks FAILED -- saving rejects to file includes/database.mysqli.inc.rej
patching file includes/database.pgsql.inc
Hunk #1 succeeded at 127 (offset 13 lines).
Hunk #2 succeeded at 143 (offset 13 lines).
Hunk #3 succeeded at 260 (offset -6 lines).
Hunk #4 succeeded at 273 (offset -6 lines).
patching file includes/pager.inc
patching file modules/node/node.module
Hunk #1 FAILED at 2418.
Hunk #2 FAILED at 2432.
2 out of 2 hunks FAILED -- saving rejects to file modules/node/node.module.rej

Crell’s picture

Status: Needs work » Closed (duplicate)

Let's continue this in the new database thread: http://drupal.org/node/225450

Swampcritter’s picture

Category: task » support
Status: Closed (duplicate) » Active

I got this patch to work on the D5.11 platform after a little bit of trial and error.

I do have a question about the $replication_type. Do I need to modify it to show DB_SLAVE if I want the 'select' queries be passed over to the slave server or do I need to leave it as DB_MASTER -- and will it detect that a slave server is available via the settings.php and attempt the query via the slave that way?

-- Michael

Crell’s picture

Status: Active » Closed (duplicate)

Please do not re-open long-closed issues. Replication in D5 and D6 is an unsupported hack.

david strauss’s picture

@Swampcritter You're not even using the standard version of the patch.