Currently system module provides a schema definition for the default database backend, and this then gets re-used by both schema module, core modules and contrib modules to create cache bins. These may or may not correspond 1-1 with hook_flush_caches() (there's nothing in core to enforce this or even recommend it), and also it means you end up with (hopefully empty) cache_* tables in your Drupal install even if you don't use them.

Instead of this I'd like to propose the following:

Add two methods to the cache interface - createBin($bin) and dropBin($bin)

Alongside these we may need updateBin($bin) and binExists($bin)

Core's cache backend would then keep the schema for cache bins, implement the two methods with db_create_table() and db_drop_table() - update would be drop + create, binExists would use db_table_exists().

Memcache (for example) would be a no-op for all of these - possibly treat bin updates as a full cache flush though.

Then we add a new hook - hook_cache_bins(). Modules implement that hook, probably in their .install file, and core calls the relevant method for whichever is the default cache backend when modules are installed. This hook would also allow us to support proper garbage collection instead of abusing hook_flush_caches() like we do now.

This would allow sites to be installed with memcache as the backend right from the start, and core would never make the SQL tables. It'd also make it easier for contrib backends that need some kind of explicit bin creation to do that.

If you swapped cache backends in settings.php, then you might run into issues - probably we could add something to drupal_flush_all_caches() to get all cache bins, check if they exist, create them if not so that this could be done at the same time as deploying the change.

If we cared about having the cache tables in schema, system module could still implement hook_schema() - but only for cache tables that are using the database backend.

There's no explicit dependency, but I'd like to get #81461: Clean up the cache API (stop overloading function arguments, remove procedural wrappers) a bit further along before starting on this.

This issue is part of #679112: Time for system.module and most of includes to commit seppuku.

Tagging with performance, there isn't much of a performance win here (tiny bit less in the MySQL table cache and Drupal's schema cache maybe) so please excuse the spammy tag.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Xano’s picture

This issue is (in part) a duplicate of #496248: Refactor cache API. If we expose cache bins in a hook, we can use that hook to flush caches (and replace hook_flush_caches()) and we might use it to create and drop bins dynamically, depending on the backend used for each particular bin.

Crell’s picture

Wouldn't adding a hook here make a late-bootstrap dependency for the cache system, which is by nature needed very early in bootstrap? What's the behavior when you try to read/write a cache bin that doesn't exist?

catch’s picture

@Xano: yes it's a partial duplicate of that issue, hook_cache_bins() is 100% nicked from that patch, however I want to break this cleanup into targeted patches so they can be reviewed more easily.

@Crell: There's no hook for cache backends, only for modules that want to have their own cache bins - this is no different from now, except the hook is currently called hook_schema() and only works for the specific cache backend that core ships with.

Nothing in the proposal here would inherently require having a hook implementation to use the cache system - but you'd have to handle cache bin creation and garbage collection yourself - or have an inherent dependency on system module and use the cache/cache_bootstrap bins. This patch does not fix that, but it doesn't make it any worse either. One step at a time...

"What's the behavior when you try to read/write a cache bin that doesn't exist?"
I'm still figuring that out, right now that can happen anyway - for example if you use the MongoDB cache backend, we offer exactly zero ways for MongoDB to add cache bins when modules are installed (I haven't actually looked at that cache backend for over a year so can't remember if it tries to handle it or not).

The only advantage that the current situation gives you now, is that if you give up on a pluggable cache backend, core can fallback to database caching and the database tables are there waiting (although they could easily contain stale cache content that was never cleared and break your site anyway - say a three month old system_list(), module_implements() and registry cache.

One thing we could do is put cache_get()/cache_set()/cache_*() inside a try/catch - the catch could then check for bin existence, try to create it if it doesn't exist, then throw an exception if that fails - this would allow some runtime fallback without adding (noticeable) overhead to every call.

pounard’s picture

I don't like the createBin() and dropBin() functions. What's would be really good is #1 suggestion, forcing modules to set their bins in a declarative manner, so the core would be able to create/drop/clear them with or without the intervention of god, and without the implementation of hook_flush_caches() and such (which need to exists for side tasks like clearing some other stuff like cache itself).

The need to have a database table under the hood for each bin, whatever is the cache backend using it is really important because it allows a runtime fallback on database if needed.

createBin() and dropBin() makes sense, but as API protected functions, because maybe some backends would need some kind of initialization, while core doesn't care about specific implementations therefore shouldn't mess with it: core doesn't have to run the create order, the backend specific implementation must complete its init by itself if needed, probably in some kind of lazzy algorithm.

@#2, because core has a hook for bins doesn't mean that it needs to be run at early bootstrap. Actual system already needs the module to create their cache tables at install time, which means that, whatever we proceed to this declarative system or not, core will end up creating the tables itself instead of letting the module doing it. The result in case of hitting a non existing bin would be exactly the same: an ugly warning/error in watchdog.
Another solution could even have an exception based error handling that throws some kind of NonExistingBinException when trying to, but this going way further than the initial post IMHO.

Crell’s picture

catch: No, I mean if we're using a hook to define cache bins, how do we initialize the basic bins (cache, cache_bootstrap, etc.) initially? Especially if the installer runs with its own custom no-op cache system in the first place?

(I'm overall fine with this cleanup in general, just trying to deal with the implications.)

pounard’s picture

For core related, they are basically set up at system module install time (site installation) it could stay there, set up them doesn't mean use them at install time. The "registry of bins" could have those per default (as hardcoded values) as start before finding out a more elegant solution for early usages?

catch’s picture

@Crell - if the try/catch idea works then we may not need to explicitly create bins. That would also solve the fallback that pounard wants to the db.

The installer would definitely be an issue with this otherwise, would need to think how to handle that if the try/catch is a dead end.

By the way if this ends up working, I'd like to do similar for session and lock back-ends as well.

As pounard mentions, even if it's a bad idea overall to have createBin()/dropBin(), we could still have hook_cache_bins() and then have system module or the install system handle bin creation and deletion on module install/uninstall. Just that would be a step forward - not sure if that exact code is in Xano's issue or not, either way it would be a good part 1 patch for this regardless so it may be worth splitting this further.

pounard’s picture

I totally vote for the declarative hook! It would avoid so much typo errors (and we could log non existing bin usage attempts somewhere) while definitely get rid of SQL errors when trying on a non existing table! I thinks this is the really first step to improve the overall cache handling.

Any other improvement is welcome I think, and I really don't care if the error handling is exception based or not, basically it just have to be really performant because it a essential/critical and early bootstrap related code.

Xano’s picture

Bin creation was not part of my original patch.

In my patch I think I not only considered DB tables as bins, but compressed CSS and JS files as well. It would be cool to have one uniform way of handling cached data.

pounard’s picture

@#9 Compressed CSS and JS files are not cache, they are resources (as called in most frameworks out there). Assets are public static files, once they have been generated they are meant to be spread over CDN's and stuff, and therefore shouldn't be treated as volatile cache entries.

Even if CSS and JS aggregated files have a shorter lifetime than a real asset (such as user uploaded file), if a particular frontend starts to cache full pages (Varnish, Akamai, ESIGate, etc..) the static outdated files should remain alive too, in order that a client that reach an outdated cache entry never ever see his site without CSS applied.

Cache bin are mostly serialized binary data only used at runtime in order to achieve the greater good, which is compute the final rendered page, while compressed CSS and JS are 100% part of the page itself therefore are not volatile anymore (on the opposite to cache, once cache is destroyed, the underalying API will lazzy build it again, and the user won't see nothing) while CSS and JS files, once destroyed, cannot be restored because they mostly are being called from an external source (and not while building the page).

I disagree with the fact those two really different problems should be mutualised.

Crell’s picture

The try part of a try-catch block is super cheap. Creating and throwing an exception is expensive, because the exception has to create a backtrace. The question is what we define as the API for cache_set().

1) "If the requested bin doesn't exist, an CacheBinNotFoundException is thrown." - That means you have to have your own try-catch to avoid Drupal dying.

2) "If the requested bin doesn't exist, a watchdog entry will be logged and the cache will respond as if it was simply a normal cache miss." - This means Drupal will "almost silently fail".

Either way the DB cache would likely be faster with a try-catch block internally than with an explicit table_exists() check, since 99% of the time it will succeed.

pounard’s picture

Mostly agree with #11 on this. Exceptions probably boot performances if you have to unstack 100 calls, but it make you loose time if it's only two (this needs to be verified, but if the JIT/VM/? is properly implemented, it should probably use a setjmp()/longjmp() alike system, which is ultra fast exception like error handling in C).

That's why I proposed the isHealthy() check, that every implementation would give (database based cache would just tell TRUE anyway, because a missing table here is not something that should happen, you can run your site on a downgraded env. without memcache, you cannot do the same without the database).

EDIT: This is why I came here: http://drupal.org/node/1164484

Anonymous’s picture

subscribe.

Fabianx’s picture

Subscribe

Crell’s picture

catch: Your thoughts on #11, vis, the API of a "bin miss"? I'm inclined toward option B, "silently fail with a log entry", as forcing everyone to wrap try-catch blocks around every cache get is going to be a losing battle.

catch’s picture

@Crell: I still think we should look at try/catch and do a db_table_exists() in the catch block, could happen in DrupalDatabaseCache->getMultiple(). If that db_table_exists()/db_create_table() fails then you'd still have to deal with it though.

Of the two choices in #11, I'd go for watchdog too.

pounard’s picture

Missing table would be a module install problem, not a cache problem. These kind of database errors should explode at the developer's face, not downgrade silentely an innocent man's site. Database cache is not depending on any external dependency, therefore a missing table is a real error that should not happen.

Crell’s picture

A db_table_exists() in the catch is fine, as that's already the edge case (presumably).

So you're expecting the cache system to be self-healing in this case? Something like:

DrupalDatabaseCache {
  function getMultiple(&$cids) {
    try {
      // ...
      $result = db_query('SELECT cid, data, created, expire, serialized FROM {' . db_escape_table($this->bin) . '} WHERE cid IN (:cids)', array(':cids' => $cids));
      // ...
      return $cache;
    }
    catch (Exception $e) {
      if (!db_table_exists($this->bin)) {
        db_create_table(...);
        watchdog(...);
        return $this->getMultiple($cids);
      }
    }
  }
}

I am not against it in concept, but I worry about cleanup when uninstalling modules. (And actually, the bin check may make more sense to do in the constructor rather than on get, but that's an implementation detail.)

pounard’s picture

Yes, at least it keeps the specific error handling into the specific implementation!

catch’s picture

@Crell: the idea is to decouple the cache system from a specific implementation of caching in SQL - right now there is a lot of hard-coded schemas etc. which assume you use core's implementation. The same is true for lock, sessions and queue.

What I'd like is for it to be possible (not necessarily recommended) to install core, and if you are using pluggable storage then core would not create empty database tables that are never going to be used in the first place. This is currently true for field storage, but not for anything else.

We don't currently have a way for pluggable includes to install things or do updates, the same exists for db drivers #1010188: Add support for database-driver updates.

In the case of caching, because the bins are by definition disposable, the self-healing seems like a reasonable way to deal with this.

While I agree with pounard that it may be nice to be able to fall back, the approach here doesn't necessarily preclude this.

Doing the check in the constructor is tricky, since there is no attempt to fetch from the cache bin there, so the try/catch trick for creating bins would incur overhead every request (per-bin) rather than only happening in failure scenarios. A module like memcache where it sets up an initial connection to the bin in __construct (iirc) may be able to use that though.

Since hook_cache_bins() is a much simpler change, probably needs to be backported to D7 in some way, I'll try to get that into a a new issue to deal with separately, once that's in it'd be less work to do here anyway.

Crell’s picture

If the cache object is self-healing, why would we need a hook? Just start throwing stuff into cache_foo, and it creates it as needed. The only issue then is module uninstall where we need to clean up such tables. Perhaps we can get away with just a ->deleteBin() method (and corresponding createBin() method for internal use), and modules are expected to clean up after themselves?

Xano’s picture

This issue actually is about two different, but related things:
- How do modules expose/create their cache bins?
- How do cache backends handle cache bins?

Modules exposing cache bins is not mutually exclusive to cache backends being able to dynamically create or delete bins of their type. If a bin is set to use backend X, but X does not have that bin yet, it needs a fallback. This is common to all backends, since DB caching is just the default backend and should not be used as a fallback.

Xano’s picture

//Edit: double post

pounard’s picture

Modules exposing cache bins is not mutually exclusive to cache backends being able to dynamically create or delete bins of their type. If a bin is set to use backend X, but X does not have that bin yet, it needs a fallback. This is common to all backends, since DB caching is just the default backend and should not be used as a fallback.

More precisely, the database bins are the only one for which the logic is not handled by the cache specific implementation itself. The clean way to proceed is to let the "bin" abstraction into core but remove all database code (from hook_schema() and other places) from core and modules and let only the database backend do whatever it has to do with it (creating and removing tables).

The only thing that can/should sweat from the core is "module X that handles the Y bin has been removed, please all cache backends remove it!".

pounard’s picture

The correct issue title should not be "Allow cache backends to create and drop bins" but "Make each specific cache backends responsible of their bins transparently".

catch’s picture

I'd be fine with that issue title :)

The reason you'd still have a hook is full cache flushes and garbage collection. Although we could make cache backends entirely responsible for the latter too.

pounard’s picture

Yes agree for the flush and garbage collection hooks.

Crell’s picture

If we want to allow bins to handle their own GC, then don't we need a global list of known bins?

Each bin may be on a different driver, and each driver may not know for certain all of the bins assigned to it. So somewhere we need a master list of "bins that exist", and then tell each one "bin, clear thyself!" (And also, possibly, "bin, kill thyself!")

catch’s picture

Right that would be hook_cache_bins().

Xano’s picture

I suggest hook_cache_bin_info(). It complies with the standards of keeping names singular and adding the "_info" suffix to info hooks.

pounard’s picture

+1 to #28, #30

Crell’s picture

I'm still not a huge fan of Yet-Another-Hook (henceforth YAH), but a proposal:

function hook_cache_bin_info() {
  // Deliberately over-verbose for later extension.
  $bins['page'] = array(
    'bin' => 'page',
  );
  return $bins;
}

interface CacheInterface {

  // A cache object knows its bin.
  public function __constructor($bin, ...);

  // Create and initialize the bin.  Usually only called internally but a legit public operation.
  public function createBin();

  // Remove a bin from the system.  This is probably a no-op on many non-SQL implementations.
  // Return TRUE if the bin was removed, FALSE if it wasn't there to begin with.
  // (Removing a non-existent bin is not an error condition; that's consistent with the DB schema API.)
  public function removeBin();

  // If the bin doesn't exist, call $this->create() and make a non-error watchdog note.
  public function getMultiple();

  // If the bin doesn't exist, call $this->create and make a non-error watchdog note.
  public function set();

  // Remove a single cache key, or subset.
  public function clear($key);

  // Clear the entire bin.
  public function clearAll();
}

- On module install... do nothing. All bins lazy-instantiate. The schema for the DB cache implementation is not even in schema hook, just in the SQL cache implementation.

- On module uninstall, check for all bins declared by that module. Load whatever cache object is set for that bin, and call remove().

- If a module created a bin on one backend and then it was reassigned to another backend, the admin is responsible for cleaning up the stale backend, not us.

- If you try to write to a cache bin that isn't declared, throw an exception. I'd rather that to watchdog as it is a sign of a significant bug in your code to begin with, probably a dumb typo. If you're using a dynamic bin name, then you're abusing the cache system and need to stop doing that.

Thoughts?

catch’s picture

@Crell, if you think of the new hook as splitting hook_flush_caches() (which is wrongly named and completely abused) does it make it any better? ;)

I agree with just about all of that.

A few things:

Couldn't we just do $bins['page'] = TRUE? Or change the API when there's a need to make this more verbose?

It would be nice to eventually work towards a point where the cache API (and backends) could be used more or less as their own unit. Throwing an exception if a bin isn't defined in a hook will rule that out (although easy to change later).

More of a concern, it means there is a hard dependency of the cache system on the module system. This could also make it more fragile than we'd like as well - for example, switch the cache backend, make a request to Drupal, the first thing it needs to do is write to {cache_bootstrap}, which won't exist, and neither will system module which is defining {cache_bootstrap}. I would rather leave the (hopefully small) risk that someone mis-names a cache bin than add a new circular dependency. While we'll be using the hook system either way, scheduled garbage collection and site-wide cache flushes are things that Drupal does, but the cache system is ignorant of.

The schema for the DB implementation should definitely be in the cache implementation (probably a public function returning the schema just in the db backend). I was thinking we'd add a hook_schema_alter() in system.module based on hook_cache_bins() + cache backend settings so that modules like schema module would report tables correctly. That'd also be a very small protection against the dynamic cache bins issue.

I may have mentioned it already, and it may be obvious if not but a follow-up goal would be to do a similar thing for queue, lock and session.

pounard’s picture

Seems rather good to me, and I like the fact you dissociated clear($key) and clearAll() (maybe clear all could be called empty() by the way).

Appart from this totally off-topic remark, yes, seems a good way, only one thing is bothering me, is that the fact the core is responsible for giving the remove() order, but not the create() order, I think the layer that is responsible for it (either the backend itself, or the core) should be responsible for calling both of these functions. It would make a lot of sens that core triggers the create(). Usage of a non existing bin would then raise an exception (not lazy creation) and alert the developer he is doing it wrong (it definitely makes the error evident).

pounard’s picture

Oups cross post removed the tags, and I can put them back, sorry.

catch’s picture

Issue tags: +Framework Initiative

Oh on clear vs. clearAll(), I missed that bit, please take a look at #81461: Clean up the cache API (stop overloading function arguments, remove procedural wrappers) which splits cache_clear_all()/clear(). Part of the reason I didn't do any patch here yet is 'cos I'm hoping we could get that one in first (current patch there leaves the clear() method for temporary bc).

@pounard the main issue with lazy creation of bins is allowing for different cache backends with bin creation requirements to be swapped in and out seamlessly. If we throw an exception always we can't do that (nor lazily create cache bins during the install process when not all modules might be loaded). For a possible compromise how about having core create the bins when the module is installed, but have the backend log to watchdog for lazy creation instead of an exception?

catch’s picture

Issue tags: +API addition
Crell’s picture

catch: Fair point on the circular dependency in the installer. My concern, though, is that it effectively makes the hook optional. Nothing *actually* breaks if you don't declare your cache bin, so it can never be properly garbage collected. That said, if we decide that's the lesser of two evils (like all those modules that forgot to clean up their variables in hook_uninstall()) then I can live with that.

I would love if we can go as far as even removing watchdog() function calls from cache objects. Any time we can push as far as an object that has zero external function calls, that is the gold-standard to shoot for. (Not always achieveable, yet, but that's where we want to end up for modularity and testability reasons.)

For the structure of the hook, we could but I prefer to leave ourselves an opening for later. We did the same for hook_hook_info() in expectation that sometime in the future we could add hook weights there, and it would be a backward-compatible change. I could see us adding things like default expiry ages to a given bin in this hook, in which case I'd rather not have to change that API even more times.

I may have mentioned it already, and it may be obvious if not but a follow-up goal would be to do a similar thing for queue, lock and session.

I don't believe you did, but that makes a lot of sense to me.

Xano’s picture

- Lazy bin instantiation might be good, so bins can easily switch backends, but we will have to document that well to prevent WTFs.
- Because bins can change backends at any moment, we should call removeBin($bin_name) for every backend when a module is uninstalled, to make sure no unused bins remain.
- As far as I can see the hook_cache_bin_info() proposal in #32 requires developers to set the bin name twice: both as key in the parent array and as a value in the bin definition. This is duplicate data and when used or altered incorrectly or inconsistently, this means trouble.

catch’s picture

The hook being optional is pretty much where we are now, so in terms of trade-offs I'd be happy leaving that for a bit.

I'm happy to just allow lazy bin instantiation without warnings, it's not really that different to dynamic table creation which we do quite a lot now with field API.

Cleaning up old bins is tricky, Xano's idea might work except that's going to be quite a big operation on module uninstall, but we do a lot there anyway so maybe.

I agree setting the bin name twice in hook_cache_bin_info() isn't great, would much rather use the array keys as suggested in #36.

pounard’s picture

This issue has a pretty good shape, and I think it's time to integrate this one: http://drupal.org/node/1164484#comment-4728356 as well. Having an isHealthy() check over cache backends allows to implement the lazy table creation in it, and return FALSE if creation didn't succeed. What do you think?

sun’s picture

Issue tags: +API clean-up
pillarsdotnet’s picture

catch’s picture

Status: Active » Needs review
Issue tags: +Needs change record
FileSize
17.16 KB

Here's a patch, it does not include everything discussed in this issue but Drupal installs so that's something.

Nor did I try to incorporate #1164484: Cache backend consistency runtime check and graceful downgrade mecanism but there's so far no conflict with that patch at all either.

Changes:

Modules declare the cache bins they will use in hook_cache_bin_info(). Currently this looks like:

hook_cache_bin_info() {
  return array(
    'foo' => array
        'description' => 'What we used to put in schema description',
     ),
    );
}

DrupalDatabaseCache adds the createBin() and dropBin() methods, it also wraps cache CRUD operations in a try catch to lazy create bins if they don't exist (to allow people to switch back to the database cache and have it still work).

Most cache backends will be a no-op for this. This would be a good reason to have a genuine null cache backend so they can extend that for no-op create/drop methods, but there's an option issue for that.

Image module does not use the cache bins that it creates, that was a head desk moment so I just nuked those lines, but needs its own issue since we should dump those in 7.x too.

We probably want to make hook_cron() call hook_cache_bin_info() here as well - however the cache API update patches are touching that code and I'm trying to avoid conflicts all over the place.

catch’s picture

Title: Allow cache backends to create and drop bins » Replace hard coded database schema for cache bins with hook_cache_bin_info()
catch’s picture

Issue tags: -Needs change record

Taking change notification tag off until this is actually committed.

sun’s picture

Status: Needs review » Needs work

Aside from the initial and minimal review that follows, I don't like this change proposal.

It mixes high-level (hook/module system) logic into low-level, pluggable cache system logic. I can't provide factual evidence for where and when this will blow up, but my gut feeling on these changes tells me that it will blow up.

It's not only the interdependency between application layers, but also the idea of moving storage container/facility logic within and into data/CRUD controller logic -- these are fundamentally different tasks happening on entirely different times and levels of the application stack. A data/CRUD controller is dumb. It must be able to rely on the fact that the storage it attempts to read from or write to already exists. The approach taken here looks way too "dynamic" and not reliable nor maintainable to me.

+++ b/includes/cache.inc
@@ -464,60 +476,89 @@ class DrupalDatabaseCache implements DrupalCacheInterface {
+    foreach ($this->schema() as $table => $definition) {
...
+    foreach ($this->schema() as $table => $definition) {

$this->schema() doesn't return a multi-dimensional array

+++ b/includes/cache.inc
@@ -464,60 +476,89 @@ class DrupalDatabaseCache implements DrupalCacheInterface {
+      if (!db_table_exists($bin)) {

$bin is undefined,

-12 days to next Drupal core point release.

catch’s picture

It mixes high-level (hook/module system) logic into low-level, pluggable cache system logic.

That's not accurate, in fact it's the opposite to what's actually happening in the patch. Currently there's a hard dependency between system_schema() and cache bins, this patch decouples that.

pounard’s picture

I think it's time to wake up this issue.

hook_cache_bin_info() is a good idea.

On the fly table creation doesn't sound a good idea: the cache table should be created per default during the module install (and removed during the module uninstall).

The cache bin info must live in .module file, it will be a very lightweight hook and can be run each request without being itself stored in a persistent cache (static cache will be enough).

It will fix this http://drupal.org/node/636454#comment-6128574

Having a registration based info hook for bins allows us to get rid of most hook_flush_caches() implementations (in favor of hook_cache_bin_info()) calls.

It also enable possibility to create full UI for cache analysis/handling etc.

Non existing bin should raise logic exceptions when requested: more robust code, early errors detection.

Xano’s picture

@pounard: See my patch in #496248: Refactor cache API for most of what you suggest. We shouldn't forget a hook_cache_bin_info_alter() so websites can easily switch a particular bin from DB cache to memcache, for instance.

Also, if we go forward with this issue I'm willing to pick up where I left off.

pounard’s picture

@Xano sounds good, why wouldn't we migrate your patches here then? @catch's opinion as some value here too, he was the one at the initiative of a series of issues about cache including this one.

I briefly overlooked at your patch, maybe we should start without the UI additions and only with the bin registration mecanism, then open a new issue for the UI?

claudiu.cristea’s picture

Assigned: Unassigned » claudiu.cristea

Picked this up. Will start from #44.

claudiu.cristea’s picture

Title: Replace hard coded database schema for cache bins with hook_cache_bin_info() » Registering cache bins via hook_cache_bin_info()
claudiu.cristea’s picture

Status: Needs work » Needs review
FileSize
25.85 KB

Here's a patch for testing.

Summary:

  • Cache bin tables are registered now in module .install file using hook_cache_bin_info().
  • When a module is enabled it automatically installs the cache bin table(s). On uninstall will drop the tables.
  • Cache bin tables are consistent as naming convention (pattern: 'cache_bin_' . $bin).
  • Main 'cache' table was renamed to 'drupal' as bin (table name: 'cache_bin_drupal') for consistency reasons and API cleanup.

Status: Needs review » Needs work

The last submitted patch, drupal-cache-bins-registering-1167144-54.patch, failed testing.

Crell’s picture

A recent Twitter conversation leads me to this interesting question...

We originally added separate cache bins to the cache API back in 4.7 or so, to avoid cache clears wiping out too much at once. Vis, we didn't want a page cache flush to also wipe a filter cache or block cache, or vice versa. And we moved each bin to separate tables in MySQL. Or rather, we moved caching to multiple tables and have since started calling those separate tables "bins" as we try to pretend that the cache API started as an abstraction rather than just an SQL implementation. :-) At the time it was a good move, as it allowed Drupal sites to get bigger without suffering from cache stampedes as badly, and it meant that the SQL query cache for one cache bin/table was not affected when we blast away a different table.

However, that was then, this is now. These days, sites of any respectable size, those large enough to care about optimizing the SQL query cache, are not using SQL caching anyway. They're running memcache, or Redis, or something better performing than SQL. And those systems don't have, generally, as much manual setup around cache tables that the SQL implementation does.

So why do we still carry around the complexity of such SQL optimization?

Rather, if we changed the SQL implementation back to being single-table, with a "bin" column, then we completely eliminate the need to manage/coordinate creating/deleting tables at different times. The "bin" is implicit, and we don't need to create anything new. That simplifies the implementation and allows us to remove a fair bit of race conditions. (Can I use this table before it's been created?) The downside is that the SQL cache implementation becomes a bit less performant, but... if you care about cache performance that much you're not using the SQL cache implementation anyway. Times have changed and what made sense to optimize 6 years ago no longer makes sense to optimize. We should treat the SQL cache implementation as a fallback, much like the in-PHP HTTP caching we're inheriting from Symfony (http://drupal.org/node/1597696), on the assumption that sites that really care will have a real Varnish in front of them rather than a simulated Varnish.

Would that sort of approach not greatly simplify (or even eliminate?) this issue entirely?

pounard’s picture

I think that the bin registration is still something we want, because we want a unified single list of all registered bins so we can flush on cache clear, provide listings in admin UI, etc...

Regarding high-volume-performance-wise sites using external cache store, I don't agree, I'm myself working on sites that are quite big enough but still using database cache due to technical restrictions on the environment it's running in, so I'd like database cache to remain performant as it is, separate tables is the right way to keep this.

What we could do is get rid of createBin() and dropBin() from the cache interface, and keep the SQL tables creation in the system module, system module then acting as the provider of the database cache backend, preparing the environment as it should in an encapsuled fashion, reacting to hooks.

Regarding the patch, we shouldn't do the try/catch workarround, because try/catch as some kind of cost on runtime, and it is totally unnecessary since the first thing you do after each upgrade is hitting update.php, and not the site itself, site should never be hit directly after files upgrade without hitting the update.php first.

I'm not fond of renaming all 'cache_*' to 'cache_bin_*' because 'cache_' is explicit enough and a well known convention in Drupal.

catch’s picture

We already have the bin registration in hook_flush_caches(), one of the goals of this issue was to decouple it from that hook (which is massively abused, #891600: Page and block caches are cleared the same way on cron as on content change is related too) and we could split that part out to just do that.

The other part of it was trying to get rid of large hunks of system_schema().

@pounard, I'm not sure what you mean by "keeping SQL tables creation in the system module", if a core or contrib module provides their own cache bin, they're responsible for creating and removing the cache bin themselves - i.e. maintaining their own schema entry for the database backend and hoping that core doesn't change it (or that core finds a way to update it, we're attempting that in Drupal 8 by seeing which tables start with cache_ then checking which columns are in them. It doesn't make sense to me that one, pluggable, implementation of database caching needs to be supported by every contributed module that provides their own cache bin, and I'd really like to see that coupling gone.

@Crell's idea of having a single table with a bin column is interesting. It doesn't let us gut system_schema() as much as I'd like, but it would completely nuke the proliferation of cache schema entries around core and contrib. I'm not sure that performance is that much of an argument against this, since it should be easy (if we have hook_cache_bin_info() as long as that's treated as a .install hook) for a contrib module to provide an SQL storage backend that does what core does currently. We're still left with the issue that pluggable storage implementations currently have no way to manage their own schema but that applies to far more than caching, and most systems don't have the dynamic table creation issues that caching does (i.e. lock and queue).

pounard’s picture

I mean, having one table per cache bin doesn't sound that bad too me: the only downside is that modules must take care of creating and removing their own table. This sounds inconsistent to me, because one module just declare a bin for the cache system to provide a new namespace. Since backend is an implementation detail and the module just need to use an interface, the module should not care about the table itself, it's only a SQL implementation implementation detail. Having the system module managing the tables instead of custom module doing it just ensure this encapsulation. If we consider that the system module provides the SQL backend, and because the backend internals is an implementation detail that should be encapsulated, then it's normal to say that the system module *must* ensures itself that those tables exists when a new bin is requested. Core itself doesn't really care about SQL, core brings the interfaces for modules to use. The fact that the SQL implementation is inside the core namespace and not inside the component namespace confirms that the SQL implementation is not part of the publicly exposed API but is an specific encapsuled implementation: as such, any implementation detail must be hidden from modules, tables included. I don't care then who needs to create/suppress them, but I would naturally say that it's the system module responsability, because it's also the only module that creates and manage system tables.

Crell’s picture

pounard: With a bin column (which we didn't have in 4.6), we can still clear each bin independently. All we loose then is the query cache, which is not that big of a deal, and at some point we need to cut our losses and focus on an enabling API, not micro-optimizing a degenerate case. And it means the create/drop table management question goes away completely; the whole "whose responsibility is it?" question ceases to exist.

sun’s picture

I like @Crell's proposal.

Ideally, we'd leverage k/v for it and simply turn 'bin' into 'collection'.

Second, if we manage to still bear in mind the use-case of a DatabaseStorage, then we perform the necessary plumbing on the application layer (not cache layer) in a way that notifies the system about bins that need to be added or removed.

In turn, people on environments that don't support alternate stores/backends, but yet have a decent amount of data to cache, can swap out the cache implementation with an alternative one that uses separate tables per cache bin (provided by contrib, not core).

That would keep core simple, but still allows for alternate implementations.

pounard’s picture

I'm kinda against relying on contrib. And I don't see the k/v fiting for this purpose since it doesn't carry half the business of cache backends.

pounard’s picture

@#60

micro-optimizing a degenerate case

And all the people that uses Drupal on shared hosting too.

And it means the create/drop table management question goes away completely; the whole "whose responsibility is it?" question ceases to exist.

This is certainly not true, because when you uninstall a module that declares a bin, you still have to clean up after it.

I understand this idea, but let's face it, the cache system is now working very well this way, and if this is only the matter of creating and dropping tables, I'm not concerned about keeping that.

function system_modules_enable($modules) {
  foreach ($modules as $module) {
    if (module_implements($module, 'cache_bin_info')) {
      foreach (module_invoke($module) as $bin => $info) {
        db_create_table('cache_' . $bin, some_function_that_gives_me_the_schem());
      }
    }
  }
}

function system_modules_disable() {
  // Same as upper with "drop" instead.
}

Doesn't sound too hard for me.

Crell’s picture

That assumes you're using the SQL backend. If not, then you have system.module creating and dropping unnecessary tables. It's also harder to unit test.

The cleanup process with a single table is easy: Meh, don't worry. The next time you run a flush-all-caches everything gets wiped anyway, regardless of bin, so that will clear anything left over from the previous module. Which will likely happen on module-disable anyway, so it's not even a delay.

sun’s picture

I think I agree with the code snippet of #63, and I'd address #64 through this:

-        db_create_table('cache_' . $bin, some_function_that_gives_me_the_schem());
+        cache($bin)->create();

In words: The built-in DatabaseBackend would probably be a no-op, an alternate SeparateBinDatabaseBackend would create a {$bin} table.

catch’s picture

cache($bin)->create();

That's very close to what my original patch here did except we assume that people switching between cache backends will run createBin() themselves (i.e. no try/catch trickery), and they'll need to delete stale bins themselves too.

I'd be fine with that assumption and shifting the bin creation to a system_modules_enabled(). This also means that if we keep the multiple tables for the database backend it doesn't matter that much - since sites using alternate backends won't run this with the DatabaseBackend anyway.

claudiu.cristea’s picture

Status: Needs work » Needs review
Issue tags: +Needs change record
FileSize
18.63 KB

More than 40% of my Drupal sites are still using standard table caching. And if I need to quickly build a new website having some specific functionality but not being under stress, I would not implement Memcache because I'm happy with table cache while there is no problem with the performance.

Going to "one-table-cache". Relying on SQL cache doesn't mean that performance is not an issue at all. If I remember well the reason for multiple cache tables was that queries against small tables are faster. I'm for keeping multiple cache tables. What I would review is if we want to keep update and form temporary storage in cache (see comment bellow).

Meanwhile I'm coming with a new patch. What is different:

  • Removed methods createBin(), dropBin() from interface while we are using the only for standard SQL cache tables.
  • Even I'm for renaming tables to cache_bin_* pattern (because cache bin tables need to be named different from others like cache_tags), I had to quit doing this. Renaming those tables it's a nightmare when trying to test. This is because of how it's build the temporary storage of update.module. That particular one is implementing its own API, directly to the DB backend cache, accessing tables directly with SQL. In my opinion update cache is not really a cache! It's something that we might want to name "temporary storage" and (like form cache) it should be moved out from cache. Cache is something that can be wiped out in any moment without affecting functionality but only performance. Update tests are crashing there because the code is using cache_bin_update but the tested SQL snapshot is still having cache_update. It's too much to workaround for a such a small gain. So, back to cache & cache_* patterns.
pounard’s picture

@#65 I would not add those methods on the backend interface, because I often switch from one implementation to another during site life, and if I do this, the hooks won't be run a second time, and I start with a memcache backend, then change per configuration to the SQL backend, the table will never be created. I'd prefer to see those empty tables even if unused, because they can always serve in case of system downgrade to SQL because something else went wrong.

cache($bin)->createBin($info['description']);

Is wrong, because the backend might be another one per configuration than SQL, and once the module enabled we cannot re-trigger the table creation upon configuration change. I think the table schema is not an instance method, but a class method (static) of the database backend. It is a utility function which is used only @enable/install time and has no sense to live as an instance method.

Aside of that, I like this patch.

catch’s picture

I would not add those methods on the backend interface, because I often switch from one implementation to another during site life, and if I do this, the hooks won't be run a second time, and I start with a memcache backend, then change per configuration to the SQL backend, the table will never be created.

Yes this will happen, but

foreach (cache_get_backends() as $cache) {
  $cache->createBin();
}

Is not hard to run I think. Or we put the try/catch back in.

pounard’s picture

I don't see why should we complexify the runtime code by adding try/catch blocks, install procedures and such whereas having 10 empty tables in the tablespace won't hurt a bit and actually works very well.

EDIT: The only cons I can see to this is that maybe some other backends will need init too, but if they follow the system module principle, it will work gracefully.

catch’s picture

Let's say you have a multi-site installation with 1,000 sites on one MySQL database.

Certain schema operations involve loading every table, even just SHOW TABLES does this I think. An extra dozen tables means an extra 12,000 to load and unload into and out of the table cache for that SHOW TABLES call, despite them never being used if you're using memcache for caching. That can cause the server to fall over. This is not a made up example.

pounard’s picture

Fair enough, but when a site has 200+ tables (real use case too) where cache tables are only 5% of them, I don't think that cache bins are the right vector to attack to try to fix the problem.

I do understand your point here, and I agree, but I want the solution to be graceful when changing the backends on a live site, and I want it performant enough for sites that don't use anything else than the database.

On a commerce site I'm actually working on, with separate tables everything is fine, but if I put everything in the same table, I will expect to quickly reach more than 10 million rows (fields, blocks, pages, menus, leaving some other outside): in that case, if my bootstrap cache for example ends up in the same table as the others, loosing some bits of query cache and force all lookups on a 10 million rows table is not something the site will support, at all.

On the other side, the same site uses some apc backend cache (bootstrap and cache only), if apc fails (sometime we have to deactivate it for debugging, I bet it could happen later on the live site if we need to put in downgrade mode) I need to be able to switch the backends without writing any code, nor running anything that might alter the site configuration or data outside of the settings.php file.

Do you think there is a compromise between all that was proposed in this issue? The try/catch runtime check would work in the end, but is this performance wise? Wouldn't it be too aggressive and catch unrelated exceptions and make the site behave wrongly? If you feel it's OK, I'm would be OK to switch to this method and test it.

Off-topic note: In the end, I don't want to bet on the "contrib will fix that" speach, it's a very wrong speach, saying this means "I don't care about what we do, we're not responsible for the code we provide" and that's something I'm generally frustrated about in Drupal, people don't care about other real use cases than their own.

catch’s picture

Fair enough, but when a site has 200+ tables (real use case too) where cache tables are only 5% of them, I don't think that cache bins are the right vector to attack to try to fix the problem.

Most of those are from field API. We have pluggable storage for field API, and when you use that, you don't get empty SQL tables created. So it's exactly the right vector to attack I think - if you want to remove something, remove something that is never going to be used. Lock and Queue should be the same.

There's a specific SchemaNotExists exception or similar we could catch, that should be specific enough.

Also I'm 100% sure that simply having a try/catch block in code is not expensive if you never actually get an exception thrown during normal operation. If you think the try {} itself is adding overhead profiling/benchmarks is welcome of course. (Also we do now throw exceptions for 404s/403s as a matter of course which is not going to be that uncommon due to HttpKernel).

I'm generally inclined to agree that one table for the database cache doesn't really buy us anything (and a big focus of this issue was trying to get us away from system module being responsible for things in /lib) - but that means solving the table creation issue and devolving responsibility to the backends.

pounard’s picture

Ok, indeed I know have a better understanding of your problem. Let's try the try/catch solution then, after all that was the kind of things I proposed a long time ago regarding broken backend detection, let's do that.

sun’s picture

The try/catch approach is wrong IMHO -- that was the main offender in the original patch already.

The for loop in #69 looks acceptable to me. (Heck, we could even provide a UI for switching backends. In the end, the implementation to use needs to be saved in the config/container anyway.)

Also, to clarify: I'm especially eager on the single-table implementation, since that would eliminate a lot of needless DDL for tests.

pounard’s picture

The single table implementation will drop the query cache and cause varchar indexes problems, for both read and write operations, in tables that will grow potentially exponentially depending on all the user, page, field and entity count. Single table is a no go because even on small sites it will be a huge performance regression, especially using MySQL.

I'm not fond of the try/catch approach, but right now it's the easiest solution that take care of both performance and tablespace bloat problems exposed by me and catch.

I'd prefer the pre create table by the system module, but it will cause catch huge problems because of his tablespace bloat problems, using a single table will kill the site I'm working on (and potentially *a lot* of others). Do you have any path we didn't explore?

EDIT: Considering providing a UI, I'm not fond of this idea neither, even if it's in the container, the container is not a miracle box: cache backends are early instanciated beasts and will need some love throught a static configuration such as the actual $settings.php, just like the database.

Crell’s picture

sun: You need to explain why try-catch inside the DB cache implementation is wrong, because it solves the coupling problem, performance concern, and "use the cache before the table is there, boom!" problem.

I can live with try-catch or with single-table, because in both cases it pushes the cache schema management exclusively into the DB cache backend driver, which is where it belongs.

fubhy’s picture

This whole issue is currently tied to the DatabaseBackend. However we actually need a way to register cache bins for all backend types anyways. So IMO this issue has to be solved at a more generic level. We need to know which bins we currently have in our system (even non-DatabaseBackend bins) to run ->invalidateTags(), ->garbageCollection() (dunno when tho :P), etc. properly.

pounard’s picture

I agree, the bin info list is necessary for us, the database table creation problem is actually more like a parallel issue of this one.

fubhy’s picture

catch’s picture

Title: Registering cache bins via hook_cache_bin_info() » Make cache backends responsible for their own storage
FileSize
19.07 KB

Coming back to this one. We already have hook_cache_bin_info(), it's just called hook_cache_flush() and abused for horrible things. So this patch goes back to doing two things:

1. Move the cache bin declaration out of hook_cache_flush()

2. Make the cache backends responsible for maintaining their bins.

Once this passes, it could potentially shave quite a bit of time off the test bot since it'll stop creating all these tables when installing core.

Status: Needs review » Needs work

The last submitted patch, cache_bins.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
19.07 KB

Without that syntax error.

Status: Needs review » Needs work

The last submitted patch, cache_bins.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
21.5 KB

update.module still defines its own cache bin.

Status: Needs review » Needs work

The last submitted patch, cache_bins.patch, failed testing.

sun’s picture

Status: Needs work » Needs review
+++ b/core/modules/system/system.module
@@ -3496,7 +3496,9 @@ function system_cron() {
+  // hook_cache_bin_info() lives in .install files, so load them.
+  module_load_all_includes('install');
+  $cache_bins = module_invoke_all('cache_bin_info');

Is there any reason to put the hook into .install files though?

Unless I'm mistaken, we're loading .module files early enough already?

+++ b/core/modules/views/views.install
@@ -15,16 +15,10 @@ function views_install() {
-function views_schema() {
-  $schema['cache_views_info'] = drupal_get_schema_unprocessed('system', 'cache');
...
+function views_cache_bin_info() {
+  return array('views_info', 'views_results');

Eliminating cache bin schema definitions from hook_schema() has an interesting consequence though: Doing so technically requires us to make update.php drop/destroy all [previously known] cache bins (entirely), and re-create them [the current] with their current schema definitions.

Crell’s picture

+++ b/core/includes/module.inc
@@ -710,6 +710,13 @@ function module_uninstall($module_list = array(), $uninstall_dependents = TRUE)
+    // Remove any cache bins defined by the module.
+    if ($bins = module_invoke($module, 'cache_bin_info')) {
+      foreach ($bins as $bin) {
+        cache($bin)->removeBin();
+      }
+    }

As discussed in meatspace, I'd rather eliminate the hook and let modules be responsible for removing their own bins on uninstall. There's only a handful of modules that would have their own cache bins, and most of them either are already in core or should be using Http caching anyway. It's not a hard thing to ask of a module that's defining a cache bin.

Although, I don't know how we handle cache-clear-all without it. I don't know if #1764474: Make Cache interface and backends use the DIC would help there or not.

+++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
@@ -298,4 +299,108 @@ public function isEmpty() {
+    catch (Exception $e) {
+    }

Silently catching an exception is almost never appropriate. If there's a valid reason for it, it should be documented inline because otherwise this is a bug.

sun: How does this require dropping and readding tables?

catch’s picture

Assigned: claudiu.cristea » Unassigned
Status: Needs review » Needs work

We need the list of cache bins so system_cron() can run garbage collection, so we can't remove either hook_cache_flush() or hook_cache_bin_info() until those are in the DIC (assuming you can get to all of them together via the DIC in a foreach()able way) so that system_cron() can continue to do that. Having said that I'm not opposed to manual removal in hook_uninstall() - if we do that then the hook doesn't need to be an .install hook and it won't matter whether we have the list or not for this.

The silent exception catching is inherited from the existing code, there's an existing issue for that in #1792536: Remove the install backend and stop catching exceptions in the default database cache backend. Mostly it's there in case of a race condition where two processes try to create the same cache table at the same time - for which it'd be useful to have a comment though yeah.

I think we would need to drop all cache tables if updating the cache schema.

Updating the cache schema in Drupal 8 required a db_find_tables() and hoping that only valid cache tables started with cache_ and had cid + data columns - because we have zero API for updating those. If we want to let other cache backends update their schemas when we update the core ones, then dropping the tables (to be recreated later) is the best option.

This is only the same problem we have for pluggable field storage, for which there's no way for a field type schema change to trigger an update in those field storage modules anyway.

Back to CNW since this is obviously breaking things badly at the moment.

Crell’s picture

For foreaching(): If we tag cache backends then we can iterate them in the compiler, but not afterward since the tags are lost in the compiled container. However, we should be able to also store a list of service IDs in the DIC as well, and then grab that and iterate the service IDs that way.

For the empty catch: Gotcha. Yeah, we should at least document it for now; if we can refactor it later, bonus.

Table deletion: Hrm. I don't know that a cache schema would ever change, but with this approach dropping tables on update.php wouldn't be all that catastrophic. There's a possible race condition there, but only if you're running update.php in non-maintenance mode. Which... update.php forces you into maintenance mode temporarily, doesn't it? If so, cycling the bins would be a bit expensive but manageable and wouldn't break anything. So, sure, let's do. We'd probably want to createBin() while still in maintenance mode to avoid a race condition. (So foreach($bins as $bin) { $bin->dropBin(); $bin->createBin(); }

sun’s picture

I'm not opposed to manual removal in hook_uninstall()

I think I am totally opposed to that, because I cannot make sense of it :) Let's presume I was the maintainer of admin_menu module which has its own cache table. So the scenario I'm facing is:

1) I need a cache bin.
2) I am not the site operator.
3) Only the site operator decides which cache backend is used for which cache bin.
4) Alrighty, I register my bin.
5) Since I'm not in charge and shouldn't be, registering it is reasonably all I can do. I also cannot maintain it, because I don't know what backend you're actually using. Nor can I uninstall it, because, again, I don't know what backend you're actually using. And maintaining a database schema of a table that isn't used to begin with is pretty much pointless. ;)

Perhaps I'm missing something. :)

Crell’s picture

What you're missing is that as the module author you know that the bin is called foo. Thus there is a service ID of cache.foo. And you know that you can call $container->get('cache.foo')->deleteBin();

And you do that in hook_uninstall(). That method will either delete a DB table if it's a DB backend, or do nothing if it's memcache, or do whatever it is you do on Redis, etc. You don't care. You are just saying "this cache bin is no longer necessary. Do what must be done."

Berdir’s picture

The current implementation of the cache in DIC issue does the same as the key value, it does not register every bin, it registers the factory, which calls the backend specific factory which returns the actual backend implementation. So there is nothing to iterate on.

Also, note that the backend might have been switched between install and uninstall. install might have created the database table, and when you call dropBin() on it, it might use the memcache bin. So we would have to inform every possible backend that we know of (we don't do that yet) that the bin no longer exists, so that it can remove it if it was at any point in there.

Also, I hope that the need for custom bins will go away for most cases after #1194136: Re-organise core cache bins because then core will provide a handful of bins for most use cases instead of simply adding one to every module that needs the cache.

catch’s picture

OK so for now we need hook_cache_bin_info() - remember it's only splitting the double duty of hook_cache_flush() into a separate hook, not doing anything we didn't already do before. If we can eventually get that information 'for free' from a separate system of cache bin registration that's fine of course.

@Crell we changed the cache schema this release to add the tags column. We changed it in Drupal 5 or 6 to add the serialized column, there may have been an extra index added in D6 as well. So it's rare but it does get changed and probably never during a stable release, however we're not really making the problems with that any worse than they currently are.

Will have to see how things are going tomorrow but if I have time I'll try to get it passing the bot (or at least locally) then.

catch’s picture

Status: Needs work » Needs review
FileSize
23.07 KB

Added some comments, and I think the test bot should at least get to the point of running tests now.

Status: Needs review » Needs work

The last submitted patch, cache_bins.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
22.72 KB

Re-rolled, checkout was out of date, can't think who might have been committing patches...

Status: Needs review » Needs work

The last submitted patch, cache_bins.patch, failed testing.

catch’s picture

Status: Needs work » Needs review
FileSize
23.45 KB

Update module is doing direct queries of the cache bin, bypassing the try/catch logic, so it needs to explicitly define the cache bin in schema - this will be fixed with the update module key/value conversion though.

Status: Needs review » Needs work

The last submitted patch, cache_bins.patch, failed testing.

chx’s picture

Assigned: Unassigned » chx

I like this issue and I will make it happen.

Berdir’s picture

See also #1194136: Re-organise core cache bins, which will probably overlap quite a bit with this.

chx’s picture

Status: Needs work » Needs review
FileSize
19.84 KB

Definitely not my prettiest code ever but it'll do. We might want to consider moving module_enable/disable/uninstall into a separate file (module.install.inc?) which is not loaded always.

Crell’s picture

+++ b/core/includes/module.inc
@@ -578,6 +579,38 @@ function module_uninstall($module_list = array(), $uninstall_dependents = TRUE)
+                catch (Exception $e) {
+                  // This is a best effort removal. At least we tried.

We should at least log something in this case so an admin knows to go clean up manually.

+++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
@@ -188,11 +214,16 @@ public function deleteTags(array $tags) {
     $tag_cache = &drupal_static('Drupal\Core\Cache\CacheBackendInterface::tagCache');

I know these were not introduced by this patch, but allow me to cry for a moment that we're still calling drupal_static() inside a class. I mean, really. These belong in an object property.

+++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
@@ -243,19 +289,31 @@ public function invalidateTags(array $tags) {
+    catch (\Exception $e) {
+      // If the table does not exist, it surely does not have garbage in it.
+      // If the table exists, the next garbage collection will clean up.
+      // There is nothing to do.
+    }

While logical, if GC fails, wouldn't the next GC fail, too? If GC always fails, shouldn't we know about it?

+++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
@@ -329,9 +387,147 @@ public function isEmpty() {
+  protected function catchException(\Exception $e, $table_name = NULL) {
+    if ($this->connection->schema()->tableExists($table_name ?: $this->bin)) {
+      throw $e;
+    }
+  }

What happens if the tableExists() call dies? That gets us an Exception-ception situation, which is not a fun place to be for debugging.

If I read this right, set() will lazy-create a table if it's missing (yay!) but setMultiple will not (boo!). Both should be doing so, presumably.

Status: Needs review » Needs work

The last submitted patch, 1167144_103.patch, failed testing.

chx’s picture

Let me quickly just address setMultiple -- that's a key-value store method and cache has no such thing.

catch’s picture

Yep setMultiple() is here and not in yet #1167248: Add CacheBackendInterface::setMultiple().

Crell’s picture

Ah! I was reading too fast misread getMultiple(), which just swallows the exception.

In that case, we should be doing something other than swallowing the exception. At least logging, if not creating the table there. (I can see the argument for not creating it on get, only set, but silent exceptions are not a good idea.)

chx’s picture

> silent exceptions are not a good idea.

Why. Is this another of those pretty theories that we use to murder the performance of our subsystems? If so, I am not interested. Here the exception is irrelevant -- the question is whether we have anything in the cache, and we don't. Whether because the cache is inaccessible or because it's empty is irrelevant. The code makes sure that when a relevant exception is raised it gets propagated. I am writing practical code. You can keep your theories if they are part of what made Drupal 8 3-5 times slower than Drupal 7.

chx’s picture

Status: Needs work » Needs review
FileSize
1.2 KB
20.27 KB

As for logging, I have no idea where to log. Someone said that we shouldn't add function calls to our classes so I can't add watchdog() can I?

Hopefully, this fixes the test failures. I simply forgot to update the schema definitions since #99.

chx’s picture

FileSize
951 bytes
20.32 KB

I am surprised/amused that the discussion centers around eating exceptions instead of the "arrow shaped" (almost every line opens a new block) code in module_uninstall. Can we do something better? As for logging in there, yes we could watchdog there, that's true. However, what do we want to log? Here is one version.

Status: Needs review » Needs work
Issue tags: -Performance, -Framework Initiative, -API addition, -API clean-up, -Needs change record

The last submitted patch, 1167144_111.patch, failed testing.

Berdir’s picture

Status: Needs work » Needs review
Issue tags: +Performance, +Framework Initiative, +API addition, +API clean-up, +Needs change record

#111: 1167144_111.patch queued for re-testing.

chx’s picture

Assigned: chx » Crell

Moving back to Crell for another (hopefully, final) review.

Crell’s picture

Two minor nitpicks, after which I think this is RTBC.

+++ b/core/lib/Drupal/Core/Cache/BackendChain.php
@@ -222,4 +222,13 @@ public function isEmpty() {
+  /**
+   * Implements Drupal\Core\Cache\CacheBackendInterface::removeBin().
+   */
+  public function removeBin() {

These "Implements" lines should now be just "{@inheritdoc}". I think we've switched over fully at this point, at least for new patches...

+++ b/core/lib/Drupal/Core/Cache/DatabaseBackend.php
@@ -329,9 +387,159 @@ public function isEmpty() {
+    // If another process has already created the cache table, attempting to
+    // recreate it will throw an exception. In this case just catch the
+    // exception and do nothing.
+    catch (\Exception $e) {
+    }

Fair, but I think we should swallow only SchemaObjectExistsException here, which is what's thrown for creating a table that already exists (aka, the race condition in question here). That makes sense to just drop. Any other exception (eg, database going away, syntax error, etc.) should either get handled or allowed to propagate.

Crell’s picture

Assigned: Crell » chx
chx’s picture

FileSize
2.38 KB
20.65 KB

Pointless to {@inheritdoc} when everything else is Implements... -- we need to eventually script that anyways.

Changed that exception.

Alas, no interdiff because the old patch no longer applies.

chx’s picture

Status: Needs review » Closed (won't fix)
Crell’s picture

Assigned: chx » Unassigned
Status: Closed (won't fix) » Needs review

Sigh.

chx’s picture

FileSize
475.67 KB

Do or do not, there is no try.

chx’s picture

Status: Needs review » Reviewed & tested by the community

Anyways #117 was said to be RTBC before we got hung up on doxygen. If that's a problem, commit #120 instead.

chx’s picture

FileSize
20.47 KB

Here's #117 with {@inheritdoc}

Crell’s picture

#122 specifically is RTBC.

Anonymous’s picture

this looks great! one thing that caught my eye:

+    // This is the first try.
+    $try = 1;
+    // Maximum one tries.
+    $max = 1;
+    do {
+      try {
+        $this->doSet($cid, $data, $expire, $tags);
+      }
+      catch (\Exception $e) {
+        if ($this->ensureBinExists()) {
+          // If the bin was just created, try one more time but only one more.
+          // By using try - max we avoid an infinite loop.
+          $max = 2;
+        }
+      }
+    } while ($try++ < $max);

that loop is weird. maybe it could just be:

    try {
      $this->doSet($cid, $data, $expire, $tags);
    }
    catch (\Exception $e) {
      if ($this->ensureBinExists()) {
        try {
          // If the bin was just created, try one more time.
          $this->doSet($cid, $data, $expire, $tags);
        }
        catch (\Exception $e) {}
      }
    }
catch’s picture

beeejeebus' suggestion for the loop looks good.

Patch itself looks great, lovely to see this RTBC!

However there's several cache bins (at least system_schema()) not in the diff?

pounard’s picture

I really do hate the try/catch statement in every cache backend method, but that's probably just a matter of taste. Anyway it just makes it terrible to read, but at least it should have no performance penalty. One huge problem thought, it doesn't check the error type, so what happens if the database is down? The catchException() will surely rethrow a new exception once again, but its stack trace will tainted and the original exception will be lost.

catch’s picture

The try/catch pattern is only going to be necessary for the database backend and one or two others - i.e. memcache won't need that at least. I've seen both a PHP framework (some time ago though and can't remember which atm) and a certain high profile open-sourced webperf site (very recently) that check if the table exists before running the queries doing similar things, and that's horrific compared to the try/catch.

The cache backend already ignores exceptions in several places but it'd be good to have the most useful error messages possible if something does actually go wrong there. OK with this being a follow-up though given the current state (but not really system_schema() unless there's a very good reason not to touch that here).

chx’s picture

FileSize
28.13 KB

Re #124, nope, you'd get an exception thrown within the catch and that'd be horrible to debug. I added a comment to that end.

I wonder how it's possible I didn't catch all those bins?

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 1167144_128.patch, failed testing.

chx’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
6.76 KB
27.29 KB

Opsie, that included another patch :) Disregard.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 1167144_129.patch, failed testing.

chx’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
935 bytes
27.98 KB

D'oh.

catch’s picture

Status: Reviewed & tested by the community » Needs work

Couple more things I failed to note first round.

+      $definitions = $parser->parse(file_get_contents($service_yaml_file));
+      if (isset($definitions['services'])) {
+        foreach ($definitions['services'] as $id => $definition) {
+          if (isset($definition['tags'])) {
+            foreach ($definition['tags'] as $tag) {
+              // This works for the default cache registration and even in some
+              // cases when a non-default "super" factory is used. That should
+              // be extremely rare.

This is ugly, and it'd want a helper if we do it anywhere else, but it'll also be unecessary when #1199946: Disabled modules are broken beyond repair so the "disable" functionality needs to be removed is fixed assuming we enforce enabled -> uninstalled with no intermediary stage. So fine with it as is.

Except:

+                catch (Exception $e) {
+                  watchdog(WATCHDOG_ERROR, 'Failed to remove cache bin defined by the service %id.', array('%id' => $id));
+                }
+              }
+            }
+          }
+        }
+      }
+    }

WATCHDOG_ERROR isn't a type ;)

Also why not watchdog_exception() instead/as well?

chx’s picture

Status: Needs work » Needs review
FileSize
531 bytes
27.98 KB

Here.

Status: Needs review » Needs work

The last submitted patch, 1167144_134.patch, failed testing.

chx’s picture

Status: Needs work » Needs review
FileSize
895 bytes
28.84 KB

Status: Needs review » Needs work

The last submitted patch, 1167144_136.patch, failed testing.

chx’s picture

Status: Needs work » Needs review
FileSize
983 bytes
29.76 KB

Ouch. FilterAdminTest is going for the jugular: it runs a query against the cache_filter table. Good that we have APIs. Also, interdiff is behaving so badly, I might need to go for a git workflow :(

chx’s picture

FileSize
1.68 KB
29.63 KB

New set() code suggested by beejeebus, commented by me.

Anonymous’s picture

ok, if that comes back green, RTBC.

Anonymous’s picture

Status: Needs review » Reviewed & tested by the community
webchick’s picture

Assigned: Unassigned » catch

Looks like catch has this one.

egger’s picture

#136: 1167144_136.patch queued for re-testing.

egger’s picture

Looks like it is ready to land.

catch’s picture

Title: Make cache backends responsible for their own storage » Change notice: Make cache backends responsible for their own storage
Assigned: catch » Unassigned
Priority: Normal » Critical
Status: Reviewed & tested by the community » Active

Committed/pushed to 8.x, thanks!

This needs a change notice - no more hook_schema() if you define your own cache table, and cache backends that care about bins need to properly implement the new method to avoid blowing up.

yched’s picture

chx’s picture

Title: Change notice: Make cache backends responsible for their own storage » Make cache backends responsible for their own storage
Status: Active » Fixed
Issue tags: -Performance, -Framework Initiative, -Needs change record +Framework InitiativePerformance

Automatically closed -- issue fixed for 2 weeks with no activity.