Core is (mostly) serializing cache data in modules and inc files. This should be updated to be in cache_get and cache_set so that the pluggable cache system can work with memcache in the future. Memcached provides built in serialization so when overriding cache_get and cache_set with memcache.inc you get a double serialization / unserialization case.

This patch patches core to fix core and cache.inc.

steve

Comments

RobRoy’s picture

Version: 5.x-dev » 6.x-dev
Status: Needs review » Needs work

What about if you don't want to store serialized data using the cache, just plain text? Maybe it should be an optional param for cache_get/set. Also, you have some .project cruft in patch and features/API changes go into HEAD.

slantview’s picture

Status: Needs work » Needs review
StatusFileSize
new4.45 KB

Re-rolled patch without the .project junk and for HEAD.

Memcached does serialization by default. The serialization in php is unnecessary overhead that will not take full advantage of the power of memcached. The only way to remove it is to move it into cache_get and cache_set by default and allow memcached to do it's own serialization internally when the memcached module is installed. The conversation about removing this serialization dates back to November 2005 (please see http://lists.drupal.org/pipermail/development/2005-November/011774.html for a detailed discussion). The memcached module is in development now and will be updated very soon. This is a major blocking issue in order to get it to have any real advantage.

The plan thus far is to make memcached work with overriding the cache_get and cache_set. This has been agreed upon by everyone including Dries as a good thing. This was also explained at the Performance and Scalability seminar. (please see a recent conversation from March of 2007 at http://lists.drupal.org/pipermail/development/2007-March/thread.html#22864 for a detailed discussion)

Further, the amount of overhead will be very minimal as there is only one place in core that is not already serializing, and that is in the filter module. It is not really adding anything, it is just moving it to be handled internally in cache_get and cache_set and removing unnecessary duplicated code.

steve

robertdouglass’s picture

I don't see any harm in adding a SERIALIZE flag to control whether the cached data needs serialization. This would allow strings or primary types to be stored without the serialization overhead. It would also not interfere with the ability to memcache.

This patch is important housecleaning that needs to happen. Having the serialization happen outside of cache_get and cache_set significantly reduces our claim to having a pluggable caching system.

+1 for the patch with or without a SERIALIZE flag.

RobRoy’s picture

@slantview, yeah I guess a flag isn't a huge deal. Most of the time we're going to be serializing, so taking the extra hit of a serialize isn't that bad. Just trying to be forward looking and see if there is any case where we won't want to serialize, but I'm fine without it.

dries’s picture

Rather than adding a flag, can't we make the cache a tad smarter?

<?php
function cache_set(... $data ...) {
if (is_object($data) || is_array($data)) {
$data = serialize($data);
}
}

We'd need an extra field in the cache tables so that cache_get() knows how to retrieve that data ...

slantview’s picture

StatusFileSize
new5.26 KB

Ok, I updated the patch to have some knowledge of what we are doing. If you are submitting an array or object, it will serialize the data and store it with a flag that says to unserialize when you get the object.

The only thing now is that I don't know where the updates go for the cache and cache_* tables go.

There will obviously need to be some documentation updates so contrib developers understand how to use the cache_get and cache_set properly to avoid any double serialization.

steve

slantview’s picture

StatusFileSize
new5.27 KB

re-rolled with minor coding style change:

changed } else }
to } else

slantview’s picture

StatusFileSize
new5.19 KB

re-rolled again with minor change to placement of db_decode_blob. (thanks killes!)

slantview’s picture

StatusFileSize
new8.19 KB

re-rolled one more time adding required fields to the system.install file and db update #6009

dries’s picture

Rather than adding a second data field, could we add a small status field? I'd propose to call it "serialized" and it could be an "unsigned int(1)". I think that will provide a better memory/table layout.

Otherwise, this looks very nice.

And yes, we'll need to document this in the module upgrade instructions once this gets committed (see module developers handbook).

Keep up the great work slantview.

dries’s picture

Also, we should add one or two sentences to the PHPdoc code of cache_set/cache_get. Extend the explanation of the $data parameter and state that $data can be a mixed type. Explain what will happen when you pass in an object or array. Also mention that this won't happen when you pass in a string.

Good documentation is key, so let's work on that some more as well! :)

slantview’s picture

StatusFileSize
new8.97 KB

Here is the latest re-roll. I actually was using the db column "serialize_data" as a flag, not storage for serialized data. However I realize that term could be confusing. I changed the column name to 'serialized' and changed the type to int(1). I also changed the default to '0' instead of '1', I think that makes more sense because we are only serializing the data of objects and arrays, and everything else is being stored without being serialized.

I also updated the cache_get and cache_set PHPdoc to mention the new functionality and changed the db column names in those SQL queries.

dries’s picture

These else clauses are redundant:

+      else {
+        $cache->data = $cache->data;
+      }

Otherwise this patch looks good. Feel free to mark it RTBC after it has been tested/checked by someone else. Thanks slantview!

slantview’s picture

StatusFileSize
new8.9 KB

Ok, final re-roll. This is ready for prime time. I can' t believe I missed that. Anyway, let's see some people test this and get this committed! Thanks Dries!

steve

eaton’s picture

This is really a big deal, and makes the cache system just that much easier and more straightforward for people to interact with. The fact that it makes the cache system easier to replace is an even bigger bonus. :-)

Major +1.

m3avrck’s picture

Steve, great patch -- looked it over and it looks good to me.

The upgrade docs will have to be updated since this patch affects modules like Views & CCK and more.

+1 for consistency and consolidation and memcache goodness :-D

dries’s picture

Status: Needs review » Needs work

I've committed this patch to CVS HEAD. Good job.

I'm marking this 'code needs work' because the module upgrade instructions in the handbook need to be updated. Please update those, and mark as 'fixed' afterwards. Thanks!

slantview’s picture

Status: Needs work » Fixed

Alright, thanks to Heine for updating the handbook!!

slantview’s picture

Status: Fixed » Closed (fixed)