Closed (fixed)
Project:
Drupal core
Version:
6.x-dev
Component:
base system
Priority:
Normal
Category:
Feature request
Assigned:
Unassigned
Reporter:
Created:
18 Apr 2007 at 19:59 UTC
Updated:
9 May 2007 at 01:51 UTC
Jump to comment: Most recent file
Comments
Comment #1
RobRoy commentedWhat about if you don't want to store serialized data using the cache, just plain text? Maybe it should be an optional param for cache_get/set. Also, you have some .project cruft in patch and features/API changes go into HEAD.
Comment #2
slantview commentedRe-rolled patch without the .project junk and for HEAD.
Memcached does serialization by default. The serialization in php is unnecessary overhead that will not take full advantage of the power of memcached. The only way to remove it is to move it into cache_get and cache_set by default and allow memcached to do it's own serialization internally when the memcached module is installed. The conversation about removing this serialization dates back to November 2005 (please see http://lists.drupal.org/pipermail/development/2005-November/011774.html for a detailed discussion). The memcached module is in development now and will be updated very soon. This is a major blocking issue in order to get it to have any real advantage.
The plan thus far is to make memcached work with overriding the cache_get and cache_set. This has been agreed upon by everyone including Dries as a good thing. This was also explained at the Performance and Scalability seminar. (please see a recent conversation from March of 2007 at http://lists.drupal.org/pipermail/development/2007-March/thread.html#22864 for a detailed discussion)
Further, the amount of overhead will be very minimal as there is only one place in core that is not already serializing, and that is in the filter module. It is not really adding anything, it is just moving it to be handled internally in cache_get and cache_set and removing unnecessary duplicated code.
steve
Comment #3
robertdouglass commentedI don't see any harm in adding a SERIALIZE flag to control whether the cached data needs serialization. This would allow strings or primary types to be stored without the serialization overhead. It would also not interfere with the ability to memcache.
This patch is important housecleaning that needs to happen. Having the serialization happen outside of cache_get and cache_set significantly reduces our claim to having a pluggable caching system.
+1 for the patch with or without a SERIALIZE flag.
Comment #4
RobRoy commented@slantview, yeah I guess a flag isn't a huge deal. Most of the time we're going to be serializing, so taking the extra hit of a serialize isn't that bad. Just trying to be forward looking and see if there is any case where we won't want to serialize, but I'm fine without it.
Comment #5
dries commentedRather than adding a flag, can't we make the cache a tad smarter?
<?php
function cache_set(... $data ...) {
if (is_object($data) || is_array($data)) {
$data = serialize($data);
}
}
We'd need an extra field in the cache tables so that cache_get() knows how to retrieve that data ...
Comment #6
slantview commentedOk, I updated the patch to have some knowledge of what we are doing. If you are submitting an array or object, it will serialize the data and store it with a flag that says to unserialize when you get the object.
The only thing now is that I don't know where the updates go for the cache and cache_* tables go.
There will obviously need to be some documentation updates so contrib developers understand how to use the cache_get and cache_set properly to avoid any double serialization.
steve
Comment #7
slantview commentedre-rolled with minor coding style change:
changed } else }
to } else
Comment #8
slantview commentedre-rolled again with minor change to placement of db_decode_blob. (thanks killes!)
Comment #9
slantview commentedre-rolled one more time adding required fields to the system.install file and db update #6009
Comment #10
dries commentedRather than adding a second data field, could we add a small status field? I'd propose to call it "serialized" and it could be an "unsigned int(1)". I think that will provide a better memory/table layout.
Otherwise, this looks very nice.
And yes, we'll need to document this in the module upgrade instructions once this gets committed (see module developers handbook).
Keep up the great work slantview.
Comment #11
dries commentedAlso, we should add one or two sentences to the PHPdoc code of cache_set/cache_get. Extend the explanation of the $data parameter and state that $data can be a mixed type. Explain what will happen when you pass in an object or array. Also mention that this won't happen when you pass in a string.
Good documentation is key, so let's work on that some more as well! :)
Comment #12
slantview commentedHere is the latest re-roll. I actually was using the db column "serialize_data" as a flag, not storage for serialized data. However I realize that term could be confusing. I changed the column name to 'serialized' and changed the type to int(1). I also changed the default to '0' instead of '1', I think that makes more sense because we are only serializing the data of objects and arrays, and everything else is being stored without being serialized.
I also updated the cache_get and cache_set PHPdoc to mention the new functionality and changed the db column names in those SQL queries.
Comment #13
dries commentedThese else clauses are redundant:
Otherwise this patch looks good. Feel free to mark it RTBC after it has been tested/checked by someone else. Thanks slantview!
Comment #14
slantview commentedOk, final re-roll. This is ready for prime time. I can' t believe I missed that. Anyway, let's see some people test this and get this committed! Thanks Dries!
steve
Comment #15
eaton commentedThis is really a big deal, and makes the cache system just that much easier and more straightforward for people to interact with. The fact that it makes the cache system easier to replace is an even bigger bonus. :-)
Major +1.
Comment #16
m3avrck commentedSteve, great patch -- looked it over and it looks good to me.
The upgrade docs will have to be updated since this patch affects modules like Views & CCK and more.
+1 for consistency and consolidation and memcache goodness :-D
Comment #17
dries commentedI've committed this patch to CVS HEAD. Good job.
I'm marking this 'code needs work' because the module upgrade instructions in the handbook need to be updated. Please update those, and mark as 'fixed' afterwards. Thanks!
Comment #18
slantview commentedAlright, thanks to Heine for updating the handbook!!
Comment #19
slantview commented