When and if Drupal raises its PHP requirements to PHP 5.2 and above, it would be nice if the serialization of cached data was changed to use the json_encode(); and json_decode(); functions instead. These have a smaller data footprint when stored as a string, and should provide the same structural integrity that the serialize(); function does. There's even potential for a performance increase when encoding the data to a string from some of my own, simple findings.

Any thoughts as to pros and cons concerning this idea are welcome.

Comments

Susurrus’s picture

The PHP requirements for Drupal7 are PHP5.2, so json_* functions would be available.

Cache stored in the table is currently stored as binary data. I think this makes a lot of sense and is also much more compact than any string serialization. While this would be a good move if the cache was still serialized into a string, I don't think we should move from a binary storage method to any string-based storage method.

-1 if I've correctly understood all site caching to be in binary.

fractile81’s picture

Ah, I hadn't realized that the cache was stored as binary data in Drupal 7. I agree that binary data is the better route, and was requesting based off the Drupal 6 implementation (stored as a serialized string). Do you happen to have a link for the patch/issue this was added in?

dalin’s picture

Issue tags: +Performance

Even though the data is stored in a binary field in the cache table it's just a string. serialize() and unserialize() are know to be slow and are called several times to render most any page. Switching from serialize to json has the potential to make a nice performance improvement on both the PHP (the functions are faster) and SQL (data size is smaller) side.

dalin’s picture

Though JSON won't allow you to store objects that aren't stdClass. Wonder if that's a problem.

casey’s picture

http://php.net/json_decode seems to have problems with deep nesting. So looks to a non-option to me.

http://www.phpdevblog.net/2009/11/serialize-vs-var-export-vs-json-encode...
We could have a look at var_export(). But I am not so sure about nested arrays. And besides var_export() does not handle circular references.

Overall I think we should stick to serialize().

catch’s picture

Version: 7.x-dev » 8.x-dev
Status: Active » Closed (won't fix)

Agreed. Also this is only for the default database caching backend.

catch’s picture

Version: 8.x-dev » 7.x-dev
Status: Closed (won't fix) » Closed (works as designed)
Anonymous’s picture

if you really want to speed up serialization, here's where to go:

http://ilia.ws/archives/211-Igbinary,-The-great-serializer.html

would require a core patch, as i don't see 'making serialization swappable' happening for D7.

casey’s picture

It is swappable, for cache at least; D7 has a nice OO caching solution. You could write a custom DrupalCacheInterface implementation.

andy inman’s picture

For what it's worth, I've been doing some module development (D6) and experimenting with json_encode() vs serialize() - yes, json_encode (and decode) is faster, but real-life performance gain is small (ok, if you needed to loop 1000 times it would be worth using json). There are definitely some data structures that don't encode properly, not sure why, so switching to json_encode might open a can of worms. I don't have test data to hand, but I tried for example using a customised cache.inc which does json_encode instead of serialise - the resulting change on page display times is negligible.

Further thought: The cache_router module seems to work well. Some cache "engines" such as APC allow direct storage of binary data, so serializing becomes an unnecessary overhead (more to the point, unserializing, since we suppose that cache data is write once/read many). So a suggestion would be: cater for alternative caching systems such as cache_router and let them handle serialization (or whatever encoding they might require) rather than doing it outside of the cache interface.

catch’s picture

serialization is already done in the cache interface in D7, so custom backends don't need to do what core does at all.