As far as I understand, lines:

if (!empty($cache) && $cache->created > REQUEST_TIME - 24*60*60) {
  $result = $cache;
}

in function piwik_stats_get_xml_data means, that the module will update data from Piwik statistics server only once a day, am I right? What if I want to make a block with today most popular articles? In this case I have to have updated data at least once an hour.

May be this parameter have to be moved to the field settings? Because, I also want to make a block with most popular articles for last month and for this block it's enough to update data once a day.

Comments

Romka’s picture

StatusFileSize
new1.3 KB

I've made a small patch for this task.

Romka’s picture

And also I think it'll be good idea to provide users opportunity to select queue type for field. On high-traffic sites solutions like RabbitMQ (http://drupal.org/project/rabbitmq) are more convenient than internal Drupal queues.

If it necessary, I can make this patch.

Romka’s picture

StatusFileSize
new1.47 KB

Updated patch from first comment.

patrickd’s picture

Status: Active » Needs review

Hi Romka, thanks for this, I'm going to evaluate it after drupalcon

patrickd’s picture

Status: Needs review » Closed (works as designed)

Hi, sorry for the late reply,

The reason for this code snippet with the hardcoded cache lifetime is, that on large sites it can take a whole day to import the cached xml file due to queued-cron-processing. Removing it, or lowering it could cause that always only eg. 10% of statistics are imported. These 10% are always new, fresh data but all the other nodes would never get any data.

For smaller sites this is no problem as the cache is only valid for one whole import. That means if your data is small enough to be imported within 1 cron-run, next time new data will be used and the cached data will be thrown away.

Therefore I see no need to take action here as I'm pretty sure it works as designed,

if you have any concerns, feel free to reopen

best regards

Romka’s picture

Sorry, but I haven't understand your reasons. Here we have two independent tasks:

  1. download XML-file with statistics data from Piwik-server,
  2. parse XML and put data to fields.

I understand, that the statistics XML-file downloading is a heavy operation. On my site I have more than 100'000 nodes, but Piwik will send you only statistics for nodes which was viewed in requested period, in my case this XML-file will contain much less than 100'000 nodes. And I'm sure, even if this file will have size more than several megabytes, downloading of this file will not take long, for example, in my case Drupal-site and Piwik server are connected by 1Gbit/s network. And I think that the site administrator have to have choice, how long time statistics cache will live.

As I said earlier, if I need to update most popular block every hour it won't work with this hardcoded cache lifetime.

Now I working on backport of this module to Drupal 6 (it almost ready) and second task I have solved by adding new parameter to field settings, which is responsible on how much statistics records have to be extracted from XML-file.

P.S.

Why do you use this two constructions:

$cache = cache_get('piwik_stats:xml:' . $field['settings']['period']);
if (!empty($cache) && $cache->created > REQUEST_TIME - 24*60*60) {

and

cache_set('piwik_stats:xml:' . $field['settings']['period'], $result->data);

I would have done so:

$cache = cache_get('piwik_stats:xml:' . $field['settings']['period']);
if (!$cache) {

and

cache_set('piwik_stats:xml:' . $field['settings']['period'], $result->data, 'cache', 24*60*60);
patrickd’s picture

As I said earlier, if I need to update most popular block every hour it won't work with this hardcoded cache lifetime.

I'm actually pretty sure it would! Have tested whether this is really the case ?

The reason why it indeed should work are these 'strange' usage of the cache functions:

cache_set('piwik_stats:xml:' . $field['settings']['period'], $result->data, 'cache', 24*60*60);
THIS would REALLY cache the data for 24 hours.

$cache = cache_get('piwik_stats:xml:' . $field['settings']['period']);
if (!empty($cache) && $cache->created > REQUEST_TIME - 24*60*60) {

THIS will make sure it is not possible to parse data older than 24 hours.

Everytime before a new "downloading and parsing round" happens the cache WILL be cleared. It is kind of impossible that the cache will exist for 24 hours.

The reason why I implemented it this way is to make sure that I don't have to pull the XML file for each cron/batch-queue operation.
The cached data will ONLY remain as long as data is still beeing parsed. After the XML file is parsed once - the cached data will be thrown away and new data will be filled into the cache.

if this is not the case - this is really an issue. Otherwise this works as designed for me.

(please keep each issue per issue - meaning, don't tell me about backporting here, use the existing backport issue)