This cron hook has a bug causing its query to return 0 hits (no matter what) on every other call.

Code, for reference:

  // We need to collect any metadata that is missing from node fields.
  $limit = media_brightcove_variable_get('cron_limit');
  $start = media_brightcove_variable_get('cron_start');
  $end = 0;
  $nodes = array();
  $results = db_query_range("SELECT nid, vid, field_name, delta, uri FROM {media_brightcove_fields} WHERE data_collected = 0 ORDER BY vid, field_name, delta", $start, $limit);

  // First collect any missing metadata.
  while ($result = db_fetch_object($results)) {
    $data = media_brightcove_data($result->uri, TRUE);

    // If we now have data, then load the node and add it.
    if (!empty($data)) {
      if (!isset($nodes[$result->nid])) {
        $nodes[$result->nid] = array();
      }
      // Load this revision if we haven't already.
      $nodes[$result->nid][$result->vid] = isset($nodes[$result->nid][$result->vid]) ? $nodes[$result->nid][$result->vid] : node_load($result->nid, $result->vid);
      if (isset($nodes[$result->nid][$result->vid]->{$result->field_name}[$result->delta]['value'])) {
        // Set the data for this field. Note this is only applicable to emvideo.
        $nodes[$result->nid][$result->vid]->{$result->field_name}[$result->delta]['data'] = $data;
        $nodes[$result->nid][$result->vid]->{$result->field_name}[$result->delta]['duration'] = media_brightcove_duration($result->uri);
        $nodes[$result->nid][$result->vid]->{$result->field_name}[$result->delta]['status'] = media_brightcove_video_available($result->uri);
        $nodes[$result->nid][$result->vid]->{$result->field_name}[$result->delta]['version'] = MEDIA_BRIGHTCOVE_DATA_VERSION;
      }
    }
  }

  // Now go through each revision and save it.
  foreach ($nodes as $revisions) {
    foreach ($revisions as $node) {
      // Save the newly collected data into the revision.
      node_save($node);
      // Store the current progress.
      $end = $node->vid;
      media_brightcove_variable_set('cron_start', $end);
    }
  }

  // Ensure we set back to 0 after reaching the end of it all.
  media_brightcove_variable_set('cron_start', $end);

The last line stores a node vid in the "cron_start" variable at the end; the next time through the function, this vid gets passed as the $from parameter to db_query_range(). That's not how $from works. Maybe this last line was supposed to save a 0; the comment seems to imply that.

Forgetting this bug, why is cron_start needed at all? Every time you update a node, its "data_collected" field will get set to 1 (during node_save()), so it won't get selected again during the next cron hook (right?). So this function should never need to pass non-zero as $from to db_query_range()....

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

adamdicarlo’s picture

Title: Media Brightcove's hook_cron() slacks off on every other call. » _media_brightcove_cron() publishes old node revisions, misuses db_query_range().
Status: Active » Needs review
FileSize
4.57 KB

Besides misusing db_query_range() (since cron_start is a VID rather than a count of rows processed), calling node_save() on old revisions of nodes can cause them to be marked as the current revision, apparently as an unintended interaciton with the Revisioning module.

So this has the side effect of publishing old content revisions. Yikes.

I don't think updating Brightcove metadata in old content revisions is worth it, anyway, so I'm removing that functionality in my patch for this.

adamdicarlo’s picture

Issue summary: View changes

Fixed broken em tag.