Hello Arthur,
1) I create some node with attached files.
2) I run cron and MM stores the processed files as a CCK filefield.
3) Later, I change some text in the node body or the title and save the node.
4) I upload some new video node and run cron. Here is the problem, because Media Mover re-encode my already processed files. I think it shouldn't happen, because I not modified the attached files, I only modified some text in the node.
Is there any way to not re-encode already processed files, only when the attached files changed?
Thank in advance.
| Comment | File | Size | Author |
|---|---|---|---|
| #18 | harvest_timestamp-64506-17.patch | 747 bytes | jsit |
| #16 | harvest_timestamp-64506-8.patch | 752 bytes | jsit |
| #7 | mm_node.diff | 439 bytes | arthurf |
| #4 | mm_cck.diff | 1.07 KB | arthurf |
Comments
Comment #1
delykj commentedIs there any hint about it?
Another option is to add a checkbox to the node editing form to not reprocess the MM configuration at cron run.
There is a similar solution in the Blue Droplet Video module (Retranscode this video):
http://drupal.org/node/422088
Comment #2
jordanmagnuson commentedSubscribing, as this seems somewhat similar to my issue: http://drupal.org/node/707038
Comment #3
delykj commentedI solved this problem with the Flags module, Rules and Actions.
I created a special global flag ("Convert this video") and hacked Media Mover to check the node's flag at cron processing. So if this flag is set, MM will convert the video, otherwise skip it. When I create a node I initially set the flag so the node will be converted at cron run. MM will publish the node when the conversion process completed. I create a rule that execute an unflag action when a node is published. If you want to reconvert a node, you should check the flag, save the node and run cron again.
Comment #4
arthurf commentedWhat is happening is that the node created date changes making it larger than the last time cron runs. Media Mover now thinks it does not have the file, so it runs it again.
The way that it needs to be fixed is in the CCK harvest query, the harvested FID time stamp needs to be looked at and that should be considered a uniqueness check- something like
I think this should do it. Though what seems weird to me is that:
is not preventing this.
Comment #5
arthurf commented@delykj - would you mind sharing the flag code? This would be awesome to get into a module. Note that you could use media_mover_api_event_trigger() to do this :)
Comment #6
delykj commentedI didn't write a module for it. I simly hacked mm_node.module, so this is hard-coded ("'convertablefiles'" is the flag name).
Comment #7
arthurf commentedSo I think the fix here is the same:
And then $configuration->last_start_time is passed in to f.timestamp... Can you try out the diff and see if it works for you?
Comment #8
jordanmagnuson commenteddiff in #7 seemed to fix my issue, but I had to apply it to mm_cck.module.
Changed mm_cck.module, line 345 from:
to:
A bit confused as to why the initial value was $job->stop_time, as that variable doesn't seem to exist.
Comment #9
jordanmagnuson commentedNever mind. After further testing, it looks like MM is STILL trying to harvest my files after they have been processed and moved to Amazon S3. At least some of them...
Comment #10
jordanmagnuson commentedOkay, more testing, and I've determined that the $job->last_start_time control is *sort of* working. What happens is that MM wants to process all of my files exactly twice, where I want it to process each file once.
Comment #11
jordanmagnuson commentedI doubt this is a good solution, but here's what I've done so that my S3 files are not processed twice by Media Mover. I changed line 347 of mm_cck.module from:
to:
This prevents MM from trying to harvest files after they have been moved to S3, since the S3 files are not readable.
Comment #12
mrwhizkid commentedIs #11 a good solution? I am having the same problem. Everything works but everytime cron runs, I get the following in my logs:
Harvested file is not readable, check permissions: http://documents.example.com/somedocument
Thanks.
Comment #13
mrwhizkid commentedThank you, Thank you, and Thank You!!
I was having the same problem...and this little change completely solves it. MM is no longer trying to harvest my S3 files which not only was filling up my logs with errors, but was also causing problems for subsequent 'run' operations.
If there is a better way to do this, I would like to know, but for now, this seems to solve my problem!
Comment #14
arthurf commentedI believe that the issue described here is this: http://drupal.org/node/917656 While #11 may prevent the double harvest, it doesn't solve the root of it which I am now convinced is #917656. This fix has been applied to 6.2.x and 6.1.x. It would be good to hear if this issue is fixed.
Comment #15
carteriii commentedI know this is a bit old, but it appears #8 is still relevant and needs to be fixed. $job->stop_time simply doesn't exist and I agree it should be replaced with $job->last_start_time.
That seems easy enough, but I'm sorry that I don't know how to create an official patch to be submitted, tested, etc. etc. If someone cares to point me to some instructions for doing that properly, I will do it, but otherwise can someone with the proper knowledge & authority simply get this moving? I see that #917656 has had some new discussion and there is hope it could make it into the next beta (or something) and it would be nice to also get this fix included at the same time.
Comment #16
jsit commentedI'm getting this same problem; here's a patch based on comment #8 that, after some brief testing, worked in 1 out of 2 instances. Not sure why it isn't consistent -- might have just been too many files and fields bouncing around in there while I tested -- but I'm hoping it'll start behaving.
Update: the problem just reoccurred on one of the two nodes I'm using to test it. Don't know why the other is unaffected.
Comment #17
jsit commentedMy mistake, and sorry for all the edits here, BUT --
I think in comment #8, the line should be changed to
$job->start_time, not$job->last_start_time. Usingstart_timehas (so far, and I think reliably) worked for me.Comment #18
jsit commentedHere's the new patch, please disregard the one from #16 and use this instead.
This changes the timestamp comparison from
$job->stop_timeto$job->start_time.Comment #19
jsit commentedCorrection: that patch doesn't work either. Sorry folks, I think we're out of luck for now.
Comment #20
arthurf commentedThat order by clause is just to try to process things in order. I don't think that that would be the issue. You could try removing it completely from the query but I don't think that would change the issue going on. You might want to try just getting the query that is being run and running that directly in mysql to see if you're getting any results.
Comment #21
jsit commentedAgain, apologies for all the posts. I was stuck in a debugging hole last night running down blind alleys, thinking I was onto something, when really the misbehavior is just inconsistent enough to trick you briefly into thinking it has been resolved.
Anyway, now, without trying to interpret these results, I'm just going to present some raw data of what Media Mover is doing when it fails.
For this test, I have changed mm_cck.module to use
$job->last_start_timeon line 342, instead of$job->stoptimeBefore testing, these were the values in the media_mover_config_list table:
I then uploaded a file to a node and saved the node, and here are the numbers that came out:
You can see already that if I were to manually run the configuation again, it would compare the file's timestamp (233270) to the last_start_time (233220) and find that the timestamp is larger, and would re-harvest the file. And indeed this is what happened, and these are the numbers that came out:
This is why I experimented with changing
$job->last_start_timeto$job->start_time-- and changing the comparison from>=to simply>-- but still ran into similar problems (duplicate harvesting).I don't know if Media Mover is writing to the files table's timestamp field, or if it's recording its last_start_time incorrectly, but there is something crucially flawed about the DB query it executes when harvesting.