I regularly see huge hosting_task_log tables in Aegir installs. This table alone can easily be 100x the size of all the rest of the frontend site's database combined, over 1GB on at least one Aegir I work on regularly. The hosting_task_arguments and hosting_task tables come in 2nd (100MB) and 4th (24MB) in terms of size. The only other table that's even in the same order of magnitude in node_revisions (3rd at 39MB).
There's already a contrib module that adds a queue to clean up Task data from deleted sites. I've just suggested that it #2053915: Allow for retention policies per task type and status. This would make it much more agressive in cleaning up these tables. I'm going to experiment in contrib, and see whether there are any negative consequences to this approach. Depending on the results there, maybe we can bring some of that into core. Hence marking this as 'postponed'.
Don't get me wrong... I firmly believe that the data in these tables is just about the most valuable in Aegir. I just think there's a bunch of redundant entries that don't bring any value. Having the Aegir frontend itself be one of the biggest sites on a given install just seems wrong :)
Comment | File | Size | Author |
---|---|---|---|
#9 | trim_hosting_task_tables-2053929-9.patch | 4.02 KB | helmo |
#5 | hosting_gc.patch | 3.81 KB | chertzog |
Comments
Comment #1
j0nathan CreditAttribution: j0nathan commentedIn our installation:
Comment #2
omega8cc CreditAttribution: omega8cc commentedWe use the aforementioned Hosting task garbage collection module in BOA for over a year already. It really should be in core. There is no reason to keep these logs for deleted sites, platforms and any other orphans.
Rebuilding hosting_package_instance is another, separate thing, discussed in some other issue, I recall.
Comment #3
omega8cc CreditAttribution: omega8cc commentedAnother really useful module we use for a long time already: Revision Deletion
Comment #4
anarcat CreditAttribution: anarcat commentedi think it's a good idea to merge this into core, or at least add it to the makefile, but i'd keep this to 3.x.
patch anyone?
Comment #5
chertzogHere is a patch that 1.) ports hosting garbage collection to D7, and 2.) adds it to Hosting.
Comment #6
chertzogComment #7
helmo CreditAttribution: helmo commentedThe other hosting sub-modules don't have 'hosting_' as directory prefix. So I guess we should add this as the 'task_gc' directory.
My very quick test just now failed to run the queue ... I'll try to look into that next week.
Comment #8
helmo CreditAttribution: helmo commentedI got it working ... here's an updated patch with some cleanup.
One TODO could be to also reduce the number of node revisions on tasks for sites that are not deleted. But maybe we could use an existing module for that .... https://drupal.org/project/node_revision_delete (I have not tried this module)
Comment #9
helmo CreditAttribution: helmo commentedComment #10
helmo CreditAttribution: helmo commentedAdded #2217745: Merging into Aegir 7.x-3.x to the hosting_task_gc queue to let Dane Powell know.
A next step is to look at feature/thermonuclear in the hosting_task_gc repo, borrowing from #2066179: Dealing with platform logs too, or: the thermonuclear option
Comment #13
helmo CreditAttribution: helmo commentedmerged to 7.x-3.x