I just noticed, while following the timings on a site install, that immediately after git_deploy was enabled during a drush site install, the time to enable the next modules jumped from about 25ms to about 4000ms, until the end of the install.
Sure enough, removing git_deploy from my deployment cut the drush site install time from about 2 minutes to 18 seconds. Same if I rename git_deploy_system_info_alter() to disable it.
While not a deal breaker since the install still work normally, it would be good if that function could notice it is being run during a site install and just skip its processing, which is AIUI useless during install anyway.
Comments
Comment #1
fgmMore details: this hook turns out to be invoked more than once per module or theme. For instance, on a small site I am currently working on, it is invoked over 700 times when displaying the admin/modules page, apparently 3 times for each module, so some static caching is probably in order anyway.
Comment #2
tobiasbSee #1354756: Prevent git parsing on every other admin page load: implement caching
Comment #3
fgmNot entirely duplicate, I think.
Your other issue and patch addresses the lack of static caching, but there is also the issue of not invoking it /at all/ during a site-install operation.
Comment #4
eMPee584 commentedPlease try installation with the linked patch applied, then report back.
Comment #5
blueyed commentedAre you referring to the patch from #1354756: Prevent git parsing on every other admin page load: implement caching? This does not apply (to the 7.x-2.x branch) (anymore).
Comment #6
blueyed commentedHere is a patch implementing naive caching: instead of using a single cache object, it uses a cache object for every module's data (and includes
filemtimein the key name / cid).I do not know about best practice here, but this way Drupal will take care of the garbage collection automatically.
btw: please note that it would be easier to provide patches, if there's a
returninstead of a level of indentation/block, e.g.vs.
(This can be seen with the patch linked at #1354756: Prevent git parsing on every other admin page load: implement caching).
Comment #7
hswong3i commentedI give a cleanup of above patch for cache, and reference with other else Drupal core cache_get()/cache_set() coding style.
Also remove the cache lifetime since if we flush all cache, typically this info cache will also cleanup, too ;-)
P.S. Sorry that by git format-patch, I merge patch from #1471178: [D7] Fix insecure handling of shell arguments to here, too.
Comment #8
hswong3i commentedSorry #7 will not works with git submodule since filemtime() can't get the correct last modify info as we original expected.
As an counter solution I use GIT HEAD hash as unique key by invoke "git rev-parse --verify", which give us a relatively fast response + solid, unique info for generating cache key.
Comment #9
hytse6c commentedIt works! ty
Comment #10
Freso commentedPlease have the patch only patch this one issue.
http://webchick.net/please-stop-eating-baby-kittens
Comment #11
hswong3i commentedPatch updated for this issue only.
Comment #12
hswong3i commentedSync coding style.
Comment #13
hytse6c commentedTried. Looks good. ty
Comment #14
hswong3i commentedAny else update or information required?
Comment #15
Freso commentedIf fgm would reply back whether this works or not, then I think it's ready to go in. I just tested it myself on a live install, and it seems to work. I didn't test it during an install though, which is what the original issue was (partly) about.
Comment #16
fgmChecking right now.
Comment #17
fgmOK, so here are some hard numbers about this "install" part, which is indeed the main topic of this issue. After initial inconclusive quick tests, I ran three series of 10 installs of Drupal 7 core and the "development" install profile (because it contains a larger number of modules than just core, amplifying the problem over just core), with default caching (DB):
To summarize, whatever its possible interest for other reasons, the patch does not help at all for this issue.
Comment #18
Freso commentedAnd back to "needs work" it goes. Thanks for the report, FGM! :)
Comment #19
fgmIf you wonder why the durations are so very different from the original issue, that's because the latest results have been taken on a SSD while the original ones were on a HD, which has a much greater sensitivity to disk activity.
Comment #20
hswong3i commentedRefer to #17, something may not totally correct: git_deploy not ONLY massively slow down module installation, but also ALL admin/* pages loading, see #1354756: Prevent git parsing on every other admin page load: implement caching for more information.
IMHO, we can't/no way to prevent initial .git scanning for every new module, or else version data may not up-to-date; BTW, we may prevent all other else unnecessary console git query unless folder content do changed.
Some suggestion: what if also compare other else admin/* page loading speed for with/without #12 patch? Refer to my local testbed, the speed can improve from ~30sec down to >1sec, which similar as non-git module management style with tar.gz.
Comment #21
hswong3i commentedIssue fork to github as https://github.com/pantarei/drupal-git_deploy/tree/7.x-2.x-1511112
Retouch file with format-patch, and only focus on cache_set().
Comment #22
luksakDoes this patch need performance testing again?
Comment #23
hswong3i commentedFor testing (or experience) with the patch effect, please kindly try with my testbed on github. Installation instruction can found from DruStack project page.
I have around 100 git submodule under sites/all/modules/contrib, and already coming with a patched version of git_deploy.
Feel free to replace the patched version of git_deploy with latest drupal.org official version, then we can "feel" the different when browsing admin/* pages, e.g. reduce from >30sec to <1sec ;-)
Comment #24
fearlsgroove commentedI can understand the need for more testing, but is there any chance you could push your testing branch into drupal's git repo, rather than a github mirror?
I'm using the patched version on a few projects and it helps considerably with the performance issues. I have yet to see any significant adverse effects.
Comment #25
adixon commentedThis looks useful, thanks.
I use git deploy on a multi-site install, so it occurs to me that some kind of shared cache for all installs (at least for the remote git repository status information) would make sense, though I'm not sure there's a standard in Drupal to do this.
A simple solution would be a file based cache within the module folder, though it wouldn't fly as a standard Drupal solution, and you'd have to consider the file permissions issues.
Comment #26
luksakIn my opinion it works as well. What is there remaining to review? Can we mark this as RTBC?
@adixon I guess this is a separate issue. Lets get the basic caching functionality committed.
Comment #27
fgm@adixon separate issue indeed, but just for reference there is something like this in Drush Make, which caches downloads globally.
Comment #28
wim leers+1, I noticed this too. Also for update checking.
Same for the 6.x-2.x-dev branch.
Comment #29
adixon commentedThanks for your notes about shared file caching.
Your D7 file applies almost perfectly to the 6.x-2.x-dev branch, and I added another simple layer underneath of shared file caching in /tmp, by adding two cover functions for cache_get/set, as below. I'll post it as a new issue if anyone else would like to use it.
Will test the D7 next.
Comment #30
adixon commentedOkay, confirmed that D7 improves my modules page load time from about 60 seconds to 15 seconds (on my production machine).
The other admin pages are also much better.
It sure is a nice and simple patch.
My only question is - would it be worth putting these cache entries into their own bin? As it is, those cache entries are going to get pulled out and sit in memory for all your visitors, and they're only useful for a handful of visitors, typically.
Not a big thing, but the other advantage would be that it's own backend caching could be assigned appropriately.
Comment #31
adixon commentedFor the 6.x branch, I've just posted a new issue #2009094: 6.x-2.x caching with a patch that includes the above code and two other enhancements (shared file cache + separate cache bin). Requires a db_update to create the new cache table.
Comment #32
adixon commentedAnd here's my revised patch for this issue, it includes:
1. An optional shared file cache [as per 29 above].
2. Separated cache bin [as per 30 above].
It's almost identical to the 6.x patch in #2009094: 6.x-2.x caching, just revised for D7. It should be applied to 7.x-2.x.
I've tested it on a new install and an upgrade to an existing one, working nicely on a production machine.
Comment #33
codycraven commentedI was able to apply #32 to latest 7.x-2.x and page performance is significantly improved, after running update.php.
Note there are a couple end of line whitespace issues in #32.
Comment #34
luksakThen lets fix these whitespace issues.
Comment #35
luksakThe patch doesn't apply to latest dev. The README.txt contains changes that have already been committed and the .module file has some other issues. @adixon, can you please re-roll it?
Comment #36
sweetchuckFor the latest dev.
Comment #37
sweetchuckThis patch:
Move the static cache into hook_system_info_alter().
Add hook_flush_cache().
I do not know what will happened with the outdated files in directory "git_deploy_tmp_dir".
We need to describe this in the README.txt. We can not figure out programmatically what files need to be deleted in the "git_deploy_tmp_dir", because more than one Drupal instance can use this directory.
Possible improvements
I think the cache file format is not optimal.
current: $cache_id = "git_deploy:$directory:$head_hash";
maybe it could be: git_deploy:<project_type>:<project_name>:<head_hash>
Examples:
git_deploy:module:git_deploy:6ee53636697afc3a0b7377a06ffe6ecb744207ad
Comment #38
adixon commentedNice ...
I haven't got any contribution to your cache_id proposal, except that from looking at my own cache files and watching the behaviour, I expect you're right - i.e. the directory shouldn't be in the cache id because I think if the same git deployed module exists in separate directories, we don't need separate caches of the git information.
On the question of the tmp dir - my assumption was that the OS would deal with cleaning up the tmp dir, however that is set up. For my CentOS machine, it gets cleaned up with a daily cron unless you put in an exception. As far as I know, Drupal doesn't clean up its tmp directories in other modules.
The only improvement I added to my own code since my last patch was to make the implementation of the static file cache less of a hassle by providing a simple way to hack the module and provide a single default git_deploy_tmp_dir, rather than have to put it in each sites config file [since typically, you want all installs using your code to share the same git_deploy_tmp_dir]. In other words a little change like the below.
I can roll a patch, but it's getting so complicated I'll probably get it wrong ...
And finally - I do get irritating errors from drush, because the default permissions in the tmp directory don't let me create/access the static file cache. That's really just a file permissions error for me to figure out, but if there's something in there I don't really understand, maybe the static file cache should be disabled for drush usage.
Comment #39
sweetchuckI have a site with 150 contrib module. (Git submodule)
The cost of git_deploy_system_info_alter() in millisec with this patch:
26897- Empty cache.6500- File cache.4000- DB cache.30- Second run in one page request.@Todo
Better PhpDoc
testing
code review
Comment #40
adixon commentedThose look like excellent numbers. I assume the last number is related to how the hook gets run more than once for a single page load [which makes sense given that a module might be altering the system info at various stages]. Since this module's altering isn't going to be affected by any other module's changes, I had been wondering if some static variable cache might improve the second and third invocations of the hook, but 30 milliseconds makes me thing that it's not worth it.
Conclusion: thanks for those numbers.
Comment #41
sweetchuck@adixon: Yes, it is strange why run more than once the
_system_rebuild_module_data(), but some contrib module trigger the _system_rebuild_module_data() by itself.In most cases this happening on
?q=admin/modulespage.In my test environment the trigger functions are:
Comment #42
sweetchuckSmall modifications.
Comment #43
sweetchuckChanges:
$additions) the key changed to$git_dir, because more than one module can live in one Git repository. See the Panels, OG or Commerce.In this case to read the versioning informations from Git history will give us same result all the time. This is faster. If a repository contains more than one modules then we need to read the version information only once.
Side effect: Only one cache entry is written in the local cache and the shared file cache. If the git_deploy_system_info_alter() called with different order than previously then the cache is not usable.
_git_deploy_file_cache_write().Comment #44
sweetchuckChanges:
The format of the shared file cache changed to Windows INI like file format, because WebServer writable files can not be trusted and sending untrusted data unserialize() is dangerous.
If this issue will be fixed then we do not need to use the file cache any more. #467226: Support for per-bin key prefix
Comment #45
luksakI confirm that the patch in #44 works. I have a performance increase from 41718 ms to 12893 ms.
What is missing to get this committed?
Comment #46
sweetchuckThe dependency cannot be resolved if the *.info file contains a line like this:
dependencies[] = libraries (>=2.0)Try the git_deploy together with colorbox and flexslider.
Comment #47
luksakThere is a different issue for this: #1905736: Version numbers not exposed to update.php I tried to merge them without success. The two patches change code at the same place...
Comment #48
sweetchuckAdd
hook_enable()to move git_deploy module to beginning of the module list.Comment #49
OnkelTem commentedMy two cents.
Time measurements of loading
admin/modulespage:• Latest dev (08.11.2013), 5 runs:
∘ 10289
∘ 12337
∘ 10135
∘ 10573
∘ 11542
∘ avg: 10975
• With git_deploy-1511112_5.patch, 5 runs:
∘ 5290
∘ 5954
∘ 5636
∘ 5104
∘ 4002
∘ avg: 5197
• Without git_deploy, 5 runs:
∘ 1662
∘ 1549
∘ 1492
∘ 1371
∘ 1450
∘ avg: 1504
Conclusions:
Comment #50
donquixote commentedThe current patches, and the existing module, miss out on a much simpler optimization:
Currently a module repository that contains more than one module will be scanned multiple times, for no reason. This can be avoided by introducing a static variable cache that caches per .git directory, not per module.
The same can then be done with the persistent cache.
Attached is a patch (with
git format-patch) which- refactors and splits up
git_deploy_system_info_alter().- uses dirname() instead of the substr() logic, to go to the parent directory.
- introduces a static cache per .git directory.
- introduces the
escapeshellargs()check, which I suppose is needed.It does NOT introduce a persistent cache like the patches above. This can be done later.
No interdiff, it would be pointless.
The splitting-up / refactoring allows to test and benchmark the discovery logic without the cache.
I developed this change before I found this issue, which explains why it is so different.
Comment #51
donquixote commentedComment #52
donquixote commentedReview of #48:
Is this still needed nowadays?
Really?
Oh, so the static cache actually uses
$git_diras a key. But the persistent cache uses$file->name.Comment #53
donquixote commentedHere is a combined patch (with
git format-patchto distinguish the steps) which combines my changes in #50 with the changes from #48 by Sweetchuck.So it introduces the same persistent cache.
I am not really convinced this is useful, because this cache is wiped on
drush cc all. My own goal was to speed up clearing caches, so this does not help. But the static variable *does* help.Maybe we can simplify the git commands?
Comment #54
donquixote commentedSee also #2775011: Remove redundant git commands.
Comment #55
andypostComment #56
darren ohComment #57
darren oh