The main repo synchronizer (as in, what's run by calling
VersioncontrolGitRepository::fetchLogs()) could stand to have its performance improved. A fair bit. I've attached the cachegrind output that was generated by parsing a Panels repo using drush. That's around 1450 commits. I'd post screenies, but skitch seems incapable of capturing stuff in an X11 window, so it can't see my kcachegrind.
There are a few big things that jump out:
We are making way, way WAY too many system calls to the binary. No big surprise there, but it's a problem - 7000 calls to
exec(), taking 40% of execution time (inclusive). There is NO reason to have 7000 git shell calls for 1450 commits. And, pointing to some real flaw in our logic, there's also 7000 calls to
shell_exec - which we use only when testing the git binary's location. And that's another 15% of execution time (inclusive). Yikes. So there's some easy wins right there.
The next bit of low-hanging fruit is under VersioncontrolEntityController::load(). Seems we spend 18% of execution time in there, 16.94% in the php function itself. Which means that it's probably the really inefficient sorting algorithms that are in there right now that are the culprit - they do NOT scale well, and get much worse as the number of in-memory-cached objects increases. (Which also explains why this process starts to get really _REALLY_ slow when we run multiple repositories through it). In fact, I actually think that fixing this stuff up could be the thing that really nails down massive speed increases.
After that, I think we're into micro-optimization territory, or needing a fairly significant refactor to get any additional benefits. Not advocating that latter option, just saying.
Note that I'm tagging this sprint 9, because the full repo-sync system isn't nearly as important as the incremental update that'll be triggered off of pushes (which still needs to be written...oi).