Project:Version Control API -- Subversion backend
Version:6.x-1.0-beta2
Component:Code
Category:task
Priority:normal
Assigned:sdboyer
Status:closed (fixed)

Issue Summary

Log fetching in the Subversion backend is currently dog slow. That's because 'svn log' doesn't tell us much of the information that we want to provide to the Version Control API:

  • It doesn't tell us if a path item is a file or a directory.
  • It doesn't tell us the last revision when an item was last changed.
  • If an item was deleted or modified and its parent directory has been moved since the last change, it doesn't tell us the previous path of that item.

That makes it necessary to call 'svn info' multiple times, and that makes the log fetching process really slow. We don't really want to miss out on any of the information, but there are a few possibilities left how it could be made faster:

  • Given a start state, we could track the item information by ourselves instead of querying 'svn info'. That wouldn't always work, except if the log starts from revision 1 - in that case only "A" entries need to be queried. However, starting from revision 1 is likely the most expensive fetch (the initial import) so the gains are more important there anyways. Also, even if only some tracked items can be reused instead of calling 'svn info', the procedure would be faster still. In order to track items, it would be necessary to find out the sort direction of the input revisions (ascending or descending), sort them for processing, and possibly revert the sorting when processing is finished.
  • It would be possible to make svnlib_info() take more detailed parameters: instead of having one fixed $url_revision parameter, there could be a differently looking $repository_urls array that looks like array(array('url' => $url, 'url_revision' => $url_revision), ...), and each of those would be passed to one single 'svn info' invocation. That of course doesn't make sense in combination with the '-r' option (which is used to retrieve previous item states), but we could bundle all 'current item' states in one go. I would guess that makes a third or (at maximum) half of the 'svn info' calls, and should speed up every log fetch considerably.

Those two approaches combined will probably make log fetches reasonably fast... I think so, at least. Other good ideas appreciated, and if someone wants to take a shot at this, you can find the sources in versioncontrol_svn/svnlib/svnlib-deluxe.inc. It's independent of Drupal so you can easily test it with the versioncontrol_svn/svnlib/bin/svnlib-detailed-log.php script. Your chance for instant rockstardom!

Comments

#1

Sam Boyer has taken on a complete rework of the SVN invocation part, which can be found at http://github.com/sdboyer/svnlib/tree/master ...the prinpiples from here mostly remain, though, some will be directly implemented there while some still require extra effort.

#2

Version:5.x-1.1» 6.x-1.0-beta2
Assigned to:Anonymous» sdboyer

For the sake of avoiding possible duplication of work, let's mark this issue as assigned to Sam.

#3

Just FYI, this is nearly done. I'm most of the way through refactoring it to stack as many data requests into as few invocations as possible.

Next week, as this week is finals. I'll be committing the changes to a new branch, as this pretty radically overhauls things.

#4

Status:active» fixed

New changes are in in the new branch, although it's not quite production-ready yet. It's at least 3-4x faster, and memory usage is more or less constant (as opposed to linearly increasing with the size of the fetch operation).

#5

Status:fixed» closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

nobody click here