Currently, there are two major entities that are solely handled by the backends (for the relevant part, at least):

  • Items (= files and directories), including everything that has to do with commit actions. Those are handled by the backends because when designing the API, we decided to keep the possibility of writing backends that don't store items and commit actions in the database but retrieve them on the fly.
  • Branches. Not branch operations (the action that is performed when a branch is created, moved or deleted) but the branch itself, as a "state" of an item or when assigned to a commit. Those are handled by the backends mainly because branches are a virtual concept in Subversion, and are best evaluated from the directory (e.g. /branches/kde/3.5/blablabla).
  • ...right, it's the same thing with tags, but that doesn't concern us here.

So, Version Control API has this core function named versioncontrol_get_commits() which retrieves a set of commits given a set of commit constraints. The set of possible constraints also include 'path' and 'branch', as those are pretty important things to look for.

The problem with the current approach is that, in order to filter by path and/or branch, versioncontrol_get_commits() needs to call versioncontrol_get_commit_actions() and/or versioncontrol_get_commit_branches() for each and every commit that has passed the other criteria that have already been filtered on in the SQL query. So if versioncontrol_get_commits() is called with a path constraint only, it will get the commit actions array for a lot of commits in the database in order to figure out if they affected a file that satisfies the path constraint. And that is an incredibly inefficient approach, or at least it has the potential to be very inefficient in certain conditions - the 'directory' column in {versioncontrol_operations} helps to pre-filter the commits in order to minimize the impact of this implication, but I assume that this will be a major hit on very large repositories like the one on drupal.org.

There are a few options on how this issue could be handled:

  1. Don't provide 'path' and 'branch' constraints. Well, ...no. Not a feasible option.
  2. Centralize all data where we want to provide constraints. Actually, that will probably be a good idea for items and commit actions (see Major Issue #1 for more info on that), but branches might still stick on a backend-level, plus it's not extensible for non-backend modules. This approach can only be part of the solution.
  3. Open up the internal query builder to backend and non-backend modules, so for example versioncontrol_git can do a (left) join on {versioncontrol_commit_branches} and filter out commits on Git repositories where a commit doesn't exist in the commit branches table. Also, versioncontrol_project (assuming it already has the planned nid/vc_op_id mapping table) could do an (inner) join so that it doesn't need to retrieve commit ids and pass them as constraint but rather lets the database do that work.
  4. As a fourth option, I could also imagine a mix of 2. and 3. - centralization and no extensibility for backend modules (items and branches move into the API module, and we don't need left joins) while providing hooks for higher-level modules so that inner joins with other tables are possible.

I'll look whether option 4. can be done, if not then option 3. is the only feasible one (even if the most complex of all).

Comments

jpetso’s picture

Component: API module » Code

If #216371: MI#1: Move item handling and commit branches into the API module is implemented, this will be an optional improvement and can be downgraded to "normal" priority.

ezyang’s picture

Component: Code » API module

For any particular commit or range of commits, we should be able to call a function and retrieve that commit, as well as all commit actions associated with it. I'm not sure which item this corresponds to for yours, but with a simple JOIN we should be able to retrieve all of the necessary information for the commit and commit actions.

To get the commit actions into our internal format, we can factor out the post-processing code [versioncontrol_vcs]_get_commit_actions so that it doesn't call the database; we pass the commit actions data to it and it crunches it into the proper form.

(At times like this, I really wish versioncontrol was OOP. :-)

jpetso’s picture

Retrieving the commit actions directly by the search query is close to impossible because the items that need to be gathered are also a filter criterium, thus a search query will only retrieve part of those commit action items.

As for the original issue, it's now close to trivial to have option 4. implemented. The query stuff should all be working (although HEAD is currently broken as the port to the new database scheme is still in progress) and can easily be extended for external modules like versioncontrol_project. All we need is a documented format for extending the query - with the format preferably being close to Views and/or Crell's sandbox query builder for Drupal 7, and a decision whether we want external modules to include fields in the operation array or not. The coding part of this will be very easy now.

jpetso’s picture

Version: 5.x-1.0-rc4 » 5.x-2.x-dev
Component: Code » API module
Priority: Critical » Normal

Version Control API 5.x-2.x has centralized items and branches/tags (labels), and the port is finally done. (Funny to see how the comment #3, in *March*, stated how little more would be missing, haha.)

The issue is thus reduced to letting extension modules do inner joins on the query - more specifically, versioncontrol_project with its nid/vc_op_id mapping, so that we can filter efficiently by project, and more importantly, so that we can keep old associations even if the project moves. 5.x-2.x-alpha1 has a new condition modification hook, all we need is to replace (or extend) this with a hook that can specify tables to be joined. Currently, I'm leaning towards a less-beautiful-but-also-less-work solution that just exposes the current internal condition format to implementers. For Drupal 7, we could still use Crell's query builder which has landed in core in the meantime.

jpetso’s picture

Version: 5.x-2.x-dev » 6.x-1.0-beta4
Status: Active » Fixed

Done! Committed in commit 171682. Whoo!

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

  • Commit 6e0a6d4 on 6.x-1.x, repository-families, drush-vc-sync-unlock by jpetso:
    #216373 by jpetso: Direct hook access to the operations query. That...

  • Commit 6e0a6d4 on 6.x-1.x, repository-families by jpetso:
    #216373 by jpetso: Direct hook access to the operations query. That...