As more backends are being implemented, it becomes clearer that 1. backends are able to retrieve and manage commit log data in the database, and 2. it's unlikely that backends appear that don't store their item revisions in the database, and that this would probably be a major performance hit.

As pointed out by ezyang in the Mercurial backend's README.txt file, we could make things easier for backends if some of the item handling code and tables is moved into the API module. As an additional bonus, we could simplify and speed up queries for the retrieval of commits and maybe branch/tag operations. Given the similarity between the different backends' item revision tables, I could imagine to move the tables to versioncontrol.install like this (MySQL version):

db_query("CREATE TABLE {versioncontrol_item_revisions} (
  item_revision_id int unsigned NOT NULL default 0,  /* random iterative number */
  vc_op_id int unsigned NOT NULL default 0,  /* id of the associated commit */
  path varchar(255) NOT NULL default '',
  type tinyint NOT NULL default 0,  /* file, directory or deleted file */
  PRIMARY KEY (item_revision_id),
  UNIQUE KEY (vc_op_id, path)
) /*!40100 DEFAULT CHARACTER SET utf8 */");

db_query("CREATE TABLE {versioncontrol_item_source_revisions} (
  item_revision_id int unsigned NOT NULL default 0,
  source_item_revision_id int unsigned NOT NULL default 0,
  PRIMARY KEY (item_revision_id, source_item_revision_id)
) /*!40100 DEFAULT CHARACTER SET utf8 */");

The 'action' property can be left out of {versioncontrol_item_revisions} as it can be determined from the set of source items and the current item, and leaving it out has no impact on performance. Having those tables in the API module would also make it possible to retrieve item history within the API module, taking this burden off the backend modules. It would also encourage backend modules to go the extra mile for providing correct item source revisions, because incomplete or wrong source revisions will show earlier (in form of action descriptions and broken history logs) and can't be hidden by easy workarounds as is the case now.

So, all good stuff, right? The only drawback is that this *again* is a major change on the database tables in both API and backend modules, and as such needs to be done before Version Control API is rolled out on drupal.org. Thus (and also because of the performance improvements) this is pretty much a critical issue, and one that should be done as soon as possible.

Comments

jpetso’s picture

Oh baby! I just had a great idea: now that we have the same kind of identifier for all operations (commits, branch ops, tag ops), we can actually unify the table format for associating any kind of operation with their items - so instead of specifying the vc_op_id of the commit in the item revisions table and ignoring that id when referencing it from branch or tag ops (that was a bit unclean anyways), we can leave the id out of the item revisions table and have all mappings in one single operation/items mapping table.

Let's revise the schema given above, and make it more like this:

db_query("CREATE TABLE {versioncontrol_operation_items} (
  vc_op_id int unsigned NOT NULL default 0,
  item_revision_id int unsigned NOT NULL default 0,
  PRIMARY KEY (vc_op_id, item_revision_id)
) /*!40100 DEFAULT CHARACTER SET utf8 */");

db_query("CREATE TABLE {versioncontrol_item_revisions} (
  item_revision_id int unsigned NOT NULL default 0,  /* random iterative number */
  repo_id int unsigned NOT NULL default 0,
  path varchar(255) NOT NULL default '',
  type tinyint NOT NULL default 0,  /* file, directory or deleted file */
  PRIMARY KEY (item_revision_id),
  UNIQUE KEY (repo_id, path)
) /*!40100 DEFAULT CHARACTER SET utf8 */");

db_query("CREATE TABLE {versioncontrol_item_source_revisions} (
  item_revision_id int unsigned NOT NULL default 0,
  source_item_revision_id int unsigned NOT NULL default 0,
  PRIMARY KEY (item_revision_id, source_item_revision_id)
) /*!40100 DEFAULT CHARACTER SET utf8 */");

That way, we can get rid of tables like {versioncontrol_*_commit_items}, {versioncontrol_*_item_tags} and {versioncontrol_cvs_item_branch_points}, plus everything is managed by the API module and backends don't need to care about any of those. Yay. Now it just needs to be implemented...

jpetso’s picture

Renaming, it's a two-part issue now (this is the other part).

jpetso’s picture

Title: Major issue #1: Move item handling into the API module » Major issue #1a: Move item handling into the API module

I said, *renaming*!

jpetso’s picture

Component: Code » API module

I think there should also be the third column in the {versioncontrol_item_revisions} table that is specified for items by the API, namely this one: revision varchar(255) NOT NULL default ''

It's only being actively used by CVS, but it shouldn't really matter for performance and makes things a bit easier still, having all "core properties" in the table that is managed by the API module.

jpetso’s picture

Ok, we need one another revision - Subversion manages to mess up a lot of expectations that apply to other version control systems. That includes the following points:

  • Replaced items: a commit action can not only contain current and source items but also a replaced item, saying that this item was deleted and added (or copied/moved) in the same revision. On the API side, I want this to be just another element in the commit actions array ('replaced item') which will of course be optional, and in the database we need to treat it separately from the source items so that the replaced item can be restored separately on loading the commit action.
  • When moving directories, child items are only moved implicitely in Subversion. This means that modified and deleted items don't necessarily need to have similar source and current item paths, and that in turn makes it impossible for us to calculate the 'action' based on the contents of a commit action array, it needs to be stored explicitely.

Both 'action' and 'replaced item' are properties of the commit action, so the solution might be a {versioncontrol_commit_actions} array in addition to the other ones that were mentioned before.

db_query("CREATE TABLE {versioncontrol_commit_actions} (
  vc_op_id int unsigned NOT NULL default 0,
  action tinyint NOT NULL default 0,
  current_item_revision_id int unsigned NOT NULL default 0,
  replaced_item_revision_id int unsigned NOT NULL default 0,
  PRIMARY KEY (vc_op_id, current_item_revision_id)
) /*!40100 DEFAULT CHARACTER SET utf8 */");

However, taking ezyang's input from the other issue into account, we might not want to introduce extra tables for specific types of operations (commit, in this case) and rather have some denormalization for performance and less table join complications. It would be just as possible to merge {versioncontrol_operation_items} and {versioncontrol_commit_items} into one table, leaving 'action' and 'replaced_item_revision_id' unused for branches and tags. Together with the 'revision' column for {versioncontrol_item_revisions} mentioned in the previous comment, we would get something like this (listing all item related tables as current table scheme proposal):

db_query("CREATE TABLE {versioncontrol_operation_items} (
  vc_op_id int unsigned NOT NULL default 0,
  item_revision_id int unsigned NOT NULL default 0,
  action tinyint NOT NULL default 0, /* only used for commit operations */
  replaced_item_revision_id int unsigned NOT NULL default 0,  /* only used for commit operations */
  PRIMARY KEY (vc_op_id, item_revision_id)
) /*!40100 DEFAULT CHARACTER SET utf8 */");

db_query("CREATE TABLE {versioncontrol_item_revisions} (
  item_revision_id int unsigned NOT NULL default 0,  /* random iterative number */
  repo_id int unsigned NOT NULL default 0,
  path varchar(255) NOT NULL default '',
  revision varchar(255) NOT NULL default '',
  type tinyint NOT NULL default 0,  /* file or directory (either normal or deleted) */
  PRIMARY KEY (item_revision_id),
  UNIQUE KEY (repo_id, path)
) /*!40100 DEFAULT CHARACTER SET utf8 */");

db_query("CREATE TABLE {versioncontrol_item_source_revisions} (
  item_revision_id int unsigned NOT NULL default 0,
  source_item_revision_id int unsigned NOT NULL default 0,
  PRIMARY KEY (item_revision_id, source_item_revision_id)
) /*!40100 DEFAULT CHARACTER SET utf8 */");

That would probably be the better solution... I'll try this out by implementing versioncontrol_svn with a similar structure (which will also be beneficial when updating to the centralized structure later).

jpetso’s picture

Title: Major issue #1a: Move item handling into the API module » Move item handling and commit branches into the API module

Progress! Or better even - I'm now pretty sure that I've developed the best schema that is possible for this purpose. Features:

  • Virtually no redundancies or unused columns.
  • Increases the number of "core" tables only by one - 3 tables removed by centralizing into {versioncontrol_operations} (commits, branch_ops, tag_ops), one table repurposed (branches -> labels), and 4 tables newly introduced (operation_labels handles commit branches as well as branch and tag name associations, the other three are for item storage).
  • Handles both items and commit branches (therefore also obsoleting Major issue #1b).
  • can be joined in the same ways for commits, tags and branches (given a NULL item and label - or 0, technically - so also operations without items or branches/tags can be joined)
  • and generally handles all use cases that I could think of. Please try to find stuff that I didn't think of.

Commit actions are a combination of operation items and item relations - but they'll disappear anyways in the API, a more detailed set of items can (and will) replace any commit action. So we'll have just versioncontrol_get_operation_items($operation, $include_source_items = FALSE) instead of versioncontrol_get_commit_actions(), versioncontrol_get_branched_items() and versioncontrol_get_tagged_items().

Yes, I am stoked about this schema. You may not like the fact that it's very abstract, but we'll just have the API a bit more concrete (not many changes are planned anyways, compared to now) and considering that even backends hardly have to care about item management anymore, the abstraction shouldn't hurt anyone but hunmonk who does the cvslog -> versioncontrol porting script (sorry, dude).

Without further ado, I proudly present the new "operations" portion of versioncontrol.install. (I am inclined to say "my best piece of architecture yet", but maybe I'm just temporarily overenthusiastic, let's wait and see ;)

(...other actions...)
define('VERSIONCONTROL_ACTION_REPLACED', 6);
define('VERSIONCONTROL_ACTION_SPECIAL',  7); // e.g. SVN's revprop changes, in case there's only those and no other modifications (which would be preferred over "special")

db_query("CREATE TABLE {versioncontrol_operations} (
  vc_op_id int unsigned NOT NULL default 0,
  type tinyint unsigned NOT NULL default 0,
  repo_id int unsigned NOT NULL default 0,
  date bigint NOT NULL default 0,
  uid int unsigned NOT NULL default 0,
  username varchar(64) NOT NULL default '',
  revision varchar(255) NOT NULL default '',
  message text,
  PRIMARY KEY (vc_op_id),
  KEY type (type),
  KEY repo_id (repo_id),
  KEY date (date),
  KEY uid (uid),
  KEY username (username),
  KEY revision (revision)
) /*!40100 DEFAULT CHARACTER SET utf8 */");

db_query("CREATE TABLE {versioncontrol_operation_labels} (
  vc_op_id int unsigned NOT NULL default 0,
  label_id int unsigned NOT NULL default 0,
  action tinyint NOT NULL default 0, /* VERSIONCONTROL_ACTION_{ADDED,DELETED,MOVED} for branches and tags, VERSIONCONTROL_ACTION_MODIFIED for commits (doesn't matter anyways) */
  PRIMARY KEY (vc_op_id, label_id)
) /*!40100 DEFAULT CHARACTER SET utf8 */");

db_query("CREATE TABLE {versioncontrol_labels} (
  label_id int unsigned NOT NULL default 0,
  repo_id int unsigned NOT NULL default 0,
  name varchar(255) NOT NULL default '',
  type tinyint NOT NULL default 0, /* VERSIONCONTROL_OPERATION_BRANCH or VERSIONCONTROL_OPERATION_TAG */
  PRIMARY KEY (label_id),
  UNIQUE KEY (repo_id, name, type)
) /*!40100 DEFAULT CHARACTER SET utf8 */");

db_query("CREATE TABLE {versioncontrol_operation_items} (
  vc_op_id int unsigned NOT NULL default 0,
  item_revision_id int unsigned NOT NULL default 0,
  PRIMARY KEY (vc_op_id, item_revision_id)
) /*!40100 DEFAULT CHARACTER SET utf8 */");

db_query("CREATE TABLE {versioncontrol_source_items} (
  item_revision_id int unsigned NOT NULL default 0,
  source_item_revision_id int unsigned NOT NULL default 0,
  action tinyint NOT NULL default 0,  /* now stored per item relation, not per commit. that makes it possible to combine "modified" and "replaced" actions in one single structure, and extensible for further actions even */
  PRIMARY KEY (item_revision_id, source_item_revision_id)
) /*!40100 DEFAULT CHARACTER SET utf8 */");

db_query("CREATE TABLE {versioncontrol_item_revisions} (
  item_revision_id int unsigned NOT NULL default 0,
  repo_id int unsigned NOT NULL default 0,
  path varchar(255) NOT NULL default '',
  revision varchar(255) NOT NULL default '',
  type tinyint NOT NULL default 0,  /* file or directory (either normal or deleted) */
  PRIMARY KEY (item_revision_id),
  UNIQUE KEY (repo_id, path)
) /*!40100 DEFAULT CHARACTER SET utf8 */");
jpetso’s picture

Title: Move item handling and commit branches into the API module » MI#1: Move item handling and commit branches into the API module

Cutting off the "Major Issue #1:" completely makes the issue queue look odd, as this issue doesn't get the importance that it deserves and the major issues #2 and #3 lack a first one. Trying to find the golden middle way. ("Mission impossible"? ...well, not quite, but close.)

jpetso’s picture

Coding is now in progress, the first part (database changes in the API module) has been committed as commit 103150.

jpetso’s picture

Version: 5.x-1.0-rc4 » 5.x-1.2

More progress:

  • The search query has been unified and ported in commit 103866. I tend to find this very cool :)
  • For more performant queries I decided to have searchable source items "cached" in the {versioncontrol_operations} table, see commit 103646 for the exact changes and explanation.

Todo:

  • Merge versioncontrol_get_commit_actions(), versioncontrol_get_branched_items() and versioncontrol_get_tagged_items() into one single function versioncontrol_get_operation_items(). Commit actions will be an array of items (instead of actions) with source and replaced items being just another property. Those should be able to be retrieved by a new function, versioncontrol_add_source_items(&$operation_items), and probably a TRUE/FALSE parameter in versioncontrol_get_operation_items() which causes it to automatically call this function (or combine the queries and add source/replaced items by itself, whatever is better). The Subversion backend has the closest implementation of what will become get_operation_items() - still named get_commit_actions() there, of course - perhaps we should start from there.
  • versioncontrol-backend.inc needs to be adjusted as well, so that versioncontrol_insert_operation() inserts the operation items by itself. It should also sort the items alphabetically before inserting them.
  • That should be the two major tasks for the API module itself - afterwards, we need to port the VCS backends (starting with the CVS one) and port all get_commit_actions() usages to get_operation_items().
jpetso’s picture

The first and second point on the above todo list are essentially done - versioncontrol_get_operation_items() and related functions work, and Commit Log shows items again. versioncontrol-backend.inc has been adapted to include versioncontrol_insert_operation() which replaces the former commit/branch/tag insertion functions, so I was able to get rid of lot of code :)

Some code in the CVS and SVN backends were also deleted, but porting those is not yet done, plus Commit Log is currently plastered with TODO items originating from the unification of commits, branches and tags into operations as well as the introduction of labels and the removal of the commit directory item. Those need to be tackled, parts of Commit Log might need a major overhaul. versioncontrol_project is in a bit of an unknown state - it should essentially work, but needs a review once API module, backends and Commit Log are done.

Here's an action plan on how we might be able to get it working again:

  1. Items need a 'selected_label' property (similar to a single $operation['labels'] element) so that the get_current_item_{branch,tag}() functions can keep working. (In fact, with the label attached to the item, those functions will be unneeded and can be deleted.) Background: Due to the fact that items are now not created by the backends anymore, the backends also cannot attach the branch/tag to operation items anymore, but those will still be needed for navigating in repository browsers. So instead of doing get_current_item_{branch/tag}() on caller request, we might rather attach that 'selected_label' property on loading the item and let the backend provide them via callback (given either the operation in get_operation_items() or the successor item in get_source_items() as context). Done.
  2. Related to that, I just noticed that versioncontrol-backend.inc still has its original branch management functions (near the end of the file) *and* a newer _versioncontrol_ensure_label(). Reduce that to only one version, and determine if the insertion functions still need to be public. (The retrieval functions will at least be necessary in the new callback mentioned above. Done.
  3. Implement the mentioned callback in the CVS backend, so that we can delete versioncontrol_cvs_get_commit_actions() and versioncontrol_cvs_get_{branched,tagged}_items() with a good conscience. For the SVN backend, a stub implementation (returning NULL) should be sufficient as it doesn't (yet?) support branch and tag emulation anyways. Done.
  4. Hide the 'selected_label' item property from both callers and backends, and implement lazy loading by referencing either the operation (versioncontrol_get_operation_items()) or source item in item-based navigation (e.g. versioncontrol_add_source_items(), versioncontrol_get_directory_items() or the yet-to-be-written versioncontrol_add_successor_items()) if the 'selected_label' has not yet been determined by the backend. Operation or item should be recursively queried for the selected label up to the start of the navigation when the selected label is known in any case (or not, with NULL). Done.
  5. Add versioncontrol_add_successor_items() as the opposite of versioncontrol_add_source_items(). Implement versioncontrol_get_item_history() with the 2.x API (using versioncontrol_add_source_items() or the {versioncontrol_source_items} table directly, probably also needs some functionality that goes into the successor item direction). Also, rework versioncontrol_get_all_item_{branches,tags}() so that they provide requirements (for the backends) and documentation that plays nicely with all that label stuff. Done.
  6. Move the lines-changed information in the {versioncontrol_source_items} table so that we can scrap get_commit_statistics(), whose API is not nice anyways. That way, we can provide the information where it makes sense (in the item arrays) and get rid of the {versioncontrol_cvs_item_revisions} table, which only contains the lines-changed columns by now. Port the (two, if I remember correctly) usages get_commit_statistics() in Commit Log and versioncontrol_project to whatever it looks afterwards. Done.
  7. Finish the database update for CVS (needs to move commit branch associations to {versioncontrol_operation_labels}). Done.
  8. Port the CVS backend's log parser and xcvs-* trigger scripts to the new operation array format. See the versioncontrol_get_operations() function docs and the 2.x API changes page. Test and have a look if data gets transferred into the database correctly. (Commit Log should be able to show most of the stuff correctly, too.) Special areas of interest - having changed in the 2.x branch - are labels (previously branches/tags) and source items, and of course commit actions have been replaced by arrays of operation items (so the layout might be slightly different here, too). Done.
  9. Rework Commit Log so that it displays all three kinds of operations - commits, branches and tags - correctly and in a way that they make sense. Might also involve changing the output format a bit, as that is a) inflexible and too crowded anyways, and b) only suited for commits in the current form. Done.
  10. Review Commit Restrictions and versioncontrol_project if they need porting to the 2.x API (the existence of commit specific functions is a nice indicator for that), and if so, get them ported and working. I think the documentation (bits in hook_versioncontrol.php and versioncontrol_fakevcs.module) also needs some updating. Done.
  11. Done! Version Control API is now restored to the 1.x branch's fully working state. (Right?) Proceed with #216373: Major issue #2: Direct hook access to the query.

I'm going to become a big fan of small-chunked checklists. Never leave your house without knowing what exactly to do next!

jpetso’s picture

Ok, I implemented point 1, 2 and 3 on the above list (now updated with shiny number ordering), but it kicks up a few new questions. Essentially, I'm still not really happy with the way that Version Control API tackles labels (= branches/tags), although after my commit today it's a lot better than it was before. Needs more work still, see point 4 and 5. That's the final architectural challenge, when we get that one right, everything else is pretty much a breeze.

Essentially, as long as we need to query the backends for items, it's not yet done... at the moment we're currently stuck between "store lots of item data and operate independently on it" on the one hand and "still leave some important data to the backends" on the other one, which is a bit awkward in terms of consistency and clarity of the used concepts. Handing over all item data to the backends is not an option (that's what 1.x did, and had severe issues in terms of performance and queryability) so I think we should really take the final step and store all label associations for all items too. We might need better item/operation management functions - update functions, that is - but I think it's the only way this thing will be able to work elegantly.

So, next up: a final database table for {versioncontrol_item_labels}. Anyone want to try coming up with a schema? (Hint: should be easy!)

jpetso’s picture

Ok ok ok, now I remember why I decided not to have that {versioncontrol_item_labels} table formerly - it's just too much data to manage. Like, Mercurial stores a line that goes something like
0425331c06ea63efa363df4c26078cf45a460197 2.0.3
which specifies the 2.0.3 tag. Starting from that commit (represented by the SHA-1 id), we would need to retrieve all items in the repository at that given commit, and write a tag association for each of those items in the database. Plus for branches it's even worse, because all items *and* all source revisions of those items would need to be associated to a branch when a new one is created. And when a label is being deleted (or a branch being merged), all of those revisions would have to get that label removed again. Might be a bad idea after all, so let's better leave it at that.

Still, the thing that bothers me is that we need to determine and propagate a selected label, and currently that's done at the time of item retrieval. That's bad, because a lot of items can be retrieved at a given point in time, while the (mostly unused) cost for determining the selected label might not be negligable. For example, if Commit Log retrieves an operation via versioncontrol_get_operation() together with its items via versioncontrol_get_operation_items(), and decides to call versioncontrol_add_source_items() for all those operation items in the 10 commits shown by default, that's already a sizable amount of items. Should a backend decide to back up its branch/tag assumptions by directly invoking the VCS executable, that will be a slowdown that we really don't want.

Of course, the solution to that is lazy loading of the 'selected_label' property. Then again, when determining the selected label coming from another (source or successor) item, that item also needs to know either its own selected label or at least how it can get to that. For a file log, we need to propagate that label finding stuff, and it's unrealistic to keep a record of how to get from any given item to the whole selected item history. (That would perform badly, too.) Rather than that, each item should only need to hold the operation or item that it got the selected label from, and query that one, which in turn queries its predecessor, and so on. In other words, by-reference storage of predecessor items, otherwise the same item would determine its selected label multiple times (thanks to PHP's copy-on-write semantics for arrays).

Now this whole thing, lazy loading and reference-copying instead of value-copying of items, that just screams for objects. But it's not yet time for that, needs to wait until the next major rework of Version Control API. So for the time being, as a sufficient workaround for not having objects, let's store the origin of an item (operation, source/successor item, or none of that required if there's already a selected label) by reference in each item, add the label information by altering that item, and hope that the caller doesn't copy around items all too much before the information is being retrieved. For that and lazy loading to work, 'selected_label' (xor its origin information until 'selected_label' is retrieved) needs to be private to the API module, and the selected branch/tag retrieval functions have to be revived as versioncontrol_get_selected_label(&$item).

With that in place, the whole thing should work at last. Sorry for the confusion spread afterwards, and I'll update the bullet list to reflect the current state of things. Now for real. Dude!

jpetso’s picture

The SVN backend is now fully ported, and back at its 1.x feature set. Which is not all that much (no hook scripts, no branch/tag emulation support, no functional account/password export), but it's enough to populate 2.x databases with new commits again. Which means that you're not dependant on my database dumps anymore. Way to go! I updated the bullet list and removed the SVN tasks from there.

jpetso’s picture

I just saved the world with a rather large commit solving the bulk of action point 5 from the list in comment #10. That was the last major hurdle, everything is now going to be fine. Really. The vast majority of the API, except for the three functions remaining on the list (get_all_item_{branches,tags}() and the statistics stuff) should now be more or less stable, and more importantly, works again for Commit Log purposes (and others) if I'm correct.

I also tried out how easy it is to write an (experimental) repository viewer, and apart from the get_item() function that I also added, I'd like to add one more wishlist item for the 2.0 release:

- Add a function called versioncontrol_fetch_item_operations() which searches the database for operations associated to a given set of item revisions (say, pretty similar to versioncontrol_fetch_source_items()) and adds those as an 'operations' element to the item arrays. Use cases: 1. for a given item history (which only contains the raw item arrays), we need the associated operations so that an item history log viewer can display the messages for each revision. and 2. pretty much every repository viewer displays the time of last modification and the associated author next to the item, some also display the commit message - of course, those are all operation properties and will not go into into an item array natively. So we need to retrieve operations for a given set of items, hence the aforementioned function. Done.

jpetso’s picture

Implemented versioncontrol_fetch_item_commit_operations() (note the slight rename for correctness purposes), so the additional point from the last comment is done already. The repository viewer is now out in the wild - although without a tarball release, for good reasons - and, as of today, makes use of this functionality. Which means it looks great and is nearly on par with other repository viewers, apart from the tons of features that it doesn't have.

Anyways, versioncontrol_get_all_item_{branches,tags}() really needs to be redefined now, see bullet point no. 5 in the above comment #10. Enough feature fun, let's get back to the Real Work (tm)!

sdboyer’s picture

So. Finally trying to ACTUALLY help out :)

I did a quick patch that added the versioncontrol_backend_implements() calls and updated some of the docblocks accordingly. It's my impression that, since we're abandoning the branch/tag distinction in favor of labels (yar for that), there may be value in collapsing them into the same function sometime later, but I suspect that a) raises other issues and more importantly b) is pretty easy to do when the time comes, so I just left it.

Yeah, I definitely had some moments of screaming for OO. I also had the funny reaction that, since I'm now used to the stupid-crazy speed of git, that git might actually be slower if we move it into the database than just querying the backend directly :P Although tbh, that's actually something that I think might be easily-ish done when the OO reworking is done. I dunno though, I've not grokked enough of this yet.

sdboyer’s picture

I'm not clear on how #6 is supposed to work. It seems to me like we should only be moving that data into the generic API table if it's something that all the backends provide. Or, put another way, there's other data that the other backends are still providing in their own tables that's more than just the lineschanged data. Which means to me that we're not eliminating those tables/that method of garnering operation data entirely - so what are we gaining by moving the lineschanged info into the API table?

jpetso’s picture

The reasoning behind #6 (moving lines-changed data into the API module) was as follows:

  • This data is similar in structure throughout all version control systems, and in wide use among log viewers and notification mails.
  • If this data is stored in a central table, backends don't need to duplicate code for the same functionality.
  • Also, we can eliminate at least one database query per backend, maybe even more (depending on the API).
  • It might come in handy for Views integration, although that's a hypothetic guess atm.

We already provide capabilities that not all backends can handle (action types that don't map to CVS or SVN, log messages for branches and tags, ...anything else? ...) so the reason for backends storing their own "proprietary" information is more like "this data can't be (directly) used by callers or the API module". Lines-changed info, however, is directly usable without further requirements on the backends.

I thought we could get around the "not supported by backend X" issue by having a column that says if lines-changed info has been recorded, and two columns that capture the actual "lines added" and "lines deleted" values. (The "has been recorded" flag could also help to add that information if it's implemented at some later point and exists for some items already but not for all of them.)

That was my reasoning... I could have missed something though, do tell me if you disagree.

jpetso’s picture

I committed a different approach to versioncontrol_get_all_{branches,tags}() - also different to comment #16, see the associated commit message for more info. The CVS log parser is now also ported, though completely untested at the moment. (That's a bit of ignoring the original task order, but I just felt like it. Seems like we're making progress - slow, but steady :)

jpetso’s picture

There were no objections against moving the lines-changed information into the API module, so that's now in there too. Updating the xcvs-* scripts is nearly done, too.

jpetso’s picture

The CVS database update is complete! Now all data is accessible by the API module. Next up: porting (and testing) the xcvs-* scripts.

jpetso’s picture

The xcvs trigger scripts for commitinfo, loginfo and taginfo are now functional again. Unfortunately I forgot to port the posttag script last time - and today is late already, I'm outta here for now - so this one's pending still.

jpetso’s picture

xcvs-posttag is working again, tags are being recorded correctly. Which means the xcvs-* scripts are back to their previous state. Before proceeding to the Commit Log "unified commit/branch/tag display" rework, I'll try to implement tag (and branch) deletions properly in the xcvs scripts. We'll need that anyway for the release node integration.

jpetso’s picture

Tag (and branch) deletions are now properly recorded in the xcvs-* scripts, yay. Also, reviewed Commit Restrictions and versioncontrol_project: I fixed a couple of smaller issues, but overall there was little left to port.

So now, one issue left: Commit Log! (and some bits of documentation.) I'd say we're getting there? :D

jpetso’s picture

Commit Log is now working like it did before. That doesn't mean that it's perfect (some more modularization and refactoring would be cool) but in regard to this issue it's probably far enough to be marked as Done. Leaves a little bit of documentation, and then we can close off this era, er, I mean issue.

jpetso’s picture

Status: Active » Fixed

I completed the last part (documentation updates) of the last bullet point just now. Version Control API is fully operational again. This issue was in the works long enough, time to say goodbye and take on new challenges. Hurray for zero gravity!™

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.

  • Commit 9ef86a0 on 5.x-2.x, 6.x-1.x, repository-families, drush-vc-sync-unlock by jpetso:
    First part of #216371: Update of the API module's database schema,...

  • Commit 9ef86a0 on 5.x-2.x, 6.x-1.x, repository-families by jpetso:
    First part of #216371: Update of the API module's database schema,...