As some of the ideas in #879600: Meta: introduce an activity stream separate from commit logs allude to, tracking branch and tag "operations" is just wrong, or at least improperly executed at present. But before we even clean out that table, we can replace all the VersioncontrolOperation stuff at the PHP level with VersioncontrolCommit logic. That'll make things MUCH more understandable for new people coming in to the code, in addition to removing a layer of almost-unnecessary abstraction.

Doing this basically entails renaming VersioncontrolOperation to VersioncontrolCommit, VersioncontrolOperationController to VersioncontrolCommitController, and updating the internals of each of those so that they make hard-coded assumptions that commits are being dealt with (e.g., 'type' = VERSIONCONTROL_OPERATION_COMMIT). Yes, that means screwing branch/tag operations completely. I'm fine with that, I've already detailed why I think they're a waste of time, and there's _nothing_ in our current code base (as far as I can see) that actually really utilizes them.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

marvil07’s picture

Priority: Normal » Critical

Marking as critical since this is going to be done before a critical.

dww’s picture

Release nodes need to know about branches and tags.

sdboyer’s picture

Sorry, let me clarify, since this post is now much more public world-facing...VCAPI 1.x was built to record branches/tags ("labels" is the abstracted term it uses) in two places. The important record - the one that Project*, etc. cares about - is the one that's represented by VersioncontrolBranch and VersioncontrolTag. Those are built from the {versioncontrol_labels} table, and this issue has nothing to do with those. What we're talking about eliminating here is the branch/tag-related entries that are made in the operations table - those are used to indicate that a given branch or tag was created, deleted, moved, whatever, not whether or not it actually exists. That's why this is related to the activity stream question - that information is perfectly fine to record, but as part of an activity stream. It's immutable historical log data, completely detached from the real

We have this problem in the first place because somewhere WAY back along the line, some wires got crossed. I think jpetso's original vision for the {versioncontrol_operations} table actually was as an activity stream. At least, that's how it behaves in a centralized VCS setting - and for a non-atomic system like CVS, it's the only way we're at all capable of referring to a single network operation as a "commit". Problem is, for a system where a commit is NOT a network operation, the table's role gets confusing - it's no longer a nonvolatile activity log, but becomes part of the repository content-tracking system. Really, if you look at the table schema, it's far better suited to being a content tracker than a pure log - generally (snarkily) speaking, you're doing it wrong if your application logic has foreign keys in the logging table.

dww’s picture

sdboyer and I hashed this out in more detail in IRC. To summarize:

- VersioncontrolBranch and VersioncontrolTag don't currently know anything about datestamps when the branch/tag they represent were created.

- Having tags and branches interleaved in the commit logs would be a major UI win when viewing the VCS activity for a given project. In fact, cvs.module would be doing this already if there were more of me and/or I had more time. We already collect all the data we need (there's a timestamp field in our existing {cvs_tags} table that's accurately maintained), I just never had time to fix the commit log pages to include the tags at the right spots. :( #339054: Store datestamps for tags so that CVS commit listings can interleave tags, too. for the interested reader.

- Although VCAPI has historically suffered from over-abstraction and over-generalization, the idea that every tag/branch has a date when it was created is both universally true across all VCS systems and useful VCAPI-wide. So, punting this into the backends would be a bad idea.

- Sure, in Git every tag just points to a commit, and we already know the date of that commit, so it's duplication to store the datestamp again, but that's really a backend implementation detail that the core API shouldn't have to care about. Different VCSs have different low-level implementation details of how they're stored. However, conceptually, every tag always has (at least) 3 things: 1) a name, 2) a set of files and versions it's pointing to, 3) a time it was created. CVS has 1 stupid way of storing this (writing another line to every single file that the tag touches). Svn has a different stupid way of storing this. Git has a pretty smart way. Bzr has a different way which I don't as much of the details about, but I bet it's relatively close to Git. None of this matters to VCAPI itself. We just need to know the name and the date. I don't think VCAPI itself cares about all the revision info that the tag points to. That's for the VCS itself to care about. We're not trying to build our own repo browser here. But the name and date are useful and important.

- The real problem Sam is getting at with this issue is that currently, due to identity crisis (and perhaps poorly named variables/functions/classes) some of the VCAPI backends are using this whole "operations" thing to actually store the primary data. That's no good. I agree with Sam we want the primary data about commits in a commit-specific place, and branches and tags in their own place, too. The kinds of data we care about for these are different, and it's confusing to try to overly abstract them in this way.

- But, we still need to know when tags/branches were committed, so either we have to add datestamp info to the VersioncontrolBranch and VersioncontrolTag classes, or we need to keep something like the "operation" subsystem. If you're trying to build a view of per-project VCS activity, it's going to be a pain in the neck to have a single view that interleaves commits and branches/tags in any kind of performant way. To prevent DB server killing queries from extremely complicated views (or having two separate views that you try to merge in PHP), you're going to end up needing a denormalized table somewhere.

We all agree that commits live in VersioncontrolCommit, branches in VersioncontrolBranch and tags in VersioncontrolTag.

The question is:

A) Kill VersioncontrolOperation entirely, but ensure that the branch and tag classes/tables store datestamps so we can still do reasonable activity streams.

B) Keep a single table in the core API to record the activity stream, but ensure no one is storing primary data in there and the whole API makes it clear this is basically a denormalized place to just keep the history timestamps in an easy-to-access format.

Ultimately, the denormalized table we're going to want to actually build most of the activity stream views is also going to need a project nid key in it. So that should probably live in vc_project, not vcapi itself. That'd be an argument in favor of (A). Don't complicate the core API with a premature optimization that doesn't actually solve the end problem we're going to face...

marvil07’s picture

Thanks for the great summary!

I just want to add a little comment that IIRC Sam and I are more less in consensus about this:

B.1) lazy loading data from external tables(activity/operation) on the tag class through some getters methods sounds reasonable to me

But, like mentioned, it would have problems with direct views integrations, I mean yeah, we would need to add a join to get the right timestamp from the external table, and I really do not know how much ould cost us in performance.

eliza411’s picture

Issue tags: +git phase 2, +git sprint 8

Tagging git phase 2 and git sprint 8 since it's cited as a pre-requisite for a vc_api release.

sdboyer’s picture

This has evolved quite a bit since the summary dww posted in #4. Short summary:

We're going to retain the versioncontrol_operations table, but move the VersioncontrolOperation class over to be VersioncontrolCommit. So it's basically just a bunch of string updates.

As for activity streams, that's going to be done by integration with Activity module. That makes sense because it also gets us integration with Rules, which we used to have (and we want anyway). That's being done over in #879600: Meta: introduce an activity stream separate from commit logs.

I'm working on this patch now.

sdboyer’s picture

Status: Active » Postponed
Issue tags: -git sprint 8 +git sprint 9

OK, I just grepped - around 360 references to 'operation' and 40 references to VersioncontrolOperation. That makes this more trouble than it's worth right now, there other things that really NEED doing.

Consequently, postponing to sprint 9. That may or may not mean delaying an official release of versioncontrol until sprint 9, but if it's JUST for this issue, then I'd be OK with that.

sdboyer’s picture

Issue tags: +git low hanging fruit

This could reasonably be called low-hanging fruit. It's got some grok time and whoever does it would definitely need to ask me questions about it, but it's still doable.

Jonathan Webb’s picture

Assigned: Unassigned » Jonathan Webb
Issue tags: -git low hanging fruit

Taking this issue. Some additional notes from sdboyer on IRC:

one important thing that's not noted in the issue, and will make a little more sense when you look through more: we actually probably need to retain the VersioncontrolOperation class, and create a VersioncontrolCommit class that extends it, rather than simply replacing VersioncontrolOperation with VersioncontrolCommit... there are still cases where we need to record/represent operation data for branches/tags, and we'll have no way to do it in the entity system if we simply switch VersioncontrolCommit in for VersioncontrolOperation

Jonathan Webb’s picture

Status: Postponed » Needs review
Issue tags: -git sprint 9 +git low hanging fruit, +git sprint 8
FileSize
24.23 KB

This patch adds VersioncontrolCommit, VersioncontrolCommitControler, and VersioncontrolCommitUnitTestingTestCase classes. The VersioncontrolOperation* classes are still in, but where things looked relevant I switched them from operation-centric to commit-centric. VcOp's hooks are switched to more generic "operation" hooks (presently not documented in the api file, because I wasn't sure if they are really needed or not) and the "commit" hooks have been moved into the VcCommit class. The drush command is now commit-centric. The submodules haven't been touched.

Please let me know what additional work it needs!

sdboyer’s picture

Status: Needs review » Needs work
FileSize
3 KB
19.8 KB

Wow, you got on that quick. Awesome. Some notes:

  • VersioncontrolOperation::getItems() should probably stay where it is so that there's an implementation for tags, too. For non-atomic systems like CVS (ugh...) i spose it's legit to want to retrieve a set of files that are affected by a branch or tag op. Of course, our datastructure doesn't really support that very well at ALL right now, but the method should still stay there as a backend needing that functionality would presumably add to the datastructure and include that additional data in their own controller.
  • I made VersioncontrolCommitController just extend VersioncontrolOperationController to reduce some code duplication. Also rejiggered some of the logic there - the generic operation won't need to attach label data in extendData() the same way because it's already guaranteed to be working on a label.
  • We maybe should keep VersioncontrolUserMapperInterface using VersioncontrolOperation, but...meh, screw it. I'm OK with saying that VCAPI only really accredits information about commits. I think. Gotta ponder that one a bit more.
  • Ahh, you updated theme_versioncontrol_user_statistics_table()...one of those really, REALLY crufty and decrepit old pieces of code from the 1.x days. Well, that's good anyway, we can delete it later if really need be.
  • Patch is kinda missing a rather crucial file - includes/VersioncontrolCommit.php :P
  • I'm still counting some ~350 references to 'operations' throughout the project when I grep. That isn't necessarily wrong, since we're not getting rid of it, but it does seem kinda like commit should be taking over a bit more.
  • In line with the last point, updating the docs/api.php would also be pretty excellent, though don't kill yourself on it, they're already horrendously out of date.

Good start, in any case. I've posted a counterpatch with some of these items taken care of. Both an incremental version against your last patch and a full patch against master.

Jonathan Webb’s picture

Status: Needs work » Needs review
FileSize
30.84 KB

Whoops I remembered to fakeadd the new files, but forgot to do the diff arg to include them in the patch. Below is the corrected patch, including your mod to the controller.

As for the grep of operation - For making my changes I ran `grep -RiI "VersioncontrolOp" /path/to/vcapi` to update the class references and `grep -RiI "[\$'>l]operation" /path/to/vcapi` to search for other operation related references (and generally scanned files for "operation" while editing. Reasons you're seeing operation in there:

  • I haven't touched commitlog, commit_restriction, or versioncontrol_fakevcs code (I figured these could wait until the class change was approved).
  • VcOp classes & respective tests were retained
  • Backend adds 'commit' entity & controller but doesn't remove 'operation'
  • Tables: versioncontrol_operation* were not renamed
  • There is some batch processing code in there where the term "operation" is unrelated to vcapi
marvil07’s picture

I have made a first pass to the last patch, and it looks fine. Thanks Jonathan!

I am going to do a second pass looking for more places where renaming/using commit class makes sense.

BTW We should not have to forget that we need another issue or at least another commit in git backend after this change got in.

sdboyer’s picture

Yes, very true - we'll need to add a VersioncontrolGitCommit class, and will also need to update the parser logic for branch/tag creation so that they continue to use VersioncontrolGitOperation, but it all still works properly...ugh.

sdboyer’s picture

I'm running this on staging now, just to give it a testbed. It seems to be working. Kinda miraculous, since we don't also have the updates to versioncontrol_git.

sdboyer’s picture

We started having some problems yesterday, but it's not entirely clear whether or not this patch running was to blame. I'll give this a window of running again probably tomorrow, see if it passes muster.

sdboyer’s picture

Status: Needs review » Postponed
Issue tags: -git phase 2 +git phase 2 leftovers

Yeah, no time for this now. We'll revisit (hopefully pretty much immediately) after launch.

sdboyer’s picture

Version: 6.x-2.x-dev » 7.x-1.x-dev
Issue tags: +vc-next

this remains important, especially given that we have the event system now, but it's still not NECESSARY. so, vc-next.