Problem/Motivation

When a label is removed, the commits that end up orphaned(without an association) are not being removed.

Proposed resolution

Avoid indexing by labels by name (details on comment 4).

Remaining tasks

  • Figure out why hook_update_N() is not finishing correctly on big datasets.

Original report by solotandem

Problem/Motivation:
I accidentally pushed a whole bunch of commits to a new 7.x-2.x branch of grammar parser. My intention was to simply push the first one.

Proposed resolution:
Please delete all commits to this branch. My understanding is this is allowed/doable as a release has not been made from this branch.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

solotandem’s picture

Before filing this issue, I had executed: git push origin :7.x-2.x. The response seemed successful, but refreshing the d.o. page seemed like nothing had happened. After several minutes, the branch disappeared at this page, but the commits still show at this page. Perhaps these will disappear too after some time. If not, please correct my error.

marvil07’s picture

Title: Delete commits to 7.x-2.x branch of grammar parser » One-lable operations are not removed after the label is removed
Project: Drupal.org infrastructure » Version Control API
Version: » 6.x-2.x-dev
Component: Drupal.org module » API module
Category: support » bug

Thanks for reporting!

Uhm, there could be two reasons:
- some cache
- some bug in versioncontrol

I would try to reproduce this, but I guess it will be tomorrow,

Self note: to reproduce I think it will be:

  • create brach with new commits
  • parse
  • remove branch
  • Verify if versioncontrol operations removed?
marvil07’s picture

Title: One-lable operations are not removed after the label is removed » One-label operations are not removed after the label is removed

typo

marvil07’s picture

Priority: Normal » Major
Status: Active » Needs work
FileSize
1.05 KB

And there was a bug in the code, actually versioncontrol_git assumes that VersioncontrolOperation::labels array is indexed by label_id, instead of by name.
But, since some vcs(e.g. git) allows the same name for a tag and a branch, we should not index by label name, so, if we use something to index by, it should be by label_id.

I have changed that, and I have seen that it works as expected(details on the patch) and also all test are passing correctly.

Anyway, we still need to:

  • Make a full pass to the code, to see if we are assuming that operation labels data member is indexed by name, and if so, change the code.
  • Create a hook_update_N() to remove orphaned operations. Hint: use sandbox(we can have lots of operations).
sdboyer’s picture

Assigned: Unassigned » sdboyer

ooh, yeah. good call, we can't safely index by name there, there really is no guarantee of uniqueness. and if all tests pass, then i'm happy - this is exactly the sort of thing where i'd want to make sure that all of the push logic still works after we make this core api change. SO glad we have the tests, or i'd lose sleep over this. at the same time, this is a case that we should have caught in those tests - i wonder if we didn't because it's one of the ones that i only wrote docblocks for, but we never actually got to writing properly.

i'll try to get to this as soon as i can, the fact that it's causing commits to erroneously hang around is problematic. hopefully friday.

tim.plunkett’s picture

Status: Needs work » Needs review

#1216596: Deleting feature branch confuses history synchronizer was closed as a duplicate (even though it was open before this :))

marvil07’s picture

Status: Needs review » Needs work

@tim.plunkett: Yep, sorry about that, I just used this issue because there's already code here :-)

Back to NW, as we still need to do what's indicated on #4

@sdboyer: Did we forget to mark this as blocker?

marvil07’s picture

Assigned: sdboyer » Unassigned
Issue tags: +versioncontrol-6.x-2.0-release-blocker

I guess it should be a blocker.

marvil07’s picture

Assigned: Unassigned » marvil07
marvil07’s picture

Make a full pass to the code, to see if we are assuming that operation labels data member is indexed by name, and if so, change the code.

After taking a closer look, I can see we are not assuming it on the code at vc. But inside vc_git there are some places, but only at tests, so it's OK after some minor changes there.

This patch only adds some documentation changes.

Still pending:

Create a hook_update_N() to remove orphaned operations. Hint: use sandbox(we can have lots of operations).

marvil07’s picture

Version: 6.x-2.x-dev » 7.x-1.x-dev
Status: Needs work » Needs review

Patches on d7 first, then d6. NR only for bot.

marvil07’s picture

The new patch contains the hook_update_N() to remove orphaned commits.

I was trying some ways to group commits, but by repository seems to be the most natural.

From git6staging(as why I was thinking grouping can be relevant):

mysql> SELECT COUNT(DISTINCT o.vc_op_id) FROM versioncontrol_operations o LEFT JOIN versioncontrol_operation_labels ol ON o.vc_op_id = ol.vc_op_id WHERE ol.label_id IS NULL;
+----------------------------+
| COUNT(DISTINCT o.vc_op_id) |
+----------------------------+
|                     138799 | 
+----------------------------+

@sdboyer: can you take a look at this before adding it to upstream?

marvil07’s picture

Sorry, some errors in the last patch.

marvil07’s picture

Assigned: sdboyer » marvil07
Status: Needs review » Needs work

This works, but it seems to not be enough for d.o(memory exhaustion), current plan to change the approach:

  • Disable caching on every backend. See if that's enough.
  • Do not store all vc_op_ids on sandbox. Use the query every time, with a limit clause, storing last processed vc_op_id on sandbox.
marvil07’s picture

Status: Needs work » Needs review
Issue tags: +Needs backport to D6
FileSize
3.53 KB
5.24 KB

This patch includes both changes mentioned on the last comment. It should be enough, tried locally and it's ok, pending to backport to D6, that should be trivial.

Current batch: 250 commits. let's see if that is good for d.o, if not decrease/increase it. Pending for testing on git7staging or git6staging.

marvil07’s picture

Actually, remove debug code and add a d6 version of the patch too.

marvil07’s picture

+++ b/versioncontrol.install
@@ -1585,3 +1585,69 @@ function versioncontrol_update_7102() {
+      $commit_ids = array_values($commits);
+      $commits = $repository->loadCommits($commits_ids);

:-/ typo bug was causing to load all operations, which blew up memory, naturally.

+++ b/versioncontrol.install
@@ -1588,3 +1588,71 @@ function versioncontrol_update_6325() {
+  $sandbox['#finished'] = empty($sandbox['max']) ? 1 : ($sandbox['progress'] / $sandbox['max']);

D7 changed the API related to incremental hook_update_N's, on D6 we need to set '#finished' on the return array, not on the sandbox as in D7. ughr

These patches seems like ready, will try too find a good value for the batch process limit(currently 250) on git6staging, but these should be working.

marvil07’s picture

Issue tags: -Needs backport to D6

I have run this hook_update_N with different batch sizes, and I think I will go with 5000, see what I could test on git6staging:

+----------------------+----------------+--------+
|batch| commit range   |delta| time to  |commits |
| size| runned         |     | run      |per sec |
+----------------------+-----+----------+--------+
|  250| 138532 -132431 | 6101| 5m24.609s| ~18,79 |
|  500| 132431 -114934 |17497|10m50.437s| ~26,90 |
| 1000| 114934 -105708 | 9226| 3m35.272s| ~42,85 |
| 5000| 105708 - 95440 |10268| 4m24.545s| ~38,81 |
| 5000|  95440 - 62384 |33056| 9m35.458s| ~57,44 |
|10000|  62384 - 37619 |24765| 9m41.473s| ~42,59 |
+-----+----------------+-----+-------------------+

php's memory_limit is 256M on git6staging, so I think 5000 will be just fine for any environment.

After the run finished on git6staging and verify it completely I will add last patches(changing batch size) to both 7.x-1.x and 6.x-2.x

marvil07’s picture

Status: Needs review » Needs work
Issue tags: +Needs backport to D6

I have tried this on git7staging too and there is still one bug pending to be solved, I can replicate the bug in local with git7site versioncontrol data.

There is always some operations that are not being removed, a really small amount, 34 of 129389, but still needs to be fixed to let the hook_update_N finish correctly, or change end condition in some way.

marvil07’s picture

Issue summary: View changes

Add link to project.

marvil07’s picture

marvil07’s picture

Assigned: marvil07 » Unassigned
Issue summary: View changes

It has passed a long time without news here, so I have added the patch from comment 10: basic fix without fixing existing data.

  • Commit 0e48342 on 7.x-1.x, drush-vc-sync-unlock by marvil07:
    Issue #1704322: Use label_id as the index value for operation labels...

  • Commit 0e48342 on 7.x-1.x by marvil07:
    Issue #1704322: Use label_id as the index value for operation labels...

  • marvil07 committed 0e48342 on d8port
    Issue #1704322: Use label_id as the index value for operation labels...
marvil07’s picture

Issue tags: -Needs backport to D6