Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Problem/Motivation
When a label is removed, the commits that end up orphaned(without an association) are not being removed.
Proposed resolution
Avoid indexing by labels by name (details on comment 4).
Remaining tasks
- Figure out why hook_update_N() is not finishing correctly on big datasets.
Original report by solotandem
Problem/Motivation:
I accidentally pushed a whole bunch of commits to a new 7.x-2.x branch of grammar parser. My intention was to simply push the first one.
Proposed resolution:
Please delete all commits to this branch. My understanding is this is allowed/doable as a release has not been made from this branch.
Comment | File | Size | Author |
---|---|---|---|
#17 | label_id-as-op-labels-index-1704322-17.patch | 6.07 KB | marvil07 |
#17 | label_id-as-op-labels-index-1704322-17-D6.patch.txt | 6.21 KB | marvil07 |
Comments
Comment #1
solotandem CreditAttribution: solotandem commentedBefore filing this issue, I had executed:
git push origin :7.x-2.x
. The response seemed successful, but refreshing the d.o. page seemed like nothing had happened. After several minutes, the branch disappeared at this page, but the commits still show at this page. Perhaps these will disappear too after some time. If not, please correct my error.Comment #2
marvil07 CreditAttribution: marvil07 commentedThanks for reporting!
Uhm, there could be two reasons:
- some cache
- some bug in versioncontrol
I would try to reproduce this, but I guess it will be tomorrow,
Self note: to reproduce I think it will be:
Comment #3
marvil07 CreditAttribution: marvil07 commentedtypo
Comment #4
marvil07 CreditAttribution: marvil07 commentedAnd there was a bug in the code, actually versioncontrol_git assumes that VersioncontrolOperation::labels array is indexed by label_id, instead of by name.
But, since some vcs(e.g. git) allows the same name for a tag and a branch, we should not index by label name, so, if we use something to index by, it should be by label_id.
I have changed that, and I have seen that it works as expected(details on the patch) and also all test are passing correctly.
Anyway, we still need to:
Comment #5
sdboyer CreditAttribution: sdboyer commentedooh, yeah. good call, we can't safely index by name there, there really is no guarantee of uniqueness. and if all tests pass, then i'm happy - this is exactly the sort of thing where i'd want to make sure that all of the push logic still works after we make this core api change. SO glad we have the tests, or i'd lose sleep over this. at the same time, this is a case that we should have caught in those tests - i wonder if we didn't because it's one of the ones that i only wrote docblocks for, but we never actually got to writing properly.
i'll try to get to this as soon as i can, the fact that it's causing commits to erroneously hang around is problematic. hopefully friday.
Comment #6
tim.plunkett#1216596: Deleting feature branch confuses history synchronizer was closed as a duplicate (even though it was open before this :))
Comment #7
marvil07 CreditAttribution: marvil07 commented@tim.plunkett: Yep, sorry about that, I just used this issue because there's already code here :-)
Back to NW, as we still need to do what's indicated on #4
@sdboyer: Did we forget to mark this as blocker?
Comment #8
marvil07 CreditAttribution: marvil07 commentedI guess it should be a blocker.
Comment #9
marvil07 CreditAttribution: marvil07 commentedComment #10
marvil07 CreditAttribution: marvil07 commentedAfter taking a closer look, I can see we are not assuming it on the code at vc. But inside vc_git there are some places, but only at tests, so it's OK after some minor changes there.
This patch only adds some documentation changes.
Still pending:
Comment #11
marvil07 CreditAttribution: marvil07 commentedPatches on d7 first, then d6. NR only for bot.
Comment #12
marvil07 CreditAttribution: marvil07 commentedThe new patch contains the hook_update_N() to remove orphaned commits.
I was trying some ways to group commits, but by repository seems to be the most natural.
From git6staging(as why I was thinking grouping can be relevant):
@sdboyer: can you take a look at this before adding it to upstream?
Comment #13
marvil07 CreditAttribution: marvil07 commentedSorry, some errors in the last patch.
Comment #14
marvil07 CreditAttribution: marvil07 commentedThis works, but it seems to not be enough for d.o(memory exhaustion), current plan to change the approach:
Comment #15
marvil07 CreditAttribution: marvil07 commentedThis patch includes both changes mentioned on the last comment. It should be enough, tried locally and it's ok, pending to backport to D6, that should be trivial.
Current batch: 250 commits. let's see if that is good for d.o, if not decrease/increase it. Pending for testing on git7staging or git6staging.
Comment #16
marvil07 CreditAttribution: marvil07 commentedActually, remove debug code and add a d6 version of the patch too.
Comment #17
marvil07 CreditAttribution: marvil07 commented:-/ typo bug was causing to load all operations, which blew up memory, naturally.
D7 changed the API related to incremental hook_update_N's, on D6 we need to set '#finished' on the return array, not on the sandbox as in D7. ughr
These patches seems like ready, will try too find a good value for the batch process limit(currently 250) on git6staging, but these should be working.
Comment #18
marvil07 CreditAttribution: marvil07 commentedI have run this hook_update_N with different batch sizes, and I think I will go with 5000, see what I could test on git6staging:
php's memory_limit is 256M on git6staging, so I think 5000 will be just fine for any environment.
After the run finished on git6staging and verify it completely I will add last patches(changing batch size) to both 7.x-1.x and 6.x-2.x
Comment #19
marvil07 CreditAttribution: marvil07 commentedI have tried this on git7staging too and there is still one bug pending to be solved, I can replicate the bug in local with git7site versioncontrol data.
There is always some operations that are not being removed, a really small amount, 34 of 129389, but still needs to be fixed to let the hook_update_N finish correctly, or change end condition in some way.
Comment #19.0
marvil07 CreditAttribution: marvil07 commentedAdd link to project.
Comment #20
marvil07 CreditAttribution: marvil07 commentedComment #21
marvil07 CreditAttribution: marvil07 commented10: label_id-as-op-labels-index-1704322-10.patch queued for re-testing.
Comment #22
marvil07 CreditAttribution: marvil07 commentedIt has passed a long time without news here, so I have added the patch from comment 10: basic fix without fixing existing data.
Comment #26
marvil07 CreditAttribution: marvil07 as a volunteer commented