Hi,

I found, that my project's commit page (http://drupal.org/node/400662/commits) does not contains last days' commits, however I see them at drupalcode.org: http://drupalcode.org/project/xc.git/shortlog/refs/heads/6.x-1.x.

I used the commands available at git instructions page under the project page, and use
git push origin 6.x-1.x

Since the most recent commits' page (the /cvs) is destroyed, now I can not see whether it is a problem with other projects as well or not.

Regards,
Péter

Comments

eliza411’s picture

From inside your project directory, if you type 'git config -l' does the value of user.email match a confirmed e-mail address in your drupal.org account? See http://drupal.org/node/636570 for details.

pkiraly’s picture

Hi,

yes, the email is the same, it matches the registered email. More strange that I have commited into an other project (http://drupal.org/project/xc_installation), and everything is OK there. In that project I pushed the changes into master (the only branch in that project). It is possible, that pushing to a branch other than master cause this problem? I use the same email and user settings in both projects.

eliza411’s picture

Assigned: Unassigned » sdboyer

Thanks for checking the e-mail address and for the additional information. I'll see if @sdboyer can take a look.

By the way, the equivalent of http://drupal.org/cvs is, I believe, http://drupal.org/commitlog

marvil07’s picture

Sounds like some problem in the workers, I mean the parsing not triggered, or triggered and failed.

Fetching logs manually should solve the problem(breaking lock?), but that's a consequence solver, not really the problem solution, but should help here.

I do not have enough privileges on d.o to check this, so yeah, please take a look at this Sam.

pkiraly’s picture

Unfortunatelly my commits for XC project do not seem to appear in http://drupal.org/commitlog

pkiraly’s picture

Priority: Normal » Critical

Hi, I found another issue: I don't see my newly created tags at drupal.org only in drupalcode.org (see links in my first post), and because of this I am not able to publish a new release.

I set the priority to "critical" because it is critical for me, since today was my deadline to create this release. If you estima, that the issue is not critical, only from my point of view, please change its priority to another level.

sdboyer’s picture

Priority: Critical » Normal

@pkiraly - that definitely indicates a critical issue. I'll be looking into this as soon as I can.

sdboyer’s picture

Looks like a worker died in the middle of parsing your push, somehow; the repo was locked. Generally speaking this is a known problem, though it happens quite rarely - based on current push stats (around 6k since launch) and the number of locked repos (2 that I cleared), ~0.03% of the time. Something goes wrong with a worker, the parse job crashes, and our unsophisticated locking system never clears the lock. Though we have a bunch of issues open for means of resolving this in a more automated way, we haven't gotten to them yet - they fell to the back burner since this is a pretty rare occurrence.

In any case, I've cleared the lock, so the next push you make should fix everything up. When I have some time, I'll try to investigate what could have caused the problem.

sdboyer’s picture

Status: Active » Fixed
pkiraly’s picture

Dear Sam,

Thanks for your work, but I sadly report, that the problem is still alive.

I commited new changes, but I don't see them in drupal.org. I have removed the tab 6.x-1.1-beta1 I created friday, and recreated it after the commits. I see both my commits and the tag in drupalcode.org, but on drupal.org, I am still not able to create a new release based on this tag. When I click on add new release link, I get this page: http://drupal.org/node/add/project-release/400662, which says:

"No valid branches or tags found.
Your release needs to have a tag or branch that follows the naming conventions and have no other release attached to it."

I don't know why 6.x-1.1-beta1 is not mathcing the naming conventions. I guess, that the problem is, that d.o is still don't seeing the tag I created (and which is a valid name - I think).

So I still have the same problem, and the fix you created does not work in my case unfortunatelly...

Regards,
Péter

sdboyer’s picture

Status: Fixed » Active

Yeah, parse jobs on your project (the things that read the incoming pushes you make) keep blowing up somewhere midway through parsing your data. Something in it is causing an error. Hopefully I can set marvil07 on your repo and figure out what it is...in the meantime, sorry, please bear with us :/

sdboyer’s picture

Project: Git on Drupal.org » Version Control API -- Git backend
Version: » 6.x-2.x-dev
Component: User interface » Git interaction
Assigned: sdboyer » marvil07
Priority: Normal » Critical

moving & reassigning

pkiraly’s picture

Thanks! I just found, that one of the machines I use I misstyped the email address, and instead of .eu I typed ,eu. The commits are:
058ffa55019cf597817a54f0630f4c441e7951db
d5edb5fba5a9d059813bbf1b2efa3487b04a73aa
aa3aefca7959c55163cc29d26e1bbe0550f42dd6

I don't know whether it helps you, but hope so.

sdboyer’s picture

Interesting. No reason the commas would cause a problem, though.

The fact that marvil07 could parse your repo from clean indicates something probably got messed up partway through one parse job that's causing problems in all the subsequent ones. Not a big problem to fix - we purge it and re-parse from scratch - but I'm not 100% confident in the logic path that does that work just yet, so I'd rather wait until we fix it properly before I try it out on your stuff. It's the current thing I'm working on, but it's got several stages, so probably by the end of the week. Given that this isn't actually preventing you from using Git (even though it is preventing you from rolling releases), I'm thinking that's an acceptable delay.

pkiraly’s picture

Hi, that's OK. Thanks!

pkiraly’s picture

Hi, it is the next week, and I am still not able to publish a release, and still do not see the commits at the project page. As I said, it is a critical for me, and the project supporting my work. Please tell me is that anything I can do on my side which would help you to fix this issue.

sdboyer’s picture

Yep, I'm working on it. Not a lot anyone can do but me atm, unfortunately, but you're not the only one waiting. I'm still trying for the end of the day today.

sdboyer’s picture

Issue tags: +git repo reparse

tagging so i remember to reparse when i've finished this

sdboyer’s picture

Now that I finally got the changes in and can reparse from scratch...this repo still fails. Tried it locally, and it fails. And here's the error:

WD php: PDOException: SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xE1ly P\xE9...' for column 'author_name' at row 1: INSERT INTO {versioncontrol_git_operations} (vc_op_id,        [error]
author_name, committer_name, parent_commit, merge) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4); Array
(
    [:db_insert_placeholder_0] => 1722
    [:db_insert_placeholder_1] => Kir?ly P?ter
    [:db_insert_placeholder_2] => Kir?ly P?ter
    [:db_insert_placeholder_3] => 81afa8ca70104125b081b8c1170e662a8d1a0b3e
    [:db_insert_placeholder_4] => 0
)
 in VersioncontrolGitOperation->backendInsert() (line 22 of /Users/sdboyer/ws/dd/do/sites/all/modules/versioncontrol_git/includes/VersioncontrolGitOperation.php).

So, well, there's yer problem. Now we gotta figure out how to take care of it.

sdboyer’s picture

For the record, the problem commit is 7c78114258d837147129527052e2b9547ac102d7:

 commit 7c78114258d837147129527052e2b9547ac102d7
 Merge: 9dbd264 347e129
 Author: Kir<E1>ly P<E9>ter <pkiraly@tesuji.eu>
 Date:   Tue Mar 1 19:59:41 2011 +0100
 
     Merge branch '6.x-1.x' of git.drupal.org:project/xc into 6.x-1.x
 

That's UTF-8 incorrectly encoded with ISO-8859-1. Ugh.

Well, calling utf8_encode() on the offending values solves *this* problem, but not necessarily the whole problem. It highlights that we clearly CLEARLY need to be more robust in our input filtering. We're pretty insulated with this case (since it can only kill workers), but this is actually an attack vector. I don't understand string encoding well enough to know what the right solution is here, but we need to figure it out quick and apply it to everything that can is a potential entry point for these encoding mismatches.

marvil07’s picture

Title: different list of commits at drupal.org and drupalcode.org » Avoid non UTF-8 input from Git
Status: Active » Needs review
StatusFileSize
new1.28 KB

I have been trying to solve this, but I actually could not find the general way Sam mentioned.

Let me explain a little more.

Git is encoding agnostic by design(mentioned in git manuals), so several encodings are accepted if the user decide to use them. Naturally by default Git uses utf-8.

Several git sub-commands have a --encoding option that try to re-encode as the encoding passed in the parameter, but sadly it only applies to the commit message, not to the other values included on the custom formats. It is exactly the same for the %e placeholder available from several git sub-commands, since it refers to the encoding of the commit message.

In this specific case where this bug has been found, the encoding used for user.name seems to be ISO-8859-1, then using utf8_encode() works fine converting it to utf-8, but I did not tried with other encodings, so I am not completely sure if we can trust completely on utf8_encode() since I do not really know how that behaves.

In the other side I also investigated a little about drupal functions dealing with these problems, but the only candidate I found could be useful is drupal_convert_to_utf8(), but it requires that we pass it the encoding of the string we want to convert, and we really do not have a way to detect the encoding, so we can not use it(again %e placeholder only refers to commit message).

So, I am attaching the patch that fix the original bug here, but probably do not solve it if a non ISO-8859-1 and non UTF-8 string is passed.

Any suggestion is welcome.

marvil07’s picture

Another option, that do not really do the job, but can prevent us to attempt to insert unreadable characters on the database is using transliteration_get() from transliteration module, that will leave any character we decide instead of the non-utf8-encoded character, by default a ?.

Please note that transliteration_get() expects a UTF-8 string as the first parameter, so probably this is a bad idea, but not completely sure, so I just wanted to mention it.

sdboyer’s picture

re: detecting encoding: mb_detect_encoding() is the sort of thing we want...but it isn't a by-default enabled extension. iconv is enabled by default, but doesn't do detection. Sorry, I should have linked to that in the first place. I'm ok with making us just require it tbh, though if we do that we need to add a hook_requirements() thing to note the requirement.

If we solve the problem the way you've proposed in the patch, we end up stacking in thousands, even millions of additional function calls in large repos since we have to call it per line of output. A very good reason to switch over to stream-based output from our commands, instead of this rather icky line-by-line approach. That doesn't need to be a blocker to an immediate fix necessarily (especially given upcoming improvements to the sync system), but it is really a good reason to want to improve it.

pkiraly’s picture

I am thinking whether it would be solved if I created a new branch, and delete the old branch? Will the bad commits be removed?

sdboyer’s picture

Just doing that alone won't help; you have to excise the offending commit. The proper order would be:

1) delete the branch on d.o
2) excise the commits
3) re-push the fixed branch to d.o

You have to do it that way because we don't allow non-ff updates. I could help you with that at some point in IRC, but I'm pretty swamped right now, it'd be better if someone else could do it.

rfay’s picture

Note that currently, the 6.x-1.x dev release is not associated with *any* branch. Yuck.

But that means that we could create a new dev release; also means that we could possibly delete and recreate 6.x-1.x

pkiraly’s picture

With rfay's help, I can remove the problematic commits from 6.x-1.x branch. The problems are still existing. I have created another issue based on the #6 comment here http://drupal.org/node/1112384.

sdboyer’s picture

Title: Avoid non UTF-8 input from Git » Manage incorrect/mismatched string encoding gracefully

Better title

marvil07’s picture

Assigned: marvil07 » Unassigned
Status: Needs review » Needs work

Un-assigning this until I actually start working on it. Please feel free to continue it.

pkiraly’s picture

I have the same problem again, but now I don't know what caused this error. The sympthoms: I don't see the project's commits, and the newly created tag since March 31th on drupal.org (http://drupal.org/project/xc), however I see them on the drupalcode.org site (http://drupalcode.org/project/xc.git).

pkiraly’s picture

For the record: I see the following commits with wrongly formatted accents:

256399ce77b0220d5d407f351b68adb2485a7dfe Kir�ly P�ter pkiraly@example.com 2011-03-07 23:05:14 +0100
d597f03e089ae66698a4732e927588576dc7b629 Kir�ly P�ter pkiraly@example.com 2011-03-04 01:09:30 +0100
1c269ce42ba9d18727348dac7adba965f44e39b3 Kir�ly P�ter pkiraly@example.com 2011-03-02 15:43:40 +0100
7c78114258d837147129527052e2b9547ac102d7 Kir�ly P�ter pkiraly@example.com 2011-03-01 19:59:41 +0100
65c1ae49c8b74fb5c14f0fc55e2b0746b08f6584 Kir�ly P�ter pkiraly@example.com 2011-02-28 22:28:14 +0100

One of them -- 7c78114258d837147129527052e2b9547ac102d7 -- were mentioned previously, and then the code was rebased, and cleared out (with other commits as well). I don't know how did them get back. I have hade to make a mistake somewhere.

marvil07’s picture

chx’s picture

Just drop non-utf8 characters. The way utf-8 is built you can always recover from non utf-8 sequences.

Edit: iconv("UTF-8", "UTF-8//IGNORE", $text)

sdboyer’s picture

Committed a fix that incorporated iconv. It'll work at least for now, we can improve it more later.

sdboyer’s picture

Status: Needs work » Fixed
sdboyer’s picture

Issue tags: +needs drupal.org deployment

tagging for deployment

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

marvil07’s picture

Issue tags: -needs drupal.org deployment

This is now deployed.

  • Commit cb62f0c on 6.x-2.x, fix-invalid-default-branches, fullsync-memory by sdboyer:
    Issue #1078392: Fixed Manage incorrect/mismatched string encoding...