Closed (fixed)
Project:
Localization server
Version:
6.x-3.x-dev
Component:
Code
Priority:
Critical
Category:
Task
Assigned:
Reporter:
Created:
31 Oct 2013 at 15:07 UTC
Updated:
13 Feb 2014 at 15:30 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #1
SebCorbin commentedIt seems that we lost some taxonomy terms regarding translations (e.g. https://drupal.org/project/fr) in the d.o upgrade, the term id was 29 and is now gone https://drupal.org/taxonomy/term/29
This is holding the two first steps listed here for release fetching.
The only other way I see now is to filter out projects both marked as "Obsolete" and "Unsupported".
Is that ok?
Comment #2
gábor hojtsyI think its fine to filter those out.
Comment #3
tvn commentedComment #4
gábor hojtsyClosed #2139775: Commerce economic not listed on l.d.o as duplicate of this one... Now with so much time passed, this is getting to be a problem.
Comment #5
cheatlex commentedHow can one help?
Comment #6
gábor hojtsy@SebCorbin: I looked at the query before/after your patch (http://drupalcode.org/project/l10n_server.git/commitdiff/583b6eae7c8109f...) and looks like our db query user does not have access to the taxonomy_index table. There is no term_node table anymore (I assume due to D7 upgrade), so no way to get data from there either. Trying to get access to that table now :)
Comment #7
gábor hojtsyOpened #2148907: Give read access to localize_ro user to file_managed.
Comment #8
gábor hojtsyComment #9
gábor hojtsyAlso closed #2151225: The project d4os is not listed on localize.drupal.org as duplicate.
Comment #10
gábor hojtsyOk, the permission is now granted on taxonomy_index. I did some more digging and improved sebcorbin's patch. The project usage sync will not work as the timestamp max query runs into a filesort and mysql resets the connection on us for that, heh :D So I commented that out. That is a sacrifice if at least the release synching would work but that does not work either :/ Looks like the release files are not in the files table anymore or some other table is not tracking them anymore... That needs more looking into.
I also provisionally put in the Oct 31 last release timestamp to the last sync so it will pick up all releases even though our last erroneous attempt knocked that number over on the live site :)
Unfortunately I am out of steam for tonight but will look into this on Monday.
Comment #11
gábor hojtsyJust got an IRC report from @csakiistvan that Drupal 7.24 with l10n_update will not download proper translations either since 7.24 is not on l.d.o yet :/
Comment #12
gábor hojtsyAlso marked #2153015: Drupal core 7.24 not listed on l.d.o as a duplicate.
Comment #13
gábor hojtsyDid more debugging. Figured out we cannot gain results from our join on [files] since that is a D6 leftover table. D7 has file_managed. Doh. Updated my issue to ask for access to that too: https://drupal.org/comment/8293461#comment-8293461
Comment #14
gábor hojtsyPinged @nnewton again about https://drupal.org/comment/8293461#comment-8293461
Comment #15
hass commentedFor about 3 months we have no updated po files on ldo. Isn't there nobody who can fix this issues, PLEASE? :-(((
Comment #16
gisleI also find the present situation problematic. Since there has been no progress for more than a month, I suggest that, as a stopgap solution, project maintainers are allowed manually upload po files for specific releases.
Comment #17
dydave commentedI've also been following this issue with a lot of interest and getting anxious to see some progress to unblock this annoying situation.
It seems there has been some developments lately with #2148907-9: Give read access to localize_ro user to file_managed and Gábor should now have proper access rights which should perhaps allow this issue to keep moving again.
@Gábor Hojtsy
With the recent update of the access to file_managed, would you be able to get this issue to move forward again?
Would there be anything we could potentially work on to assist you, to allow you to do the work we can't?
Thanks very much in advance.
Comment #18
gábor hojtsyI share your frustrations. I spent several hours yesterday trying to untangle this only to be more depressed about the situation :( The extent to which we need changes is way bigger than I thought. I thought I fixed several queries and only needed the file_managed table to be accessed, and then figured out yesterday that the tables we have access stopped collecting data on October 31st and now data like project shortnames or dev/stable designations are at entirely different places. Eg data we used to get from the project_release_nodes and project_projects tables are now in field_data_field_project_machine_name and field_data_field_release_build_type. :/
Earlier I also found out that the data for project usage reached a treshold where our queries would resort to filesort, which would bring down the database.
So I think if we want to get this fixed sooner than later, we need to give up most syncing functionality that localize.drupal.org used to have (eg. ordering projects by usage or having real project titles). Which is pretty sad. I mean that you would find some (old) projects under their human titles and new ones under machine names only is quite odd. But the data provided publicly by drupal.org is not sufficient to get this type of information with use of any reasonable amount of resources.
The offered solution from the drupal.org upgrade team was/is a single TSV file at https://drupal.org/files/releases.tsv which provides all releases ever made. This can be used to create the release info on l.d.o, with some assumptions about where the files are put by the packager (there are no filenames in this dump). It cannot be used to collect real project names (project node titles), project usage data, and it cannot be used to tell if a project / release was unpublished (eg. security unsupported). These are things the current code does.
At least from a stop-gap perspective the https://drupal.org/files/releases.tsv data could be used to save raw project info (machine name basically) to l.d.o's db, so people could search for their projects with that and it can be used to set up minimal info about releases as well. We can keep using old project usage data for a while until people start complaining about that too.... Not having that is probably not that big of a problem ATM than not having any new projects or releases available.
Since we will not get to where we wanted to be with asking for db table access one by one, I think everybody can help move this forward who wants to dedicate some time and know PHP :) The task would be to update the logic in http://drupalcode.org/project/l10n_server.git/blob/refs/heads/6.x-3.x:/c... based on data from https://drupal.org/files/releases.tsv and gut out all the stuff that cannot be done with this data (ie. project status tracking, real project titles, tracking of project usage, tracking of file hashes).
Who wants to help?
Comment #19
SebCorbin commentedI've taken the skeleton from #2100597: Add a connector for www.drupal.org’s REST API and adapted it as per #18
Note that for now, I put
WHERE p.connector_module IN ('l10n_project_drupalorg', 'l10n_drupal_rest_restapi')in the parsing process, but this may be useless, thoughts?Comment #20
SebCorbin commentedPatch applied on http://syncing-localize.redesign.devdrupal.org/, accessible with drush uli and devwww.drupal.org access
Installed l10n_drupal_rest module and enabled connector (the other one is not visible), ran cron (this took a while)
Took this example #2176591: Project jQuery Nicescroll not listed on l.d.o which was not listed before and then http://syncing-localize.redesign.devdrupal.org/admin/l10n_server/project...
Cron after cron, you can see parsed releases, and they become available in the translate interface http://syncing-localize.redesign.devdrupal.org/admin/reports/dblog
Comment #21
barrapontoI'd love to chime in, but how to start working on that?
Should I set up a l10n_server instance and try the patch above to see if it works?
(would it require an awful lot of CPU/RAM to replicate l.d.o locally for development?)
Comment #22
gábor hojtsyThanks sebcorbin for jumping on this. At least this will get us a minimum level to move forward :) Some things to fix before we deploy:
I think this is the kind of stuff that we cannot do anymore, at least I would not parse this multi-megabyte file in its entirety, sounds like a recipe for problems... :/ Also if we don't remove/disable anything in this code, since we don't really know the project status, the rollout of this would be more safe :D
I think downloading the whole file and parsing the whole thing would be problematic. IMHO we should read the tsv line by line and stop after the last sync timestamp minus a day. That should mean we only read a few hundred lines at most (except the first run now that we need to pick up all our missing stuff).
Add a @todo that titles need to be grabbed from somewhere *later*. Not blocking this patch at all.
I don't think this is relevant if we only consider new releases, since there are no new translation releases allowed.
Comment #23
baluertlI also want to help your heroic efforts guys. Currently I'm blessed with dozens hour of freetime, but with limited technical possibilities. Please count with me for any browser-based (e.g. clicking through l.d.o for testing) or textfile-editing (eg. processing .csv/.tsv dumps) tasks.
Comment #24
SebCorbin commentedHere's the updated as per #22
Comment #25
gábor hojtsyLooks good. If you find it works well on staging, it looks good to me to deploy. Thanks for jumping on this so fast.
Comment #26
SebCorbin commented"So fast" => 3 months late ;)
Unfortunately, I get this error on the server
Thank my company for working on this, this would not have been possible without them (btw, if you have spare clients, we are open :p)
Comment #27
gábor hojtsyOh, well, sorry for the detour then. Let's get back to the drupal http request code BUT don't disable projects and only look at the first needed part of the file (to be quicker and use less memory). Hopefully the file will be OK size for a while.
Comment #28
gábor hojtsyIn the meantime the tsv changed to include the project full name, so we can create it with that. See http://drupalcode.org/project/infrastructure.git/commitdiff/311211d. Also with this the size of the tsv changed from 2.9MB to 4.4MB, so a pretty huge increase :/ Hope this will not mean problems for downloads for a while...
Comment #29
SebCorbin commentedI have not the courage to update existing project titles so they will be updated as soon as they have new releases.
Also, I've switching to using column headers for data since the patch to release-list.sh in infra changed the order of the columns.
Comment #30
gábor hojtsyI think we can call this fixed. @Sebcorbin amazingly rolled this out and thousands of releases are now in the queue to parse. Will take some time to catch up with parsing, but now its running. Still 751 releases in the queue but yesterday it was above 2000, so its running well seems like :)
Comment #31
tvn commentedSebCorbin++ !
Comment #32
baluertl@SebCorbin, I wish once we could meet on a Drupal-event, to say Thank You personally :)