as part of the new release system (http://drupal.org/node/77562) we need to have access to info about all CVS tags and branches in each project. the attached patch adds a {cvs_tags} table to the cvs.module, and changes the xcvs-* scripts so that as cvs tag commands happen, we record them in the table for each directory that's part of the tag command. we remove entries from the table when tags are deleted. we also keep track of if the tag is a branch tag or not, since we'll need to know that in various places, too.

the only thing missing here is a way to import all the existing branches and tags. ;) at this point, i'm just planning to do that as a separate, stand-alone script that you'd run once. to be really complete, i suppose i should fix the automatic log parsing code in cvs.module to also populate the {cvs_tags} table. but, even if i did that, it wouldn't help us on d.o, so i'm going to need the separate script, anyway.

but, if someone else wants to take a look here and review what i've got so far, that'd be swell. ;)

thanks,
-derek

Comments

dww’s picture

"what about the 'branch' field in the {cvs_files} table?" you might ask... ;)

well, a few problems:

  1. that's only branches, not regular tags -- we'd only see stuff when folks commit to a given branch, not when they create the branch, or when they add non-branch tags (i.e. to tag a specific release). this is the primary reason why we need this new table, and the new way to populate the data as the tags are coming in.
  2. that's per file, and we need to know the global view, per project. this is more of a minor thing, but having the new table will simplify a bunch of related code, too.

... just in case you were wondering. ;)

dww’s picture

StatusFileSize
new10.44 KB

another step closer to RTBC -- now the automatic log importing code knows about {cvs_tags} and will populate that table, too.

so, in theory, the update path on drupal.org could be to just run all that stuff. however, i'll probably want to just split out this part of the functionality into a separate function or script as a 1-time thing for d.o...

dww’s picture

StatusFileSize
new10.36 KB

new patch that applies cleanly after a little cleanup in nearby code

dww’s picture

Priority: Normal » Critical
StatusFileSize
new15.08 KB

new patch that provides:

  • an initial import method for slurping in the tag data, in addition to the stuff i already wrote to dynamically push tag info into the DB via the xcvs-taginfo.php script as the tags are created/removed.

i ran xcvs-import-tags.php on a complete snapshot of the d.o cvs repositories on my laptop, and it took about 7 minutes. not ideal, and we might be able to optimize, but i'm not sure i really care. for example, one thing we could do is use the following workflow:

  1. install the new cvs.module and run the update to add the {cvs_tags} table
  2. install the new xcvs-taginfo.php script (along w/ the change to xcvs-config.php and CVSROOT/taginfo, as described in the xcvs/README.txt file)
  3. at this point, we'll be collecting data for any new tags that are added
  4. take a snapshot of the repo and d.o DB (e.g. on scratch.d.o) and run xcvs-update-tags.php there
  5. merge the two {cvs_tags} tables

i'm not sure how easily we could optimize this 7 minutes, it's just an expensive operation on a ton of data. we basically have to dump the entire cvs log history on all files, and parse through all of it.

i suppose i could try a more crafty approach where we only query all the existing nodes in {cvs_projects} (which includes the cvs-related data for each project), make some assumptions about what the files will be called, and try to only slurp log info on a subset of the files. but, i don't think it's worth my time to write/test all that code, when i've already tested this, and there's a fairly easy work-around for the 7 minute update problem. also, the update will probably go about twice as fast on the d.o hardware, so we're only really talking like 3 or 4 minutes...

anyway, i've tested this, it's good to go, and it's a prerequisite for the new release system, so getting this committed and installed on d.o is a top priority...

can i get a final review from someone before i proceed?

thanks!
-derek

dww’s picture

Status: Needs review » Reviewed & tested by the community
StatusFileSize
new17.66 KB

final patch with a few enhancements:

  • xcvs/README.txt talks about xcvs-import-tags.php
  • added a setting in xcvs-config.php to disable tags during the import
  • bumped the internal version number of xcvs scripts, since this is really a new version.

after a "a passing look..." by killes, this is RTBC. ;)

dww’s picture

Status: Reviewed & tested by the community » Fixed

committed to TRUNK and 4.7. i'm working with killes in IRC to get this installed and imported on d.o right now... i'll set this to closed when that's all done.

dww’s picture

i was curious about the KEYs i defined, and asked killes... i thought if you have a PRIMARY KEY(nid, tag), but then you try to select on just nid, you can't use the key, since the key is specifically on both values. i know, for example, that this primary key allows you to add 2 rows with the same nid and different tags, but not 2 rows with the same nid and same tag...

upon further investigation, selecting on nid works fine. the problem comes when you want to select on tag:

http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html

"If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to find rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3). MySQL cannot use a partial index if the columns do not form a leftmost prefix of the index."

so, i removed the extra KEY(nid), but left the one on tag... committed to TRUNK and 4.7.

dww’s picture

Status: Fixed » Closed (fixed)

everything is now installed on d.o, and the historical tag data has all been imported.
one step closer to the new releases system -- done! ;)