I've become interested in getting a number of Drupalcamp sites archived, in the UK we've had a lot of Drupacamps in the past year, and we're loosing valuable information because the sites are just 'disappearing'.
From reading around, I can see that a number of sites have been archived on the Drupal.org infrastructure, and are available at *.camps.drupal.org for example, this is discussed here: #1693890: Set up hosting for static copy of NJ 2012 camp site.
I'd like to help get sites archived and document the process, beyond the existing documentation here: http://drupal.org/node/27882
In #1693890: Set up hosting for static copy of NJ 2012 camp site there seems to be some discussion about domain names, specifically which one to use, in summary I think that:
- Having*.drupalcamp(s).org would be preferred and actually people could host the original site at this domain, and then cut over to a static version of the site after the camp without having to change the domain name.
Those domains are being squatted by people unwilling to give them to the Drupal community, at least this is the impression that I get from reading the issues about archiving sites.The drupalcamp.org domain has now been given to the Drupal community!- As a fallback for now, the *.camps.drupal.org domains are being used, with the actual sites hosted on the Drupal.org infrastructure.
When reading all this and working out what was going on and what had been done, I did a couple of searches for domain names, drupalcamparchive.org turned out to be available, so I bought it. I'd like to make it clear that I'm super happy to transfer this domain to whomever within the Drupal project/association if its wanted.
So my main proposal has two main points:
- Use the drupalcamp.org domain to serve static copies of the Drupalcamp sites, and have a nice directory of the sites archived at the top level domain.
- Move the hosting off of the Drupal.org servers, to github.
I see the primary advantage of point 1 being security. If these sites are on the same domain as drupal.org, then obviously they will have access to the cookies, and so any code deployed to the static sites must be reviewed for security issues throughly. If they are being run from a different domain, then they won't have to be reviewed so thoroughly. I'm really happy to be told: "No, we're keeping them on camps.drupal.org" though.
My second proposal, would be to move the actual hosting of the static sites to use github pages. Now, I know that hosting static websites is significantly easier than maintaining the infrastructure for drupal.org itself, but it seems that if we can offload the sysadmin for these sites entirely, then we should do so.
Each site would be contained in it's own github repo owned by a 'drupal-static' organisation, this would make it incredibly easy for people to set up new sites. And we can easily have multiple administrators for setting up the sites.
If we get this right, then we should be expecting a lot of requests to archive static sites, and so I'd be interested in making this as simple as possible for people involved.
Please be kind and gentle with your feedback, if we're totally happy with the current setup, then I'd just like to document all of that.
Comment | File | Size | Author |
---|---|---|---|
#28 | camps_host_static_site.patch | 3.28 KB | ricardoamaro |
Comments
Comment #1
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedThe only really important thing is, that these sites should *not* use *.drupal.org as a domain name due to our bakery configuration.
The current usage is a temporary solution that is way too work intensive since all the archives have to be looked through for potentially dangerous JS.
I don't care much _which_ other domain is used, the one you bought sounds fine to me.
I also don't care much where the static sites end up, on our infrastructure, on github, or elsewhere. Hosting the archive on our infrastructure is an offer that camp organizers can take up or not.
I am quite sure that I don't want to host live camp sites on our infrastructure. There are too many of them, even though the process is currently much better than a year ago.
Comment #2
gregglesMy sense is that the domain should be owned by the DA, Narayan has the account and passwords to do so and IMO should go ahead (even if the domain he chooses seems "bad" it will help motivate folks to get us one of the good domains).
I agree that the main holdup has been the review and security concerns and think that getting a new domain should eliminate those.
There is some more documentation/discussion of the process at http://groups.drupal.org/node/84644 which references http://drupal.org/node/871948#comment-3284576 - but those should probably be made into a real handbook page (though I'm not where where in the hierarchy it would go).
Thanks for your enthusiasm for this idea and your action so far!
Comment #3
Steven Jones CreditAttribution: Steven Jones commentedIf you want the drupalcamparchive.org domain, then I can set it up to be transferred.
Comment #4
Steven Jones CreditAttribution: Steven Jones commented@greggles Thanks for the documentation that you've added, it's exceptionally helpful.
I've mocked up a simple listing site that could live at:
http://drupalcamparchive.org
Which you can see at:
http://darthsteven.github.com/drupalcamp-archive/
It's easy to add new sites to this listing, as it's generated with Jekyll. I've just added some of the Drupalcamps that have happened so give some idea of what the site might look like. I plan to add documentation to the site too, and link into the d.o handbooks where needed, i'm keen to not duplicate or move documentation off of d.o, so there might have to be some duplication.
Comment #5
Steven Jones CreditAttribution: Steven Jones commentedI've now moved the site to an organisation on github: http://drupalcamp-archive.github.com/
Comment #6
gregglesHi Steven. So, the drupalcamparchive.info (or something similar) has been purchased. It should be easier to get these sites onto that domain now.
I would appreciate it if you remove the Colorado site from your domain or maybe point the link on the index page to http://colorado2010.camps.drupal.org/drupalcampcolorado.org/index.html - we don't want a duplicate content penalty for having that content in two places.
Comment #7
Steven Jones CreditAttribution: Steven Jones commented@greggles sorry, I was just showing what an index page might look like, rather than actually making the index page work. I've not set up hosting for any of the sites yet, as I figured that would be really rude if nothing else.
I will adapt my listing concept so that we can link off to existing domains too.
Comment #8
Steven Jones CreditAttribution: Steven Jones commentedAlso, to clarify, I do own the drupalcamparchive.org domain, but would be absolutely happy to transfer to the DA.
Comment #9
coltraneThis is good news https://association.drupal.org/node/17278
Comment #10
Steven Jones CreditAttribution: Steven Jones commentedAwesome!
Comment #11
Steven Jones CreditAttribution: Steven Jones commentedGreat, so we have better domain names now: drupalcamp.org, drupalcamp.com
So the next question is: Do we have an issue hosting these static sites on github, or are we sticking with hosting them on our own infrastructure? I've roughly documented how it would work if they were hosted on github here: http://drupalcamp-archive.github.com/get-site-archived.html
Comment #12
coltraneThanks for documenting that Steven.
I don't think we should offload the administration of these archived sites just because we can. Or at least until there's a case against doing so. While I could understand that the infrastructure team is overburdened I'd like to better understand how so and seek solutions first before jumping to a third-party.
Comment #13
gregglesAFAIK, the biggest blocker here was concern about xss. That's now gone :)
Comment #14
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedWe should discuss how we would want to work the creation of such archive sites.
1) Narayan already said that he wants to create a wildcard DNS entry, so there will be no need to create new apache configs.
2) We could create a jenkins job that would download and untar a tarball at a defined place on the server. The download could be from our own server, ie an upload to the issue queue.
This would allow us to replace any buggy tarballs/incomplete archives by simply re-uploading.
As always, I am open for alternative suggestions.
Comment #15
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedSo, we've had a discussion in IRC and the upshot is
1) Narayan needs to create the zone. He'll do so on Monday.
2) Somebody else can create the vhost in puppet, Narayan can review. Care needs to be taken to disable php , .htaccess and all other scripting.
3) Initially we'll move the existing camps by hand. We need to take care to do a redirect for these.
4) We can then start to upload any backlog, also by hand.
5) Then a jenkins job will need to be developed.
6) Running that job will need to be entrusted to volunteers. Maybe Steven would be interested.
7) We've discussed using our own git infra, however there are issues with it: a) all the contents would become GPL which may not be wanted. b) git is currently a bit of a bottleneck c) versioning of these static tarballs isn't needed.
8) nobody saw the need for external hosting.
Comment #16
Steven Jones CreditAttribution: Steven Jones commentedThanks for the discussions that you had.
Some points:
Comment #17
alesr CreditAttribution: alesr commentedHi guys,
@jredding pointed me here to ask you about the *.drupalcamp.org domain.
We are organizing a Drupal Camp Alpe-Adria in 2013 http://drupal.org/project/dcaa and while thinking of which domain name to choose the good news came out https://association.drupal.org/node/17278
The idea of having consistent domain name for Drupal Camps is just awesome and we would be very please to be a part of it. Where do we sign? ;)
Comment #18
greggles@alesr - I think that idea has some merit, but at this point it's not a high enough priority. If you want to help it become a priority it will take wittling down the infrastructure queue in any way you (and other interested parties) can.
Comment #19
Steven Jones CreditAttribution: Steven Jones commentedAlso, just want to add that if the domain was being used for 'live' sites then we'd be back into the security issue arena of shared cookies etc. so I'd imagine that we may not want to host live sites at all? (Even though the domain is indeed great)
Comment #20
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedThe shared cookie issue is mainly due to drupal.org's use of bakery for SSO. For a single site using e.g. foo.drupalcamp.org and having set their cookie domain to the same this wouldn't be an issue.
However, I am not sure we planned for assigning particular subdomains to individual camps not hosted on our hardware. We do not want to host live camps on our hardware.
Comment #21
alesr CreditAttribution: alesr commentedGot your point.
Is it then possible to get a subdomain (alpeadria.drupalcamp.org) pointed to our IP and hosted on our hardware?
I don't like the domain name inconsistency of all Drupal Camps around. To have *.drupalcamp.org ready for this purpose seems a really good idea.
Imagine all camps using this and we can have a www.drupalcamp.org with links to all sub-domains + a form where new camps could apply for a sub-domain.
After the event we can put it in .html and archive it on your server.
Comment #22
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedI need to discuss this with Narayan. The problem is that only a limited number of people have access to our DNS config.
Comment #23
Steven Jones CreditAttribution: Steven Jones commentedBut one camp's site could read the cookies from another camp's site, so we'd have to absolutely trust the people running the sites, otherwise people could start stealing sessions etc.
Comment #24
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedNo, if I set foo.drupalcamp.org's cookie domain as foo.drupalcamp.org your browser will not (unless broken) send it to bar.drupalcamp.org. And I believe that's still the standard Drupal setting.
Comment #25
Steven Jones CreditAttribution: Steven Jones commentedThat's true, but you'd need to ensure that people didn't change the defaults. Just raising a potential issue, not really saying it's a blocker.
Comment #26
alesr CreditAttribution: alesr commentedAny news about getting *.drupalcamp.org linked to our server or should we build up the site first?
Comment #27
gregglesI think assume it's not happening until there's news otherwise.
Comment #28
ricardoamaro CreditAttribution: ricardoamaro commentedHello all!
Per what we have been debating, here is my contribution to this work.
In attach you will find a patch to the infrastructure repo with a script for automation of static camps sites hosting.
This script is for Automatic hosting of static Camp sites in html and it assumes that everything is static ignoring any php or funky .htaccess files.
For testing i created a tgz file using the content of:
wget --recursive --no-clobber --page-requisites --html-extension \
--convert-links --restrict-file-names=windows \
--domains colorado2010.camps.drupal.org --no-parent colorado2010.camps.drupal.org/
That created the all site in static files.
I did: tar czvf colorado2010.tgz .
and used the script with:
./host_static_site.sh ./colorado2010.tgz colorado2010
1 - script , 2 - tgz with site , 3 - subdomain for vhost
I would appreciate all reviews and suggestions.
Thank you,
Ricardo
Comment #29
Steven Jones CreditAttribution: Steven Jones commented@ricardoamaro Thanks for helping to move this issue on.
I don't think you need to delete the htaccess files, as you got an
AllowOverride None
directive in there, so they won't get executed anyway.Could the same be done with PHP files in these vhosts, so PHP wasn't actually enabled? This would be safer than trying to find and remove suspect files I think?
Comment #30
ricardoamaro CreditAttribution: ricardoamaro commentedRemoving them is a plus security enforcement that some people like killes and nnewton aproved.
Why should we let those pass if they are not needed?
Better safe than sorry i would say. ;)
Comment #31
ricardoamaro CreditAttribution: ricardoamaro commentedall requests are done and committed.
closing
Comment #32
greggles@ricardoamaro - what is the process for someone to request a site be archived? Because http://2013.drupalcampcolorado.org/ :)
Comment #33
ricardoamaro CreditAttribution: ricardoamaro commentedComment #34
ricardoamaro CreditAttribution: ricardoamaro commented@greggles - The process is to:
- Either use the instructions on https://drupal.org/node/27882 and create the files with it, sending to Infra for production
- Or just request with a ticket on the infra queue so that we can convert it on util and put it in production.
We are working on simplifying this if possible. Do you have any other cool suggestion?
For http://2013.drupalcampcolorado.org/ we can actually use this ticket because it's part of the ticket history.
Finishing that now.
Comment #35
ricardoamaro CreditAttribution: ricardoamaro commentedcolorado2013.drupalcamp.org files are now in place
Comment #36
ricardoamaro CreditAttribution: ricardoamaro commentedcreated ticket #23080 on osuosl
Comment #37
ricardoamaro CreditAttribution: ricardoamaro commentedhttp://colorado2013.drupalcamp.org/ done!
Comment #37.0
ricardoamaro CreditAttribution: ricardoamaro commentedUpdated issue summary.