I've become interested in getting a number of Drupalcamp sites archived, in the UK we've had a lot of Drupacamps in the past year, and we're loosing valuable information because the sites are just 'disappearing'.

From reading around, I can see that a number of sites have been archived on the Drupal.org infrastructure, and are available at *.camps.drupal.org for example, this is discussed here: #1693890: Set up hosting for static copy of NJ 2012 camp site.

I'd like to help get sites archived and document the process, beyond the existing documentation here: http://drupal.org/node/27882

In #1693890: Set up hosting for static copy of NJ 2012 camp site there seems to be some discussion about domain names, specifically which one to use, in summary I think that:

  • Having*.drupalcamp(s).org would be preferred and actually people could host the original site at this domain, and then cut over to a static version of the site after the camp without having to change the domain name.
  • Those domains are being squatted by people unwilling to give them to the Drupal community, at least this is the impression that I get from reading the issues about archiving sites.The drupalcamp.org domain has now been given to the Drupal community!
  • As a fallback for now, the *.camps.drupal.org domains are being used, with the actual sites hosted on the Drupal.org infrastructure.

When reading all this and working out what was going on and what had been done, I did a couple of searches for domain names, drupalcamparchive.org turned out to be available, so I bought it. I'd like to make it clear that I'm super happy to transfer this domain to whomever within the Drupal project/association if its wanted.

So my main proposal has two main points:

  1. Use the drupalcamp.org domain to serve static copies of the Drupalcamp sites, and have a nice directory of the sites archived at the top level domain.
  2. Move the hosting off of the Drupal.org servers, to github.

I see the primary advantage of point 1 being security. If these sites are on the same domain as drupal.org, then obviously they will have access to the cookies, and so any code deployed to the static sites must be reviewed for security issues throughly. If they are being run from a different domain, then they won't have to be reviewed so thoroughly. I'm really happy to be told: "No, we're keeping them on camps.drupal.org" though.

My second proposal, would be to move the actual hosting of the static sites to use github pages. Now, I know that hosting static websites is significantly easier than maintaining the infrastructure for drupal.org itself, but it seems that if we can offload the sysadmin for these sites entirely, then we should do so.
Each site would be contained in it's own github repo owned by a 'drupal-static' organisation, this would make it incredibly easy for people to set up new sites. And we can easily have multiple administrators for setting up the sites.

If we get this right, then we should be expecting a lot of requests to archive static sites, and so I'd be interested in making this as simple as possible for people involved.

Please be kind and gentle with your feedback, if we're totally happy with the current setup, then I'd just like to document all of that.

Files: 
CommentFileSizeAuthor
#28 camps_host_static_site.patch3.28 KBricardoamaro

Comments

The only really important thing is, that these sites should *not* use *.drupal.org as a domain name due to our bakery configuration.

The current usage is a temporary solution that is way too work intensive since all the archives have to be looked through for potentially dangerous JS.

I don't care much _which_ other domain is used, the one you bought sounds fine to me.

I also don't care much where the static sites end up, on our infrastructure, on github, or elsewhere. Hosting the archive on our infrastructure is an offer that camp organizers can take up or not.

I am quite sure that I don't want to host live camp sites on our infrastructure. There are too many of them, even though the process is currently much better than a year ago.

My sense is that the domain should be owned by the DA, Narayan has the account and passwords to do so and IMO should go ahead (even if the domain he chooses seems "bad" it will help motivate folks to get us one of the good domains).

I agree that the main holdup has been the review and security concerns and think that getting a new domain should eliminate those.

There is some more documentation/discussion of the process at http://groups.drupal.org/node/84644 which references http://drupal.org/node/871948#comment-3284576 - but those should probably be made into a real handbook page (though I'm not where where in the hierarchy it would go).

Thanks for your enthusiasm for this idea and your action so far!

If you want the drupalcamparchive.org domain, then I can set it up to be transferred.

@greggles Thanks for the documentation that you've added, it's exceptionally helpful.

I've mocked up a simple listing site that could live at:
http://drupalcamparchive.org

Which you can see at:

http://darthsteven.github.com/drupalcamp-archive/

It's easy to add new sites to this listing, as it's generated with Jekyll. I've just added some of the Drupalcamps that have happened so give some idea of what the site might look like. I plan to add documentation to the site too, and link into the d.o handbooks where needed, i'm keen to not duplicate or move documentation off of d.o, so there might have to be some duplication.

I've now moved the site to an organisation on github: http://drupalcamp-archive.github.com/

Hi Steven. So, the drupalcamparchive.info (or something similar) has been purchased. It should be easier to get these sites onto that domain now.

I would appreciate it if you remove the Colorado site from your domain or maybe point the link on the index page to http://colorado2010.camps.drupal.org/drupalcampcolorado.org/index.html - we don't want a duplicate content penalty for having that content in two places.

@greggles sorry, I was just showing what an index page might look like, rather than actually making the index page work. I've not set up hosting for any of the sites yet, as I figured that would be really rude if nothing else.

I will adapt my listing concept so that we can link off to existing domains too.

Also, to clarify, I do own the drupalcamparchive.org domain, but would be absolutely happy to transfer to the DA.

Awesome!

Great, so we have better domain names now: drupalcamp.org, drupalcamp.com

So the next question is: Do we have an issue hosting these static sites on github, or are we sticking with hosting them on our own infrastructure? I've roughly documented how it would work if they were hosted on github here: http://drupalcamp-archive.github.com/get-site-archived.html

Thanks for documenting that Steven.

I don't think we should offload the administration of these archived sites just because we can. Or at least until there's a case against doing so. While I could understand that the infrastructure team is overburdened I'd like to better understand how so and seek solutions first before jumping to a third-party.

AFAIK, the biggest blocker here was concern about xss. That's now gone :)

We should discuss how we would want to work the creation of such archive sites.

1) Narayan already said that he wants to create a wildcard DNS entry, so there will be no need to create new apache configs.

2) We could create a jenkins job that would download and untar a tarball at a defined place on the server. The download could be from our own server, ie an upload to the issue queue.
This would allow us to replace any buggy tarballs/incomplete archives by simply re-uploading.

As always, I am open for alternative suggestions.

So, we've had a discussion in IRC and the upshot is

1) Narayan needs to create the zone. He'll do so on Monday.

2) Somebody else can create the vhost in puppet, Narayan can review. Care needs to be taken to disable php , .htaccess and all other scripting.

3) Initially we'll move the existing camps by hand. We need to take care to do a redirect for these.

4) We can then start to upload any backlog, also by hand.

5) Then a jenkins job will need to be developed.

6) Running that job will need to be entrusted to volunteers. Maybe Steven would be interested.

7) We've discussed using our own git infra, however there are issues with it: a) all the contents would become GPL which may not be wanted. b) git is currently a bit of a bottleneck c) versioning of these static tarballs isn't needed.

8) nobody saw the need for external hosting.

Thanks for the discussions that you had.

Some points:

  1. I'd happily be the one to 'push the button' in Jenkins or whatever.
  2. If it's just me that can push the button, then this process doesn't scale.
  3. In terms or in git or not, I can't say I'm a fan of passing tarballs around, and I'd likely untar and put it straight into a git repo, even if that repo just stayed on my machine, but hey, that's personal preference. Also, we don't have to make people upload stuff to our git, just git somewhere if they didn't want to upload tarballs.
  4. Were there any thoughts about the listing page at the root of drupalcamp.org at all? Seems like getting something there should happen relatively soon too.

Hi guys,

@jredding pointed me here to ask you about the *.drupalcamp.org domain.
We are organizing a Drupal Camp Alpe-Adria in 2013 http://drupal.org/project/dcaa and while thinking of which domain name to choose the good news came out https://association.drupal.org/node/17278
The idea of having consistent domain name for Drupal Camps is just awesome and we would be very please to be a part of it. Where do we sign? ;)

@alesr - I think that idea has some merit, but at this point it's not a high enough priority. If you want to help it become a priority it will take wittling down the infrastructure queue in any way you (and other interested parties) can.

Also, just want to add that if the domain was being used for 'live' sites then we'd be back into the security issue arena of shared cookies etc. so I'd imagine that we may not want to host live sites at all? (Even though the domain is indeed great)

The shared cookie issue is mainly due to drupal.org's use of bakery for SSO. For a single site using e.g. foo.drupalcamp.org and having set their cookie domain to the same this wouldn't be an issue.

However, I am not sure we planned for assigning particular subdomains to individual camps not hosted on our hardware. We do not want to host live camps on our hardware.

Got your point.
Is it then possible to get a subdomain (alpeadria.drupalcamp.org) pointed to our IP and hosted on our hardware?
I don't like the domain name inconsistency of all Drupal Camps around. To have *.drupalcamp.org ready for this purpose seems a really good idea.
Imagine all camps using this and we can have a www.drupalcamp.org with links to all sub-domains + a form where new camps could apply for a sub-domain.
After the event we can put it in .html and archive it on your server.

Is it then possible to get a subdomain (alpeadria.drupalcamp.org) pointed to our IP and hosted on our hardware?

I need to discuss this with Narayan. The problem is that only a limited number of people have access to our DNS config.

The shared cookie issue is mainly due to drupal.org's use of bakery for SSO. For a single site using e.g. foo.drupalcamp.org and having set their cookie domain to the same this wouldn't be an issue.

But one camp's site could read the cookies from another camp's site, so we'd have to absolutely trust the people running the sites, otherwise people could start stealing sessions etc.

No, if I set foo.drupalcamp.org's cookie domain as foo.drupalcamp.org your browser will not (unless broken) send it to bar.drupalcamp.org. And I believe that's still the standard Drupal setting.

That's true, but you'd need to ensure that people didn't change the defaults. Just raising a potential issue, not really saying it's a blocker.

Any news about getting *.drupalcamp.org linked to our server or should we build up the site first?

I think assume it's not happening until there's news otherwise.

Assigned:Unassigned» ricardoamaro
Status:Active» Needs review
StatusFileSize
new3.28 KB

Hello all!

Per what we have been debating, here is my contribution to this work.
In attach you will find a patch to the infrastructure repo with a script for automation of static camps sites hosting.
This script is for Automatic hosting of static Camp sites in html and it assumes that everything is static ignoring any php or funky .htaccess files.

For testing i created a tgz file using the content of:
wget --recursive --no-clobber --page-requisites --html-extension \
--convert-links --restrict-file-names=windows \
--domains colorado2010.camps.drupal.org --no-parent colorado2010.camps.drupal.org/

That created the all site in static files.
I did: tar czvf colorado2010.tgz .
and used the script with:

./host_static_site.sh ./colorado2010.tgz colorado2010
1 - script , 2 - tgz with site , 3 - subdomain for vhost

I would appreciate all reviews and suggestions.

Thank you,
Ricardo

@ricardoamaro Thanks for helping to move this issue on.

I don't think you need to delete the htaccess files, as you got an AllowOverride None directive in there, so they won't get executed anyway.
Could the same be done with PHP files in these vhosts, so PHP wasn't actually enabled? This would be safer than trying to find and remove suspect files I think?

Removing them is a plus security enforcement that some people like killes and nnewton aproved.
Why should we let those pass if they are not needed?
Better safe than sorry i would say. ;)

Status:Needs review» Fixed

all requests are done and committed.
closing

@ricardoamaro - what is the process for someone to request a site be archived? Because http://2013.drupalcampcolorado.org/ :)

Status:Fixed» Needs work

@greggles - The process is to:
- Either use the instructions on https://drupal.org/node/27882 and create the files with it, sending to Infra for production
- Or just request with a ticket on the infra queue so that we can convert it on util and put it in production.

We are working on simplifying this if possible. Do you have any other cool suggestion?

For http://2013.drupalcampcolorado.org/ we can actually use this ticket because it's part of the ticket history.

Finishing that now.

colorado2013.drupalcamp.org files are now in place

created ticket #23080 on osuosl

Status:Needs work» Closed (fixed)

Issue summary:View changes

Updated issue summary.