If we look around the internet we see several Drupalcamps hosted on groups.drupal.org and many hosted externally, where their content should live on forever and several hosted externally. Unfortunately, many of the externally hosted camp sites get lost over time and the resources are no longer available.

Drupalcamp Colorado 2010 recently wrapped up. Our webforms are submitted, some of the videos that were recorded are linked from the session page, etc. We are done making edits to the site and I created a static archive using webhtttrack. It went to external sites and mirrored content from them only where it was linked and was something like a pdf or png that is important to the site.

The static archive is 64MB, I think most of which is images.

I request that we host this site at 2010colorado.drupalcamp.drupal.org - the benefit of a subdomain over a subdirectory is that all of the links on the site are built relative to / and would be screwed up if it were hosted in a subdirectory. Doing it as a subdomain of drupalcamp.drupal.org seems like a good idea to encourage other organizations to host their site there.

At this point, all I'd like is a little discussion of the ideas (killes gave a thumbs up to the idea in irc a few weeks ago). Assuming there is agreement, we would just need someone to create the DNS entries and tell me where to stick the files on util.drupal.org and I can move them there.

Edited the first sentence for clarity.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Damien Tournoud’s picture

The Drupalcons websites are named [city][year].drupal.org.

As a consequence, I would suggest [city/area][edition].camps.drupal.org for camps. As a consequence, Drupalcamp Colorado would be colorado-2010.camps.drupal.org.

Some camps can be named slightly differently (for example: in Paris, we named them sequentially paris2.camps.drupal.org, paris3.camps.drupal.org, etc.), but I think we can afford the slight inconsistency.

Gerhard Killesreiter’s picture

I have no problem with this.

Damien's naming scheme seems sound but I am open to other suggestions as well.

greggles’s picture

I think Damien's idea on [location][number].camps.drupal.org is good as well. The word "camp" vs. "drupalcamp" vs. "summit" is potentially sticky, but I'm not sure what we should do and think we should do something rather than letting that block us. I like "camp" or "camps" since it is obviously related to Drupal. We don't have drupalapi.drupal.org so why have drupalcamp.drupal.org. "camps" seems better since these are indeed plural.

greggles’s picture

Status: Active » Fixed

Awesome, now fixed http://colorado2010.camps.drupal.org/drupalcampcolorado.org/index.html :)

Thanks, @basic!

As a process for future camps who want to take advantage of this:

  1. Someone needs to create a dns entry manually. Relatively few people can do that, but I think filing an infrastructure issue is the best way to ask for it.
  2. The files need to be placed in /var/webroot/camps.drupal.org/ and subdirectories like /var/webroot/camps.drupal.org/colorado2010.camps.drupal.org
  3. The permissions on that directory are such that the groups.drupal.org admins (i.e. moshe, josh, me) can upload files on behalf of camps, or anyone with sudo can do it as well
  4. Since this is running on *drupal.org there is a potential for some security shenanigans. We should do a bit of a review of all js, html, zip, etc. files prior to uploading. All js should be hosted locally on drupal.org (i.e. not on an external server) OR should be from well known sources like google. I did that and also ran clamav on the files.
Damien Tournoud’s picture

greggles, could you document the way to build the static copy itself?

I need to do that for the Drupalcon Paris website.

greggles’s picture

I used a tool called webhttrack which runs on Ubuntu inside of a browser. Frankly, it feels to me like a weird way to do this but it worked reasonably well.

I forget which options I changed, except for one: "Maximum external depth: 0" so that it doesn't go off site for content.

Last year I tried to use wget --mirror and found that it didn't work very well for the css and images. Perhaps I didn't research it long enough.

webchick’s picture

Status: Fixed » Needs work

Let's also ensure these docs are located somewhere other than some random issue in the infrastructure queue.

greggles’s picture

The only place I can think is infrastructure.d.o which is not really accessible. Any other suggestions?

webchick’s picture

I guess under here? http://drupal.org/about/drupal.org-FAQ

"How do I get my local Drupalcamp website archived at Drupal.org?" or some-such.

Gerhard Killesreiter’s picture

Please add it to infra.

greggles’s picture

It seems I'm getting 403 forbidden errors depending on which webnode I get.

@webchick @killes - can't we decide on one place to document this?

Gerhard Killesreiter’s picture

When do you get 403s?

greggles’s picture

Intermittently, but http://colorado2010.camps.drupal.org/drupalcampcolorado.org/sponsors/mon... is the url where I first noticed it. Sometimes it works. Sometimes *some* of the css/images are missing. Sometimes it's a 403. Here's the HTTP response headers. I can just refresh that page a few times to reproduce it. Is there some way to know which webhead/varnish server is having the problem?

Response Headers
Server Apache
Content-Type text/html; charset=iso-8859-1
Content-Length 335
Date Fri, 06 Aug 2010 13:59:49 GMT
X-Varnish 1749925004
Age 0
Via 1.1 varnish
Connection keep-alive
Request Headers
Host colorado2010.camps.drupal.org
User-Agent Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2.4) Gecko/20100413 Firefox/3.6.4
Accept text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language en-us,en;q=0.5
Accept-Encoding gzip,deflate
Accept-Charset ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive 115
Connection keep-alive
Referer http://www.google.com/search?q=site%3Adrupal.org%20monarch%20drupalcamp%...
Cookie
If-Modified-Since Mon, 02 Aug 2010 19:28:36 GMT
If-None-Match "3dac3-5aee-48cdc34f4e100"
Cache-Control max-age=0

Damien Tournoud’s picture

$ node drupal-org-test.js 
www1.drupal.org 200
www2.drupal.org 200
www3.drupal.org 403
www4.drupal.org 200
www5.drupal.org 403
www6.drupal.org 200
www7.drupal.org 200
Damien Tournoud’s picture

FileSize
1.61 KB

A small nodejs tool to test webnode response codes for a given URL. Might be useful to someone.

greggles’s picture

Status: Needs work » Fixed

Fortunately there is a good document on this topic in general in the handbooks already:

Creating a static archive of a Drupal site.

Pasqualle’s picture

but there is no information about how to get it to d.o
and questions like:
1. is it for English content only?
2. where to store videos and presentation slides?
3. what to do when inappropriate (static) content (like spam) is found?
4. where is the list of all drupalcamps (stored on d.o)?

Gerhard Killesreiter’s picture

1) No, we host content in any language.

2) I don't know where you store them now. presentation slides are ok, but videos should be hosted externally.

3) Spam should be removed before you make a static html copy of your site.

4) I don't know if such a list exists, having one would be good. I think currently it is only the Colorado camp.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

aem34’s picture

Status: Closed (fixed) » Active

hello,

this is a great idea for all newcomers who wants to learn,
but there could be one thing to overcome :

ok, we can see the websites but how to know to come to those pages ?

could it be possible to have a mini-site (like camps.drupal.org or something like that)
with some categorizations of sessions, like :
- city/country,
- topic of sessions,
- year,
- author/company

and the results linked to the sessions pages on all those html subsites ?

it could be a great resource for guiding all newcomers through drupal !

greggles’s picture

That seems like a great idea to me. So far there is just one camp site archived, but soon there will be two so something like this will make sense.

pwolanin’s picture

created a separate issue for NJ camp archive: #1693890: Set up hosting for static copy of NJ 2012 camp site

greggles’s picture

Title: Host static copies of Drupalcamp Websites: Start with Colorado 2010 » Host static copies of Drupalcamp Websites: provide an index page

Updated title for the task that actually remains here.

I'll volunteer to edit that page and, if someone wants, they can make it prettier than the version I'll come up with ;)

killes@www.drop.org’s picture

We've discussed this an have agreed that such an index would be best hosted on d.o itself and that we need a redirect from the to be created vhost camps.drupal.org to that page.

We need to create that page and to think of an alias before that can happen.

webchick’s picture

drupal.org/camps sounds fine to me.

webchick’s picture

And it could be a landing page that includes not only the list of hosted sites, but also a link off to http://groups.drupal.org/events?field_event_type_value_many_to_one%5B%5D... and maybe even if we're feeling extra spunky, a block of upcoming camps.

greggles’s picture

It's a start http://groups.drupal.org/camps

Now just needs a vhost and a redirect, I think.

pwolanin’s picture

I'm archiving our 2013 site with these options:

httrack http://www.drupalcampnj.org --update -w -O . -%v --robots=0 -c10 -%e0

Note the -c10 is the default runs up to 10 req in parallel. At e.g. -c1 it's damn slow. You might even got to e.g. -c20.

There is also -A for the throttle rate (bytes/sec I thhink). If you have big images, you may want like -A10000000 or more.

tvn’s picture

Assigned: Unassigned » ricardoamaro

Now that we have drupalcamp.org url, it needs landing page, which should link the sites we currently host and link to the guidelines on how to add new sites.

ricardoamaro’s picture

FileSize
131.51 KB

I suggest we use something in this line.
With you comments to improve it of course...

:)

webchick’s picture

Though it's probably out of scope, personally, I'd prefer this landing page to look a lot more like http://central.wordcamp.org/ than a mere index of sites available. The fact that you archived DrupalCamp New Jersey in 2012 isn't really relevant to anyone but New Jerseyians, but the fact that there was a session there from Foo Contributor on Bar Baz API and there are video/slides available is much more so.

Barring that, though, even something like adding links directly to the sessions / attendees / etc. rather than just the index.php of each one would be more useful I think.

webchick’s picture

In fact, if you really wanted to change the game, in addition to static site archives, a central repo of all Drupal Camp sessions/presenters/presentation slides/videos with ratings and a big honking search box would be the way to do it. I actually don't really get the value provided by static archives of sites, beyond not breaking old URLs (which, granted, is valuable, just a lot less so than a dynamic archive of Drupal knowledge).

But now we're definitely getting off-topic. ;) Is there a better place for ideas like that?

killes@www.drop.org’s picture

I think that we'd ideally integrate the static files with our solr instance and provide integration through that. Failing that, google will pick up the data and people can ask google...

drumm’s picture

webchick - We're building a better ideation process for Drupal.org as part of the Software Working Group.

Looks like greggles already made a landing page at https://groups.drupal.org/camps, see comment #24. We can keep it simple and redirect to there.

https://drupal.org/events should be updated to mention this. (Maybe even move the page from groups.drupal.org.)

ricardoamaro’s picture

Assigned: ricardoamaro » Unassigned
ricardoamaro’s picture

I surely missed completely the point of want was needed, sorry.
This is clearly not only a landing page has i was explained.

drumm’s picture

Let's keep this issue limited to the scope of the title. Just an index page.

webchick's idea comes up occasionally, and I'm sure we might already have an issue open for it somewhere. It is certainly huge scope creep for this issue.

ricardoamaro’s picture

Assigned: Unassigned » ricardoamaro

@drumm, thank you for clarifying. :)

Indeed the proposal is still a static page maintaining the layout of drupal.org, in order to keep things simple.
If we would like to have all of the search, taxonomy and aggregation features we would probably need another Drupal work that someone would need to support.

Since we currently have "null", a static page is better than nothing. Therefore i'm very excited that this is one of the topics we are taking to the live meeting today. ;)
A redirect to https://groups.drupal.org/camps is also a fair idea.

Feel free to make any sketching over the proposal, because it's really basic.

Steven Jones’s picture

As a number of camps have a distinct visual style for each camp, and often each year, it might be nice to highlight this with screenshots of the archived sites in a grid instead of just a list of URLs.
Might make it a little less text heavy too.

ricardoamaro’s picture

Hey Steven!

You mean, to have some thumbnails of the site's homepage?
That's really a good idea, however that will give more manual work when adding the sites to the vhosts.
I guess some kind of automation for building those "screenshots" would need to be in place.
Currently the (ul) list has an automated script but then again we can improve it.

Thank you

patcon’s picture

On d.o infra call where drupalcamps.org is part of the agenda. Thoughts are related to higher level goal of "Sort out way to make sure camp data doesn't go way", but not necessarily using drupalcamp.org. (Consider this comment to be out-of-flow "scratch notes".)

I spent the last week trying to bring some old drupalcamp toronto camp sites back from the grave.

Pantheon had some nice input on promoting camp site longevity:
http://help.getpantheon.com/pantheon/topics/more_liberal_hosting_policy_...

Also, going to drop this here for now:
http://lanyrd.com

Lanyrd was just acquired by Eventbrite, and it's a great place to centralize content where lots of communities can find it, and speaker data can be collated.

webchick’s picture

Very interesting. Looks like they have some rudimentary web services too http://lanyrd.com/services/ which we could use to pull data back into Drupal.org and display how we like.

ricardoamaro’s picture

From the Infrastructure meeting call, these are the main decisions for this ticket:

a) We are going to redirect the domain drupalcamp.org to a drupal.org index page for the drupalcamp context.

b) We are going to create a node type “drupalcamp” on drupal.org and add a MAIN page for drupalcamp content searching and organizing.

c) We are going to convert all *.camps.drupal.org -> *.drupalcamp.org

ricardoamaro’s picture

As discussed with @tvn:
right now, we'll just need to finish a), update the thumbnails and we are done on this ticket.

ricardoamaro’s picture

this is done

ricardoamaro’s picture

Issue summary: View changes
Status: Active » Closed (fixed)
Steven Jones’s picture

Yey, awesome work everyone!

tim.plunkett’s picture

Status: Closed (fixed) » Fixed

(Status changed so other people following don't miss this like I almost did)

So, where is the new index page? I don't see it in the issue summary.

Steven Jones’s picture

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Component: Webserver » Servers