When attempting to get to any URL like http://barcelona2007.drupalcon.org/ or http://boston2008.drupalcon.org/ you get this Apache password prompt with "drupalcon security lockdown" on the label.

What's up with that? There's lots of really good information on these old sites. :(

Comments

drumm’s picture

gerhard killesreiter’s picture

The password is available on the infrastructure site.

We need a volunteer to convert these sites into static html since they aren't maintained.

jpmckinney’s picture

Where is this "infrastructure site" that has the password?

gerhard killesreiter’s picture

at infrastructure.drupal.org :)

However, it is a site with restricted access.

We are still waiting for volunteers to volunteer to make these sites static html copies.

ensignavenger’s picture

Why not disable the webform module on these sites? I can't imagine why it is still necessary for the conference archives? As along term fix, perhaps an archive site could house all the session archives?

I've never tried converting a Drupal website into Static HTML- always done things the other way round- but converting each page manually would be a lot of work- any suggestions on tools to use to automate this? Is there a discussion on this topic elsewhere that I should look at?

gerhard killesreiter’s picture

There are probably other insecure modules there as well. The sites aren't maintained and thus can not be publicly accessible.

Instead of complaining about that I suggest somebody come up with a way to produce static html from the sites...

pwolanin’s picture

webform is not the only issue - all dynamic behavior needs to be shut off.

There is an issue from greggles with notes on how to do this - http://drupal.org/node/871948

basically, you disable all login blocks/links

then try one of the following:

wget -r
HTTrack
add boost + crawl + save boost pages
etc

populist’s picture

Status: Active » Needs review

I worked with a few folks here to get the Drupalcon SF site updated. It is now running secure versions of all modules in production.

Might it be possible to remove the .htpasswd rule on the site so people can watch the videos?

gerhard killesreiter’s picture

I really would have preferred if you had spent your time on making the sites statically...

Nevertheless, I have requested that the passowrd for that site be removed.

greggles’s picture

Status: Needs review » Active
jerdavis’s picture

I was directed here by greggles, sorry I missed this in my search! Archiving the sites to static HTML makes a lot of sense. Currently the CPH site is on lockdown, and I haven't yet gone in to see if it's functional for creation of a static archive. The Paris site however is broken at present which would make the creation of a static archive impossible for anyone who didn't have access to fix it's security problem.

I'm unsure if I can find the time to create a functional archive but I'd like to try or at least have the possibility for someone else to do it available.

killes@www.drop.org’s picture

What we need is not an archive as such (hard to do with the sites unavailable) but a recipe that is proven to work on Drupal sites.

The Paris site isn't hosted on d.o. infra.

jerdavis’s picture

#871948 linked in #7 above points to this handbook page:

"Creating a static archive of a Drupal site"
http://drupal.org/node/27882

On the surface it seems correct, although I haven't personally tried it yet. The suggestion of boost seems worth considering as well.

So perhaps the question then becomes how do volunteers help to archive the currently locked down Drupalcon site(s) and how do we create a process for future Drupalcon site developers or D.O webmasters to follow to preserve future Drupalcon sites?

killes@www.drop.org’s picture

It would be useful if somebody could wrap this up as e.g. a drush script and try it on some other (simple) site and make both sites available for inspection.

BMDan’s picture

Would an alternative be to simply block POST at the web server level? I can't think of any modules offhand which have security issues exploitable via GET. This would effectively create a static site, but without any of the headache associated with the #871948 approach.

populist’s picture

The recent Webform SQL Injection (http://drupal.org/node/1021210) was exploitable through GET (although not on the Drupalcon sites) and it seems plausible future exploits could follow a similar vector.

joachim’s picture

Possibly as a result of changes made in this issue, the schedule page on DC Paris, http://paris2009.drupalcon.org/content/program-multifaceted-event-divers..., gives a fatal PHP error.

heine’s picture

From #12: "The Paris site isn't hosted on d.o. infra."

BMDan’s picture

@populist Fair enough. So, go for a defense-in-depth approach: block POSTs, give the DB user only SELECT rights, and make sure the web server user can't write to anything (even "files"), and use open_basedir for good measure if you aren't already. The only way to exploit that would then be to do something strange that got written out to memcache, if memcache was even in use, and this theoretical exploit would have to be able to be done entirely via GET parameters.

It doesn't seem reasonable to say, in one breath, that Drupal is powerful and extensible and community-supported, and in the next to say that, if you don't have a developer constantly touching the site and upgrading it, that you can't have a simple, static site that actually runs Drupal.

drumm’s picture

Assigned: Unassigned » drumm

I'll look at using HTTrack as documented at http://drupal.org/node/27882. The same tool was used for #871948: Host static copies of Drupalcamp Websites: provide an index page. This tool isn't on our infrastructure, so I can't make a Jenkins job yet.

drumm’s picture

Nevermind about Jenkins, "...runs on Ubuntu inside of a browser." Our servers don't have browsers. Maybe we could set up a VM at the OSL or AWS to automate this sort of thing in the future.

webchick’s picture

Even if you could just come up with some easily reproducible steps that we could document in the handbook, I'm sure any of our volunteers would likely happily run this in a browser if they knew what options to choose and stuff.

drumm’s picture

I think this tool will not be happy about the password itself. I'll need an exception to the password. I can proxy through something on our infrastructure, probably util. And/or my office's IP is 173.164.238.54.

drumm’s picture

Specifically, I'd like Allow from 173.164.238.54 and/or Allow from (however www* see util) added to /etc/apache2/vhosts.d/cph2010.drupal.org.conf on www*.

(I assume this goes through cfengine, so I can't make this change. Just CPH for now, but the others will follow.)

basic’s picture

I've added allow from rules to the vhost configuration in Cfengine, and will reload httpd with the changes in the morning.

BMDan’s picture

Not to suggest you don't already know to do this, but just in case: you'll also need a satisfy clause, since you're intermixing require and allow.

http://httpd.apache.org/docs/current/mod/core.html#satisfy

drumm’s picture

Yep, do need that, or Apache hasn't been restarted. I don't do enough Apache configuration to actually remember all the details every time.

drumm’s picture

And #1110224: Final spam clearing for Boston 2008 & Paris 2009 needs to happen and I could use help there.

basic’s picture

I may have messed this up... I had added the following:

       Require valid-user
       Order allow,deny
       Allow from 173.164.238.54 140.211.10.
       Satisfy any

and reloaded apache on the nodes, but am still getting a 401.

BMDan’s picture

I admit I've never tried "140.211.10."; I'd consider trying "140.211.10", instead.

Failing that, if you don't have RPAF or similar installed, since Varnish is in front of the server, you won't actually see the client's IP address. Thus, you'll need to something like:

SetEnvIf X-Forwarded-For ^140\.211\.10\. is_auth_user
SetEnvIf X-Forwarded-For ^173\.164\.238\.54 is_auth_user
Require valid-user
Order allow,deny
Allow from env=is_auth_user
Satisfy any
drumm’s picture

SF is being archived now, rate limited to 1 connection per second. I'll stick around for a few minutes to make sure it doesn't go wild, and let it run overnight. The site often has offline errors, but it should be a start.

basic’s picture

@BMDan, good catch on mod_rpaf -- It is installed but was not enabled via OPTIONS= (with the Gentoo way of IfDefine'ing everything).

I've enabled it across all of the nodes via Cfengine, and will reload them shortly.

webchick’s picture

basic’s picture

Okay, so access should work now. I ended up going with the most inclusive restrictive set of access control combining @BMDan's is_auth_user/X-Forwarded-For magic with functioning mod_rpaf, and a standard ip-based Allow from:

SetEnvIf X-Forwarded-For ^140\.211\.10\. is_auth_user
SetEnvIf X-Forwarded-For ^173\.164\.238\.54 is_auth_user
Require valid-user
Order allow,deny
Allow from 173.164.238.54 140.211.10 env=is_auth_user
Satisfy any

I tested from util.d.o and was able to 'curl http://dc2009.drupalcon.org' w/o a user login.

Thanks for the help Dan :)

-Rudy

drumm’s picture

SF took under 3 hours. #1113468: Change comment status of archived DrupalCon sites to "Read only" rather than "Disabled" got fixed in the middle, and therewere a lot of 503 errors, but should have something in the next few days, while starting on CPH.

It would be nice to keep hosting a dynamic site for archeological work, like getting attendee lists and other random requests. I'm guessing those would work best in separate vhosts, always with a lockdown password.

nnewton’s picture

Sep vhosts with a lockdown PW would be good, maybe even IP limits. At some point i'll start growing concerned about disk space though. Not sure what the solution will be there, but perhaps to implement a sliding window for keeping sites around or looking into compression.

-N

drumm’s picture

My static copy of sf2010 is in my home directory on util. I think now is a good time to rearrange the vhosts for at least sf2010. The other sites can just get a quick splash page pointing here or http://association.drupal.org/node/774.

DrewMathers’s picture

Old Drupalcon sites are browsable on the Wayback Machine. Here is Drupalcon Barcelona. Drupalcon Paris is broken, even on the Wayback Machine.

drumm’s picture

13:56 < greggles> drumm: it seems your form disabler is a little over
aggressive - http://sf2010.drupal.org/community/attendees
13:56 < greggles> i.e. it shouldn't show that dsm on that page

drumm’s picture

http://sf2010.drupal.org/ is now fully-static. I put the Drupal in htdocs-dynamic, which is currently not served from anywhere. We do want to keep that and the DB around for drush, if nothing else. We still are getting random bits of data.

killes@www.drop.org’s picture

Thanks a lot!

What does "We still are getting random bits of data." mean? You still need the DB for some time? that should be ok if we don't run into a disk-space issue.

drumm’s picture

Yes, like session reviews. Lots of random things to inform how future conferences run.

drumm’s picture

CPH is now static and the password should be lifted soon.

Please see #1110224: Final spam clearing for Boston 2008 & Paris 2009 if you are interested in DC being online.

drumm’s picture

basic or nnewton - dc2009 is now static. Please take off the HTTP password when you get a chance.

nnewton’s picture

dc2009 static up

drumm’s picture

Jacob mentioned using URLs like http://dc2009.drupalcon.org/sessions4658.html?page=2, with the added .html breaks old links. The only option I can see that controls this is -K, we are using the default. For the next site, I can try those out. If another does works significantly better, re-archiving a site requires a vhost change to get the dynamic site hosted again. Otherwise, Apache configuration, mod_rewrite or otherwise, might solve this.

httrack http://dc2009.drupalcon.org/ -w -O . -%v --robots=0 -c1 %e0 is what I have been using.

drumm’s picture

The -K option doesn't work, but this .htaccess does

RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} -d
RewriteCond %{DOCUMENT_ROOT}/$1.html -f
RewriteRule ^(.*)/$ $1.html [L,R=301]

I'm trying it out on sf2010.

drumm’s picture

Szeged is now archived. The password can be removed.

I noticed that my local Apache loads up foo.html for foo, but Drupal.org does not. Whatever does that, would be good to have on these sites. (I don't have time to investigate right now.)

nnewton’s picture

szeged back up

drumm’s picture

nnewton - barcelona was a static site, it never needed the password, it can be removed.

nnewton’s picture

Removed the lockdown on barcelona

webchick’s picture

Awesome, looks like we're now just down to DrupalCon Boston? http://boston2008.drupalcon.org/

http://drupal.org/drupalcon-2005-media - on d.o; no archiving necessary
http://2006.oscms-summit.org/ - dead. :( http://drupal.org/node/46559 has some stuff
http://brussels2006.drupalcon.org/ - dead. :( http://drupal.org/node/77404 has some stuff
http://2007.oscms-summit.org/ - dead. :( http://drupal.org/events/oscms2007 has some stuff
http://barcelona2007.drupalcon.org/ - Archived & Accessible
http://boston2008.drupalcon.org/ - LOCKDOWN
http://szeged2008.drupalcon.org/ - Archived & Accessible
http://dc2009.drupalcon.org/ - Archived & Accessible
http://paris2009.drupalcon.org/ - Archived & Accessible
http://sf2010.drupal.org/ - Archived & Accessible
http://cph2010.drupal.org/ - Archived & Accessible
http://chicago2011.drupal.org/ - LIVE & Accessible

drumm’s picture

Paris is not archived (yet). It is not on our main hardware. The archive is in place on our server and needs a vhost. We will wait until http://paris2009.drupalcon.org/drupliconroadtrip/ is dealt with, which is a separate site, and I believe should be NodeOne's responsibility. They are on it.

Boston is now archived and password can be removed.

BMDan’s picture

RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} -d
RewriteCond %{DOCUMENT_ROOT}/$1.html -f
RewriteRule ^(.*)/$ $1.html [L,R=301]

As written, this is a security hole—albeit simply an unwieldy and unlikely information-disclosure one—because it allows an attacker to go to "http://www.somesite.com/%2e%2e%2fsecretfile" and determine whether a file named "secretfile.html" exists one level above the DocumentRoot by whether or not the response from the server is a 301 or a 404. Also, there appears to be a missing "!" before the "-d".

If we want to do this in a .htaccess, I'd use a set of rules like:

RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME}.html -f
RewriteRule (.*) $1.html? [L,R=301]

A few notes:

  • It allows access to "/foo.ext", even if "/foo.ext.html" exists. This may not be desirable, but made sense to me. Easy enough to change by killing the second RewriteCond, of course.
  • This rule should probably not apply to the files directory, lest we open a future security hole. That said, I think the caveat in the previous note actually prevents that concern.
  • It chops off any query string as a result of the terminal "?". If there's some fancy AJAX going on that needs to read its own query string, drop the "?" and add a "QSA" flag to the RewriteRule.
  • Standard disclaimers about .htaccess vs. httpd.conf in a <Directory> stanza vs. httpd.conf outside of <Directory> stanza go here.
  • Blatantly untested. At minimum, scan for typos before using, please.
killes@www.drop.org’s picture

Boston is now accesible and archived, thanks Neil!

Paris remains to be dealt with.

drumm’s picture

The static sites are nice, but not searchable or at all integrated with Drupal.org. #1238508: Create a permanent home for DrupalCon presentations for the next step.

michelle’s picture

It's been nearly a year... Is this still a problem?

pwolanin’s picture

TNR global is using https://webarchive.jira.com/wiki/display/Heritrix/Heritrix which is developed/used by archive.org to put static sites into Solr. I suspect it's via something like this wrapper: http://youseer.sourceforge.net/

given the relatively limited scope of those sites, we could probably get away with something simpler.

As long as we set the right site hash and other meta data, we should be able to put those in the same index and have multi-site search work.

greggles’s picture

Status: Active » Fixed

This seems fixed to me. I suggest a new issue that's focused on search for anyone who wants to work on that part.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.