Closed (fixed)
Project:
Drupal.org infrastructure
Component:
Other
Priority:
Normal
Category:
Bug report
Assigned:
Reporter:
Created:
11 Jan 2011 at 03:00 UTC
Updated:
12 Sep 2012 at 20:01 UTC
When attempting to get to any URL like http://barcelona2007.drupalcon.org/ or http://boston2008.drupalcon.org/ you get this Apache password prompt with "drupalcon security lockdown" on the label.
What's up with that? There's lots of really good information on these old sites. :(
Comments
Comment #1
drummhttp://drupal.org/node/1021210 mitigation.
Comment #2
gerhard killesreiter commentedThe password is available on the infrastructure site.
We need a volunteer to convert these sites into static html since they aren't maintained.
Comment #3
jpmckinney commentedWhere is this "infrastructure site" that has the password?
Comment #4
gerhard killesreiter commentedat infrastructure.drupal.org :)
However, it is a site with restricted access.
We are still waiting for volunteers to volunteer to make these sites static html copies.
Comment #5
ensignavenger commentedWhy not disable the webform module on these sites? I can't imagine why it is still necessary for the conference archives? As along term fix, perhaps an archive site could house all the session archives?
I've never tried converting a Drupal website into Static HTML- always done things the other way round- but converting each page manually would be a lot of work- any suggestions on tools to use to automate this? Is there a discussion on this topic elsewhere that I should look at?
Comment #6
gerhard killesreiter commentedThere are probably other insecure modules there as well. The sites aren't maintained and thus can not be publicly accessible.
Instead of complaining about that I suggest somebody come up with a way to produce static html from the sites...
Comment #7
pwolanin commentedwebform is not the only issue - all dynamic behavior needs to be shut off.
There is an issue from greggles with notes on how to do this - http://drupal.org/node/871948
basically, you disable all login blocks/links
then try one of the following:
wget -r
HTTrack
add boost + crawl + save boost pages
etc
Comment #8
populist commentedI worked with a few folks here to get the Drupalcon SF site updated. It is now running secure versions of all modules in production.
Might it be possible to remove the .htpasswd rule on the site so people can watch the videos?
Comment #9
gerhard killesreiter commentedI really would have preferred if you had spent your time on making the sites statically...
Nevertheless, I have requested that the passowrd for that site be removed.
Comment #10
gregglesComment #11
jerdavisI was directed here by greggles, sorry I missed this in my search! Archiving the sites to static HTML makes a lot of sense. Currently the CPH site is on lockdown, and I haven't yet gone in to see if it's functional for creation of a static archive. The Paris site however is broken at present which would make the creation of a static archive impossible for anyone who didn't have access to fix it's security problem.
I'm unsure if I can find the time to create a functional archive but I'd like to try or at least have the possibility for someone else to do it available.
Comment #12
killes@www.drop.org commentedWhat we need is not an archive as such (hard to do with the sites unavailable) but a recipe that is proven to work on Drupal sites.
The Paris site isn't hosted on d.o. infra.
Comment #13
jerdavis#871948 linked in #7 above points to this handbook page:
"Creating a static archive of a Drupal site"
http://drupal.org/node/27882
On the surface it seems correct, although I haven't personally tried it yet. The suggestion of boost seems worth considering as well.
So perhaps the question then becomes how do volunteers help to archive the currently locked down Drupalcon site(s) and how do we create a process for future Drupalcon site developers or D.O webmasters to follow to preserve future Drupalcon sites?
Comment #14
killes@www.drop.org commentedIt would be useful if somebody could wrap this up as e.g. a drush script and try it on some other (simple) site and make both sites available for inspection.
Comment #15
BMDan commentedWould an alternative be to simply block POST at the web server level? I can't think of any modules offhand which have security issues exploitable via GET. This would effectively create a static site, but without any of the headache associated with the #871948 approach.
Comment #16
populist commentedThe recent Webform SQL Injection (http://drupal.org/node/1021210) was exploitable through GET (although not on the Drupalcon sites) and it seems plausible future exploits could follow a similar vector.
Comment #17
joachim commentedPossibly as a result of changes made in this issue, the schedule page on DC Paris, http://paris2009.drupalcon.org/content/program-multifaceted-event-divers..., gives a fatal PHP error.
Comment #18
heine commentedFrom #12: "The Paris site isn't hosted on d.o. infra."
Comment #19
BMDan commented@populist Fair enough. So, go for a defense-in-depth approach: block POSTs, give the DB user only SELECT rights, and make sure the web server user can't write to anything (even "files"), and use open_basedir for good measure if you aren't already. The only way to exploit that would then be to do something strange that got written out to memcache, if memcache was even in use, and this theoretical exploit would have to be able to be done entirely via GET parameters.
It doesn't seem reasonable to say, in one breath, that Drupal is powerful and extensible and community-supported, and in the next to say that, if you don't have a developer constantly touching the site and upgrading it, that you can't have a simple, static site that actually runs Drupal.
Comment #20
drummI'll look at using HTTrack as documented at http://drupal.org/node/27882. The same tool was used for #871948: Host static copies of Drupalcamp Websites: provide an index page. This tool isn't on our infrastructure, so I can't make a Jenkins job yet.
Comment #21
drummNevermind about Jenkins, "...runs on Ubuntu inside of a browser." Our servers don't have browsers. Maybe we could set up a VM at the OSL or AWS to automate this sort of thing in the future.
Comment #22
webchickEven if you could just come up with some easily reproducible steps that we could document in the handbook, I'm sure any of our volunteers would likely happily run this in a browser if they knew what options to choose and stuff.
Comment #23
drummI think this tool will not be happy about the password itself. I'll need an exception to the password. I can proxy through something on our infrastructure, probably util. And/or my office's IP is 173.164.238.54.
Comment #24
drummSpecifically, I'd like
Allow from 173.164.238.54and/orAllow from (however www* see util)added to /etc/apache2/vhosts.d/cph2010.drupal.org.conf on www*.(I assume this goes through cfengine, so I can't make this change. Just CPH for now, but the others will follow.)
Comment #25
basic commentedI've added allow from rules to the vhost configuration in Cfengine, and will reload httpd with the changes in the morning.
Comment #26
BMDan commentedNot to suggest you don't already know to do this, but just in case: you'll also need a
satisfyclause, since you're intermixingrequireandallow.http://httpd.apache.org/docs/current/mod/core.html#satisfy
Comment #27
drummYep, do need that, or Apache hasn't been restarted. I don't do enough Apache configuration to actually remember all the details every time.
Comment #28
drummAnd #1110224: Final spam clearing for Boston 2008 & Paris 2009 needs to happen and I could use help there.
Comment #29
basic commentedI may have messed this up... I had added the following:
and reloaded apache on the nodes, but am still getting a 401.
Comment #30
BMDan commentedI admit I've never tried "140.211.10."; I'd consider trying "140.211.10", instead.
Failing that, if you don't have RPAF or similar installed, since Varnish is in front of the server, you won't actually see the client's IP address. Thus, you'll need to something like:
Comment #31
drummSF is being archived now, rate limited to 1 connection per second. I'll stick around for a few minutes to make sure it doesn't go wild, and let it run overnight. The site often has offline errors, but it should be a start.
Comment #32
basic commented@BMDan, good catch on mod_rpaf -- It is installed but was not enabled via OPTIONS= (with the Gentoo way of IfDefine'ing everything).
I've enabled it across all of the nodes via Cfengine, and will reload them shortly.
Comment #33
webchickOh, I didn't see #31. If so, #1113468: Change comment status of archived DrupalCon sites to "Read only" rather than "Disabled" might be relevant.
Comment #34
basic commentedOkay, so access should work now. I ended up going with the most inclusive restrictive set of access control combining @BMDan's is_auth_user/X-Forwarded-For magic with functioning mod_rpaf, and a standard ip-based Allow from:
I tested from util.d.o and was able to 'curl http://dc2009.drupalcon.org' w/o a user login.
Thanks for the help Dan :)
-Rudy
Comment #35
drummSF took under 3 hours. #1113468: Change comment status of archived DrupalCon sites to "Read only" rather than "Disabled" got fixed in the middle, and therewere a lot of 503 errors, but should have something in the next few days, while starting on CPH.
It would be nice to keep hosting a dynamic site for archeological work, like getting attendee lists and other random requests. I'm guessing those would work best in separate vhosts, always with a lockdown password.
Comment #36
nnewton commentedSep vhosts with a lockdown PW would be good, maybe even IP limits. At some point i'll start growing concerned about disk space though. Not sure what the solution will be there, but perhaps to implement a sliding window for keeping sites around or looking into compression.
-N
Comment #37
drummMy static copy of sf2010 is in my home directory on util. I think now is a good time to rearrange the vhosts for at least sf2010. The other sites can just get a quick splash page pointing here or http://association.drupal.org/node/774.
Comment #38
DrewMathers commentedOld Drupalcon sites are browsable on the Wayback Machine. Here is Drupalcon Barcelona. Drupalcon Paris is broken, even on the Wayback Machine.
Comment #39
drumm13:56 < greggles> drumm: it seems your form disabler is a little over
aggressive - http://sf2010.drupal.org/community/attendees
13:56 < greggles> i.e. it shouldn't show that dsm on that page
Comment #40
drummhttp://sf2010.drupal.org/ is now fully-static. I put the Drupal in htdocs-dynamic, which is currently not served from anywhere. We do want to keep that and the DB around for drush, if nothing else. We still are getting random bits of data.
Comment #41
killes@www.drop.org commentedThanks a lot!
What does "We still are getting random bits of data." mean? You still need the DB for some time? that should be ok if we don't run into a disk-space issue.
Comment #42
drummYes, like session reviews. Lots of random things to inform how future conferences run.
Comment #43
drummCPH is now static and the password should be lifted soon.
Please see #1110224: Final spam clearing for Boston 2008 & Paris 2009 if you are interested in DC being online.
Comment #44
drummbasic or nnewton - dc2009 is now static. Please take off the HTTP password when you get a chance.
Comment #45
nnewton commenteddc2009 static up
Comment #46
drummJacob mentioned using URLs like http://dc2009.drupalcon.org/sessions4658.html?page=2, with the added .html breaks old links. The only option I can see that controls this is -K, we are using the default. For the next site, I can try those out. If another does works significantly better, re-archiving a site requires a vhost change to get the dynamic site hosted again. Otherwise, Apache configuration, mod_rewrite or otherwise, might solve this.
httrack http://dc2009.drupalcon.org/ -w -O . -%v --robots=0 -c1 %e0is what I have been using.Comment #47
drummThe -K option doesn't work, but this .htaccess does
I'm trying it out on sf2010.
Comment #48
drummSzeged is now archived. The password can be removed.
I noticed that my local Apache loads up foo.html for foo, but Drupal.org does not. Whatever does that, would be good to have on these sites. (I don't have time to investigate right now.)
Comment #49
nnewton commentedszeged back up
Comment #50
drummnnewton - barcelona was a static site, it never needed the password, it can be removed.
Comment #51
nnewton commentedRemoved the lockdown on barcelona
Comment #52
webchickAwesome, looks like we're now just down to DrupalCon Boston? http://boston2008.drupalcon.org/
http://drupal.org/drupalcon-2005-media - on d.o; no archiving necessary
http://2006.oscms-summit.org/ - dead. :( http://drupal.org/node/46559 has some stuff
http://brussels2006.drupalcon.org/ - dead. :( http://drupal.org/node/77404 has some stuff
http://2007.oscms-summit.org/ - dead. :( http://drupal.org/events/oscms2007 has some stuff
http://barcelona2007.drupalcon.org/ - Archived & Accessible
http://boston2008.drupalcon.org/ - LOCKDOWN
http://szeged2008.drupalcon.org/ - Archived & Accessible
http://dc2009.drupalcon.org/ - Archived & Accessible
http://paris2009.drupalcon.org/ - Archived & Accessible
http://sf2010.drupal.org/ - Archived & Accessible
http://cph2010.drupal.org/ - Archived & Accessible
http://chicago2011.drupal.org/ - LIVE & Accessible
Comment #53
drummParis is not archived (yet). It is not on our main hardware. The archive is in place on our server and needs a vhost. We will wait until http://paris2009.drupalcon.org/drupliconroadtrip/ is dealt with, which is a separate site, and I believe should be NodeOne's responsibility. They are on it.
Boston is now archived and password can be removed.
Comment #54
BMDan commentedAs written, this is a security hole—albeit simply an unwieldy and unlikely information-disclosure one—because it allows an attacker to go to "http://www.somesite.com/%2e%2e%2fsecretfile" and determine whether a file named "secretfile.html" exists one level above the DocumentRoot by whether or not the response from the server is a 301 or a 404. Also, there appears to be a missing "!" before the "-d".
If we want to do this in a .htaccess, I'd use a set of rules like:
A few notes:
Comment #55
killes@www.drop.org commentedBoston is now accesible and archived, thanks Neil!
Paris remains to be dealt with.
Comment #56
drummThe static sites are nice, but not searchable or at all integrated with Drupal.org. #1238508: Create a permanent home for DrupalCon presentations for the next step.
Comment #57
michelleIt's been nearly a year... Is this still a problem?
Comment #58
pwolanin commentedTNR global is using https://webarchive.jira.com/wiki/display/Heritrix/Heritrix which is developed/used by archive.org to put static sites into Solr. I suspect it's via something like this wrapper: http://youseer.sourceforge.net/
given the relatively limited scope of those sites, we could probably get away with something simpler.
As long as we set the right site hash and other meta data, we should be able to put those in the same index and have multi-site search work.
Comment #59
gregglesThis seems fixed to me. I suggest a new issue that's focused on search for anyone who wants to work on that part.