Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Attached is a patch from David Strauss to add reverse proxy support to drupal.org. I have not reviewed this patch yet, but it should be reviewed and tested for possible merging into our webroot. Reverse proxy support would be extremely useful for performance and mitigation of large spikes in traffic.
Comment | File | Size | Author |
---|---|---|---|
my_unidiff.patch | 35.61 KB | nnewton |
Comments
Comment #1
Amazon CreditAttribution: Amazon commentedhttp://en.wikipedia.org/wiki/Reverse_proxy
What does this do for us? We have two load balancers and distribution among the webservers already. I am sure it's useful, but could you be more specific.
ieran
Comment #2
nnewton CreditAttribution: nnewton commentedRight, sorry, used too much lingo there. This would add support for Reverse *Caching* Proxies such as Squid and Varnish. These not only distribute requests to the backend, but cache the returned results for a specified time period. This patch would build in support for sane caching headers to support such reverse proxies. It is the caching we are interested in and not the actual front-end to back-end distribution. We currently run with Squid as a reverse caching proxy, for static files only. This would expand that to anonymous page views. (or that is my understanding based on other such patches, I have not reviewed this one yet, only the previous incarnation)
Comment #3
dwwDare I say it, but can we get this into D7 HEAD first, along with all the reviews and possible improvements that would entail, and then backport that onto d.o? I'd hate to see such a huge patch on d.o that ended up not getting into D7, and then having to keep a similarly large patch for the next 2+ years while d.o runs D7... Plus, the act of getting this into HEAD is going to almost certainly involve making this a better patch. At the very least, it's going to get a lot more reviews and attention as a core issue than here as an infra issue. Maybe. ;)
The flip side is that it'll (hopefully) be easier to justify committing this to HEAD if it's already running on d.o. However, I think if the original post of the core issue makes it clear this is a patch we're intending to run on d.o, that'll be just about as good in terms of motivating it.
Quick skim of the patch turned up a few questions/issues:
A) I see you changed cache.inc. Seems like we're going to need a corresponding change for memcache's cache.inc for this to work on d.o, right?
B) It seems weird that drupal_get_session() on an undefined variable returns FALSE not NULL. Makes it harder to distinguish undefined from defined as FALSE.
C) Since the patch contains no context for what functions each hunk belongs to, it's hard to tell what the hunk in openid.module is for. A comment there would help.
D) Direct references to "Pressflow" in the comments should probably be removed. ;)
E) A fair number of the function comments don't start with 1 line summaries.
F) "If the client send a session cookie" s/send/sent/
G) It's not immediately obvious why inside drupal_set_message() we have to test for and conditionally use drupal_set_session(). When would drupal_set_message() ever be called when drupal_set_session() doesn't exist? If there are cases where drupal_set_session() might not exist, that could use a comment in the code explaining it. If drupal_set_session() always exists, we should just use it.
H) The interaction of drupal_session_start() and drupal_session_is_started() seems slightly clumsy. Not sure how to make it any better, but something about it doesn't sit well.
I) "Sun, 19 Nov 1978 05:00:00 GMT" has special significance for the Drupal project. ;) Is there some technical reason you changed it to "Sun, 11 Mar 1984 12:00:00 GMT", or is that someone else's birthday?
J) All contribs running on d.o need to use drupal_set_session() instead of directly touching $_SESSION, right? That's sort of a pain. I guess if we already assume a logged in user, we can assume drupal_session_start() was already called and can touch $_SESSION directly. So, we just have to audit every occurrence of $_SESSION on d.o that could be hit via an anonymous user, right? Still ugh...
K) It's not clear why the API and default behavior of page_get_cache() changes with this patch. The comment explains the new API, and mentions "Default of TRUE added to maximize Drupal 5 compatibility", but this is D6 we're patching, not D5. We should match D6 more closely, not D5.
There's probably a lot more -- I'm not doing a super-careful review here, but just some initial thoughts on reading the patch. Haven't tried applying it, and certainly haven't tried testing any of this.
Comment #4
Damien Tournoud CreditAttribution: Damien Tournoud commentedNaranyan should have mention that this is mostly a back-port of the work that was done for Drupal 7, especially #147310: Implement better cache headers for reverse proxies and #201122: Drupal should support disabling anonymous sessions.
Some more improvements should be discussed and go in first in D7 (for example, as far as I understand this right, the LOGGED_IN cookie is really not necessary, so are CACHE: hit and CACHE: miss ones). For D6, the main issue that remains is that this patch requires modifications in contrib modules that are using $_SESSION to store data for anonymous users, because this data would be lost in drupal_set_session() is not used. I worked on an automatic, lazy-starting of the session, but this will need to be discussed in D7 first, too.
Comment #5
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedWhile I agree that first getting this into D7 would make sure it can be run without on d.o without an upgrading nightmare, this is in no way certain, ie the code might be broken in suble ways. Historically, d.o has been used to test patches for inclusion into the next Drupal version and I have no objections for reviving this. I'd totally would like to cache node objects fwiw.
However, Derek has addressed multiple issues and we should investigate them before deploying this.
Comment #6
nnewton CreditAttribution: nnewton commentedThis was originally posted for review after a short discussion in #drupal-infrastructure and my assumption was that this was somewhat similar to the reverse proxy patches I've seen before. However, this patch is much larger and more intrusive than the previous patches I've seen.
David would you mind commenting on what this patch provides us with that the older patch doesn't? I can see some areas where this would be more technically correct, but would like your view on that and dww's issues.
Comment #7
David StraussMost of this patch involves the back-port of lazy session creation for anonymous users. The reverse proxy caching support leverages this functionality by sending "Vary: Cookie" headers. So, instead of the common hack of adding a special "logged in" cookie and configuring non-standard cache behavior, we can rely on the lack of session cookies (and other cookies) to indicate that pages are cacheable.
This is a nearly unchanged backport of the D7 functionality. I had to change a handful of things, but those were the result of other D7 -> D6 differences.
Comment #8
dwwThanks for the clarifications. I haven't been following D7 development particularly closely, and had no idea about these other issues. That said, comparing this patch to http://drupal.org/files/issues/sessions_1.patch which is what Dries committed to HEAD:
(A) Is still a concern.
(B) is indeed "broken" in D7 HEAD. I'll open a new issue about that, I guess.
(C) Still a concern: the hunk for openid.module in this patch doesn't look much like the one in HEAD.
(D) Still a concern.
(E) Partly "broken" in HEAD, partly made worse in this patch.
(F) Still a concern: HEAD doesn't have that broken comment.
(G) Still a concern: HEAD doesn't conditionally invoke drupal_set_session() inside drupal_set_message().
(H) Just as clumsy in HEAD, not really worth opening a new issue for that.
(I) Still a "concern": HEAD uses the older date. ;)
(J) Still a (big) concern, but just as bad in HEAD. Damien protests about this at #201122-74: Drupal should support disabling anonymous sessions and suggests there might be a better approach that doesn't hurt DX as badly. Said he was going to open a new issue, but a quick search doesn't reveal an issue nid. @Damien, any news?
(K) Still a concern: The D7 patch doesn't do this weird thing with the API for "D5 compatibility".
...
Comment #9
David Strauss@dww
(A) Yes. Headers are managed (and must be managed) in an array in this patch, and it was easier to provide a backwards-compatible D6 API and fundamentally change internal header handling to use arrays.
(B) Agreed. I was just staying true to D7's implementation.
(C) I'll look into this.
(D) and (I) This patch was pulled directly from the Pressflow repository, specifically the reverse-proxy-support branch. That's why the comments make references to Pressflow, and that's why the "expires" header is my birthday. I have to re-brand Pressflow for trademark reasons, though I try to keep those changes in a separate rebranding branch when possible. These changes aren't intended for Drupal or Drupal.org use, and I can ensure they're safely relegated to the rebranding branch before merging this patch into the Drupal.org core branch I also maintain.
(E) and (F) I don't think cosmetic and API indexing issues should be a concern for this proposal, which is primarily patching Drupal.org. These ought to be raised directly against D7.
(G) The installer in D6 sets messages without fully bootstrapping. If there's a more elegant solution, I'm interested.
(H) DamZ has taken the lead in improving the DX for this. Most of the current DX-friendly proposals seem a little hack-ish to me.
(J) If we come up with a more transparent solution, I'd be all for it.
(K) This is merely a leftover from the initial D5 backport I did of this. Agreed that it should be removed.
Comment #10
dwwRe: (A) I'm not doubting the change. Managing headers internally as an array makes a lot of sense. I'm just wondering where's the corresponding patch for d.o's copy of memcache, which we're also going to need before we can deploy this.
Re: (G) ahh, i see. A comment to that effect would be helpful. However, why doesn't D7 core have to do that?
Comment #11
David Strauss@dww Regarding (G), D7 core might really need to do that. I haven't checked. The message only gets set in D6 if you don't have settings.php in place with proper permissions.
Comment #12
David StraussI've updated the Bazaar branch to make the patch compatible with memcache's standard cache.inc.
Comment #13
nnewton CreditAttribution: nnewton commentedAfter the massive search-bot hit on saturday that caused d.o to be unstable/slow, we tested and deployed David's latest patch on d6.drupal.org. It may not be perfect, but the ability to serve out of the squid cache for these bots would save us in situations like we just had.
Please test d6.drupal.org and thanks for the code review dww!
Note: for those wanting to look at the code deployed, its in d.o svn.
Comment #14
David StraussThe Bazaar branch has been updated to include Damien's refinements. I'm updating trunk and branches/drupal6 in Subversion and deploying to d6.drupal.org.
Comment #15
David StraussComment #16
David StraussWhile I was investigating the reverse proxy changes necessary for production, I noticed that some security updates were only partially deployed to drupal.org. To avoid extracting and manually applying the security changes, I simply updated d.o, including the latest version of the reverse proxy changes.
Most real browsers won't encounter the proxy cache after the first page load because Google Analytics sets cookies we don't have Squid ignoring yet. The cache should help with spiders, though.
Comment #18
markus_petrux CreditAttribution: markus_petrux commentedWould it be possible to manage this patch as if it was a community managed project?
This would make possible for others to benefit from it, as well as to keep contributing back based on experience, fact that could be easily shared.
Comment #19
David Strauss@markus_petrux
Patches are publicly hosted on the Four Kitchens Bazaar server. This "patch" is actually managed as a fork of Drupal 6 core; our CVS tools on Drupal.org are completely inadequate for this purpose (lots of branching and merging). Using Bazaar, you can branch, make changes, commit, and submit your changes back to folks on the infrastructure team. You don't need our permission or any special access to contribute.
Comment #20
David Strauss@markus_petrux
If you'd like to work this way, download and install Bazaar. Then run
bzr branch bzr://vcs.pressflow.org/pressflow/6-patches/reverse-proxy-support
. From that copy, you can commit, merge from upstream, andbzr send -o mychanges.patch
to create a portable copy of your work for us to merge back in.Comment #21
Damien Tournoud CreditAttribution: Damien Tournoud commentedYay! Marking as fixed.
Comment #22
markus_petrux CreditAttribution: markus_petrux commented@David: Thanks for the hints. I'll take a look and if we can enhance anything, we'll post back, maybe at g.d.o.
Comment #24
tomlobato CreditAttribution: tomlobato commentedI got some errors when patching drupal 6.19 with this file. is there a 6.19 compatible version?
Comment #25
m.sant CreditAttribution: m.sant commentedSubscribing
Comment #26
jthomasbailey CreditAttribution: jthomasbailey commentedSubscribing
Comment #27
geekgirlweb CreditAttribution: geekgirlweb commented+1 subscribing
Comment #28
jguffey CreditAttribution: jguffey commentedSub, and bump. Can we get this updated for 6.19?
Comment #29
adanelova CreditAttribution: adanelova commentedfor 6.19 or 6.20?
Comment #30
mstiwhere can u find a 6.19 patch?
Comment #31
_-. CreditAttribution: _-. commentedSubscribing
Comment #32
velocis CreditAttribution: velocis commentedJust tried to apply this patch on a 6.20 version of Drupal with the following results (see below)
The failure to patch default.settings.php is expected due to the fact I am patching my core files which are shared by few drupal installs and thus the sites/default directory is kept seperately in each site.
Any tips or explanations regarding the following would be helpful
patching file includes/bootstrap.inc
Hunk #1 FAILED at 1.
Hunk #3 FAILED at 723.
2 out of 7 hunks FAILED -- saving rejects to file includes/bootstrap.inc.rej
patching file includes/common.inc
Hunk #4 FAILED at 2630.
1 out of 4 hunks FAILED -- saving rejects to file includes/common.inc.rej
Patching output below::
I will now try and go through the rejected patches and see if I can massage the changes in
Comment #33
velocis CreditAttribution: velocis commentedI manually patched the rejected code on bootstrap.inc, common.inc and .htaccess file.
The site appears to be running well now (now when you log out admin, it does not cache the devel toolbar when you re-view the page.)
I am however still getting the following errors. Any help in this final bit would be much appreciated and then I will upload the patch file for the 6.20
Comment #34
velocis CreditAttribution: velocis commentedClearing the cache fixed any of the unserialise and invalid argument issues.
The only error (NOTIVE) I have left is the following, what should CACHE_NONE equal ???
Comment #35
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedPlease stop spamming the infra queue.
The patch has been applied tothe pressflow Drupal distribution, I suggest you download that.