Problem/Motivation
Drupal's current outgoing-HTTP capability is, to be polite, minimal. We have one small function with a lousy API that can do basic requests, but that's it. If we want to be serious about web services we need strong bidirectional HTTP support.
Issues depending on a better API for HTTP requests (check these and move down to follow-up if correct, close if redundant now).
#1189464: Add an 'instant' queue runner
#1014086: Stampedes and cold cache performance issues with css/js aggregation - Put AdvAgg in core
#229905: Batch API assumes client's support of meta refresh - Can be fixed with something like Background Process
#106506: drupal_http_request() does not handle 'chunked' responses - Make it support HTTP 1.1
#786074: Allow hooks for drupal_http_request() request and responses
#205969: drupal_http_request() assumes presence of Reason-Phrase in response Status-Line
#386384: drupal_http_request does not support gz compressed data
#289820: drupal_http_request() could support digest authentication
#436240: Add a cURL replacement for drupal_http_request()
#164365: drupal_http_request() does handle (invalid) non-absolute redirects (RFC 7231)
All of these need something better than drupal_http_request() in order to be done well.
Proposed resolution
Writing a solid HTTP client is hard. Fortunately, there's plenty of existing ones out there that we can leverage, just like we pulled in Symfony.
Mikeytown2 did some extensive research into existing libraries, and the end conclusion was that Guzzle was the most full-featured and capable option available. It's only missing feature was asynchronous HTTP requests, which the author, Michael Dowling, added based on our feedback months ago.
The primary concern expressed about Guzzle is its size, which was decidedly not-small. (Multiple megabytes.) The size is justified for a fully capable HTTP client, because HTTP is complex, but not everyone was comfortable with a vendor library that was 2-3 MB in size.
Based on those concerns, Guzzle has been refactored in 3.x to be a suite of components, much the same way Symfony is. That allows us to just pull in 3-4 base components for core, and contrib modules that have need of the more advanced Guzzle components can pull those in when they need to. Problem solved.
The other concern was that Guzzle requires cURL, whereas drupal_http_request() does not. However, based on feedback so far it appears that cURL is far more common than it used to be, so making it a requirement for outgoing requests is not an onerous requirement. Arguably it's less onerous than Drupal's performance requirements, which effectively mandate APC already. The recommended resolution here is "meh, OK, you need cURL. Such is the modern web."
Therefore, Guzzle. Let's do.
Remaining tasks
- Get a green light from Dries on this issue. Dries?
- Add the necessary Guzzle components to core via Composer. Convention has been to do this in a separate patch from this issue to keep patch size down.
- Expose Guzzle to core via the Container.
- Replace calls to drupal_http_request() with Guzzle. (Don't bother wrapping an extra layer around it. There's no benefit to doing so.)
User interface changes
None.
API changes
drupal_http_request() goes away, instead you use Guzzle directly. Other systems like Simpletest could be refactored to use Guzzle's browser as well, but that's for follow-ups.
Follow-ups
- #1862536: Improve documentation of SimpletestHttpRequestSubscriber
- #1599622: Run poormanscron via a non-blocking HTTP request
- #1862398: [meta] Replace drupal_http_request() with Guzzle
- #1875818: Update composer.json to use the latest stable 3.0.x version of Guzzle
- #1875792: Standardize Guzzle exception handling
Comment | File | Size | Author |
---|---|---|---|
#168 | core--adopt-guzzle--1447736--168.diff | 654.38 KB | Fabianx |
#162 | 1447736-162-adopt-guzzle.patch | 643.49 KB | effulgentsia |
#157 | 1447736-157-adopt-guzzle.patch | 643.48 KB | effulgentsia |
#140 | 1447736-140-adopt-guzzle.patch | 644.76 KB | kim.pepper |
#102 | drupal-1447736-101-guzzle.patch | 583.67 KB | effulgentsia |
Comments
Comment #1
boombatower CreditAttribution: boombatower commentedThe two recommended implementations of the Client are:
https://github.com/fabpot/Goutte
https://github.com/symfony/symfony/blob/master/src/Symfony/Component/Htt...
Based on http://www.garfieldtech.com/blog/refocusing-wscci we will already have the HttpKernel component available so I will start with that.
Comment #2
yched CreditAttribution: yched commented"goutte" means "drop" in french. Just sayin' :-)
Comment #3
Crell CreditAttribution: Crell commentedI've also been told that Buzz (https://github.com/kriswallsmith/Buzz) is popular in Symfony circles. We should look at that, too.
For the record, I'm 100% in favor of using a separate library for outgoing requests. One that plays nice with Symfony's HttpFoundation (which does incoming requests) would be better for DX.
Since Web Services involve more than just receiving requests but contacting other systems, too, this is relevant to WSCCI. So, tagging.
Comment #4
Crell CreditAttribution: Crell commentedFrom the readme, it looks like Goutte is more of a scraper, not a generic HTTP client object. That's more what we need, I think. The scraper would only be useful for Simpletest, for things that we shouldn't be testing that way anyway. :-) I don't know that it would work for general purpose REST communication.
Comment #5
boombatower CreditAttribution: boombatower commentedmsonnabaum suggested http://mink.behat.org/, also looking into it.
Comment #6
catchMink would be really handy if we ever want proper browser acceptance tests in core, as well as real browser performance testing. I've not looked at it yet but based on the description it sounds encouraging. We discussed those quite a bit at #1194824: Pick a load test runner and scripting language but didn't know about Mink at the time.
Comment #7
boombatower CreditAttribution: boombatower commentedIf we end up implementing this we should close #64866: Pluggable architecture for drupal_http_request().
Comment #8
catchComment #9
catchRe-titling. Also see recent discussion on #64866: Pluggable architecture for drupal_http_request().
Comment #10
catchAnd #1551600: DBehave!.
Comment #11
mikeytown2 CreditAttribution: mikeytown2 commentedSubscribe from #64866: Pluggable architecture for drupal_http_request()
My original plan was to create a 2.x branch of HTTPRL that could be the base for a pluggable architecture #1593862: 2.x branch and target for core inclusion. Move extra stuff to sub-modules.. Replacing drupal_http_request() with a 3rd party library seems to be the winner so I've come up with a list of things that a HTTP client should have.
List of requirements based off of what I've seen/need.
Whats in core:
- Set Headers
- Method (GET, POST, etc)
- Max Redirects
- Data
- Timeout
- Error handling
Nice things to have (whats in httprl):
- Parallel HTTP streams.
- Non Blocking Requests.
- Set callback function with arguments.
- Domain Connections: Maximum number of simultaneous connections to a given domain name.
- Global Connections: Maximum number of simultaneous connections that can be open in this process.
- Global Timeout: A float representing the maximum number of seconds the call may take.
- Option to alter all streams mid execution (example: request 20 urls & break after at least 5 return).
- Chunk Size (needed for writing to REST server on IIS)
- HTTP Version.
- Add new requests to stack mid execution.
- Cookie Parsing.
- Proxy Support.
- Async Connect.
- Handle chunked encoding.
- Handle gzip and deflate.
- Handle more error conditions.
- Automatically handle given data (if not a string use http_build_query and set application/x-www-form-urlencoded content type).
Things that are not in HTTPRL (usually found in curl)
- Get things from FTP.
- Sending Files.
- Full HTTP 1.1 compliance.
- Support protocols other than HTTP and FTP.
Modules that use context (stream_context_create()) in drupal_http_request():
Acquia Network Connector in acquia_agent_stream_context_create()
Shorten URLs in _shorten_googl()
Both deal with SSL. All other uses for stream_context_create in contrib do not use drupal_http_request (mainly used to send headers when using file_get_contents()).
Other projects like HTTPRL in Drupal land:
Webclient
cURL HTTP Request
Async Jobs
Background Process
Other PHP HTTP Clients
https://github.com/guzzle/guzzle
https://github.com/kriswallsmith/Buzz
https://github.com/jmathai/php-multi-curl
https://github.com/LionsAd/rolling-curl <- closest to HTTPRL from what I can tell
Comment #12
RobLoachBuzz, Buzy or Guzzle are on the top of my list. If they don't satisfy our requirements, we could easily work with those libraries to add the functionality we need.
Comment #13
Crell CreditAttribution: Crell commentedAnother nice-to-have would be request/response objects that are compatible with the Symfony ones. That's not a hard requirement, but it would make DX a bit nicer.
mikeytown, are you volunteering to drive this forward, I hope? :-)
Comment #14
mikeytown2 CreditAttribution: mikeytown2 commentedA comparison of the various php http clients can be found here in this wiki: http://groups.drupal.org/node/233173
Comment #15
mtdowling CreditAttribution: mtdowling commentedHi! I'm the author of Guzzle. I've updated the matrix wiki posted by mikeytown2 with more information on Guzzle. I see that Mink and Goutte have both been mentioned. Guzzle is now the HTTP client used by Goutte, and Goutte is actually one of the browser implementations used in Mink. Let me know if you have any questions or need clarification on anything related to Guzzle.
-Michael
Comment #16
tassoman CreditAttribution: tassoman commentedOn Symfony2 documentation the chapter abt "Testing" explains you can use browserkit with the DomCrawler component and phpUnit for testing easily your code. Running with that framework I think no more 3rd parties software is needed to get things done.
Comment #17
catchI've opened #1599622: Run poormanscron via a non-blocking HTTP request for once this is in (assuming we get non-blocking http request support).
Comment #18
rbayliss CreditAttribution: rbayliss commentedIn reference to Goutte (mentioned above). Yes, Goutte is a scraper, but it's literally a single class wrapper (OK a class and a compiler class we wouldn't need) around Guzzle and DomCrawler, BrowserKit, and a couple other Symfony components. So you could still use Guzzle for HTTP requests, and Goutte for testing, all in the same package.
Comment #19
mikeytown2 CreditAttribution: mikeytown2 commentedGuzzle+Goutte/Guzzle look like the right way forward. What's the next step?
Comment #20
Crell CreditAttribution: Crell commented1) Adding the necessary libraries to the composer.json file, and pulling them in.
2) Replace one or two uses of drupal_http_request() with whatever the new setup would be, so we can see what it looks like and if we need to extend anything or if we can use it directly as-is. (I'd prefer the latter if it works.)
3) Post a patch here and let's have a look.
Assuming people don't hate it entirely we'll probably open a new issue to just add the libraries to core to keep the patch size here manageable.
If you want to work out of a branch, I'm happy to give you access to the wscci sandbox to spin off a new branch for this.
Comment #21
boombatower CreditAttribution: boombatower commentedSo we are still committing libraries even with composer files?
Comment #22
RobLoachEspecially with composer.json files... Add "kriswallsmith/buzz" or "guzzle/guzzle" to the "require" section of
core/composer.json
, and run:It will install the new library accordingly. This patch would help as well: #1663404: Use Composer's defined namespaces to ease maintenance.
Comment #23
boombatower CreditAttribution: boombatower commentedNot really an explanation, kinda like using drush make files and committing all the modules and their dependencies into your own repo...doesn't make a whole lot of sense.
Comment #24
Crell CreditAttribution: Crell commentedboombatower: What you're looking for is #1475510: Remove external dependencies from the core repo and let Composer manage the dependencies instead.
Comment #25
rbayliss CreditAttribution: rbayliss commentedHere's a first pass at using Guzzle for a couple things. I switched the new default aggregator fetcher plugin and system_retrieve_file() to use Guzzle instead of drupal_http_request(). It seems like we can easily replicate all the behavior of drupal_http_request() using subscribers and configuration. The trick is in applying the settings to every client Drupal produces. In this patch, I've only created a default client, but to use the full extent of Guzzle, we would probably want to have custom clients (service clients in Guzzle terms) for more advanced applications. A couple of ideas that came to mind on how we can normalize settings across various clients:
1. Create an event dispatcher specifically for HTTP requests, load it up with subscribers in the container, and make sure a cloned version is attached to every client that's created.
2. Create a client factory that handles attaching a bunch of subscribers and common configuration.
3. drupal_alter().
Comment #26
RobLoachVery nicely done! This is looking absolutely amazing. Don't quite have that much time to evaluate thoroughly currently, but just wanted to say it looks really good.
Do we want to turn drupal_http_request() into a wrapper around Guzzle?
Comment #27
RobLoachAlso curious what the bot says.
Comment #28
Crell CreditAttribution: Crell commentedCan you post a version that doesn't include the Guzzle source as well, so that it's Dreditor-able? :-) I am scared to click Review on a 1.22 MB patch...
Rob: I think we want to eliminate drupal_http_request() entirely and just let objects pull http_client from the DIC directly. The wrapper function is completely unnecessary, and just makes testing harder.
Comment #29
rbayliss CreditAttribution: rbayliss commentedYeah, here's the patch less Guzzle.
Comment #30
Crell CreditAttribution: Crell commentedJust a note that this will change a bit once #1599108: Allow modules to register services and subscriber services (events) gets in.
For now this works, but it should be moved to the Simpletest module as soon as modules have access to register subscribers, per the issue linked above.
I'd rather skip this wrapper function entirely. Let's just expose Guzzle and be done with it.
Is this all subsumed into Guzzle directly? I like. :-)
Then shouldn't we be modifying aggregator to expect Guzzle's headers and then not need this comment?
Should be 'use'd rather than a full namespace here.
Minor nitpick. $result is usually used for a DB result object. $response for an HTTP response object. I'd rather risk confusion with the Symfony response object than with a PDOStatement object. :-)
Ibid.
-15 days to next Drupal core point release.
Comment #31
sunGuzzle looks absolutely fantastic to me. Especially love its extensive documentation.
Would love to see further progress here. (It is possible/likely that we're going to use the new library for the new testing framework.)
That said, if anyone is confused like me, the comparison matrix in #14 is outdated - http://groups.drupal.org/node/233173 is way more complete and some facts have been corrected. @mikeytown2, can you remove the misleading comparison table from your comment here (or replace it with the updated/current one)?
Comment #32
mikeytown2 CreditAttribution: mikeytown2 commentedOld table removed; all that's left is a link to the wiki.
Comment #33
Crell CreditAttribution: Crell commentedI think we can retitle this now.
Comment #34
sunHm... given that we're likely going to NOT "update" WebTestBase for the revised testing framework, but rather "replace" it (with a new/separate class to extend from, and which will then tie into Guzzle), I'm really not sure whether this patch here should "waste" any time on the Simpletest browser...
No, actually, please ignore WebTestBase entirely -- it does not use drupal_http_request() at all (just grepped). WebTestBase uses cURL directly.
The other parts to support the WebTestBase architecture need to be retained though. But it looks like @rbayliss mastered those already :)
That said, I might have to disagree with @Crell ;) but only on a detail... unless I'm mistaken on how and when exactly it is involved, the Simpletest subscriber needs to work even if Simpletest module is not installed.
Comment #35
rbayliss CreditAttribution: rbayliss commentedYes, unless we're changing something drastic, the Simpletest subscriber has to be available outside of the Simpletest module. It's needed to modify the HTTP requests outbound from the site under testing, which typically doesn't have Simpletest enabled. I'll be rerolling this patch very soon.
Comment #36
Crell CreditAttribution: Crell commentedIn either case, I'm fine with replacing drupal_http_request() first, getting this committed, and then we can see what if any impact we want it to have on Simpletest. That can easily enough be a follow-up.
sun: Why does core need the ability to set a header on an outgoing request to be "from simpletest" when simpletest is not enabled? That makes no sense to me at all.
Comment #37
sunThis is done to allow a web test to "recursively" perform a HTTP request against the site under test.
I.e., to show a possible call chain (actual class/function names differ):
All of the issued HTTP requests need to use the User-Agent HTTP header, so that the site under test uses the correct database and configuration.
Without this TestSubscriber, the "nested" HTTP request between 2. and 3. (from the site under test to the site under test) would not use the required HTTP header, so the requested endpoint in 3. would effectively hit the database and configuration of the test runner (or "parent site"), which does not have the test configuration and might not have Aggregator module installed at all.
EDIT: This part will stay, even with the new testing framework.
Comment #38
Crell CreditAttribution: Crell commentedOh yuck. That's gross, but I can see the use case, ugly as it is. I'd still rather refactor our tests to not have to do that sort of inception-esque call chain, but that's not happening any time soon.
Comment #39
boombatower CreditAttribution: boombatower commentedHow is the something we can avoid short of removing the web tests that call that execute code that calls the drupal site? This is also necessary for xmlrpc tests and possibly other service oriented tests (in contrib if not core).
Comment #40
rbayliss CreditAttribution: rbayliss commentedHere's a reroll of the patch in #29, after taking @Crell's comments into account. The do-not-test version is the one that only includes changes outside of core/vendor. Assuming this still looks good and we want to proceed, here's a list of places that currently use drupal_http_request() and still need to be switched to Guzzle.
Comment #41
rbayliss CreditAttribution: rbayliss commentedComment #42
sunWould it make sense to commit the Guzzle library in a separate issue first, so we can focus on all the conversions in this issue?
Comment #43
Dries CreditAttribution: Dries commentedAt first glance, it seems like an enormous library to replace a small function in Drupal.
Also, Guzzle seems to require the PHP CURL extension.
I'm not sure we want this in core and definitely want to brainstorm about this more.
I don't think this is in the critical path for getting WSCCI done so let's not focus on it for now?
Comment #44
mikeytown2 CreditAttribution: mikeytown2 commentedIf curl is a major issue we can have a contrib module that replaces guzzle with one that doesn't use curl, right?
@Dries
Issue is a lot of modules already use/require curl #64866-43: Pluggable architecture for drupal_http_request()
Comment #45
Sheldon Rampton CreditAttribution: Sheldon Rampton commented@Dries: I think Guzzle or something like it would be a significant thing to get added into Drupal. Right now the Services module for Drupal is exploding in popularity, and Drupal 8 will have a lot of Services-like functionality built into core. This will make it much easier to serve up Drupal content in non-HTML forms such as JSON, but there is no corresponding functionality in Drupal that makes it easy to RETRIEVE that content. Guzzle provides a fully-featured HTTP Client package which does more and does it more easily than the current drupal_http_request() function. I think this is important functionality strategically to make Drupal play well with other applications in the future as we move away from a focus on HTML-based web pages and become more focused on Drupal as an importer/exporter of data services.
Comment #46
effulgentsia CreditAttribution: effulgentsia commentedPerhaps an updated issue summary would be helpful. drupal_http_request() is currently 300 lines of ugliness. Whereas Guzzle is 20,000 lines (40,000 if you include tests) of what is probably much more modern, powerful, and pretty code. But what does it actually provide for core? What do we envision adding to core to warrant including such a large library? If part of the Web Services initiative roadmap includes specific service consumption features needed in core, please state what those are in the issue summary. Or, if our testing system, aggregator fetching, OpenId logic, or something else currently in core is running into a limitation that Guzzle solves, please add that to the summary too. If there's not enough benefit for core to justify the weight, why not let it be a contrib project?
Comment #47
Sheldon Rampton CreditAttribution: Sheldon Rampton commentedThe big thing in my view is that there are quite a few Drupal contrib modules which consume some external API via REST queries, and right now there is no standardization about how they make those API calls. For example, the D6 version of the Redmine API module relies upon a separate contrib module called rest_api_query. Dave Reid and I have had some conversations related to updating the Redmine module to D7, and the main area of difficulty/disagreement between our approaches had to do with what approach we should take to do the REST queries, because we both felt that the rest_api_query module was wrong for the task. Dave therefore started writing his own library for that purpose, while I found a PHP library for REST queries that was recommended on Redmine.org. Judging from what people are saying here about Guzzle, it sounds to me like we would have been better off (and would have found it easier to agree quickly upon an approach) if Guzzle or something like it already existed as a standard for Drupal.
If we take the approach of making Guzzle a contrib module for D8, that will be a step forward, but it will also mean that there is still no standard recommended and defined in core which the entire community of Drupal module developers can rely on for guidance on how they should make REST queries. We'll therefore have a proliferation of methods and will probably be trying to sort this out in Drupal 9.
The main value of this issue ticket is that multiple people have come together to discuss and evaluate different PHP library options for HTTP request handling in Drupal. It seems that a consensus has formed in favor of Guzzle as the best option. If that consensus is not yet solid, then maybe we should postpone putting Guzzle into core for the time being. However, if the consensus is solid, I think it should go into core.
As an aside: I think creating a contrib module for Guzzle in Drupal 7 would be a good idea in any case, regardless of what happens with Drupal 8.
Comment #48
Crell CreditAttribution: Crell commentedOne big use case is deployment:
http://groups.drupal.org/node/237443
http://groups.drupal.org/node/244158
http://groups.drupal.org/node/246748
(Incidentally, the latter two still need feedback.)
We want to do a lot more Drupal-to-Drupal communication. That means Drupal sending outgoing requests to other Drupal sites, and sanely handling redirects, recursive connections, POST/PUT/DELETE, Accept headers, etc. Those are all places that we're weak right now. Web services means being strong on outgoing requests, not just incoming.
Comment #49
webchick#40: 1447736-drupal_http_request_guzzle-40.patch queued for re-testing.
Comment #51
mikeytown2 CreditAttribution: mikeytown2 commentedCreated a session to present on this subject at the Pacific Northwest Drupal Summit.
http://2012.pnwdrupalsummit.org/sessions/issuing-http-requests-drupal-8
I believe mtdowling is Seattle based, so I'll see if we can team up on the presentation.
Comment #52
mikeytown2 CreditAttribution: mikeytown2 commentedAnother bug that can be fixed by using guzzle
#229905: Batch API assumes client's support of meta refresh
Found a sandbox just for guzzle: http://drupal.org/sandbox/kimpepper/1723292
Comment #53
pwolanin CreditAttribution: pwolanin commentedI don't think we should include guzzle or any other library for 8 - the security implications alone of adding extra libraries is something that hasn't been addressed.
Instead, let's define a good interface and get this issue done for 8: #64866: Pluggable architecture for drupal_http_request()
In general I think we should be working to define good interfaces with minimal implementations in core and allow them to be swapped for those sites that need something more.
Comment #54
boombatower CreditAttribution: boombatower commentedDrupal runs on the web...modules want to use the web...to do so we need a browser...Having one makes loads of sense for all the reasons above...and having core come with one lowers the barrier and allows for a default choice.
http://drupal.org/node/64866#comment-6382120 sums it up well.
Comment #55
pwolanin CreditAttribution: pwolanin commentedSo, I think the top priority D8 for now should be #64866: Pluggable architecture for drupal_http_request() which I'm re-opening. I think this should be closed or postponed given the lack of concrete answers to Alex's questions in #46.
Comment #56
mtdowling CreditAttribution: mtdowling commentedGuzzle has a lot of features and is extremely extendable. This is not code you had to write nor is it code you will need to maintain (though more contributors are always welcome). I'd love it if Drupal decided to add Guzzle to core, but I understand that some of you have concerns. The issues I'm seeing being raised are the following:
Antithesis to "it's not lightweight"
According to phploc, Guzzle currently is 11,156 non comment lines of code, but I'm curious as to why the lines of code of a project is significant. Are you working on an embedded system with PHP? Is the 1MB of source code (which includes the liberal use of docblocks for each method, class, and property) going to use up all of your disk space on a web server? (even most shared hosts claim to offer unlimited disk space these days).
157,775 - that's the number of non comment lines of code in Symfony2 according to phploc. Does that make Symfony2 a bad candidate for Drupal? Nope; it means Drupal is choosing a sophisticated framework to power its server side framework.
HTTP is not "lightweight" protocol. Heck, the book on HTTP is 656 pages and weighs 2.3 pounds. Does that mean I don't like the book? Nope; it means it can tell me almost everything I need to know about HTTP. Do I have to read the entire book for it to be useful? Nope; I can skip to the parts that I need right now, and if I ever needed to refer to other parts of the book, I would know where to look. </allegory>
While Symfony2 receives requests and powers the server side of HTTP, Guzzle works with the other side of HTTP -- the client. With the rise in web services, the client is just as important as the server in a HTTP transaction because the client is likely a critical part of your application. These days people are using HTTP clients to power their databases, file storage, queuing, emailing, and even their sessions.
Guzzle's aim is to provide access to every feature of HTTP/1.1, be as fast as possible, and provide some abstractions and patterns to make it easier to consume a web service. If you want a client that can handle almost every aspect of HTTP/1.1, then Guzzle is a great option. If you want a client that's 1-2k lines of code, then I don't know how you could come close to the feature set of Guzzle. mikeytown2 wrote a great comparison of PHP HTTP client libraries that can be found here: http://groups.drupal.org/node/233173
If anything, that should be a good sign that the library is extremely well tested. As a matter of fact, Guzzle has 99.98% code coverage -- that's 4,366 tested lines of code out of 4,367 total testable lines of code.
Undefined security implications
Can you elaborate on the security implications? I'd be happy to try to address them.
Requiring cURL
From what several people are posting, there are currently several places in core that already use cURL, so this would not be a new requirement.
It's extremely common for cURL to be installed on shared hosts. A quick Google search shows that GoDaddy, hostgator, dreamhost, and ServerGrove come with cURL. If a shared host does not support cURL, I would guess that they would enable if asked. Additionally, installing the PHP cURL bindings on Windows is trivial: http://curl.haxx.se/libcurl/php/install.html
Using Guzzle vs defining interfaces
This is a great idea in theory, but I'm a bit of a pessimist when it comes to this sort of thing.
The problem is that you would need to abstract every feature that you like in Guzzle into various interfaces (e.g. Request, Response, Client, CookieJar, Cookie, Stream, EntityBody, Url, QueryString, RequestFactory, etc). Making these interfaces flexible and generic enough to be implemented by multiple providers while still providing all of the features Guzzle provides will be a monumental effort. Assuming you get past the task of agreeing on a set of potentially 20+ interfaces (which based on the related PSR discussion is not easy), you would then need to provide implementations of these interfaces for one or more clients. At this point, I would estimate that it's months later and you're using a stripped down version of Guzzle (and likely much slower due to the fact that every call would need to be proxied through an adapter).
-Michael
Comment #57
Dave ReidAs for undefined security implications, can we get answers for what the Guzzle project team's stance or position is on all of the following questions from http://groups.drupal.org/node/250578:
I'm not sure we've also tested as to how secure/dummy-proof the code is from a security standpoint. Are there any active security issues (private or known)? What are the common gotchas (like XSS in Drupal)?
Comment #58
mtdowling CreditAttribution: mtdowling commentedHi Dave,
> The project should have a simple, well documented way to report bugs
Like most open source projects, issues are reported on GitHub.
> The project should follow Responsible Disclosure
> The project should make security releases on a consistent schedule (e.g. every tuesday, every X day of the month)
Sure, but there aren't many security concerns for an HTTP client. A client serializes an HTTP request, sends it over the wire, parses the response, then returns the response. As long as you use SSL and don't disable peer and host certificate verification, there isn't much more a client should be responsible for. Did you have any specific things in mind that might raise concern?
I typically tag a release once a week, but I don't think there's ever been a need for a security release.
> The project will ideally allow interested members of our security team to join their private mailing list/issue tracker so we can have some advance knowledge and coordination around releases.
I don't think this is necessary for Guzzle, but if it's a requirement, we can set something up. Is this something that can be handled on drupal.org somehow if needed?
> The version of the libraries being incorporated into Drupal will be supported by their security process for as long as we reasonably expect Drupal Core to be supported (around 5 years)
I'm sure we could work out a similar approach to what will be taken with Symfony2.
> The project should have active maintainers so that security vulnerabilities can be fixed in a reasonable amount of time (X weeks).
I'm the lead, and at this point in time, there have been 25 contributors: https://github.com/guzzle/guzzle/graphs/contributors
> I'm not sure we've also tested as to how secure/dummy-proof the code is from a security standpoint.
Guzzle has been thoroughly tested both in unit tests and in production systems, but I'd love for someone to do this.
> Are there any active security issues (private or known)? What are the common gotchas (like XSS in Drupal)?
None that I know of.
-Michael
Comment #59
catchI've added two issues depending on this one to the summary. Those aren't enough in themselves to justify adding the library, so hopefully some other people will update the summary too.
Comment #59.0
catchUpdated issue summary.
Comment #60
mikeytown2 CreditAttribution: mikeytown2 commentedI've added a couple more issues to the top. The biggest benefit I see is in contrib. Having something as powerful as guzzle in core allows for a lot of cool things to happen. Looks like a decent sized project is now requiring httprl.
Comment #61
webchickJust to play devil's advocate though, couldn't those contrib modules just add a contributed Guzzle module to their list of dependencies..?
Comment #62
RobLoachJust chiming in here. Blown away by how much work and dedication mtdowling has put into helping us out. This should not be ignored.
mtdowling has split Guzzle into different components. If we need just the HTTP interface, we can do that. Composer will handle its dependencies for us.
Yes, but that's contrib and we wouldn't gain benefit out of it out-of-the-box in Drupal Core. I'm all for keeping Drupal Core light, but the benefits shown in Guzzle extremely out-weight the cons.
Comment #63
pwolanin CreditAttribution: pwolanin commented@Rob Loach - the issue summary seems out of date above, but I'm agree with webchick's suggestion that modules that need it add it as a dependency. I still think this is unreasonable core bloat and overhead of coordinating core releases with this project.
The http component alone is still ~300k of code (~10k LOC).
Comment #64
mtdowling CreditAttribution: mtdowling commentedIn order to address some of the concerns that have been raised on this thread about the size of Guzzle, I've been working on a 3.0 release of Guzzle that will break the project further into components. The end result would be somewhere around 5k LOC that would need to be required by core. Here's the pull request if anyone is interested: https://github.com/guzzle/guzzle/pull/133
The plan is to break Guzzle into the following components via composer and possibly PEAR:
A barebones installation would require guzzle/http, which requires guzzle/common and guzzle/stream. You will be able to install all of the plugin using the guzzle/plugin repository, or install individual plugins using guzzle/plugin-[name]. You will be able to install the entire project by using guzzle/guzzle. (Note that using these components will not pull in the tests)
These changes will allow a cleaner way to separate plugins and the ability to build more robust plugins. For example, the BackoffPlugin and the CachePlugin will be much easier to extend and modify in 3.0. Breaking the project into components like this will mean that contributors can build awesome plugins and new components without worrying about the lines of code. If you don't need a feature, you won't have to pull it into your project. If Guzzle's core were provided in Drupal's core, this setup would make it easier for contrib modules to pull in additional Guzzle modules as needed.
These are breaking changes, but the majority of the changes are just find and replace. I'll provide a small migration script when I release to help ease the migration. These changes are a long time overdue. I can promise that after this release, Guzzle will have much fewer BC changes between versions, and if Guzzle were added to Drupal core, I would try to coordinate with Drupal on BC changes.
Do these changes help to address some of the concerns? Is there something else that could be done?
-Michael
Comment #65
cosmicdreams CreditAttribution: cosmicdreams commentedThe naming conventions here make the inclusion of this sounds like a happy marriage of Web service client in core and Feeds in core.
This would be my wildest dream come true.
Comment #66
effulgentsia CreditAttribution: effulgentsia commented@mtdowling: that sounds awesome. My only suggestion is to really try to be aggressive in cutting things out of guzzle/http and moving anything not essential to the architecture or very simple http requests into a separate component. For example, looking at https://github.com/guzzle/guzzle/tree/modular/src/Guzzle/Http, I'm wondering if:
- Cookie and CookieJar can move to a Cookie component
- Message/PostFile can move to a File component
- Curl can move to a Curl component
I'm just thinking that really minimizing what we need to pull into Drupal core will help a lot in getting this in, but these are just rough suggestions: of course, you also need to keep your project architecture sane and not over-componentized.
Comment #67
pwolanin CreditAttribution: pwolanin commentedIt would seem guzzle still requires cURL, which would then become a requirement for Drupal. That seems like the most likely reason not to pull an external library into core, in addition to possible maintenance issues mentioned above. A smaller library lessens those concerns, but doesn't remove them.
Comment #68
effulgentsia CreditAttribution: effulgentsia commentedClientInterface has setCurlMulti() and getCurlMulti() methods. Would be good for those to go away.
Client has send() and prepareRequest() methods that use the Curl classes. Would be good if those could be abstracted, though if not, we could potentially extend the class and override those methods.
Note that as a last resort, we could swap out CurlHandle to not actually use cURL, though I hope we don't have to resort to such hackery.
Comment #69
Crell CreditAttribution: Crell commentedWhy is cURL such an evil thing to require, exactly? It's already required if you want to use Simpletest...
Comment #70
mikeytown2 CreditAttribution: mikeytown2 commentedIn #56 mtdowling explains that curl is widely supported now. I would agree that it's rare to find a host that does not support cURL. I think that all we would need to do is add a page like the one we did for PDO.
Comment #71
effulgentsia CreditAttribution: effulgentsia commentedThat's not a compelling argument, since many production sites don't need to run Simpletest. However, one can make the argument that a site also doesn't need to run aggregator, openid, or xmlrpc. But, "Update Status" is the one core feature that I would find unfortunate to not be working on sites without cURL, if such sites actually exist.
Comment #72
webchickIt might indeed be more true now than it was when I started with Drupal that a cURL requirement is more standard. I know drupal_http_request() saved my bacon on DreamHost back in 2005 by the fact that it used sockets rather than cURL, but I noticed phpinfo there now shows libcurl/7.21.0 OpenSSL/0.9.8o zlib/1.2.3.4 libidn/1.15 libssh2/1.2.6.
The only other host I have access to atm is Acquia and they have 7.19.7.
This might be worth a webform.com survey piped through g.d.o/core and see if we can get some data on this.
Comment #73
mtdowling CreditAttribution: mtdowling commented@effulgentsia Thanks for the feeback!
I'll think about moving the Guzzle\Http\Cookie object and Guzzle\Http\CookieJar namespace to Guzzle\Plugin\Cookie.
I think these should stay because they're required for sending POST requests.
I'm not sure about this one, but I'll consider it.
I could potentially add a Guzzle\Transport namespace, and update Guzzle\Http\Client to use a TransportInterface. This would open up the ability to support multiple transports in the future (socket, file_get_contents, etc).
The problem here is that it would be incredibly difficult to achieve or even emulate feature parity with a cURL transport, and the amount of effort required to implement additional transports would only benefit a very small percentage of users that do not have cURL installed. Using a socket based approach that can even come close to cURL, including parallel requests, would be a massive undertaking and tens of thousands of lines of code. Each adapter would hopefully support events for transferring data out, in, progress events, and true streaming of HTTP request and response bodies.
It might be a good idea to add the necessary interfaces regardless of if/when alternatives are ever implemented though.
@webchick: It would be awesome to get some data around this!
-Michael
Comment #74
Crell CreditAttribution: Crell commented+1 to collecting data rather than guessing.
Comment #75
pwolanin CreditAttribution: pwolanin commentedimho, it's not worth adding more indirection and interfaces if Guzzle is only ever going to use cURL. The code is already hard to follow.
Comment #76
mtdowling CreditAttribution: mtdowling commented@pwolanin I evaluated the possibility yesterday, but will not be abstracting away cURL. The reason Guzzle is so powerful is because it is so tightly integrated with cURL.
Comment #77
effulgentsia CreditAttribution: effulgentsia commentedSome more thoughts about cURL:
- WordPress does not require it. They developed their own transport abstraction API in order to be compatible on as many servers as possible.
- ZenCart and Magento do require it. My guess is most big shared hosting companies want the business of small shop owners and make their hosting plans compatible with these.
- Per #71, a Drupal site doesn't need to consume web services, so cURL would only be needed for sites that choose to do so (e.g., Aggregator or OpenId). Even Update Status is optional (though a very nice security helper for site maintainers who don't subscribe to the security announcements mailing list).
- Per #44, if sites without cURL that want to consume web services is a small enough edge case, it might not make sense for core to address it. Contrib could provide a replacement implementation that conforms to Guzzle's interface. This would likely only come about if there's a large enough website that wants it. Until then, small website owners would need to switch to one of the many hosting companies that does offer cURL.
Comment #78
sunClosely related: #1322890: Add a property to .info files to allow modules to specify required php extensions
Comment #79
mtdowling CreditAttribution: mtdowling commentedThat's some great insight, effulgentsia. Including what effulgentsia listed, these popular PHP projects all require cURL: Magento, ZenCart, Facebook PHP SDK, AWS SDK for PHP, Twitter OAuth, ThinkUp, Fuel (appears to only offer a cURL client), and the majority of web service clients I've worked with. I'm sure we could come up with a much larger list if we kept looking.
Comment #80
tunic#77, about Drupal consumption of webservices: the point may be what things could be done with a good HTTP client implementation like Guzzle in core instead of actual needs. If Guzzle is included other core and contrib components will be using it (check mentioned cases in the issue body).
Comment #81
cosmicdreams CreditAttribution: cosmicdreams commented#80: In my opinion, if Drupal had Guzzle AND we accomplish everything we're targetting in the WSCCI initiative then we'd have web-service driven, platform agnostic Outputs and inputs.
Once we connect Guzzle with data import logic (Migrate, Feeds, Import/Export API, etc.) we can allow Drupal to simply act as a datastore in projects that
In my opinion, whenever I read about the progress of the Services module in D6 and D7, I always thought it would allow Drupal to act as a web client that connects to others, instead of acting like a Server that provided it's data.
Being able to pull in data from outside sources so that content can be curated is exciting to me because I love data warehousing. That's a whole category of stuff we can approximate also long as we had good clients and servers.
Comment #82
catchIt seems OK to add a cURL dependency to modules like aggregator and even update status. We have a nag when update_status is disabled, so while it'll be really annoying to have Drupal nag you about a module that it won't also let you enable, that's probably the right kind of annoying.
I also can't really think of a feature we'd need in core using outgoing http requests that would have to be required - we removed system_check_http_request() as well as the strange clean urls check which are the only places I can think of in core that were doing this and neither will be missed. Given that I can't see it creeping up to a full dependency.
Our PHP version and PDO requirements have been a lot more stringent than this and that's worked out OK. And the release is almost a year out so there's plenty of time for both hosting companies and small sites to prepare.
Comment #83
cweagansIn the last two months, I think that there's been a pretty compelling argument for adding Guzzle to core, so I'm going to reopen this.
Answers were given to #46, and the project maintainer seems very amicable to making changes that we need made.
For the record, I'm okay with adding a curl dependency to all of core - I mean, what host doesn't have curl available? If it does turn out to be a problem, then is there any reason that Guzzle can't be extended with another class that uses sockets instead of curl? I don't see anything particularly difficult about that.
Comment #84
pwolanin CreditAttribution: pwolanin commented@cweagans - I don't think you can do that to guzzle itself. See #73.
I think the main question is whether core ships w/ some interface that overlays guzzle, or whether we depend on and ship the minimal parts of it and depend on some factory methods to swap out a piece if someone really has a use case that cannot be satisfied.
Comment #85
kim.pepperI agree with comments by @mtdowling in #56. We would have to do a load of work to abstract every Guzzle interface
Comment #86
mtdowling CreditAttribution: mtdowling commentedI just released Guzzle 3.0. Guzzle is now broken into the many components listed earlier in the thread. This would allow Drupal to pull in only the bare-bones requirements of Guzzle. You can read more about the release and the component installation options here: http://mtdowling.com/blog/2012/10/16/guzzle-3-dot-0-better-service-descr...
Comment #87
podarok#86 nice update
Comment #88
Crell CreditAttribution: Crell commentedNice work, mtdowling!
pwolanin: Does the revised Guzzle aussage your concerns? Would you be comfortable with putting the base guzzle/guzzle package (and maybe one or two others as needed) in core, and saying "if contrib wants more, they can grab the other components with composer"? (That's pretty much what we're doing with Symfony.)
catch, how about you?
Comment #89
effulgentsia CreditAttribution: effulgentsia commentedI'm guessing that catch will care about this statement from #64:
With Symfony, fabpot formalized this a bit more into components that will keep strict BC across 2.x point releases and ones that won't, and so far in Drupal core we've only added the stricter set of components, though #1810472: Add Symfony's Serializer component to core despite Symfony potentially releasing BC-breaking updates after 2.3. may become our one exception. @mtdowling: are you thinking of doing something similar for Guzzle 3.x, where BC is treated differently per component, or in general what are your thoughts of what things would look like for Guzzle in Drupal 8.x in 2017/2018?
Comment #90
catch@Crell, I've been generally in favour of this, and the updated structure of Guzzle looks like a really good change. It'd be good to see what the answer to effulgentsia's question about BC over the next 5-6 years is, but restructuring the library so it's easier for us to commit it suggests it should be easy to resolve issues in the future as well.
If we use Guzzle, we'd hopefully be able to unpostpone #1599622: Run poormanscron via a non-blocking HTTP request which would cheer me up for one. In this case most of the pushback has been from Dries, so unassigning from me.
Comment #91
effulgentsia CreditAttribution: effulgentsia commentedTo move this along, let's get a patch up here that just adds Guzzle 3.0 core and the other 1 or 2 components we expect to need (I'm hoping Guzzle is setup to allow this to be done via Composer). Then, pwolanin, Moshe, Dries, and others who have been concerned about code weight / complexity can look it over (yes, they can also look it over on Guzzle's github, but having it here would put it in their face a little bit more). Perhaps it also makes sense to change drupal_http_request() to use Guzzle as part of this patch to demonstrate what a simple Guzzle consumer might look like, but I don't think it's worth changing any consumers of drupal_http_request() yet, since Dries hasn't agreed to Guzzle yet.
Comment #92
rbayliss CreditAttribution: rbayliss commentedI definitely agree we should put a patch together that includes the library. A couple simple examples of how Guzzle (pre-3.0, anyway) can be used is found in #40 above. That patch was applying and passed tests at the time it was provided.
Comment #93
mtdowling CreditAttribution: mtdowling commented#89:
> are you thinking of doing something similar for Guzzle 3.x, where BC is treated differently per component
I don't have anything written up like what fabpot has, but I'd be happy to put something together next week.
In short, Drupal would only require guzzle/http, which in turn requires guzzle/common, guzzle/parser, guzzle/stream. These core packages in Guzzle will have the same promise Fabien made for his core packages. However, other packages, i.e. guzzle/service, could potentially have minor BC changes introduced over time. Further, Guzzle 3 will strictly follow semantic versioning.
> what things would look like for Guzzle in Drupal 8.x in 2017/2018
If between now and 2018 Guzze 4 is released, then Guzzle 3 will be maintained in a separate branch. This is the approach I'm taking with Guzzle 2. While I'm not actively maintaining Guzzle 2 anymore, I'll accept pull requests for it and will address any security concerns.
Comment #94
mikeytown2 CreditAttribution: mikeytown2 commentedTrying to manually merge in changes from #40 into latest dev. Got some questions. Hopefully the answers are easy :)
Looks like core/lib/Drupal/Core/DependencyInjection/ContainerBuilder.php has changed a lot. Where can I register
http_default_client
? Would it belong in core/lib/Drupal/Core/DependencyInjection/Compiler/RegisterKernelListenersPass.php?Note: This has NOT been added to this patch. I don't know where to put it.
The missing bit of code is sorta important thus this patch won't work without it.
With this patch applied, I believe all one needs to do is run
and composer will do the rest. Or run
if composer.phar isn't in the
core
directory but drush is installed.Comment #95
effulgentsia CreditAttribution: effulgentsia commentedcore/lib/Drupal/Core/CoreBundle
Comment #96
rbayliss CreditAttribution: rbayliss commentedWe might want to wait on a reroll though, since #1706064: Compile the Injection Container to disk is RTBC, and that will change where the service is registered again.
Comment #97
mikeytown2 CreditAttribution: mikeytown2 commentedPatch with http_default_client registered. Still not ready for test, will be doing that next (
php composer.phar update
).Comment #98
mikeytown2 CreditAttribution: mikeytown2 commentedfull patch attached.
Comment #99.0
(not verified) CreditAttribution: commentedAdded in more issues that could be solved by using guzzle
Comment #100
Crell CreditAttribution: Crell commentedI've updated the issue summary. If I missed anything, please add it. Assigning to Dries for a green light.
Comment #100.0
Crell CreditAttribution: Crell commentedAdd proper issue summary.
Comment #101
mikeytown2 CreditAttribution: mikeytown2 commentedI was able to meet up with mtdowling at the Pacific North West Drupal Summit this weekend. Long story short guzzle has been heavily profiled for speed (overhead of guzzle), so if that is a concern, I wouldn't worry about it; the code is fast.
With things like service descriptions in guzzle, putting guzzle in core is starting to make more and more sense when WSCCI is taken into account.
Comment #102
effulgentsia CreditAttribution: effulgentsia commentedWe only need guzzle/http and its dependencies.
Useless line.
I'm not clear why this is an improvement, so I took it out of this patch. We can do it in a follow up. The aggregator example demonstrates some clearer improvements.
6 days to next Drupal core point release.
Comment #103
effulgentsia CreditAttribution: effulgentsia commentedBy the way, in #102, the composer.lock file grew a lot compared to HEAD. Is there a switch to
php composer update
for generating a less verbose lock file? I don't know how we have such a terse one in HEAD.Comment #104
Crell CreditAttribution: Crell commentedComposer changed its .lock file format a few weeks ago in preparation for an RC, I believe. If it works, I wouldn't worry about it.
Comment #105
cosmicdreams CreditAttribution: cosmicdreams commentedThis doesn't seem related to using Guzzle but that looks nice.
Comment #106
effulgentsia CreditAttribution: effulgentsia commentedThat was automatically updated by Composer.
Comment #107
RobLoach#102: drupal-1447736-101-guzzle.patch queued for re-testing.
Comment #108.0
(not verified) CreditAttribution: commentedFixed typo.
Comment #109
Fabianx CreditAttribution: Fabianx commented#102: drupal-1447736-101-guzzle.patch queued for re-testing.
Comment #110
Crell CreditAttribution: Crell commentedWe can flesh out usage further once this is in. Let's get the big patch in, then we can spin off other issues.
Comment #111
hass CreditAttribution: hass commentedAs Link checker heavily relies on Drupal core and how it works today and as I know that cURL cannot work properly - has someone tested if Guzzle has implemented some workarounds or solved the inabilities of cURL. Just from memory.
Stop redirecting after the first request (e.g. the first HEAD answers with 301). cURL in this case will follow all redirects and only gives me the last 200 what break will link checker completely...
I'm sorry, but I've forgotten the details of these major limitations I had with cURL, but it was a show stopper for sure I've found with http://www.onlineaspect.com/2009/01/26/how-to-use-curl_multi-without-blo.... Maybe this have been issues with multi_curl, I tried to use for boosting the link check speed... but the redirect limit we have in Core is something that cURL cannot handle correctly. Have you really tested this status codes?
Comment #112
RobLoachGuzzle can send requests via curl_multi. If their implementation is not enough for us, then it's still possible to inherit from it and swap out the definition we use using drupal_container().
Comment #113
mtdowling CreditAttribution: mtdowling commentedI'm trying to understand what you're talking about, but you haven't provided enough details.
> As Link checker heavily relies on Drupal core and how it works today and as I know that cURL cannot work properly
Please elaborate with evidence.
> Stop redirecting after the first request (e.g. the first HEAD answers with 301). cURL in this case will follow all redirects and only gives me the last 200 what break will link checker completely...
You can turn off redirects if needed, or set the maximum number of allowed redirects. If you do allow redirects, you can get the chain of request and response objects that created the final response by calling getPreviousResponse() on a Response object.
I should also mention that I moved away from cURL powered redirects to redirect plugin that can work around some issues with redirects and non-repeatable entity bodies / cookies being added on a redirect. These changes are present in master and will be tagged and released likely this weekend.
> the redirect limit we have in Core is something that cURL cannot handle correctly
Can you elaborate?
-Michael
Comment #114
sunYes, we should definitely do this.
Also, to give an update from the testing system side: Meanwhile, discussions resulted in a concrete battle plan:
#1567500: [meta] Pave the way to replace the testing framework with PHPUnit and possibly rewrite Simpletest
We will start to work on this for reals after feature freeze in:
#1801176: Deploy a PHPUnit system with a bottom-up approach
This effort will also require Guzzle.
Comment #115
mikeytown2 CreditAttribution: mikeytown2 commentedHeads up: Amazon's AWS SDK for PHP now uses Guzzle
https://github.com/aws/aws-sdk-php/
Comment #116
RobLoachTagging.
Comment #117
RobLoach#102: drupal-1447736-101-guzzle.patch queued for re-testing.
Comment #118
Crell CreditAttribution: Crell commentedSo what are we still waiting for here? Dries, catch, please either commit or mark back to CNW with an explanation.
Comment #119
catchI already posted in #90, but new libraries need agreement from Dries as well, which is why it's assigned to him.
Comment #120
klausiWhile writing Simpletests for rest.module I really started to hate cURL. Cannot even access the Location response header without parsing the response myself. So let's please use Guzzle to have readable and maintainable tests for rest.module.
Not sure this qualifies as a critical task, but at least major.
Comment #121
Dries CreditAttribution: Dries commentedI looked into this and am +1 for this issue. I don't think we should be in the business of maintaining an HTTP library. My biggest concerns is around security. We need the Guzzle team to do security releases well. This can be committed once we're below the commit threshold.
Comment #122
mtdowling CreditAttribution: mtdowling commentedThat's great!
If Guzzle eventually goes beyond a 3.x release in the distant future, then security fixes will always be backported to a 3.x branch.
Is there more feedback or a specific process in mind for security fixes?
Comment #123
klausiThe Drupal community encourages responsible disclosure: http://en.wikipedia.org/wiki/Responsible_disclosure
When I search the web for "guzzle report security problem" the first hit is a link to a Drupal module ;-). I could not find a hint on reporting security problems on the Guzzle homepage.
Furthermore the Drupal security team has a security advisory process with RSS feeds and a mailing list for notifications: http://drupal.org/security
And the security release process follows a predictable timing scheme, i.e. security releases are done in the middle of the week on work days so that people are available to react accordingly.
So from Drupal's point of view it would be good to know when Guzzle schedules security releases, so that we can release on the same day or as close as possible.
Comment #124
effulgentsia CreditAttribution: effulgentsia commented#102: drupal-1447736-101-guzzle.patch queued for re-testing.
Comment #125
effulgentsia CreditAttribution: effulgentsia commentedWe are currently 2 major tasks over thresholds and this is one of them. I don't know whether that means we should commit this or reclassify as a feature request. I queued a bot check just in case the answer is the former.
Comment #126
RobLoachmtdowling stated above:
Sure, but there aren't many security concerns for an HTTP client. A client serializes an HTTP request, sends it over the wire, parses the response, then returns the response. As long as you use SSL and don't disable peer and host certificate verification, there isn't much more a client should be responsible for. Did you have any specific things in mind that might raise concern?
I typically tag a release once a week, but I don't think there's ever been a need for a security release.
So, as long as we roll the latest stable from Guzzle, then we'll be solid. No reason to hold a Guzzle release for a Drupal release. Their release cycle is a lot more aggressive than ours.
Comment #127
scor CreditAttribution: scor commented@mtdowling said in #58:
What about security issues? These are usually not reported / discussed publicly. Is there a email address to report these? or do we contact you personally? I couldn't find any indication on http://guzzlephp.org/ or github.
The Drupal Security team would prefer to follow the same policy for all third party libraries, no matter the size and/or risk associated with a library, that's why we're asking you these questions...
Comment #128
mtdowling CreditAttribution: mtdowling commentedThanks for the feedback. I've create an email address for security issues, security@guzzlephp.org, and updated the documentation accordingly: http://guzzlephp.org/guide/contributing.html#reporting-a-security-vulner...
Comment #129
David_Rothstein CreditAttribution: David_Rothstein commentedSo are we actually in a situation where this issue is blocking itself from being committed? :)
I think to resolve that catch-22 we can just demote this issue's priority. #120 bumped it to major based on a shortcoming in cURL, but the patch is more about replacing drupal_http_request() which I don't think has that same shortcoming. (Also, having to parse a response in a simpletest to work around a cURL limitation really doesn't seem like a major issue to me, just an annoyance.)
Comment #130
David_Rothstein CreditAttribution: David_Rothstein commentedActually, I just tried this patch out on a server with cURL disabled and got all sorts of "call to undefined function" fatal errors when using the Aggregator module.
This patch needs a strategy for how it's going to handle servers that don't have cURL, whether to allow the modules to be enabled but handle the failed requests gracefully, or make each individual module that depends on cURL declare it as a dependency (as Simpletest already does), or just make all of core depend on cURL, etc.
Comment #131
David_Rothstein CreditAttribution: David_Rothstein commentedMysteriously disappearing tag.
Comment #132
mikeytown2 CreditAttribution: mikeytown2 commented@David_Rothstein
The server with cURL disabled, is cURL available on it? Is this a simple matter of turning it on or do you have a rare host that doesn't have any cURL support?
Comments about cURL #56, #66 - #83. Long story short we are okay with making cURL a core dependency (from my understanding). Vast majority of hosts can enable/install cURL if cURL isn't available.
Comment #133
Crell CreditAttribution: Crell commentedLet's just add it as system_requirements() check and call it a day. Maybe we can push that out to specific modules later, but for now it sounds like it's not anymore of an onerous requirement than we have already.
Comment #134
sunI agree with that — haven't seen a host on which cURL wasn't available in the past years.
Essentially, as @Crell says, the check has to be moved into
system_requirements()
, but off-hand, I'm not 100% sure whether that is sufficient for the Drupal installer as well as update.php. 100% confidence can most likely only be achieved by temporarily disabling cURL via php.ini and manually testing those requirements checks.That said, I wouldn't mind this patch from landing without that manual testing. Like it or not, we'll run into PHP extensions issues anyway with D8, since the Symfony components we're actively using have a range of undocumented dependencies. (E.g., it is more than possible that our entire
includes/unicode.inc
is completely obsolete, because we're indirectly relying on the multibyte extension in various of subsystems already and unicode.inc #1 kicks in too late and #2 isn't even known to standalone vendor components in the first place. Just to name the first battle-field.)Comment #135
effulgentsia CreditAttribution: effulgentsia commentedBack to needs work for some kind of requirements check added to this patch.
My preference would be that we release D8 with the ability to run without cURL, and only make Aggregator, OpenId, Update Status, Testing, etc. depend on it. But I'm happy to take that up in a follow up if it's easier to add it to system_requirements() for the time being. However, #102 only uses Guzzle for Aggregator, so I would consider it sufficient to add a check to aggregator_requirements() instead if that's just as easy.
Comment #136
catchI'd also prefer to just add the requirement where it's needed, and just for aggregator for now. The worst that happens is you install a module that doesn't declare it and get some notices if someone forgets to add the dependency, as opposed to not being able to install in the first place.
Comment #137
David_Rothstein CreditAttribution: David_Rothstein commentedNope, it's very easy to turn on in my case - I was just pointing out that we need a better way to tell people that than a white screen of death :)
Comment #138
kim.pepperI applied the patch at #102 but there are lots of composer.lock differences.
If I run composer update, it updates all components including symfony.
How do I re-roll this patch without updating everything?
Thanks!
Kim
Comment #139
effulgentsia CreditAttribution: effulgentsia commentedYou should be able to run
composer update guzzle/http
to just update that.Comment #140
kim.pepperI have:
Comment #141
mikeytown2 CreditAttribution: mikeytown2 commentedTest Passes. Moving this back to RTBC as curl has been added to the system requirements thus preventing unknown WSOD (main objection in #130).
Comment #142
cosmicdreams CreditAttribution: cosmicdreams commentedAh I see. The tests pass and this patch is green but the comment in this issue doesn't annotate it as so.
Comment #143
Fabianx CreditAttribution: Fabianx commented#140: PASSED: [[SimpleTest]]: [MySQL] 48,751 pass(es).
Comment #144
catchWe should be adding the requirement for individual modules that require cURL, not the entire system.
Comment #145
cweagansDoes it really matter? cURL is almost universally available so almost nobody will hit that check anyways.
Comment #146
catchDavid hit it but I don't know if he turned it off especially for testing this patch.
I don't really mind adding cURL as a dependency in general, but it feels like an artificial limitation if we don't actually require it.
Comment #147
cweagansCan we revisit that after feature freeze and commit the patch as-is for now?
Comment #148
Dave ReidSince Guzzle would be available as a system library, it feels like it makes more sense to add cURL as a system dependency. As a contrib author, adding requirements for cURL for each module that would need to use the API would be frustrating, especially since I can't easily 'extend' the list of PHP required extensions and I have to write a new hook_requirements() for each module each time.
Comment #149
effulgentsia CreditAttribution: effulgentsia commentedOne benefit of adding the hard dependency now is we can get feedback from the wild (to the extent that people testing D8 is sufficiently wild) for who doesn't have cURL. I agree we should revisit the artificially hard dependency before release though. Back to catch for comment.
Comment #150
David_Rothstein CreditAttribution: David_Rothstein commentedIn my case I turned it off to test the patch. I'm pretty sure people do sometimes run Drupal in restricted environments though (where outgoing HTTP requests are forbidden); whether that translates to not even wanting to or being able to install cURL itself (regardless of whether its requests are going to be blocked), that I'm not sure.
As far as I can tell, the only places in core that call drupal_http_request() are optional modules, so making it a strict requirement seems a little overboard. The one exception is system_retrieve_file(), which is an API function in System module but isn't actually called from anywhere outside the Update module.
I would think the right way to handle that is sort of like http://drupal.org/project/curl - core could move all the Guzzle and HTTP request code into its own independent module, then any module which relies on HTTP requests can just declare that module as a dependency (and the module itself can take care of hook_requirements). It's a better experience for end users also.
However, it's too big of a change to the patch at this point, so just adding the code to aggregator_requirements() (copied from the Simpletest module) like @catch suggested seems pretty reasonable, and anything else could be a followup... I would have done that myself but wasn't sure what to do about system_retrieve_file().
Either way, the above patch isn't quite right since adding the code to system_requirements() means the simpletest_requirements() code is obsolete, but it's still there.
Comment #151
David_Rothstein CreditAttribution: David_Rothstein commentedCrosspost. Setting RTBC for @catch's feedback is good.
Although like I said, there's still a problem with the patch no matter what:
But it's a minor one.
Comment #152
David_Rothstein CreditAttribution: David_Rothstein commentedComment #153
catchFor contrib modules there's #1322890: Add a property to .info files to allow modules to specify required php extensions which would avoid having to write system_requirements() every time. I don't think saving contrib modules from having to write even a system_requirements() though is worth completely blocking installs for anyone on a restricted environment unless they hack core.
Also iirc several linux distros (ubuntu is one I think) don't have php5-curl out of the box - you need to install that separately. Most people will have no restrictions on doing so if it's their desktop but it's still an instant failure when you try to install.
Comment #154
sunIMHO, #1322890: Add a property to .info files to allow modules to specify required php extensions is the best and ultimate answer, but for both core and contrib modules.
I think we should commit this as-is with the system requirement for now, and migrate that into individual module .info files as soon as the capability exists.
Comment #155
Crell CreditAttribution: Crell commented+1 to #154. Let's get the library in so we can use it; we can fiddle with the requirements flags for months without changing the API after that.
Comment #156
Sylvain Lecoy CreditAttribution: Sylvain Lecoy commentedCome on even IBM mainframe system support curl!
Comment #157
effulgentsia CreditAttribution: effulgentsia commented#140 no longer applied, so I needed to reroll.
I also noticed that despite saying in #102 that I removed the change to system_retrieve_file() to cover that in a follow up, that I hadn't actually done so, but in this patch, I did.
And finally, I removed the change to system.install, and added an aggregator_requirements(). I copied the relevant bits from simpletest_requirements(). Since we have #1322890: Add a property to .info files to allow modules to specify required php extensions in the queue, I'm not concerned about the 10 lines of code duplication that each Guzzle consuming module will need until that issue lands.
Comment #158
mikeytown2 CreditAttribution: mikeytown2 commentedLast patch passed. #144 has been addressed with this latest patch thus moving this back to RTBC.
On a side note I'm getting married December 1st (yes code freeze day). This getting in would be a really nice gift. Also means I'm pretty busy and can't help much with pushing this through.
Comment #159
Crell CreditAttribution: Crell commentedCongratulations, Mike! Let's commit this as an early wedding gift. :-)
Comment #160
catch#157: 1447736-157-adopt-guzzle.patch queued for re-testing.
Comment #162
effulgentsia CreditAttribution: effulgentsia commentedJust a reroll for HEAD drift. No actual changes.
Comment #164
catch#162: 1447736-162-adopt-guzzle.patch queued for re-testing.
Comment #166
Fabianx CreditAttribution: Fabianx commented#162: 1447736-162-adopt-guzzle.patch queued for re-testing.
Comment #168
Fabianx CreditAttribution: Fabianx commentedAnd another re-roll to land on a different testbot ...
Comment #170
mikeytown2 CreditAttribution: mikeytown2 commented#168: core--adopt-guzzle--1447736--168.diff queued for re-testing.
Testbot went fatal on line #30. Thinking this should still pass
Comment #172
Fabianx CreditAttribution: Fabianx commented#168: core--adopt-guzzle--1447736--168.diff queued for re-testing.
Comment #174
jthorson CreditAttribution: jthorson commentedThe testbots have a hard cap at 90 minutes per test ... with everything we've dumped into core, I've seen testing of D8 + this patch exceeding that at least twice today.
Requeued ... if it comes back as failed during invocation of run-tests.sh in 90 minutes again, we're going to have to i) track down what's slowing down the tests (which may be outside of this patch ... *everything* is slow right now), or ii) increase the timeout on the bots.
Comment #175
jthorson CreditAttribution: jthorson commentedPassed.
Comment #176
jthorson CreditAttribution: jthorson commentedOoops ... RTBC as per #162.
Comment #177
webchickK, committing the crap out of this while it's still green. Taking a raincheck from earlier when we were at thresholds.
Committed and pushed to 8.x. Thanks!
Comment #178
ParisLiakos CreditAttribution: ParisLiakos commentedAwesome!
I guess a change notice will be needed
Comment #179
twistor CreditAttribution: twistor commentedFantastic!
Comment #180
catchIs there an issue to actually convert drupal_http_request() to use this yet?
Comment #181
Sutharsan CreditAttribution: Sutharsan commentedI like to see an example too. Need to convert the drupal_http_request in #1804688: Download and import interface translations. And like to find a way to route the calls to http://ftp.drupal.org made by its test script back to the calling test domain. The test domain should serve files as if they come from an external domain. A http request and a redirect in one.
Comment #182
Berdirgit show 5058fa3 core/modules/aggregator/lib/Drupal/aggregator/Plugin/aggregator/fetcher/DefaultFetcher.php
Or look for that file in the patch.
For testing, look for Mock Plugin on http://guzzlephp.org/guide/plugins.html.
According to a quick grep, we're not yet including the Mock plugin, so you will also have to update composer.json to fetch it. https://packagist.org/packages/guzzle/plugin-mock, so the line in composer.json should look like "guzzle/plugin-mock": "3.*" I guess.
EDIT: @catch I think the goal is to directly use guzzle everywhere and then just drop drupal_http_request() once everything is converted.
Comment #183
catch@Berdir, leaving drupal_http_request() as is then just dropping it when it's not used in core works for me, but it'd still be good to know where the conversion issues are.
Comment #184
BerdirThis resulted in an interesting "bug": #1859536: Testing/Aggregator module warning: Requirements check comparison output two 'strings' (cURL) as an array.
Comment #185
BerdirRight, filled #1862398: [meta] Replace drupal_http_request() with Guzzle.
Comment #186
BerdirWill try to get started with a change notice here.
Comment #187
BerdirHere is a start: http://drupal.org/node/1862446.
Please review and complete.
While writing the example, I noticed that there are some possibly relevant things that we lost during the conversion:
- The old code updated the feed url in case of a 301 response.
- The old code explicitly only accepted the response for a given set of response codes. I'm not really sure when the BadResponseException is actually thrown, what happens if the site returns a 5xx response code? Are we going to happily return this to the parser? Does Guzzle provide anything to differentiate a "good" response from a unexpected one? Or a feature to set expected status codes and throw an exception on an unexpected one?
Comment #188
BerdirAnother thing, SimpletestHttpRequestSubscriber is currently added by CoreBundle, but it should be in simpletest.module, as @Crell already pointed out after the first patches here.
But, that means that the event will only be executed in tests that have simpletest.module enabled, which is kinda pointless ;) So we either need to keep it there or add a TestBundle or something like that? I'm not exactly sure what the point of that subscriber is, anyway, is it just for requests done by drupalPost()/drupalGet()? Another follow-up, I guess...
Comment #189
BerdirOk, I did some testing with the BadResponseException and a) it throws an exception on e.g. not found (which is going to be interesting in drupalPost()/Get() because there sure are a lot of cases where this is the expected behavior, just like access denied and more. b) The current error handling in DefaultFetcher.php can result in a fatal error because there isn't always a response, for details see #1862524: Convert drupal_http_request usage in aggregator.module to Guzzle, I suggest we deal with that there. @mtdowling, it would be great if you could comment over there in regards to the recommended error/exception handling with Guzzle.
Also opened #1862536: Improve documentation of SimpletestHttpRequestSubscriber.
Will add these follow-ups to the summary and try to stop spamming this issue ;)
Comment #189.0
BerdirAdded a related issue to drupal_http_request()
Comment #190
sunThat was explained in #37 already.
Comment #191
effulgentsia CreditAttribution: effulgentsia commentedThe change notice looks fine to me. Perhaps it can be improved as we follow through on remaining conversions, but it already identifies the key change, provides some example code, and links to documentation.
Comment #192
BerdirComment #193
Sheldon Rampton CreditAttribution: Sheldon Rampton commentedWhile running SimpletestHttpRequestSubscriber, my website met an earlier version of itself and accidentally shared information that altered its own timeline. It then became self-aware and began recursively sending User-Agents back to eliminate Sarah Connor.
Comment #193.0
tim.plunkettUpdated issue summary.
Comment #194
David_Rothstein CreditAttribution: David_Rothstein commentedSorry for the noise; I'm told that slashes in tags can break the autocomplete on drupal.org.
Comment #195
David_Rothstein CreditAttribution: David_Rothstein commentedAnd again...
Comment #197
arafathhashmi CreditAttribution: arafathhashmi commentedSeems not fixed yet, however my host has solved few of the bugs.
Comment #198
bappa.sarkar CreditAttribution: bappa.sarkar commentedcheck https://drupal.org/node/2082353
Comment #199
bappa.sarkar CreditAttribution: bappa.sarkar commentedComment #200
bappa.sarkar CreditAttribution: bappa.sarkar commentedComment #201
BerdirThere is no need to re-open this issue for that?
Comment #201.0
BerdirUpdated issue summary.
Comment #202
dgtlmoon CreditAttribution: dgtlmoon commentedIs this worth backporting? (For those searching - yes adding Guzzle means it now supports HTTP/1.1 and with chunked encoding correctly)
Comment #203
mikeytown2 CreditAttribution: mikeytown2 commentedhttps://www.drupal.org/project/composer_manager fills in the 7.x space