Problem/Motivation

Drupal provides a way to convert a relative url to an absolute url, but no way to convert an absolute url to a relative url. Having the (limited) ability to convert an absolute url to a path on the current site is necessary in order to parse HTML messages for inline image attachments. (See #46).

bfroehle said in #15, a function like url_to_path() would be useful to the Central Authentication Service module..

It may also be helpful in a future 7.x/8.x port of filter_html_secure_image_process(), which duplicates much of this functionality.

Proposed resolution

it turns out that most of the necessary logic is contained within the conf_path() function, but that generally-useful part needs to be segregated from the rest of the function in order to be re-usable.

The patch under review modifies includes/bootstrap.inc by moving the re-usable code from conf_path() to a new find_conf_path() function.

The remainder of the needed functionality may be provided by a contrib module.

Remaining tasks

The API change node needs to be reviewed.

User interface changes

None.

API changes

No changes to existing code is needed. A new function is made available for developers who need it.

Original report by pillarsdotnet:

Drupal provides a way to convert a relative url to an absolute url, but no way to convert an absolute url to a relative url. This would be helpful, for example, when parsing an HTML mail body for image urls that could be converted to inline attachments.

Comments

pillarsdotnet’s picture

Title: Provide url_to_path() function by re-factoring and re-using code from existing conf_path() function. » Provide url_is_local() function.
Version: 8.x-dev » 7.x-dev
Status: Needs review » Active
StatusFileSize
new4.04 KB

(EDITED to reflect the status of the latest patch submission)

Re-factor the conf_path() function to create find_conf_path(), url_to_path(), and url_to_realpath() functions.

/**
 * Find the appropriate configuration directory for a given host and path.
 *
 * @param $http_host
 *   The hostname and optional port number, e.g. "www.example.com" or
 *   "www.example.com:8080".
 *
 * @param $script_name
 *   The part of the url following the hostname, including the leading slash.
 *
 * @return
 *   The path of the matching configuration directory.
 *
 * @see conf_path()
 */
function find_conf_path($http_host, $script_name, $require_settings = TRUE);
/**
 * Returns the local relative path corresponding to a given URL, if possible.
 *
 * Finds the configuration directory matching the given URL, and compares it
 * to the current configuration directory. Returns a path relative to the drupal
 * install if they match, and FALSE if they don't.
 *
 * In a properly-configured multisite installation, this function helps answer
 * the question, "Could a given URL match a file or path within my site?"
 *
 * Whether the URL does in fact resolve to this site, or at all, cannot be
 * determined within Drupal itself.  This is only a sanity check.
 *
 * @param $url
 *   The internal path or external URL being linked to, such as
 *   "drupal/node/34" or "http://example.com/drupal/foo".
 * @return
 *   FALSE if $url contains a host/port/path that does not match the current
 *   site, or else a drupal path such as "node/34" or "foo".
 *
 * @see conf_path(), find_conf_path()
 */
function url_to_path($url);
/**
 * Returns an absolute local path corresponding to a given URL, if possible.
 *
 * For example, when looking for image URLs within an email message body for
 * possible conversion to inline attachments, the following code might be used:
 * @code
 *   if ( !url_is_external($url)
 *     && ($path = url_to_realpath($url))
 *     && file_exists($path) ) {
 *     // Attach $path and replace $url.
 *     ...
 *   }
 * @endcode
 *
 * @param $url
 *   The internal path or external URL being linked to, such as "node/34" or
 *   "http://example.com/drupal/foo".
 * @return
 *   FALSE if $url contains a host/port/path that does not match the current
 *   site, or else an absolute local path such as "/path/to/drupal/node/34".
 *
 * @see url_to_path()
 */
function url_to_realpath($url);
pillarsdotnet’s picture

Status: Active » Needs review
damien tournoud’s picture

Version: 7.x-dev » 8.x-dev

Triaging.

pillarsdotnet’s picture

StatusFileSize
new3.93 KB

@Damien -- Of course I know this is 8.x but the patch applies cleanly to both and I wanted a testbot check.

Revised patch fixes a couple of whitespace errors.

pillarsdotnet’s picture

Version: 8.x-dev » 7.x-dev

Can we please leave at 7.x until the testbot returns something?

Status: Needs review » Needs work

The last submitted patch, url_is_local-1113588-4.patch, failed testing.

pillarsdotnet’s picture

Status: Needs work » Needs review
StatusFileSize
new3.93 KB

Another typo, sigh.

pillarsdotnet’s picture

Version: 7.x-dev » 8.x-dev

Cool. Testbot passed. Once we have a working 8.x testbot, I'll re-roll and re-submit, but for now the code is identical.

joachim’s picture

Status: Needs review » Needs work

Seems a good idea in principle.

Just a few niggles...

+++ b/includes/bootstrap.inc
@@ -381,6 +381,27 @@ function conf_path($require_settings = TRUE, $reset = FALSE) {
+  return ($conf = _conf_path($script_name, $http_host, $require_settings));

Ew.

+++ b/includes/bootstrap.inc
@@ -381,6 +381,27 @@ function conf_path($require_settings = TRUE, $reset = FALSE) {
+function _conf_path($http_host, $script_name, $require_settings = TRUE) {

There's now conf_path and _conf_path? What's the difference?

+++ b/includes/common.inc
@@ -2197,6 +2197,55 @@ function url_is_external($path) {
+ * to the current configuration directory.  Returns a relative URL path if they

Only one space after a full stop.

Powered by Dreditor.

pillarsdotnet’s picture

Version: 7.x-dev » 8.x-dev
Status: Needs review » Needs work
StatusFileSize
new3.97 KB

Suggestions taken.

P.S. -- Joachim: Please help me understand your "ew" response. Standards? Coding style? Documentation? -- Thanks.

P.P.S. --The function should probably be named url_to_path() since it takes a (possibly absolute) URL and returns a local path, or FALSE if no equivalent local path could exist.

pillarsdotnet’s picture

Version: 8.x-dev » 7.x-dev
Status: Needs work » Needs review

Go, testbot!

joachim’s picture

Version: 8.x-dev » 7.x-dev
Status: Needs work » Needs review

ew was to the assigning and returning in the single line. It's quite hard to understand what's going on.

I think this is clearer:

$foo = complex($expression);
return $foo;

although unless $foo was a reference, just this will do:

return complex($expression);

Though the first version makes debugging easier! ;)

bfroehle’s picture

Title: Provide url_to_path() function by re-factoring and re-using code from existing conf_path() function. » Provide url_is_local() function.
Version: 8.x-dev » 7.x-dev
Assigned: pillarsdotnet » Unassigned
Issue tags: -Needs backport to D6, -Needs committer feedback, -Needs backport to D7

A function like url_to_path() would be useful to the Central Authentication Service module.

Currently we have a cas path which redirects to the single sign-on server and then back to Drupal. We add a destination parameter to the return address so we can use drupal_goto() to redirect to a useful page after the login. Using url_to_path(referer_uri()) would be great for getting the destination parameter based upon the HTTP Referrer.

We found that refactoring the CAS code and passing a destination=... query string provided much better results in general. I think including this function in core would only encourage developers to use it.

pillarsdotnet’s picture

Issue summary: View changes

Use issue summary template.

bfroehle’s picture

Status: Needs review » Needs work
+++ b/includes/common.incundefined
@@ -2197,6 +2197,57 @@ function url_is_external($path) {
+  $local = &drupal_static(__FUNCTION__, '');

Shouldn't the default be array(), given the later use of $local?

+++ b/includes/common.incundefined
@@ -2197,6 +2197,57 @@ function url_is_external($path) {
+  $url = drupal_parse_url($path);
+  $http_host = $url[PHP_URL_HOST];
+  if (isset($url[PHP_URL_PORT])) {
+    $http_host .= ':' . $url[PHP_URL_PORT];
+  }
+  $script_name = $url[PHP_URL_PATH];

drupal_parse_url() does not function the same as parse_url(). In particular, for external URLs it does not return the host, port, scheme, etc separately.

Powered by Dreditor.

bfroehle’s picture

Also, what if Drupal is installed in a subdirectory. It's not clear to me that this would return the proper local path.

pillarsdotnet’s picture

Title: Provide url_is_local() function. » Provide url_to_path() function by re-factoring and re-using code from existing conf_path() function.
Status: Needs work » Needs review
StatusFileSize
new4.12 KB

Suggestions implemented and name changed.

pillarsdotnet’s picture

StatusFileSize
new4.18 KB

Found and fixed a bug, and wrapped the return value in drupal_realpath() for good measure.

bfroehle’s picture

+++ b/includes/common.incundefined
@@ -2197,6 +2197,59 @@ function url_is_external($path) {
+ * @return
+ *   A local path, such as "/node/34" or "/foo", or FALSE if $url could not

What's the logic for this return? In particular, why is there a leading slash?

Based upon the name, I would have assumed that
$path == url_to_path(url($path)) for most $paths.

Powered by Dreditor.

pillarsdotnet’s picture

StatusFileSize
new4.22 KB

Better doxygen comments.

The returned value should be directly usable by fopen(), file_exists(), and the like.

Perhaps this should use realpath() instead of drupal_realpath()?

Perhaps a better name would be url_to_local_path()?

pillarsdotnet’s picture

StatusFileSize
new4.17 KB

Did some testing and found I was using parse_url() incorrectly.

Replaced drupal_realpath() with realpath() because I can't imagine a need for stream wrappers with this function.

@#20 -- Almost. Actually, for most paths, url_to_path(url($path)) == realpath('./' . $path)

So maybe this function should be called url_to_realpath()?

pillarsdotnet’s picture

Version: 7.x-dev » 8.x-dev
StatusFileSize
new4.17 KB

Do we have a working D8 testbot now?

pillarsdotnet’s picture

Title: Provide url_to_path() function by re-factoring and re-using code from existing conf_path() function. » Provide url_to_path(), url_to_realpath(), and find_conf_path() by refactoring existing conf_path() function.
StatusFileSize
new2.72 KB
new4.73 KB

Alternatively, why not supply both url_to_path() and url_to_realpath() functions?

damien tournoud’s picture

Title: Provide url_to_path(), url_to_realpath(), and find_conf_path() by refactoring existing conf_path() function. » Provide url_to_path() function by re-factoring and re-using code from existing conf_path() function.
Status: Needs review » Needs work

This needs to take into account the base root directory, Drupal is not always install as the site root.

Also:

+    ? realpath('.' . $script_name) : FALSE;

This looks like DRUPAL_ROOT . '/' . $script_name to me.

pillarsdotnet’s picture

Status: Needs review » Needs work
StatusFileSize
new4.76 KB

@Damien -- I would expect that realpath($path) == DRUPAL_ROOT . '/' . $path for most relative paths, but calling realpath() has the advantage of resolving symlinks to a canonical path.

Attached patch changes substr($path, 1) to ltrim($path, '/') because $path may or may not start with a leading slash.

Here's the goal. If Drupal is installed at http://example.com/drupal and conf_path() is 'drupal/sites/example.com' then:

  1. url_to_path('http://external.com/drupal/foo') returns FALSE
  2. url_to_path('http://www.example.com/drupal/foo') returns 'drupal/foo'
  3. url_to_path('http://example.com/foo') returns 'foo'
  4. url_to_path('drupal/foo') returns 'drupal/foo'
  5. url_to_path('/foo') returns 'foo'

#3 and #5 should perhaps return FALSE, but that would require logic not provided by the existing conf_path().

pillarsdotnet’s picture

Status: Needs work » Needs review
pillarsdotnet’s picture

Issue tags: -Needs tests
StatusFileSize
new4.83 KB

@Damien -- You're right, it only adds one line of code to take base_path() into account.

Revised patch so that if conf_path() is 'drupal/sites/example.com' as before,

  1. url_to_path('http://external.com/drupal/foo') returns FALSE
  2. url_to_path('http://www.example.com/drupal/foo') returns 'drupal/foo'
  3. url_to_path('http://example.com/foo') returns FALSE
  4. url_to_path('drupal/foo') returns 'drupal/foo'
  5. url_to_path('/foo') returns FALSE
  6. url_to_path('/drupal/foo') returns 'drupal/foo'
pillarsdotnet’s picture

Status: Needs work » Needs review
Issue tags: +Needs tests
StatusFileSize
new4.94 KB

I really need to write some tests.

As an aside,

function conf_path($require_settings = TRUE, $reset = FALSE) {
  $conf = &drupal_static(__FUNCTION__, '');

Why drupal_static() rather than plain static? Surely the value can't change within a single page request, can it?

pillarsdotnet’s picture

According to rfay it isn't (currently) possible to test this on Drupal testbots, so removing "needs tests" tag.

See #1117308: It should be possible to test things that depend on base_path() and conf_path()

pillarsdotnet’s picture

StatusFileSize
new4.98 KB

Found and fixed another minor logic error.

catch’s picture

Title: Provide url_is_local() function. » Provide url_to_path() function by re-factoring and re-using code from existing conf_path() function.
Version: 7.x-dev » 8.x-dev
Category: feature » task
Status: Active » Needs review

This needs both profiling of the modified functions, and ideally a comparison of strace output from apache before and after. See #1055862: Require sites.php to opt-in for multi-site support/functionality and #1055856: Optimize settings.php discovery for background.

At a quick look it doesn't look like it would conflict with those patches, but I need to look properly and haven't got time for that this morning.

pillarsdotnet’s picture

Category: task » feature
StatusFileSize
new4.87 KB
new5.05 KB
new5.05 KB

Updated doxygen to acknowledge that realpath( string $path ) does a stat and returns FALSE if the $path does not exist. Also moved url_to_realpath() after url_to_path(). No other changes.

@catch -- No profiling is necessary. The only core change is that part of conf_path() moves to a new function called find_conf_path(). If it kills performance to add just one function call, then I need to switch programming languages.

pillarsdotnet’s picture

catch’s picture

Why is url_to_realpath() in bootstrap.inc compared to common.inc? As far as I can see it's not called anywhere in core.

pillarsdotnet’s picture

@catch -- huh? What are you looking at?

EDIT: -- Okay, I can see where, looking at the issue description and (only) the top of the patch, you'd get confused.

The patch modifies both bootstrap.inc and common.inc.

In bootstrap.inc, most of conf_path() is moved to a new function, find_conf_path(). The patched conf_path() calls find_conf_path().

In common.inc, two new functions are added: url_to_path() and url_to_realpath().

When writing url_to_path(), I needed part of the conf_path() function. I didn't want to copy part of conf_path() into url_to_path(), so I used that part to make a new find_conf_path(), instead.

Better?

pillarsdotnet’s picture

Assigned: Unassigned » pillarsdotnet
StatusFileSize
new5.2 KB

It turns out that I can't use DRUPAL_ROOT in conjunction with base_path().

Updated patch for 8.x:

pillarsdotnet’s picture

Note that the code in this issue has been used by Emogrifier since April 5, 2011:

That's 172 users, according to the project usage statistics.

Does that qualify as "Reviewed and Tested by the Community" ?

pillarsdotnet’s picture

Status: Needs review » Reviewed & tested by the community
pillarsdotnet’s picture

Status: Reviewed & tested by the community » Needs review
Issue tags: +Needs committer feedback

Apparently not, (sigh). Tagging instead.

geerlingguy’s picture

Subscribe, can be useful. Don't have time to test right now though.

pillarsdotnet’s picture

Issue tags: -Needs backport to D6, -Needs committer feedback, -Needs backport to D7

#38: url_to_path-1113588-38.patch queued for re-testing.

pillarsdotnet’s picture

Issue tags: +Needs backport to D6, +Needs committer feedback, +Needs backport to D7

#38: url_to_path-1113588-38.patch queued for re-testing.

damien tournoud’s picture

Status: Needs review » Needs work

Quick review:

- The code style needs to be fixed everywhere (docblocks are not correctly formatted, ternary operators should not go to the line, etc.)
- url_to_path() is buggy as it uses $base_path which is the base path *of the current site* (ie. it is a part of $base_url). The base path could be different for other sites
- url_to_realpath() has very limited added value and should just go away

Other then that, this looks like a useful refactoring to me.

pillarsdotnet’s picture

The code style needs to be fixed everywhere (docblocks are not correctly formatted, ternary operators should not go to the line, etc.)

I'll review and try to fix (I've learned some things since writing that code), but if I fail to see the same things you're seeing, I'd appreciate some specific examples.

url_to_path() is buggy as it uses $base_path which is the base path *of the current site* (ie. it is a part of $base_url). The base path could be different for other sites

Well, that's exactly what it's supposed to do. The object is to answer the question, "Could this url resolve to a file on the current site?" I'm not concerned with other sites that coincidentally share the same server.

- url_to_realpath() has very limited added value and should just go away

Actually, that's the only function that is necessary for my use-case.

Quoting my own post:
The relevant code snippet is found in mailmime.inc:
protected function attachRegex($matches) {
    if ( ($url = filter_xss_bad_protocol($matches[4], FALSE))
      && ($path = url_to_realpath($url))
      && is_file($path)
      && $this->addHTMLImage($path)
    ) {
      // The parent method will replace this with the actual cid: string.
      $matches[4] = $path;
    }
    return implode('', array_slice($matches, 1));
  }n

So if:

then the file is attached.

pillarsdotnet’s picture

StatusFileSize
new5.49 KB

Okay, I've tried to address your concerns, but I don't understand what is wrong about the formatting of my doc blocks. Can you point me to a specific example that violates coding standards?

pillarsdotnet’s picture

Status: Needs work » Needs review
bfroehle’s picture

It is hard for me to imagine other (legitimate) use cases except the one you provide. In other circumstances the developer would often be much better off refactoring their code to avoid this kludge.

pillarsdotnet’s picture

Title: Provide url_is_local() function. » Provide url_to_path() function by re-factoring and re-using code from existing conf_path() function.
Version: 7.x-dev » 8.x-dev
Assigned: Unassigned » pillarsdotnet
Issue tags: +Needs backport to D6, +Needs committer feedback, +Needs backport to D7

In other circumstances the developer would often be much better off refactoring their code to avoid this kludge.

I'm sorry that you find my solution inelegant. Can you suggest an alternative way for me to safely resolve a URL to a file within the current site? Note that unlike the solution @sun provides in his Local image input filter, I want to include the possibility that several domains point to the same Drupal site.

It is hard for me to imagine a case where copying 90% of a core function into module code just to avoid running the other 10% would be preferable to refactoring that core function into reusable parts.

Nevertheless, I am maintaining two such copies, both in Emogrifier and in Mail MIME.

I suppose that I could write an entire module just to supply the missing function, but that seems ... ugly.

pillarsdotnet’s picture

Issue summary: View changes

Acknowledge bfroehle's redaction and add another possible use-case.

lars toomre’s picture

@pillarsdotnet - re: docblock formatting -

My understanding is that is that there should be no blank lines between two or more @param definitions. Also, there always should be a blank line before an @return definition. Finally, the first line of a docblock should begin with an active verb.

Hence, the docblock for find_conf_path() should include 'Finds ' and deletion of the blank line after first @param. The other two docblocks need a blank line before @return.

damien tournoud’s picture

url_to_path() is buggy as it uses $base_path which is the base path *of the current site* (ie. it is a part of $base_url). The base path could be different for other sites

Well, that's exactly what it's supposed to do. The object is to answer the question, "Could this url resolve to a file on the current site?" I'm not concerned with other sites that coincidentally share the same server.

I see.

Well, given that limited use case and the comments above, I recommend only getting in the bootstrap.inc part.

pillarsdotnet’s picture

Issue summary: View changes

Corrected awkward grammar.

pillarsdotnet’s picture

Title: Provide find_conf_path() function by re-factoring reusable code from existing conf_path() function. » Provide url_to_path() function by re-factoring and re-using code from existing conf_path() function.
StatusFileSize
new2.17 KB

Well, given that limited use case and the comments above, I recommend only getting in the bootstrap.inc part.

Fair enough. Re-rolled. (Thanks, Lars!)

Note that it would be nice to consider other sites that coincidentally share the same server, but doing so with a reasonable level of confidence would require (at least) a DNS lookup starting from the global root servers, and possibly some assumptions regarding the webserver and filesystem configuration.

pillarsdotnet’s picture

Title: Provide url_to_path() function by re-factoring and re-using code from existing conf_path() function. » Provide find_conf_path() function by re-factoring reusable code from existing conf_path() function.

Updated title and summary.

damien tournoud’s picture

Title: Provide url_to_path() function by re-factoring and re-using code from existing conf_path() function. » Provide find_conf_path() function by re-factoring reusable code from existing conf_path() function.
Status: Needs review » Reviewed & tested by the community

Ok here.

damien tournoud’s picture

Issue summary: View changes

Updated in response to Damien's suggestion in #52.

pillarsdotnet’s picture

Issue summary: View changes

Linked to bfroehle's redacted comment.

pillarsdotnet’s picture

Issue summary: View changes

Updated to reflect RTBC status.

pillarsdotnet’s picture

Issue tags: -Needs committer feedback

Thanks. Added an API change node (please review -- I don't know what belongs in most of the fields), updated the summary, and removed the "Needs committer feedback" tag.

bfroehle’s picture

In other circumstances the developer would often be much better off refactoring their code to avoid this kludge.

I'm sorry that you find my solution inelegant. Can you suggest an alternative way for me to safely resolve a URL to a file within the current site?

Obviously I was referring to your use in the one "non-other" circumstances. Nonetheless, it is too bad that there isn't another more robust way to accomplish the same thing --- like a custom theme_image which would alter the URL and record that the image should be attached.

pillarsdotnet’s picture

Nonetheless, it is too bad that there isn't another more robust way to accomplish the same thing --- like a custom theme_image which would alter the URL and record that the image should be attached.

Are you referrring to this function?

catch’s picture

Status: Reviewed & tested by the community » Fixed

Committed to 8.x, thanks!

jessebeach’s picture

Status: Fixed » Needs work

I noticed some strange behavior with multisite while installing a fresh site off of d8 head (commit 8157c28).

I have two sites in my sites directory:

d8.drupal.dev
default

When I tried to reinstall at http://d8.drupal.dev/install.php, after dropping the tables in my drupal_d8 database, I got a message from the installer that my site was already installed. On a whim, I changed the name of my default directory to default2 and tried installing. This brought me to the installer.

Looking further, it seems that this patch (commit b93f2bc), might be causing this behavior where the default site is being used in lieu of the more specific site I've indicated.

This is what I see in bootstrap.inc in debug

https://skitch.com/jesse.beach/fh6b1/netbeans-ide-7.0

The for loop is never entered and it jumps right to default:

https://skitch.com/jesse.beach/fh6nf/drupal-dev-d8-netbeans-ide-7.0

The more specific site is not used. The default site settings.php file gets loaded.

pillarsdotnet’s picture

Status: Needs work » Closed (fixed)

@jessebeach:
Please open a new issue and link it here. Also please verify whether reversing the commit patch solves your problem, as I do not see any possible way that your problem could be related to the change made in this issue.

Note that none of the code lines referenced by your screenshots have been changed; they were just moved without modification from one place to another.

EDIT: Retracted -- see next comment, and thanks for the report!

pillarsdotnet’s picture

Category: feature » bug
Priority: Normal » Critical
Status: Closed (fixed) » Needs review
StatusFileSize
new910 bytes

Bah! Now I see the problem. The parameters $script_name and $http_host somehow got reversed. (Scrambling to find/fix collateral breakage...)

ksenzee’s picture

Status: Needs review » Reviewed & tested by the community

That patch fixes the problem for me. I'm wondering how it got past testbot. Do we not have any multisite tests? And if not, how the heck would we write any?

jessebeach’s picture

Status: Reviewed & tested by the community » Needs review

Re #62:

That did it! I'm able to install from the specific site settings.php file now. Thanks @pillarsdotnet!

pillarsdotnet’s picture

Do we not have any multisite tests?

No.

And if not, how the heck would we write any?

Dunno. I don't see any open issues. Feel free to open one.

EDIT: Nevermind. Opened #1294974: PIFR should provide a way to test multisite installations.

catch’s picture

Status: Needs review » Reviewed & tested by the community

I have a feeling we don't and potentially may never have multi-site tests - can someone open a followup issue to confirm that is really true.

I'm heading off so can't commit tonight, but if it's still here in the morning will rollback or commit whatever is here unless webchick beats me too it.

webchick’s picture

Status: Reviewed & tested by the community » Fixed

Cool, committed and pushed #62 to 8.x. I'll be around for the next 12-14 hours if anything else happens.

Thanks for the quick turnaround on the fix, pillarsdotnet!

webchick’s picture

#1295046: Determine testing status of multisite installs (and if it is possible) filed for the follow-up. I didn't know if it should be major or not, so I left it at normal.

Status: Fixed » Closed (fixed)
Issue tags: -Needs backport to D6, -Needs backport to D7

Automatically closed -- issue fixed for 2 weeks with no activity.

Anonymous’s picture

Issue summary: View changes

Added API change node.