GHOP #53: add input filter to convert #12345 into an issue link

dww - October 17, 2006 - 08:11
Project:Project issue tracking
Version:5.x-2.x-dev
Component:Issues
Category:feature request
Priority:normal
Assigned:hunmonk
Status:closed
Description

it'd be nice to have an input filter for the description textfield in a release node that automatically converted something like:

#83339

into:

#83339: make releases real nodes

so, it turns it into a valid link, and also extracts the title of the pointed-to issue and appends that.
this will make it much easier for folks to write decent changelog/description text for their releases, which will help end users.

chx pointed out in IRC that we can add such a filter but restrict it only to project_release nodes by judicious use of form_alter(). basically, the filter would be installed and enabled site-wide, but in project_release.module (assuming that's what provides the filter), we'd just form alter *every* form that comes our way to strip this radio out of the valid options whenever there's an "input format" fieldset, *except* for the project_release node add form...

patch, anyone? ;)

thanks,
-derek

#1

moshe weitzman - December 19, 2006 - 02:30

i think any site running project would want this filter running everywhere. one of trac's finest points is a filter just like this one. that filter is really smart and will show the issue as crossed out if the issue is fixed/closed which we could do too if we use the nocache feature of filtering. but nocache might be a bit controversial on drupal.org due to resources. we should ask steven/gerhard what they think.

#2

Gerhard Killesreiter - December 19, 2006 - 09:33

sounds like a good idea

#3

Gerhard Killesreiter - December 19, 2006 - 09:51

Should have read it all: I don't think a nocache filter is currently a good idea (unless somebody shows benchmarks that it isn't all that bad). The filtered strings are currently stored for 24h, I guess we could live with some delay regarding crossed out issues.

#4

moshe weitzman - December 21, 2006 - 21:35
Project:Project» Project issue tracking
Version:4.7.x-1.x-dev» 5.x-2.x-dev
Component:Releases» Issues

steven reminded me that nocache is for the whole input format so this is basically not solvable within the filter system (real time status). project_issue might be able to solve it itself within the issue system. or we accept 24 hour delay.

#5

moshe weitzman - December 21, 2006 - 21:36

steven reminded me that nocache is for the whole input format so this is basically not solvable within the filter system (real time status). project_issue might be able to solve it itself within the issue system. or we accept 24 hour delay.

#6

dww - December 28, 2006 - 07:33
Category:bug report» feature request

i wonder how this got classified as a bug... i'm pretty sure i submitted it as a feature request originally, and no one seems to have changed this with a followup. weird. must be a bug. ;) (needs a different issue, though)...

#7

moshe weitzman - January 5, 2007 - 18:24
Title:add input filter to convert #12345 into a link to an issue» add input filter to convert #12345 into an issue link and [789] into a commit link

a related filter which is nice would be [x] where x is a commit number as per cvslog module. for example, http://drupal.org/cvs?commit=49762 would be [49762] ... this syntax is open for debate but this is how trac does it, and feels quite natural. yes, it would occasionally mishyperlink stuff but thats a small price to pay for the easy linking when you want it.

#8

mwrochna - December 2, 2007 - 11:04
Assigned to:Anonymous» mwrochna
Status:active» active (needs more info)

I'm working on it as a part of GHOP, tell me about every mistake - it's easier to change bad habits at the beginning :)
The patch includes the implementation of hook_filter('process'):
-should it be a part of project_release? I think project_issue would be more appropriate.
-should only project_issue node ids be turned into links? (isset($node->sid))
-should the title be separated? Usually #1234 turns into a link+title in the middle of a sentence, so i think the title should be put in parentheses or into the link.
-it's disabled by default in input formats - should it be enabled by default? (, how?)
and the main question:
how should the array of striked issue state ids be customizable? An additional checkbox in Administer>>Project administration>>Project issue status options would be the best, but this would involve changing issue.inc: theme_project_issue_admin_states_form() and a new column in table {project_issue_state} (add a project_release_update_5002() to .install) - is it ok?
I don't understand what "chx pointed out in IRC" (in issue description)- should something be done with input formats and forms?

AttachmentSize
project_release-filter.patch2.28 KB

#9

aclight - December 2, 2007 - 17:44
Status:active (needs more info)» patch (code needs work)

Nice work so far! Here are some comments I had regarding the code itself:

  1. +    .......Issues with certain statuses are crossed out');
    needs a period at the end of the second sentence:
    +    .......Issues with certain statuses are crossed out.');
  2. I would change
    +      return array(0 => t('Issue to link filter'));
    to
    +      return array(0 => t('Project Issue to link filter'));
  3. +      return t('Converts references to project issues (in the form of #12345) into links'); needs a period at the end of the sentence.
  4. +        if (!is_object($node) || !isset($node->sid)) continue; does not follow the Drupal coding standards. Specifically, this needs to be broken up into a multiline block enclosed by curly braces. See http://drupal.org/coding-standards for more details.
  5. +        if(in_array($node->sid, $striked)) $link = '<strike>'.$link.'</strike>'; also doesn't follow coding standards.
  6. For +        if(in_array($node->sid, $striked)) $link = '<strike>'.$link.'</strike>';, I think I would break this out into a (very simple) theme function. This allows individual site administrators to override how links to all issues are displayed. In general, it's best to try to avoid adding any HTML outside of theme_functionname() type functions. Project* does not always follow this rule of course, but in this case I think we gain a lot by creating a theme function to handle this (see below).
  7. I think your regular expression needs to be a little smarter. For example, if someone added a URL like http://example.com/node/123#12345 we don't want the #12345 part to be turned into a link. However, we do want people to be able to include a list of project issues like #12345, #123456,#1234567 and have those be turned into links. (see end for a caveat to this comment).

Now, for your specific questions:

-should it be a part of project_release? I think project_issue would be more appropriate.

I think project_issue is the best place to put this.

-should only project_issue node ids be turned into links? (isset($node->sid))

Yes, I think so. However, I think allowing this filter to work from within any node type is what we want. So, for example, a forum posting could have #12345 turned into a link with the title, if 12345 was a node with $node->type=='project_issue'

-should the title be separated? Usually #1234 turns into a link+title in the middle of a sentence, so i think the title should be put in parentheses or into the link.

I think I would recommend:
#89673: add input filter to convert #12345 into an issue link and [789] into a commit link. This is another place where having a theme function might be nice.
So, off the top of my head, I might do something like this:

<?php
function theme_project_issue_issue_link($node) {
 
$output = '';
  if (isset(
$node->sid)) {
   
$class = "project-issue-status-$node->sid";
   
$output .= "<span class=$class>"
   
$output .= l(check_plain($node->title), "/node/$node->nid");
   
$output .= '</span>';
  }
  return
$output;
}
?>

Then, you'd include in your patch changes to project_issue.css that added styles for .project-issue-status-2, .project-issue-ststus-3, etc. I haven't tested this to see if it will actually work, however. I'm not sure that span is the right thing to use here either.

-it's disabled by default in input formats - should it be enabled by default? (, how?)
and the main question:

No, I think disabled by default is what we want. Administrators can enable this if they would like.

how should the array of striked issue state ids be customizable? An additional checkbox in Administer>>Project administration>>Project issue status options would be the best, but this would involve changing issue.inc: theme_project_issue_admin_states_form() and a new column in table {project_issue_state} (add a project_release_update_5002() to .install) - is it ok?

I think my theming suggestion above removes the need to add any additional options in the administrative interface. dww or hunmonk may have a different idea, however, but if you include a theme function that can be overridden by a sites template.php file and/or style.css files, I think we're providing sufficient opportunity for customization by site admins without cluttering up the project_issue admin UI.

I don't understand what "chx pointed out in IRC" (in issue description)- should something be done with input formats and forms?

Don't worry about what he said. I don't think it's relevant since output of this filter will be cachable.

I haven't actually tested this patch out yet. So, my comment about your regexp needing to be smarter could be wrong. I'll try to test it later today, but if you can get an updated patch before I get around to testing it that would be great.

Again, nice start on this--I think this will be a nice feature to have.

#10

hunmonk - December 2, 2007 - 19:06
Title:add input filter to convert #12345 into an issue link and [789] into a commit link» add input filter to convert #12345 into an issue link

@moshe: the commit filter is out of scope for this patch -- i believe if we add that it actually belongs as part of cvs module or it's successor. feel free to open another issue for that as part of cvs.module or versioncontrolAPI.

my vote is also that this work goes in project_issue module. i'll wait for the updated patch and aclight's testing before i dig into this further.

#11

dww - December 2, 2007 - 19:20

@mwrochna: thanks for taking on this task, and for the patch!

@aclight: thanks for the great review and comments. i agree entirely with almost everything you said. my one hesitation is that i'm a little torn about the strike-out functionality only living in the .css at the theme layer, but i suppose that's fine. we already make them chose colors for each row in issue listings based on status via hard-coding it in the theme's .css, so i guess it's reasonable to put this logic in the theme, as well.

@all: I just had a related idea: it'd be nice to give the href a title attribute (for mouse-over) that includes the current status of the issue. So, when you mouse-over one of these crossed-out links, you can see that it's either "closed", "won't fix", "duplicate", etc. Something like:

Title of issue (current_status)

So, for example:

add input filter to convert #12345 into an issue link and [789] into a commit link (patch (code needs work))

Or, maybe that's silly, and the mouse-over should just have the current status? Either way, I think it'd be a nice touch...

@moshe, hunmonk: yes, I agree the commit filter is out of scope, and belongs elsewhere, but it'd be worth doing. mwrochna: if you're interested in working on that, too, we can write it up as another GHOP task. ;)

@all: yes, if this is going to be a global filter for any node types to use, then this code belongs in project_issue. i had originally just imagined it for release nodes, but it's true, this auto-linking could be handy in other places. however, my concern is that we're always referring to comments in issues via "patch in #23 is RTBC", etc. so, we're going to get *a lot* of false links in the issue queues, exactly the place it's going to be the most confusing. :( not sure what to do about this... a few possible ideas:

A) Use [#12345] as the syntax if you want the linking.

B) Another possible alternate: #12345#. Maybe slightly wonky, but no one refers to comments by "just tested #12#, but i get fatal errors...", etc.

C) Have a setting somewhere that controls the minimum nid you need for the issue-linking to trigger. This would work on d.o, where we'd set the minimum to something like 500. Yes, there are a handful of issues in the queue somewhere below that, but if they don't get auto-linking, no big deal. and i don't know of any issue that had 500 replies. ;) so, that'd work for us, but wouldn't work at all on other sites with fewer nodes, and more active issues below around nid 200-500. :(

D) Have the filter only target specific node types, and we disable the auto-linking from within issue nodes. I think this would be a shame, since it'd be handy to have this feature in the issue queue, but I know we're going to get tons of false positives if we don't solve this some other way.

E) Some other bright idea someone else comes up with. ;)

Thanks everyone,
-Derek

#12

mwrochna - December 2, 2007 - 21:27
Status:patch (code needs work)» patch (code needs review)

Thanks for every suggestion!
Periods added, ifs multilined, moved to project_issue, themed.
I think the regex does what it should (back- and forward-asserts there's no word char, asserts string isn't directly in <a>), it won't match issues that are already linked.

I'll be happy to do the [commit] links too, not necessarily as another GHOP task (I guess it's just a copy-paste and some small changes).
@dww
A) I think [#1234] is the best solution, though potentialy good matches in old comments won't change.
C) I also thought about a minimum nid setting, it's easy and it would prevent the most common #1 mismatches.
E) Maybe a #1234 link, which will show the title and status only in an on-hover:visible, floating css box ;) ? (that's probably not to everybody's taste, but similar boxes work well on my forum)
P.S: a.title=status added.
P.P.S: minlink.patch for C) added

AttachmentSize
project_issue-filter-minlink.patch1.8 KB
project_issue-filter.patch3.31 KB

#13

aclight - December 2, 2007 - 20:56

Or, maybe that's silly, and the mouse-over should just have the current status? Either way, I think it'd be a nice touch...

I think just having the status in the href title attribute is the best. Otherwise it might be too long, if the issue title is long, and it will be truncated by some browsers.

however, my concern is that we're always referring to comments in issues via "patch in #23 is RTBC", etc. so, we're going to get *a lot* of false links in the issue queues, exactly the place it's going to be the most confusing.

Can't we just change:

+        if (is_object($node)){

to this:
+        if (is_object($node) && $node->type=='project_issue'){

That'll prevent your example from being a link, right? We'd need to add another line to the code to return the unfiltered text if this isn't true though, I think. If we're checking for node type in project_release_filter() then we can take that same check out of the theme function.

I just got a new computer this weekend and am still getting my development environment set up. Hopefully I'll be able to test this tonight.

Again, great work so far.

#14

dww - December 2, 2007 - 21:23
Status:patch (code needs review)» patch (code needs work)

@aclight: Thing is, for example, both http://drupal.org/node/8 and http://drupal.org/node/9 are valid issue nodes (and #8 is still open, in fact!). :( So, anytime someone refers to "#8 is RTBC" in an issue, #8 will become a link. :( Behold:

mysql> select count(*) from node where nid<50 and type='project_issue';
+----------+
| count(*) |
+----------+
|       22 |
+----------+

mysql> select count(*) from node where nid<100 and type='project_issue';
+----------+
| count(*) |
+----------+
|       51 |
+----------+

mysql> select count(*) from node where nid<200 and type='project_issue';
+----------+
| count(*) |
+----------+
|      117 |
+----------+

As you can see, there are a *lot* of issue nodes on d.o in the under 200 nid-space. :( So, restricting to issue nodes doesn't help us. We either need a different syntax for the auto-linking (which is probably best), or a threshold, or something else.

[#12345] seems good to me, but then i'm concerned about the syntax to use for the cvs commit ids. maybe [cvs:12345] ? Kinda off topic for this thread, but we might as well have a clear plan for both so there's no conflicts/confusion.

Patch needs work:
A) Some functions are still named project_release_*
B) Needs a solution to the above problems, probably [#12345]

Thanks,
-Derek

#15

dww - December 2, 2007 - 21:30

@mwrochna: Please just add new comments, instead of editing old ones to add more patches or text. This makes it easier to follow what's going on and see what changed. Thanks.

#16

dww - December 2, 2007 - 21:34

Oh, and I don't think minlink is a good solution to this, honestly. It's not going to work well at all for most other sites, and it'll be annoying when you actually do want to link to nid #8. ;) I think [#8] would be better.

Speaking of which, I didn't check, but does your regexp correctly ignore stuff already inside certain HTML tags? For example, stuff wrapped in <code> tags (as above) should never be converted. I think you mentioned it already ignores <a>. It should probably also ignore <pre>, too...

Thanks,
-Derek

#17

aclight - December 2, 2007 - 22:46

@dww: ah...I see your point. Sorry for being dense :)

In that case, I agree that something like [#12345] is best.

#18

mwrochna - December 2, 2007 - 23:01

Changed project_release_* to project_issue* (remember to recheck your input formats)
Changed format to [#1234], I think everybody agreed.
Oh, <code> and <pre> tags... I added a workaround (it forward-asserts the next '<' isn't '</code>', just like how it did with </a>) - but this will match anything like
<code>[#1234] ... < ... </code>.
I don't know how this could be improved, I think it would be simpler (and faster) for the <code> filter to prepare and process things, so that they couldn't be changed by other filters.

Should $restrict be set to true when calling project_issue_state()?

AttachmentSize
project_issue-filter2.patch3.35 KB

#19

webchick - December 3, 2007 - 07:55
Title:add input filter to convert #12345 into an issue link» GHOP #53: add input filter to convert #12345 into an issue link

#20

aclight - December 3, 2007 - 12:26
Status:patch (code needs work)» patch (code needs review)

@mwrochna--make sure to change the status of the issue back to code needs review when your patch is ready for review. This helps get eyes on the issue and helps us keep track of things better.

#21

aclight - December 3, 2007 - 16:03
Status:patch (code needs review)» patch (code needs work)

Nice work. Here are some comments on the code:

  1. In project_issue.css, you're setting postponed (sid=4) to be the same style as fixed, closed, etc. I think it should be displayed like active, cnw, etc. since a postponed issue isn't really closed--it's just postponed.
  2. +  if ($node->type=='project_issue') {
    should have spaces on each side of the '==' as per drupal coding standards.
  3. It occurs to me that this filter is opening a potential access elevation if users who are not permitted to view issues (or all issues) can view, say, a forum post where there is a reference to an issue. They wouldn't be able to see the actual issue, but would be able to see if an issue with a certain nid existed and what the title and status of the issue was. If I'm missing some access checking somewhere, let me know. The best way to handle this is probably to change:
    +        if (is_object($node)) {

    to:
    +        if (is_object($node) && project_issue_access('view', $node)) {
  4. Remove the leading '/' in the URL for the link here:
    +                 "/node/$node->nid",
  5. There's a problem with filter ordering here. If I put the project_issue filter in this patch to be at the top of my filters, so that the order, from lightest to heaviest, is:
    1. Project Issue to link filter
    2. URL filter
    3. HTML filter
    4. Line break converter

    then I get the results seen in the attached screenshot named project_issue_filter_light.png. Issue #329 is fixed, but is not crossed out. I think this is because is not an allowed tag in the filtered HTML filter, and so it's getting taken out when the HTML filter filters the text after the project issue filter adds the span elements.

    On the other hand, if I move the project issue filter to be the heaviest, so that my filter order is:

    1. URL filter
    2. HTML filter
    3. Line break converter
    4. Project Issue to link filter

    then the results are shown in the attached image named project_issue_filter_heavy.png. In that case, the span elements come through, but now a reference to an issue within a <code></code> block is also made into a link, which is not what we want. See below for more on this.

Should $restrict be set to true when calling project_issue_state()?

Probably. I don't see how this would hurt anything. dww?

One other thing (related to the filter ordering problem I mentioned above):
I think the problem where references to issues within <code></code> tags getting highlighted is that the line break filter inserts <br /> tags, and your regexp isn't ignoring those. I'm not quite sure how to ignore them, but I would guess it's possible.

For example, here's the $text variable on my test page before it's processed by preg_match(). Note that I'm using the php filter instead of the code filter here so that I don't have to escape some of the tags. Also note that the code below doesn't quite match the screenshots, but I didn't feel like remaking the screenshots:

<?php
<p>issue [#329] should be linked<br />
issue #329 should not be linked<br />
issue [#329a], [abc#329], [#32abc9] should not be linked<br />
issues [#329],[#217],[#214] should all be linked<br />
issues [#329][#217][#214] should all be linked<br />
project release [#326] should not be linked since it's not an issue<br />
it would be cool if [#204#comment-1133] was also linked but it's probably not<br />
maybe [#329,#217,#214] should be linked?  probably not<br />
<div class="codeblock"><code>&lt;br /&gt;issue #329 should not be linked&lt;br /&gt;issue [#329] should not be linked because it's inside code tags&lt;br /&gt; // This does not work and is linked</code></div><br />
<code>issue [#329] should not be linked because it's inside code tags</code><br />  // This works and is not linked
this time it <code>should be linked because the issue reference is outside the code tags</code>[#329]<br />
<a href="">this time it should not be linked  [#329] because it's in an &lt;a&gt; tag</a></p>
?>

Let me just say that this filter is really sweet. This is going to be really nice to have, especially if it's installed and enabled on d.o.

AttachmentSize
project_issue_filter_heavy.png12.72 KB
project_issue_filter_light.png12.47 KB

#22

dww - December 3, 2007 - 17:53

Re: $restrict for project_issue_state() -- no you don't really want that, and yes, it would "hurt" since it complicates the job that project_issue_state() has to do, costing some additional performance load. That option just says "should the array of choices passed back be restricted to the issue states the current user is allowed to use?". In our case, we don't care about that, we just want the name of the current state.

Keep on rocking, this is looking great. Can't wait to deploy it on d.o. ;)

#23

mwrochna - December 3, 2007 - 23:15
Status:patch (code needs work)» patch (code needs review)

I'm not sure why some of the <code>ed ids got linked, but this patch (3) should help.
(Now it forward-asserts there's no match for: any non< chars + any non<code> tags, and then a '</code>').
Sorry if it's still wrong, everything appears correct on my install.

Added support for [#204#comment-1133] - maybe these should look shorter?

by the way - shouldn't issue.inc: project_issue_access('view') return TRUE when user_access('access project issues') is true? (it just break;s and returns nothing instead)

AttachmentSize
project_issue-filter3.patch3.61 KB

#24

dww - December 3, 2007 - 23:29

re: project_issue_access('view') -- no, it most definitely should not return TRUE. That'd clobber all other access control modules (e.g OG, node_access_by_role, etc, etc). By returning NULL, it means "i don't care what happens, let other modules that want to reject it do so". if no one returns FALSE, and the user has permission to view content at all, it will be allowed. for example, if we returned TRUE, it'd mean that issues posted to private groups wouldn't be private for anyone with 'access project issues' permission.

sorry, no time to review this closer now, just wanted to answer your question...

#25

aclight - December 3, 2007 - 23:44
Status:patch (code needs review)» patch (code needs work)

Ok--you've fixed all of the concerns I raised in #21. Unfortunately, I think I wasn't thinking through the access elevation issue clearly--we want to use node_access(), not project_issue_access(). Sorry about that.

While you're re-rolling the patch, you might as well go ahead and add a bit of documentation about the filter and mention that the weight of the filter needs to be heavier than that of the HTML filter. It might even be wise to imple hook_requirements() to make sure that this is the case. If you want an example of how to do this, I know that the GeSHi filter module implements hook_requirements() for this purpose. Of course, you'd want to check in hook_requirements to make sure that the project issue input filter is enabled, because otherwise there's no need to produce an error message.

A few sentences of documentation can go in INSTALL.txt.

After you reroll the patch I'll test it one last time, but this is looking very good. great work!

#26

moshe weitzman - December 3, 2007 - 23:50

is this supposed to be a cacheable filter? if so, i don't see how it can be compatible with node_access() modules. whomever happens to view the text when it is evaluated will determine the access control that gets cached.

if we make it non cacheable, it negates all caching for the whole input format so you make this whole effort virtually worthless from a performance perspective. sorry to bear bad news.

i think we need a big status report warning if a node access module is found ... we should still use node_access as dww suggests since some admins are ok with the problem i describe.

#27

mwrochna - December 4, 2007 - 19:21
Status:patch (code needs work)» patch (code needs review)

regex matching: I wrote hook_requirements(), but when I tested it turned out that (after the regex change in (3)?) both weight arangments work. I left the function in the patch in case I'm wrong, but if it works for you too, then it should be removed.

documentation: I only mentioned about enabling the filter in INSTALL.txt. Should I add something in UPGRADE/README/filter_tips?

security: If we removed node_access() any user could check all issue titles by previewing comments. Now (with node_access() used) in the majority of cases, when an author will write [#1234] in a node, he will cache it and make it's title available to all who read the node - I think it's not worth a mention, just like if he used the title directly. Links will be sometimes randomly disabled by users who can see the comment, but not the issue - but I don't think this happens in reality. There's still a little probability that a user could post an unauthorized [#1234] and wait until some admin recaches it, but again - imho it's not worth warning about it.

AttachmentSize
project_issue-filter4.patch5.33 KB

#29

aclight - December 4, 2007 - 22:40
Status:patch (code needs review)» patch (code needs work)

It looks like you're creating this patch against project_issue.module v. 1.38.2.5. Ideally, you should be creating this against HEAD, which is at 1.73. The patch still applies, though with fuzz:

File to patch: project_issue/project_issue.module
patching file project_issue/project_issue.module
Hunk #1 succeeded at 948 with fuzz 2 (offset 74 lines).

I tested your most recent patch and you are right--it doesn't seem like the order of the filter matters any more. So I think it's safe to remove the hook_requirements(). Sorry about that!

Moshe reminded us of a good point above. The way you've written this so far is to allow the results of this filter to be cached. That's good performance wise, but as Moshe said this won't really work if a node access module is installed. I don't think this will be a problem on d.o, since as far as I know there are no node access modules installed, and all users can view project issues. However, certain sites might want to restrict viewing of project issues to certain users, and this filter needs to respect this, or at least make it pretty clear when it isn't respecting this.

So, here's my suggestion:
A. Create a filter setting that determines if the results of the filter are cached or not. In the description, it needs to be made clear that filter caching is by input format, not by filter. In other words, if the project issue filter is part of the Filtered HTML format, all text that uses the Filtered HTML format will NOT be cached. I think the default of this filter should be NOT to cache. The description should also mention that sites using node access modules should not cache because they will not get the desired results.

B. If caching is turned on, and if the site runs a node access module, I think we should implement hook_requirements and return a message of REQUIREMENT_WARNING level suggesting that this is not a good idea because:
a) Results will be unexpected
b) Issue titles will be revealed to users who may not otherwise have access to the issues. However, users will not be able to read or otherwise access the issues. Instead, they will get an "Access denied" page.

So, if we're going to do B (other suggestions are welcome, of course), that brings up the question of how we know if any node access control modules are enabled. I don't know how to check this, though. Anyone else?

#30

moshe weitzman - December 5, 2007 - 01:08

it occurs to me now that we can just change our filter declaration based on whether we see any node access module. here is some code for hook_filter('no cache').

<?php
if (empty(module_implements('node_grants'))) {
  return
TRUE
}
else {
  return
FALSE;
}
?>

#31

aclight - December 5, 2007 - 02:09

@mwrochna: I chatted with dww briefly about this today in #drupal-ghop, and he is behind the general plan in #29 and #30. We agreed that moshe's suggestion is much simpler than my suggestion in #29 B, so let's go with that. Let's add some help text that makes it very clear that any site with a node access control module that uses this filter will give up caching on any node with a content type that includes the filter. But I don't expect this to be a huge problem.

#32

mwrochna - December 5, 2007 - 18:53
Status:patch (code needs work)» patch (code needs review)

Updated from v1.38 to HEAD.
When testing the order I forgot the problem was about <span>, not <code>, so the requirement is necessary.
Added moshe's code, a REQUIREMENT_WARNING with same condition and a note in $op='description'.
I understand that issue information is never restricted when there is no hook_node_grants() (because there's no project_issue_access()), yes?

AttachmentSize
project_issue-filter5.patch6.28 KB

#33

aclight - December 5, 2007 - 23:15
Status:patch (code needs review)» patch (code needs work)

A. moshe's code is reversed. We really want this:

    case 'no cache':
      $grants = module_implements('node_grants');
      if (empty($grants)) {
        return FALSE;
      }
      else {
        return TRUE;
      }

The double negatives are confusing.

B. We still have a problem now if not all users can view project issues. In a sense, project_issue acts like a node access control module with respect to project_issue nodes, but since it doesn't implement hook_node_grants() the input filter will be cached if there are no node access control modules but if all users can't view all project_issues.

So, I initially thought that we could just cache the filter results if anonymous users have access project issues permission. But then I thought that this will be a little too restrictive, because we really only need to not cache the filter results if the access to any project_issue node is more restrictive than access to any node where the filter could be used. So, for example, we'd be fine caching the filter results if anonymous users did not have access nodes privs. but authenticated users had both access nodes and access project issues privs.

Since this is getting pretty complicated, I would be fine if you just added a simple check in the 'no cache' operation that checked to see if anonymous users had 'access project issues' privs. If so, and if there are no node access control modules installed, then the filter results can be cached. Otherwise, they are not.

C. I installed simple_access (a node access control module) after I installed and configured the project issues input filter. However, I don't think that project_issue_filter() was called with $op='no cache' until I clicked the "Save configuration" button at admin/settings/filters/1.

I'm not sure if it's possible to do this, but we probably want to force the 'no cache' operation to be called whenever a module is enabled or disabled so we make sure we're doing the right thing regarding caching.

#34

moshe weitzman - December 6, 2007 - 03:08

A: oops

B: there can be no access control like 'access project issues'. if it exists, it is a bug. no core node types have such a permission because it is incompatible with the node access control system. thus B is moot (but we might have uncovered a different problem) ... i am not too familiar with project codebase so sorry for the high level POV.

C: good catch. i would never have thought of that.

#35

aclight - December 6, 2007 - 03:17

@moshe:

B: there can be no access control like 'access project issues'. if it exists, it is a bug. no core node types have such a permission because it is incompatible with the node access control system. thus B is moot (but we might have uncovered a different problem) ... i am not too familiar with project codebase so sorry for the high level POV.

Maybe they shouldn't be there, but they're there....

From project.module

<?php
function project_perm() {
 
$perms = array(
   
'administer projects',
   
'maintain projects',
   
'access projects',
   
'access own projects',
  );
  return
$perms;
}
?>

and from project_issue.module:

<?php
function project_issue_perm() {
 
$perms = array(
   
'create project issues',
   
'access project issues',
   
'edit own project issues',
   
'access own project issues'
 
);
 
$states = project_issue_state();
  foreach(
$states as $key => $value) {
   
$perms[] = "set issue status ". str_replace("'", "", $value);
  }
  return
$perms;
}
?>

#36

dww - December 6, 2007 - 04:37

@moshe: and they've been around a *long* time. :( and, coincidentally, something of a giant pain in the neck: http://drupal.org/node/168760

#37

mwrochna - December 6, 2007 - 16:33

A. Oops, sorry, corrected.
B. So.. do I add || !node_access('access project issues', drupal_anonymous_user())?
C. 'no cache' are only called by filter.module:filter_admin_format_form_submit(), then the & result is saved into the database per input format. Anyway, there's no callback function when enabling a module or anything, so we'd have to check often or modify the core. I added a check and a REQUIREMENT_ERROR.
Maybe adding a setting for caching and informing about it would be better?

Tomorrow and the day after I'm away, sorry for any delay.

AttachmentSize
project_issue-filter6.patch7.24 KB

#38

mwrochna - December 6, 2007 - 16:43
Status:patch (code needs work)» patch (code needs review)

#39

moshe weitzman - December 6, 2007 - 18:23

B: there is no right way to do this beyond ripping out the insecure permission 'access project issues' and a'ccess projects' and any code that depends on it. and then warn any sites when the release comes out that the broken feature that they are depending on is no longer available and they should install content_access() or similar instead.

we need to involve dww at this point.

#40

dww - December 6, 2007 - 20:27

removing these perms is *WAY* outside the scope of this issue. i don't see why you call them "insecure", either. they're just additional restrictions. no where does hook_access() return TRUE, subverting other node access modules. they're just another way to return FALSE...

that said, we could certainly consider removing them at some point, just not now...

and, *that* said, i'm fine with cutting corners a little in this patch if the edge cases caused by these perms are causing grief. something like don't cache the filter if (!user_access('access project issues', array('uid' => 0))).

i really don't have much time to ponder this more, but i wanted to share my initial impressions.

#41

moshe weitzman - December 6, 2007 - 20:42

i call them insecure becuase anyone can see project_issues in listings like home page, search_results, etc regardless of their 'access issues permission. Thats misleading on a very important point.

I agree it isn't especially in scope here.

#42

dww - December 6, 2007 - 20:56

@moshe: not true. behold project_db_rewrite_sql(). That's exactly what the SA I pointed you to above was about.

#43

aclight - December 9, 2007 - 01:45

dww and I talked about this again on IRC and came up with the following plan:

1. Add a setting in the filter where the user can choose whether the filter results should be cached or not. So, for example,

<?php
   
case 'settings':
     
$formats = filter_formats();
     
$form['project_issue'] = array(
       
'#type' => 'fieldset',
       
'#title' => t('Project Issue to link filter'),
       
'#collapsible' => TRUE,
       
'#collapsed' => FALSE,
      );
     
$form['project_issue']['project_issue_filter_nocache'] = array(
       
'#type' => 'radios',
       
'#title' => t('Filter caching'),
       
'#default_value' => variable_get('project_issue_filter_nocache', 'prevent'),
       
'#callback' => 'project_issue_filter_settings_submit',
       
'#options' => array(
         
'prevent' => t('Prevent caching'),
         
'allow' => t('Do not prevent caching')
          ),
       
'#description' => t('If you are using any node access control modules on your site,
          <em>or</em> if anonymous users on your site do not have access to view project issues,
          you should choose "Prevent caching". Otherwise, users may have the ability to see the
          titles of project issues that they do not otherwise have access to.
          <p><strong>Please note the following:</strong>
          <ol>
            <li>If you prevent caching any input format in which the Project issue to link filter is enabled
            will not be cached.  This could decrease performance on your site.</li>
            <li>If you choose "Do not prevent caching", output of this input format will not be cached
            if the other enabled filters for this input format prevent caching.</li>
            </ol>'
, array('%input_format' => $formats[$format]->name))
      );
      return
$form;

    case
'no cache':
     
$nocache = variable_get('project_issue_filter_nocache', 'prevent') == 'prevent';
      return
$nocache;
?>

2. Use hook_requirements() to warn the administrator in the following circumstances:

if (!nocache)
  // see if anonymous users have 'access project issues' perms
  $allowed_roles = user_roles(FALSE, 'access project issues');
  if (!isset($allowed_roles[DRUPAL_ANONYMOUS_RID])) {
        --> warning <--
  }

  // see if there are any node access control modules enabled
  $grants = module_implements('node_grants');
   if (!empty($grants)) {
       --> warning <--
    }
}

The warning would advise the administrator to prevent caching for the project issue input filter.

I spent some time trying to figure out how to do this, and I don't think it's even possible.

hook_filter() is only called with $op == 'no cache' when the settings for the input format (admin/settings/filters/#) are saved. When the configuration for the input filters is saved (admin/settings/filters/#/configure) 'no cache' is not reevaluated. I can't figure out a w ay to do this, either. So, right now, I can't think of a way to make the filter caching conditional based on a filter configuration option.

Any suggestions?

@mwrochna: Unless there are suggestions about how to address this, why don't you concentrate on getting the filter to work properly (in other words, the regular expressions). Here's the problem:
I hadn't tried enabling the codefilter module before, so I did that this time, and it also conflicts with the project_issue filter. The project_issue filter needs to be weighted below the codefilter filter, or otherwise issue references within code tags are linked and references between &lt?php ?> tags are not made into links but instead the HTML for the link is displayed. I believe this is because the codefilter module prepares text it processes by replacing <code> and <?php tags with \xFEcode\xFF and \xFE?php tags. So we either need to fix this in the regexp or add another requirement that project_input filter be the heaviest filter. An alternative might be to respond to the 'prepare' op and replace [#XXXXX] project issue references with \xFE#XXXXX\xFF references. Then, in process, if the reference isn't valid or otherwise isn't converted into a link switch it back to [#XXXXX]. I haven't thought this idea through carefully so there could be reasons that this would be a bad idea.

Also, it's Drupal coding convention to make all comments true sentences (unless they are really short or it doesn't make sense to make them a sentence). So, capitalize the first letter, and put a period at the end.

If we can't figure out a way to get the caching to work properly with node access control modules, etc., then for the purposes of the GHOP task I think we'll just say make the patch for a filter that never caches and you'll get credit for that. But hold off on changing any of the caching stuff yet--let's see if anyone has good ideas for how to get around this problem.

#44

moshe weitzman - December 9, 2007 - 02:32

once again, a filter which is non cacheable will never get installed on drupal.org or any other performance sensitive site.

basically, i think this module should cache by default and if a node access module is recognized, throw a status report warning. i don't even think we should implement a non cache option. it is worthless.

#45

dww - December 9, 2007 - 02:38

@moshe: i mostly agree. however i disagree with the final point: not every site running project_issue has the scalability concerns that d.o itself has. for example, i could imagine wanting this feature, even without caching (since we have a bunch of node privacy stuff there) on sec.d.o. therefore, i think there's value in an option to enable/disable caching with clear "more accurate for sites with access control and fresher status/title results" vs. "actually scales well but only works in simpler configurations" sort of choices/instructions.

but, fundamentally i agree that caching should be on by default, and we should just generate a warning if we think we detect a configuration that doesn't play nicely with caching.

#46

aclight - December 9, 2007 - 03:47

@moshe: I agree with you that a non-cacheable filter has no chance of being installed on d.o. But at this point, I'm less concerned about whether this can be installed on d.o but more with getting to a point where we can call this ghop task completed so mwrochna can start working on another (hopefully Drupal) task.

Unless I'm missing something, there's really no clean way to allow the user to select whether or not to make the filter results cacheable. Drupal just doesn't have the necessary hooks, etc. to do this in an intuitive way.

Here's an idea--why don't we create two filters. Project issue links (cached) and Project issue links (not cached). An administrator would enable one, but not both, of the filters. The two filters would use almost the same code--only the 'may cache' op would be different. This way it's clear to the user if caching is happening, and it's easy to effectively turn caching on and off. It's not perfect, because ideally we would need only one filter to do this, but this is the best way I can think of to get what we want without going to great lengths to get there. What do others think?

#47

dww - December 9, 2007 - 03:50

@aclight: I guess that's reasonable. However, I'd hate to see a big function forked to change a single line in 1 case. Some care would need to go into how to share the code as much as possible, given the constraints core places on us with the somewhat whacky API we're running into. :(

#48

aclight - December 9, 2007 - 04:27
Status:patch (code needs review)» patch (code needs work)

@dww: There's no need for function forking. Here's an example of what the two functions would like. I haven't tested the filtering itself to make sure that it works as expected, but I think it should.

<?php
/**
* Implementation of hook_form_filter()
* @ingroup project_issue_filter
*/
function project_issue_filter($op, $delta = 0, $format = -1, $text = '') {
  switch (
$op) {
    case
'list':
      return array(
       
0 => t('Project Issue to link filter (not cached)'),
       
1 => t('Project issue to link filter (cached)'),
      );

    case
'no cache':
      switch (
$delta) {
        case
0:
          return
TRUE;
        case
1:
          return
FALSE;
      }

    case
'description':
      switch (
$delta) {
        case
0:
          return
t('Converts references to project issues (in the form of [#12345]) into links. Use this filter when using node access control modules or if access to project issues is not given to all users (both anonymous and authenticated).');
        case
1:
          return
t('Converts references to project issues (in the form of [#12345]) into links. Use this filter if no node access control modules are used on your site and if all users can access all project issues.');
      }

    case
'prepare':
      return
$text;

    case
'process':
     
$regex = '(?<!\w)(\[#(\d+)(#comment-(\d+))?\])(?![^<]*<\/a>|([^<]|(<(?!code>)))*<\/code>|([^<]|(<(?!pre>)))*<\/pre>|\w)';
     
$offset = 0;
      while(
preg_match('/'.$regex.'/', $text, $preg_matches, PREG_OFFSET_CAPTURE, $offset)) {
       
$offset++;
       
$match = $preg_matches[1];
       
$nid = $preg_matches[2][0];
       
$comment_id=$preg_matches[4][0];
       
$node = node_load($nid);
        if (
is_object($node) && node_access('view', $node)) {
         
$link = theme_project_issue_issue_link($node, $comment_id);
         
$text = substr_replace($text, $link, $match[1], strlen($match[0]));
         
$offset = max($offset, $match[1]+strlen($link));
        }
        else {
         
$offset = max($offset, $match[1]+strlen($match[0]));
        }
      }
      return
$text;
  }
}

/**
* Implementation of hook_requirements()
* @ingroup project_issue_filter
*/
function project_issue_requirements($phase) {
 
$requirements = array();
 
$input_formats = array();
  if (
$phase == 'runtime') {
    foreach (
filter_formats() as $format => $input_format) {
     
$filters = filter_list_format($format);
      if (isset(
$filters['project_issue/0']) && isset($filters['project_issue/1'])) {
       
// Put up an error when both Project issue link (cached and non-cached) are enabled.
         
$incorrect_formats['!input_format'] = l($input_format->name, "admin/settings/filters/$format");
         
$requirements[] = array(
           
'title' => 'Project Issue to link filter',
           
'value' => t('Filter conflicts were detected.'),
           
'description' => t('The Project issue to link (not cached) and Project issue to link (cached) filters are both enabled for the !input_format input format.  Only one Project issue to link filter should be enabled for any input format.', $incorrect_formats),
           
'severity' => REQUIREMENT_ERROR,
          );
        }
       
//error when HTML filter's weight is higher
        // @todo: look for position of both filter/0 and filter/1
        // @todo: may need to check position with respect to other filters such as codefilter
       
if (isset($filters['filter/0']) && $filters['project_issue/0']->weight <= $filters['filter/0']->weight) {
         
$description_names['%issuefilter'] = $filters['project_issue/0']->name;
         
$description_names['%htmlfilter']  = $filters['filter/0']->name;
         
$requirements[] = array(
           
'title' => 'Project Issue to link filter',
           
'value' => t('Some filter conflicts were detected.'),
           
'description' => t('%issuefilter should come after %htmlfilter to prevent loss of layout and highlighting.', $description_names)
               .
' '.l(t('Please rearange the filters.'), "admin/settings/filters/$format/order"),
           
'severity' => REQUIREMENT_ERROR,
          );
        }
      }
  }
  return
$requirements;
}
?>

#49

moshe weitzman - December 9, 2007 - 13:24

well, if we are going to tr to support both cacheable and non cacheable then i think we should do so dynamically. we can just add a submit handler (this is a form api feature) to the modules form which checks for node access and if found, calls a 'rebuild filter formats' function, whatever that function may be.

#50

aclight - December 9, 2007 - 16:35

@moshe: Though making the caching/non-caching status dynamic would be nice, I think the logic to do so will be to complicated and error prone. Quoting myself from comment #33:

So, I initially thought that we could just cache the filter results if anonymous users have access project issues permission. But then I thought that this will be a little too restrictive, because we really only need to not cache the filter results if the access to any project_issue node is more restrictive than access to any node where the filter could be used. So, for example, we'd be fine caching the filter results if anonymous users did not have access nodes privs. but authenticated users had both access nodes and access project issues privs.

Even if we could cleanly work the logic out with regards to when caching should be enabled, I'm not sure how we'd do what you mention above. There's not really a rebuild filter formats function--it's just a submit handler. So I think we'd have to call the form building function and then pass that off to the submit handler. Isn't that considered hackish?

As for adding a submit handler to the modules page, I'm not sure how to do that, so I can't really speak to how practical that suggestion is.

#51

dww - December 9, 2007 - 21:46

Adding the submit handler to the modules page is easy. I don't know about the rest of this, since I don't know the filter system that well and I haven't spent the time really groking this issue.

#52

mwrochna - December 10, 2007 - 20:18

The regex will work correctly if all code-escaping filters output a <code> or <pre> before the regex acts (tags like <php do so, when they have a lower weight). Trying to make the regex more general (ex. to permit some filters to have a higher weight) will just make it slower and help in few cases. I added an array ($low_filters) of code-escaping filters (filter/0,codefilter/0,filter/1), an error will be put when one of them has a higher weight. Should I add every appropriate filter to $low_filters?

ad #49-51: I tried to do this dynamically before, but as aclight said, there's no clean way to 'rebuild filter formats', we'd have to call the form building function and call the input_format's form submit handler, or change the database manually. I ended up putting a requirement_error redirecting to this form (in patch 6).

I think it's a good idea to give the user the choice (by 'forking' with $delta) - many admins prefer publishing all issue titles and statuses (still hiding the issue's content) to disabling cache.

AttachmentSize
project_issue-filter7.patch7.42 KB

#53

aclight - December 10, 2007 - 22:30

I think I prefer that we create two separate filters (one cached, one not cached) as I suggested in #48. But ultimately it's going to be dww or hunmonk that will have to make the call. I think there are arguments for both sides, but I think it just gets too complicated if we try to have the caching be determined dynamically by the code itself.

If you're in a hurry to get this closed, you might want to create two separate patches, one that implements #48 and one that implements what was suggested in #49 and #51. Unfortunately, I can't be of much help with regards to how to implement #49/#51, since I'm not very familiar with FAPI.

If you're not in so much of a hurry, then I guess you should just wait and see what dww and/or hunmonk think.

As for the regexp stuff, I think it's fine as is. I think it's fine to just require that the filter be weighted heavier than other code filters.

#54

hunmonk - December 11, 2007 - 16:21

my feeling is that we've entered territory that is out of scope for the original task, which was to write the filter. these other considerations are important in terms of getting this code properly committed to project*, but i feel that this is something that the project* maintainers should work out.

so what i'd like to propose is this: implement the no-cache filter in the drupalorg.module for now. this accomplishes three important things:

  1. it allows mwrochna to actually finish this task :)
  2. it allows the drupal community to immediately benefit from his work.
  3. it allows the project* maintainers to work out a more complete solution over the long haul, without anyone feeling like we're committing a hack. when we figure out a best implementation, we can move it from drupalorg.module to project_issue, or wherever it belongs.

i'd like dww to weigh in on this idea before we go forward with it. if he's ok with it, then i'll post some more details about how exactly to implement/test this functionality in drupalorg.module

#55

dww - December 11, 2007 - 17:00

I'm fine with splitting it. However, I think hunmonk means "implement the caching-version of the filter now", and let's worry about the no-cache configurations later. The drupal community can't benefit from a no-cache filter...

Thanks, all.

#56

hunmonk - December 11, 2007 - 17:08

right, sorry. caching filter it is.

@mwrochna: i'm going to let aclight fill you in on how to implement this in drupalorg.module, since he's more familiar w/ the