words being replaced in links...

Bartezz - December 8, 2008 - 13:07
Project:Wordfilter
Version:6.x-1.x-dev
Component:Code
Category:feature request
Priority:normal
Assigned:Unassigned
Status:closed
Description

Hi,

I've setup the worldfilter module to replace certain words in the bodytext.
Yet within the bodetext there are hyperlinks some of which also contain these words.
These words are also being replaced by the module and thus the links are changed to non-existing urls.

Checking the stand-alone option could fix this but then not all words to be filtered are being filtered.
For instance when ending a line and thus having a period attached to them.

Would it be possible to change the module to filter the words in the bodytext but NOT when they are part of a link?

Regards

#1

jaydub - January 17, 2009 - 11:14

that's a tough one actually. The current implementation uses preg_replace to do a global search and replace in the source text all in one go. This is nice and fast. I'm not enough of a REGEX expert to sort out whether it's possible to write one pattern that can catch all words to filter sans those that are part of a URL href. If anyone has got a suggestion I'd like to hear it.

#2

tignux - March 25, 2009 - 13:38

Subscribe

#3

AppleBag - April 7, 2009 - 21:27

I'd REALLY love to see this fixed as well. I use it to replace words people type with my own affiliate links and for that it works great, however when someone pastes a regular url to the site, it breaks it. i.e.:

Someone types the word Norton it'll replace it with my affiliate link to norton av, (perfect), but if they add a link (or just paste it in an url right in the post) like this: http://www.norton.com/free it'll think the word norton in the link needs replacing and screws the link up no matter what I do. Even setting the word as standalone doesnt help.

#4

jaydub - April 14, 2009 - 09:21
Version:5.x-1.x-dev» 6.x-1.x-dev
Status:active» needs review

Ok I've tried to put something together for this. This is a patch off of the latest -dev snapshot of the 6.x branch (should be dated April 14th).

This is pretty complex and I'm not so sure it will work in all cases but please do try it out. You should make sure that wordfilter is set below other filters that transform text into HTML links as the regex used here is basic in that it only looks for and recognizes as a link something that looks like an HTML link complete with <a> tags. This means that if you have a filter that turns www.example.com into <a href="http://www.example.com">www.example.com</a> you should make sure that wordfilter comes after that filter.

AttachmentSize
wordfilter.module.344287.patch 2.51 KB

#5

AppleBag - April 14, 2009 - 10:44

Jaydub, tyvm for this. I should of mentioned that I am using D5. (very sorry). Is there a chance of getting a similar patch for D5?

#6

jaydub - April 14, 2009 - 13:47

I'd like to see this tested and submitted into the D6 branch first before porting to D5. As you can see, D6 usage is about 3:1 of D5.

#7

AppleBag - April 15, 2009 - 05:29

Ok, thanks. Hopefully someone will test it soon. :)

#8

Bartezz - April 15, 2009 - 09:18

Hey Jaydub,

Thanx a million for this. Unfortunately I'm on D5 as well so I can't test it either.
Will keep an eye on this thread tho!

Cheers

#9

jaydub - April 17, 2009 - 22:32
Status:needs review» needs work

I'm going to have to take another pass at this since I realized that HTML other than <a> tags are going to be a problem as well.

#10

tignux - May 7, 2009 - 18:22

A good patch for D5 will be appreciate

Cheers

#11

tignux - May 27, 2009 - 13:19

I'm not a php developer, but I strongly need e D5 version of this module without problem on link.
I will be glad to contribute in any way (money of course) to obtain a working one.

Cheers
Andrea

#12

jaydub - June 5, 2009 - 09:10
Status:needs work» needs review

Can you guys bang on either of these two patches to d5 & d6 wordfilter? I'm looking for tests of content with any HTML tags in them. The regex attempts to bypass filtering on any text between an open tag < and close tag >. In theory that should make hrefs, src, class, id and any other tag attributes safe.

AttachmentSize
wordfilter.module.344287.d6.patch 1.65 KB
wordfilter.module.344287.d5.patch 1.64 KB

#13

muirgein - June 5, 2009 - 15:22

Tested in 6. No longer replaces text inside links.

However, general replacement is now broken. Any space before the replaced word is lost.

Example:
If "toMAYto|toMAHto", then "I like toMAYto on my sandwich." replaces to "I liketoMAHto on my sandwich."

#14

jaydub - June 6, 2009 - 18:22

Ok try this re-roll to address #13

AttachmentSize
wordfilter.module.344287.d5.patch 1.67 KB
wordfilter.module.344287.d6.patch 1.68 KB

#15

muirgein - June 11, 2009 - 02:45

Tested in 6.

Initially received a parse error. There is a missing period before $2 on the last line added in the patch.

When above error is corrected, patch works as expected and text inside links are not filtered.

#16

jaydub - June 11, 2009 - 03:36

re-rolled patches based on #15. thanks for picking that up @muirgein

AttachmentSize
wordfilter.module.344287.d6.patch 1.68 KB
wordfilter.module.344287.d5.patch 1.67 KB

#17

AppleBag - June 11, 2009 - 11:22

Thanks for working on this Jaydub. I just tried the patch on the latest dev for D5 and it still seems to be causing an issue. I'll send you a message using your contact form, the details on how to re-enact the issue.

#18

jaydub - June 14, 2009 - 04:12

@AppleBag I have responded to your email but for the sake of this thread, the filtering worked as advertised based on what you sent me in the message.

#19

jaydub - June 15, 2009 - 07:29

I've received a few questions outside the issue queue about the filtering of text in links as a result of the patch above.

To anyone who wants to avoid having any text in a link (or anything in an HTML tag) replaced it's best to follow this advice:

* If you will have links in your text that are not already turned into HTML links such as something like www.example.com or http://www.example.com it's a good idea to make use of Drupal's URL filter to turn those into <a> tags so the patch above (or the module once the patch is finalized and committed) will see the HTML tags and bypass the text from processing. The URL filter is in Drupal 6 core and is available as a contrib module in Drupal 5. This filter is used on Drupal.org so it's likely that you will see the above links which I typed in without the requisite HTML have been dynamically turned into HTML links.

* Set the weight of the Wordfilter filter to be lower than most or all other filters so that any text that -could- be part of an HTML tag will have been turned into a tag by the time the Wordfilter filter is called and thus the text that is now in a tag is safe. Some examples would be any type of filters that use a form of markup to be turned into HTML such as Bbcode, Markdown, Inline, Insert View and other similar types of modules. Basically any filter that would not work as intended if a Wordfilter replacement altered a word or part of a word should have its weight set higher than the Wordfilter filter.

#20

tignux - June 16, 2009 - 10:47

jaydub, thank you so much for the paches

Will can have a way, in the future of course, to limit the filter only to the link?

#21

jaydub - June 17, 2009 - 03:37

#20 @tignux I'd say that's unlikely. It's a lot less safe to filter items inside HTML tags. You could easily break an href or an image src tag or class and id attributes on HTML tags. All the time and effort put into this patch was precisely _because_ it's a good thing to try and avoid altering text inside an HTML tag since that can have unintended consequences.

Can I ask what sort of example you have in mind?

Wordfilter is not really meant to be something that rewrites your HTML for you. It's meant at it's most elemental level to be used for things like filtering out bad language or linking instances of a string to an HTML link.

#22

tignux - June 18, 2009 - 10:05

@jaydub, I'm really sorry, I wrote a wrong sentence

My intent is to filler all other html tags (B, STRONG, I, etc) excluding the link (tag A)

I apologise for bothering you again

#23

jaydub - June 18, 2009 - 17:26

#22 @tignux how do you intend to filter the HTML tags you mentioned? If you just want to remove tags, that would be done by the base Drupal HTML filter.

#24

tignux - June 19, 2009 - 12:59

I try to explain better my intent with an example.

Word to filter: BOYS
Word to show: GUYS

If I have this code

The people over there look like they'd be good bad <b>BOYS</b>. If you think they are good BOYS, write us.
Watch them with our <a href="BOYS-webcam">BOYS webcam</a>

I want to obtain

The people over there look like they'd be good bad <b>GUYS</b>. If you think they are good GUYS, write us.
Watch them with our <a href="BOYS-webcam">BOYS webcam</a>

I hope now is more clear

#25

Bartezz - June 19, 2009 - 15:16

I don't even wanna know what kind of website that is ;)
Anyways, I think your request is rather specific for it to become part of a module.

#26

muirgein - August 28, 2009 - 02:12

Will you please add the tested 6.x patch to the repo so I can get a snap-shot I don't have to keep re-patching? Thanks.

#27

jaydub - September 11, 2009 - 05:34
Status:needs review» fixed

#26 committed to CVS. sorry not done sooner.

#28

System Message - September 25, 2009 - 05:40
Status:fixed» closed

Automatically closed -- issue fixed for 2 weeks with no activity.

 
 

Drupal is a registered trademark of Dries Buytaert.