Checker Allowing Duplicates
Starminder - September 12, 2009 - 12:44
| Project: | Web Links |
| Version: | 6.x-2.x-dev |
| Component: | Contrib: Checker |
| Category: | bug report |
| Priority: | normal |
| Assigned: | NancyDru |
| Status: | closed |
Description
Specifically have checker set to not allow duplicates, yet a bunch of duplicate links have been added. Using 2x.dev from Sep 5. Thanks.

#1
OK, I see what is going on -
If link a exists: www.example.com and someone comes along and puts
link b as: http://www.example.com/
the checker allows this.
#2
Which would be more correct, with or without the trailing slash? It wouldn't be that hard to change it either way.
#3
Beauty is in the eye of the beholder? :) I prefer without the slash, and if that domain is already there without a slash, not to let another one come along and make a duplicate just by adding a slash. or vice-versa. If it is alreeady a link that exists with a slash at the end, don't allow the non-slash site to be added. In any case, to me www.example.com and www.example.com/ are duplicates of each other - so if one exists the other should not be allowed.
#4
I've just had the need to be reading up on Apache and it looks like if the / is not there, it will add one under the covers. On the other hand, the Path module tells you not to use a trailing slash. Hmm, which to do?
#5
Don't allow the site with the trailing slash :) Just my opinion. However, there will be instances where a very specific link is required such as www.example.com/examples/new.html so it's a tough call. If there is something after the slash, would it then be allowable?
I'm running into dupes such as www.example.com exists, and someone else comes along and adds www.example.com/index.html. How annoying.
I am one by one removing all the duplicates my overzealous users added. It is slow and tedious :) I just noticed something else - if I edit a link and remove the trailing slash and the checker then recognizes it as a duplicate, I then hit the delete button, but the entry doesn't really get deleted.
If you need that as a new issue let me know.
Thanks!
#6
Yes, that should be a new issue. Also check if the "delete link" link works, please.
Robert and I discussed this last night. We don't mind so much disallowing trailing slashes, but we don't really want to do an update to check for existing ones (for the very reasons you are finding). And there is a limit, because of performance, to how much "butt-saving" we want to do; for example, checking the "www" on the front (www.example.com vs example.com). As for example.com/index.html, that's a different URL, IMHO.
At any rate, we want to get a new release out momentarily, so we'd like to wait until after that.
#7
The "delete link" link works, but the delete button on the edit page does not (opened a new issue).
Whatever you want to do on this one is fine with me, mainly just wanted to bring up that a function of the checker is to disallow duplicate submissions, but it does allow it if the url is different in any way. So, if there is a way to be more....stringent...with the checker, it would be a good thing.
Since some sites are in subfolders, then www.examplesites.com/site1 and www.examplesites.com/site2 should be considered different. I still also think either www.example.com and www.example.com/ should be allowed, but if one exists the the other should not be allowed to be added. Not sure how hard that is, but I think that would be a method for reducing duplicate entries.
Thanks!!
#8
Well, this was interesting: http://www.alistapart.com/articles/slashforward/
There should be a trailing slash unless the URL ends with a file extension. Sigh, I guess this means a hook_update_N.
#9
Well, this was interesting: http://www.alistapart.com/articles/slashforward/
There should be a trailing slash unless the URL ends with a file extension. Sigh, I guess this means a hook_update_N.
#10
As I said, I'd like to hold off on this until we get the new release out, but here's a patch for you to test.
#11
BTW, there is still some room for duplicates. http://www.example.com and http://www.example.com:80 will be "different." There is also a hook_update_N that will put the trailing slash into the latest revision, if appropriate. The update itself will not check for duplicates.
#12
Committed to 6.x-2.x-dev.
#13
Included in 6.x-2.3.
#14
how about the long urls? For some reason the link checker fails to detect a duplicate of an example link like: http://www.lowes.com/lowes/lkn?action=categorySelect&Ne=4294967294&categ... . I can keep posting this link over and over without errors