In the same vein as this thread http://drupal.org/node/105078 I have a need to implement some sort of link checker for links submitted. I agree with other threads in that this should not go into this module and should be an external module which is exactly what I'm doing.
The link checking, managing states of checks, etc. etc. are not a problem but what I'm curious about is the best way of determining, via code, what nodes have link fields and pulling a list of all links that the particular node has.
Could you point me in the right direction, please?
Comments
Comment #1
quicksketchSure! Thanks for taking up this cause (sorry for being so long responding).
The best way to find what fields a node has is to use the content_fields() function. This will return a list of all fields that a node currently uses. Then you can iterate through the list and find which fields are links.
Note: totally untested code. Hope this helps.
Comment #2
jredding commentedNot a problem for the late response (at least you responded). Cool I figured it was something straight forward I had just never looked into it. Thanks for making my life really, really easy ;)
I'll post back here when I whip something up it'll probably be a few weeks.
Comment #3
allie mickaSubscribing
Comment #4
chadchandler commentedLooks very exciting! Any progress? or maybe even canceled plans to create this module?
Comment #5
jredding commentedyes I'm working on it... I'll post a new project soon and then we'll figure out what the best course for the module is (i.e. always separate or worked into this module).
give me one or two days to get it into its own project place
Comment #6
jredding commentedhttp://drupal.org/project/link_checker
Give it sometime before it shows up for download.
Comment #7
quicksketchLooks good :)
You mention on the project page that you'd like to merge with link, but I'm still not sure this would be a good idea. Right now you're leveraging modr8 to handling listing of invalid links, which is a smart move to reduce the amount of code your module needs to provide. However, if this were part of link.module, I'm sure people wouldn't want to have to use modr8 just to see what links are dead. Then we get into adding a lot of functionality into link.module to remove the dependency, and then people want link click-through tracking because link.module now does more than just a CCK field... you can kind of see where this is going. Adding any features besides a CCK field means ultimately adding all the features of a complete links package, which is what I want to avoid.
I absolutely don't want to discourage the creation of a CCK-based links package however. I'd love if a solution were available so that people could stop posting the requests here :) If there's anything I can do to help facilitate your module short of implementing it directly in link, just let me know.
Comment #8
allie mickaOne happy medium might be adding a link status to the link fields, defaulting it to "good" (or even the expected status code). The 3rd-party module would be responsible for maintaining the check-queue and updating the status as required.
This might permit the link module to expose link status via Views (thus deferring the how-to-handle-broken-links question to administrators), and any display formatter - link.module or otherwise - could handle display according to status.
Comment #9
jredding commentedeh.. whatcha talking willis?
modr8... there is no such dependency. The module will simply "unpublish" and flag the "moderate" switch in the DB all of which are part of Drupal core.
The only dependencies the module has is (1) link (2) CCK (because of link) and (3) Drupal core
I'd like to merge this in with your module because of 1 reason, table structure ;)
Right now I'm making a lot of assumptions in my code to find the correct tables to query in order to pull out the links. Once I have those tables I query all the links and then put the found status back into my own table. The only thing I need is 2 fields
last checked & status code
but in order to match this up with X number of tables and or fields (Depending on if its a single value or multiple value) I have to store
nid, vid, delta & field_name. These 4 make up the primary key to link it back to the original link... which is just a pain in the butt.
So this is what I'd like in the links module... 2 new fields (last checked and status code).
The link checking code
administrative code (Tables, etc. etc.) can all be kept in a separate module.
So in short I agree with you lets not bloat the link module but if we could get one or two new fields into the link type (only DB fields) then the link checker module could do its job much more effectively.
Comment #10
chadchandler commentedOh, a way to check for duplicates seems to be pretty popular as well. Awesome to see this!
Comment #11
quicksketchI'm not sure we can sneak by with just adding a 2 database columns. Look at the links package, which contains 3 tables:
Nearly immediately after mentioning adding a column to link module, Prodigy jumped in to say that we could add another feature (or database column) for unique checking. This is the sort of feature creep that I'd like to avoid.
If we did add these columns, we'd probably denormalize these tables of course, meaning that the columns needed would be:
last checked
status code
fail_count (threshold unpublishing)
clicks (needed for click counting)
url_md5 (needed for duplicate checking)
last_click_time (recent clicks? not sure what links package really needs this one for)
So now adding a link field (with all the features people commonly request) takes 9 database columns. The above 6 plus the 3 it already uses (url, title, attributes). I don't know I want to go down that path.
There's no need to make assumptions about table structure, just ask content.module what tables link uses:
As for modr8, yes you're right. It's not actually a dependency.
Comment #12
jredding commentedPoints taken and I agree...
Thanks for the code on asking content which DB fields it uses I was unaware of those and wrote some assumptions into my code. good to know that I rip out this piece of code and replace it with something cleaner.
OK next question. Currently I'm uniquely identifying each link by using the nid, vid, delta and the field_name. Yes I have a PK index consisting of 4 fields.
The field_name indentifies the table in which the link resides and the nid,vid,delta identifies the exact link I'm referring to.
Do you know of a better way of uniquely identifying each link? From my vantage point that seems like too many points and seems too shaky of a foundation.
Thanks for all the help so far its invaluable!
Comment #13
quicksketchHmm, yeah that's a bit troublesome. I can't really think of a way to shorten that key down any. Because link data can be spread across multiple database tables, this is a bit of a tricky situation.
As a (slightly wild) attempt, maybe you could use an md5 hash to identify links, then give them individual lids (pretty much like links package). Then build out a second table that contains the nid, vid, field_name, and delta information. Interestingly, if you take this approach, you could essentially piggy-back off the functionality of the links package (which already has URL checking and click through tracking).
If you do use an md5 hash to build a table of unique links, be sure that you run the URL through link_cleanup_url() first, so that the http:// and such gets appended and duplicate links are reduced.
Comment #14
jredding commentedhhmm.. a hash wouldn't quite work as its entirely possible to have the same links just posted under different content types (for example if you create link reference on a node type but continue to use wikipedia.org as a main reference). Even I ran them through a hash and only stored unique links for the purpose of checking I would still need to create a link back to the nid, vid so that I can reference back to the original node in order to flag it as having bad data (unpublished, moderation queue, etc. etc)
The hash would certainly make it run more efficiently if I were only checking a link once but it would make the backend even more messy.
I think the best way is just to leave it as is for now and continue to work this as an external module and flesh out plans over the long term.
Thanks for all the help it's been greatly appreciated. i'm going to rework some code so that I'm making less assumptions (thanks for the code snippets above) and then roll another release. I'll also look a bit more closely to see if we could somehow merge/piggyback off the links package module my first run through it though made it seem messier to get it to work together.
Thanks again!
Comment #15
wwwoliondorcom commentedHi,
On a digg-like website, any idea how to make a list of the websites to which I send visitors ?
http://drupal.org/node/295425
Thanks.