Download: drupal.org/project/honeytrap
Summary
The Honeytrap module allows site owners and system administrators to monitor web crawlers that do not follow the rules set out in the robot.txt file or via the RobotsText module or similar method and as a result put an unnecessarily high load on servers.
This is especially important for large, very high traffic and/or high profile sites where the activity of these non-compliant crawlers can bring servers to their knees. If these crawlers are not blocked or slowed down quickly enough then these crawlers can result in servers being knocked completely offline.
Please note that the Honeytrap module does not directly block or slow down offending IP addresses itself; it only logs and reports them, leaving you in full control of how you want to deal with them.
WARNING: When set up correctly with a firewall or similar this module will slow down and block traffic to your website. This potentially could include yourself, website owners, maintainers and legitimate visitors to your website. If you use this module then you do so at your own risk.
Requirements
- Views 2 module
- A configurable firewall (ideally)
- A running Drupal cron task (ideally)
- RobotsTxt module or access to your robots.txt file
- This module will not currently work on a Windows environment because it assumes that the '/' character separates path components.
More information
For much more detailed help and information see the README.html file in the module's docs folder or visit http://www.moofreechocolates.com/mikey-bunny/honeytrap.
Comment | File | Size | Author |
---|---|---|---|
#43 | honeytrap-1089490-43.patch | 8.4 KB | pillarsdotnet |
#43 | honeytrap-coder-review.png | 17.03 KB | pillarsdotnet |
#45 | honeytrap-1089490-45.patch | 21.52 KB | pillarsdotnet |
#25 | honeytrap-coder-review.png | 63.86 KB | pillarsdotnet |
#21 | honeytrap-suggestions.patch | 3.77 KB | pillarsdotnet |
Comments
Comment #1
Mikey Bunny CreditAttribution: Mikey Bunny commentedPlease review and provide me with any feedback. Thanks.
Comment #2
mermentau CreditAttribution: mermentau commentedYou forgot your sandbox link.
Comment #3
Mikey Bunny CreditAttribution: Mikey Bunny commentedThanks, I knew that I'd miss something. The Honeytrap sandbox link is:
http://drupal.org/sandbox/MikeyBunny/1087828
Comment #4
Mikey Bunny CreditAttribution: Mikey Bunny commentedAfter talking to the local OPs guys and another developer I am making some minor changes to the module in order to make its use slightly simpler.
Comment #5
mermentau CreditAttribution: mermentau commentedFrom what I'm seeing the Assigned too: would be the name of the one taking on the review of your project. Changed it for you.
Comment #6
Jackinloadup CreditAttribution: Jackinloadup commentedsubscribing. Looks like an interesting module.
Comment #7
Mikey Bunny CreditAttribution: Mikey Bunny commentedReady for review. I'll be releasing a firewall module which will also work with Honeytrap to provide a simple firewall solutions after this.
Comment #8
clemens.tolboomThe forms directory is not really drupal.
You can user either honeytrap.pages.inc or honeytrap.admin.inc for this depending on the purpose of the pages / forms.
See ie http://api.drupal.org/api/drupal/modules--node--node.pages.inc/6 and http://api.drupal.org/api/drupal/modules--node--node.admin.inc/6
Comment #9
Mikey Bunny CreditAttribution: Mikey Bunny commentedI have now moved all of the form functions in to honeytrap.admin.inc and tested that they still work. If you already have the Honeytrap module installed you will need to clear your menu's cache once you get the update.
Comment #11
Mikey Bunny CreditAttribution: Mikey Bunny commentedI will be releasing another module shortly which Honeytrap will optionally integrate with to provide firewall functionality for those that do not have access to their firewall settings.
Comment #12
BerdirSome hints...
- You should not add a LICENSE.txt file, that will be added automatically
- Never seen an @ingroup tag directly on the @file docblock, not sure if that is wrong but I doubt it gets parsed. Also, it's not really necessary to do that for your install hooks, this is already quite obvious as they are in the .install file :)
- When adding @see tags, also add (), e.g. "@see honeytrap_uninstall()".
- You forgot to commit honeytags.admin.inc
- "$form_path = drupal_get_path('module', 'honeytrap') . '/forms';" that line seems unecessary now
- Haven't seen your admin settings callback as it is missing, but when you save all settings in a single variable, then you can't use system_settings_form() I guess and will have to deal with saving them yourself. Not really wrong, just uncommon...
- No reason to make the hook_cron() implementation directly accessible through the menu system?
- It's not necessary to prefix your page callbacks with _
- The text after @param $variable should be on the next line, intended by two spaces.
- You should use l() in your footer hook,you can pass in additional attributes, see http://api.drupal.org/api/drupal/includes--common.inc/function/l/7
Comment #13
Mikey Bunny CreditAttribution: Mikey Bunny commentedI've been battling with git for the past week to try and recover the contents of the missing file. I'll add and commit the updates just as soon as I can work out how to recover everything.
Comment #14
Mikey Bunny CreditAttribution: Mikey Bunny commentedThank's Berdir. I have updated the code with your recommendations and added the missing file. I have also fixed an issue with the Options field in the view and have made some minor documentation updates regarding the cron.
Comment #15
pillarsdotnet CreditAttribution: pillarsdotnet commentedPlease:
admin/modules
to enable both "Coder" and "Coder Review" modulesadmin/config/development/coder/review
.I have done this for you once. The results are attached.
Additional problems not covered by Coder module:
I'm impressed. There was very little I could pick at.
Leaving this at "needs review" because someone may consider it RTBC-worthy despite the minor errors found.
Comment #16
Mikey Bunny CreditAttribution: Mikey Bunny commentedThank you pillarsdotnet. I'm not sure what to do as http://drupal.org/documentation/install/modules-themes/modules-7 refers to Drupal 7 installations and this is currently only a Drupal 6 module. I will spend time making any necessary changes to make a D7 version of this module one it is approved. I'd appreciate any feedback that you may have regarding this.
Thanks
Mike
Comment #17
pillarsdotnet CreditAttribution: pillarsdotnet commentedSorry; the proper link for d6 installation instructions is http://drupal.org/documentation/install/modules-themes/modules-5-6. I was reviewing several d7 modules and I must have gotten yours mixed in with them.
(looking...)
Well, most of the coder review notices apply whether you're running D6 or D7. Do your best to clean things up and I'll take another look.
Here are corrected instructions for d6 coder review:
admin/build/modules
to enable the "Coder" module.coder
to run Code review.Comment #18
Mikey Bunny CreditAttribution: Mikey Bunny commentedThanks again pillarsdotnet. I've ran coder and made the corrections. There is still one outstanding issue on several lines I get the following:
@see references should be separated by "," followed by a single space and with no trailing punctuation
I've tried everything that I can think of but can't get this to pass. I've looked at the Drupal doxygen standards for @see at http://drupal.org/node/1354 but still can't see that I've done anything different from the examples there.
Any advice on how to fix the @see problem would be greatly appreciated.
Thanks
MB
Comment #19
mermentau CreditAttribution: mermentau commentedI had the same problem and think it is a bug. If you comply with the coding standards page I wouldn't worry.
Comment #20
pillarsdotnet CreditAttribution: pillarsdotnet commentedYour honeytrap.admin.inc file has DOS line-endings. Please convert to Unix line-endings.
In honeytrap.module, I think
@sa
should be changed to@see
, unless@sa
means something I'm not familiar with.In honeytrap.views.default.inc on line 25, you have a block comment that should be changed to an inline comment. Additionally, it should probably be moved to its own line to avoid running over 80 characters line length.
I'm trying to install and test now.
Comment #21
pillarsdotnet CreditAttribution: pillarsdotnet commentedI got some undefined variable warnings for $cron_url.
Also, you might want to note either in the module documentation or in a hook_requirements() implementation that this module will not work on Windows, because it assumes that the '/' character separates path components.
Here are some additional suggested changes.
Comment #22
pillarsdotnet CreditAttribution: pillarsdotnet commentedChanging the DOS line endings to Unix line endings resolves the bogus @see warnings from coder.
Comment #23
pillarsdotnet CreditAttribution: pillarsdotnet commented(looking...) Actually, my suggested change to honeytrap.admin.inc is wrong. It should make the same check as _honeytrap_get_list_filename() before checking with is_writable(). But I see you've already begun to make some changes in the repository, so I'll wait for you to finish.
Comment #24
Mikey Bunny CreditAttribution: Mikey Bunny commentedThanks again pillarsdotnet. I have updated the code as you suggested. I'm so glad that you spotted the DOS line endings; I usually work on Linux but I edited some on the code on a windows machine and the coder error that it caused was driving me mad.
Re you comment: "In honeytrap.views.default.inc on line 25, you have a block comment that should be changed to an inline comment. Additionally, it should probably be moved to its own line to avoid running over 80 characters line length."
The entry that you refer to is generated by a views 2 export. Because this code is automatically generated anyway I will remove the comment.
I've also added a hook for the robotstxt module which will automatically add the required "Disallow" entries if the robotstxt module is enabled. Finally I ran coder again after all of my code changes.
I look forwards to receiving further feedback.
Comment #25
pillarsdotnet CreditAttribution: pillarsdotnet commentedIf I were you, I'd open a bug report against views 2.
Reviewing now... I must say that I like the idea of your module very much. Once it's approved, I'll probably submit a d7-upgrade patch.
Turned on Internationalization checks and found more coder issues. See attached. Also:
You need some more description here. I'm testing http://example.com/d6/index.php?q=admin/settings/honeytrap and trying to get my own IP address added to the blacklist. I've tried inserting the following into the Disallowed paths block, to no success:
Your users shouldn't have to resort to trial-and-error to figure out what values are appropriate.
You need some more description here. At a minimum, you should explain:
"/"
character.As an aside, you might want to consider asking for three complete paths (black-list path, white-list path, naughty-list path) rather than three path parts (file path, prefix, suffix). This would provide additional flexibility at no additional cost.
This description is confusing because you state one of the "either/or" options but not the other. Also, if there is a separate URL for invoking honeytrap cron, you should state it here.
There should be a
'/'
between$honeytrap_settings['list_files']['path']
and$honeytrap_settings['list_files']['prefix']
.Comment #26
Mikey Bunny CreditAttribution: Mikey Bunny commentedThank you again pillarsdotnet for your very concise and helpful instructions and advice.
I have now enabled internationalization on coder and made the required corrections. I have simplified the settings page and added more descriptions and help text. I also changed the list files to be three separate fields as you suggested as I agree that the interface is much simpler that way. Finally after testing everything I ran coder again to ensure that the code is still ship-shape and bristol fashion.
When you get time, please let me know what you think this time around. Cheers.
Comment #27
Mikey Bunny CreditAttribution: Mikey Bunny commentedComment #28
pillarsdotnet CreditAttribution: pillarsdotnet commentedt()
function should be a single literal text string with no concatenation, variables, or operators. In some places, you have concatenated several strings together, probably in an effort to keep your line length down. This is incompatible with the way the Drupal localization server will process your files. For instance, inhoneytrap.admin.inc
lines 44-48, you have:One way to simultaneously keep line length down and satisfy multilingual requirements is as follows:
See the Localization API and Dynamic strings with placeholders pages for details.
admin/settings/honeytrap
page, I received the following notices:View lists
tab, I received the following notices:Comment #29
Mikey Bunny CreditAttribution: Mikey Bunny commentedThanks again. I have changed the t() functions and switched on PHP notices. I then fixed the code that was causing the first notice. However, I'm not getting the notices on the view list page. Please can you let me know the steps to get this? Is it as simple as visiting the list?
Thanks
Comment #30
pillarsdotnet CreditAttribution: pillarsdotnet commentedOkay, here's what I did:
admin/settings/honeytrap
/admin
/tmp/black
/tmp/naughty
(Investigating the views notice...)
Ah. This is a bug in Views 3 that has since been fixed. See #1062506: Misnamed parameter for creation of broken handlers.
(Pulling latest views from git repository and trying again...)
Still getting:
(Investigating...)
Another views bug. Filed patch at #1174130-7: Strict warning: Creating default object from empty value in views_many_to_one_helper->ensure_my_table()
Okay, no PHP notices now. Let's see if I triggered the blacklist...
(running cron manually...)
Got another notice:
Yup. That
return $retval
should bereturn TRUE
or removed altogether.Let's see if my IP showed up in a list...
Nope. Why not?
(Looking at code some more...)
Since your list names are exposed in user-visible admin forms, you ought to make them translatable. Right now they're stored in HONEY_TRAP_LISTS which is not translatable. If you instead defined them in a
hook_init()
implementation, you could wrap the static text int()
.(looking some more...)
Now I understand! The Disallowed paths block must be filled with paths that do not otherwise exist in the menu system. So instead of
/admin
I should have supplied and then visitied/admin/foo
.(testing...)
That worked, but even a rogue robot isn't going to visit
/admin/foo
unless I supply a link that points there.Okay, so changes needed for final review:
honeytrap_cron()
, remove thereturn $retval
line ashook_cron()
doesn't need to return anything.Some nice-to-have items on my personal wishlist, not at all required for project approval:
hook_block()
implementation that contains invisible links with randomly-generated link text that lead to the Disallowed paths. This should then be added to the front page header region.NOTE TO SELF: ping mlncn on IRC when this is RTBC
Comment #31
sunThese should be separate variables normally.
This needs to happen during form validation.
In general, these dynamically defined callbacks are risky. The form validation also has to verify that the entered path does not clash with an existing system path (which likely involves checking the router directly).
This second router item can be removed. The callback of the first definition already gets any additional arguments.
Watchdog category names are normally all-lowercase.
Also, a module should normally use only one category name, if possible.
A page callback is not supposed to print/output anything directly. The output has to be returned.
Any reason for why this is not a variable?
The page callback should simply return MENU_NOT_FOUND instead.
I'd recommend to move the Views hook implementations into a separate honeytrap.views.inc file, so all of that code is not loaded on every single request (improves performance).
Usage of switch statements in form submit handlers is discouraged since Drupal 6. Instead, each form button is able to specify a dedicated form submit handler, if necessary:
Comment #32
pillarsdotnet CreditAttribution: pillarsdotnet commentedI agree with most of @sun's review, but:
For the record, that's a matter of personal preference and not part of coding standards.
Comment #33
Mikey Bunny CreditAttribution: Mikey Bunny commentedThank you very much for all you help and feedback pillarsdotnet and sun. I have addressed most of these issues but have a few questions about the few remaining ones below:
both: The variable array is now built in to the structure of the module and the way that it uses it. As changing it will be very disruptive to pretty much all functionality in the module and require extensive testing in order to ensure that everything still works as intended. Can we live with this as it is?
pillarsdotnet: This can already be done by defining your own theme hook for "honeytrap_list_item". Did you have something else in mind?
sun: I've removed these callbacks now as they are no longer required. However, please could you let me know what is the approved way of rendering the output returned from a callback function while ensuring that the content is not marked-up in any way with headers/footer etc?
sun: I already tried this. If I move them in to the .inc files the view will stop working with the following errors:
Any suggestions on how to resolve this other than keeping the functions in the .module file?
Comment #34
pillarsdotnet CreditAttribution: pillarsdotnet commentedThat works. Docs/example would be nice. Will review the rest later today.
Comment #35
Mikey Bunny CreditAttribution: Mikey Bunny commentedThanks pillarsdotnet. Please could you elaborate on exactly what you would like to see in Docs/example so that I get it right? Do you mean add an example of how to define a theme for this in the existing documentation? It does already mention that you will need to define this theme to change it I'm not sure if this is enought?
Comment #36
pillarsdotnet CreditAttribution: pillarsdotnet commentedThe point is that without any links to the honey-trapped pages, a web crawler that ignores robots.txt would not find them anyway.
So your docs should explain this, and explain how to create links that are visible to a web crawler but invisible (or at least obviously undesirable) to a real human.
Ideally, the links should be inserted automatically near the top of the front (home) page.
Comment #37
Mikey Bunny CreditAttribution: Mikey Bunny commentedThe links are already automatically inserted in to every page - see honeytrap_footer(). I will look at making this clearer in the documentation. This is already stated at the top of the settings page as follows:
I hope that this helps?
Comment #38
pillarsdotnet CreditAttribution: pillarsdotnet commentedYup. Will do a more thorough review later today.
Comment #39
Mikey Bunny CreditAttribution: Mikey Bunny commentedComment #40
Mikey Bunny CreditAttribution: Mikey Bunny commentedApplication priority changes as per http://drupal.org/node/894256 guidelines.
Comment #40.0
Mikey Bunny CreditAttribution: Mikey Bunny commentedUpdated description and sanbox url.
Comment #41
Mikey Bunny CreditAttribution: Mikey Bunny commentedApplication priority changes as per http://drupal.org/node/894256 guidelines.
Comment #42
Mikey Bunny CreditAttribution: Mikey Bunny commented...and still the tortuous wait for someone to review modules goes on. It has got to the point now after waiting for over 5 month where I am helping to review 3 other modules now myself in order to try and help get the queue down.
I have several other very useful modules that I would also like to contribute. Does anyone now if I submit more modules to the review queue and then finally one of them gets reviewed will all the other modules have to wait for reviewing after that as well?
Unfortunately I fear that I may just be talking to myself. I know, how about an infuriatingly slow review process that drives the applicant insane? It could be an impossible challenge a bit like trying to get in to the special forces. Now there's an idea...
MB
Comment #43
pillarsdotnet CreditAttribution: pillarsdotnet commentedHey; we're all volunteers here. I didn't even know the module review queue existed until just recently.
Hooray! Thanks for helping!
Suggested changes recommended by @sun, coder review, and myself attached and annotated as follows:
EDIT: Attachment is wrong. See #45.
<br />
instead of<br>
:if
blocks should always use curly-braces:watchdog()
argument:l()
function so that your links work when Drupal is in a subdirectory or when clean urls are turned off:Other modules that had
'operator' => 'or',
also had'exposed' => FALSE,
I'm not a views expert, but this change makes the error go away:
Comment #44
pillarsdotnet CreditAttribution: pillarsdotnet commentedPlease note that most of my suggested changes are just that; suggestions.
The problems that are blocking RTBC are:
Fix those bits, and once your module is approved I'll move my other suggestions to a patch in your module issue queue.
Comment #45
pillarsdotnet CreditAttribution: pillarsdotnet commentedBah; some of my suggestions didn't make it to the attachment. Re-attaching.
Comment #46
Mikey Bunny CreditAttribution: Mikey Bunny commentedHi pillars, thanks again for you exhaustive feedback. I have applied the changes and will commit them later on today once I have an environment to test the code and ensure no additional issues have crept in.
Unfortunately I am on a windows machine for the rest of today with a CentOS VM running by web services. My VM killed itself this morning due to lack of disk space on the host drive and I'm struggling to rescue it at present. It is insisting on more disk space for the repair but as the VM automatically gobbled up all the disk space in the first place so it's the chicken and egg scenario. I may just end up throwing in the towel and creating a new VM but that will take a few hours - which I have already wasted trying to recover it.
Oh, one comment on your comment about my comment:
The above sentence already ends with a period on the second line "...a view of the Honeytrap table(s) to be created.". It's probably a little confusing as it spans 2 lines to stay within the width restriction for comments.
You definitely deserve a mention if this module ever makes it live.
Thanks again
MB
Comment #47
pillarsdotnet CreditAttribution: pillarsdotnet commentedSorry; that's why it's always good to have a second pair of eyes look at changes before committing them.
I'm just looking forward to porting it to D7 so I can actually use it.
Comment #48
Mikey Bunny CreditAttribution: Mikey Bunny commentedThe D7 help is great news. The latest D7s won't install for me at present - there's an outstanding issue about it somewhere.
If your going to find this module useful I think that you may love my next one. I know that is it something driving sys admins and developers on large scale Drupal projects mad everywhere that I have worked at so far.
MB
Comment #49
Mikey Bunny CreditAttribution: Mikey Bunny commentedApologies for the extended delay, my machine trashed itself when it ran out of disk space and then I got very ill for a few weeks. With the exception of the update below I have made the suggested changes and updates with a few minor modification.
This code is generated by a views export. If it is modified to use the l() function then the view's "[ip]" token replacement will not work as the l() function will replace the square brackets with their html encoded entities.
Comment #50
pillarsdotnet CreditAttribution: pillarsdotnet commentedFile a bug against views; the export is wrong. At the very least, you should prefix the path with
base_path()
. If you don't, and someone has Drupal installed in a subdirectory of the web root, all your links will be broken.I'll test the rest soon.
Comment #51
pillarsdotnet CreditAttribution: pillarsdotnet commentedAccording to the guidelines on Process redesign, we probably should have approved this long ago.
Comment #52
gregglesThe installation instructions could use some more text about going to admin > setttings > honeytrap.
Module looks great to me as well. I was concerned about XSS via the browser user agent or IP or comment, but your views handlers are filtering the data. Well done!
Thanks for your contribution, Mikey Bunny! Welcome to the community of project contributors on drupal.org.
I've granted you the git vetted user role which will let you promote this to a full project and also create new projects as either sandbox or "full" projects depending on which you feel is best.
Thanks, also, for your patience with the review process. Anyone is welcome to participate in the review process. Please consider reviewing other projects that are pending review. I encourage you to learn more about that process and join the group of reviewers.
Comment #53
Mikey Bunny CreditAttribution: Mikey Bunny commentedThanks greggles that's excellent news and thank you everyone that helped, especially you pillarsdotnet. I will cut an alpha release just as soon as I work out how to do it.
Comment #53.0
Mikey Bunny CreditAttribution: Mikey Bunny commentedCorrected description formatting issues.
Comment #54.0
(not verified) CreditAttribution: commentedUpdated download location with full project url.